The assembly language command structure contains: Abstract: Outline: Preface. Assembly language and command structure. Structure of the exe file (semantic analysis). Assembly Language Basics

1. PC architecture………………………………………………………………………………5

    1.1. Registers.

    1.1.1 Registers general purpose.

1.1.2. Segment registers

1.1.3 Flag register

1.2. Organization of memory.

1.3. Data presentation.

1.3.1 Data types

1.3.2 Representation of characters and strings

2. Program statements in assembler ……………………………………

    1. Assembly language commands

2.2. Addressing modes and machine instruction formats

3. Pseudo-operators……………………………………………………….

3.1 Data definition directives

3.2 Structure of an assembler program

3.2.1 Program segments. assume directive

3.2.3 Simplified segmentation directive

4. Assembling and composing the program ………………………….

5. Data transfer commands…………………………………………….

    5.1 General commands

    5.2 Stack commands

5.3 I/O commands

5.4 Address forwarding commands

5.5 Flag forwarding commands

6. Arithmetic commands…………………………………………….

    6.1 Arithmetic operations on binary integers

6.1.1 Addition and subtraction

6.1.2 Commands to increment and decrement the receiver by one

6.2 Multiplication and division

6.3 Change of sign

7. Logical operations…………………………………………….

8. Shifts and cyclic shifts………………………………………………………

9. String operations……………………………………………………….

10. Logic and organization of programs……………………………………

10.1 Unconditional branches

10.2 Conditional jumps

10.4 Procedures in assembly language

10.5 INT Interrupts

10.6 System software

10.6.1.1 Reading the keyboard.

10.6.1.2 Displaying characters on the screen

10.6.1.3 Ending programs.

10.6.2.1 Selecting display modes

11. Disk memory……………………………………………………………..

11.2 File distribution table

11.3 Disk I/O operations

11.3.1 Writing a file to disk

11.3.1.1 ASCIIZ data

11.3.1.2 File number

11.3.1.3 Creating a disk file

11.3.2 Reading a disk file

Introduction

Assembly language is a symbolic representation of machine language. All processes in a personal computer (PC) at the lowest hardware level are driven only by machine language commands (instructions). It is impossible to truly solve problems related to hardware (or even, moreover, dependent on hardware, such as increasing the speed of a program), without knowledge of assembler.

The assembler is a convenient form of commands directly for PC components and requires knowledge of the properties and capabilities of the integrated circuit containing these components, namely the PC microprocessor. Thus, assembly language is directly related to the internal organization of the PC. And it is no coincidence that almost all high-level language compilers support access to the assembly level of programming.

An element of the training of a professional programmer is necessarily the study of assembler. This is because assembly language programming requires knowledge of PC architecture, which allows you to create more efficient programs in other languages ​​and combine them with assembly language programs.

The manual discusses programming in assembly language for computers based on Intel microprocessors.

This tutorial is addressed to everyone who is interested in processor architecture and the basics of programming in Assembly language, primarily to software product developers.

    PC architecture.

Computer architecture is an abstract representation of a computer, which reflects its structural, circuitry and logical organization.

All modern computers have some common and individual architectural properties. Individual properties are unique to a specific computer model.

The concept of computer architecture includes:

    computer block diagram;

    means and methods of access to elements of the computer block diagram;

    set and availability of registers;

    organization and methods of addressing;

    method of presentation and format of computer data;

    a set of computer machine instructions;

    machine instruction formats;

    interrupt handling.

The main elements of computer hardware: system unit, keyboard, display devices, disk drives, printing devices (printer) and various communications equipment. System unit consists of a motherboard, power supply and expansion cells for additional cards. The system board contains a microprocessor, read-only memory (ROM), RAM(RAM) and coprocessor.

      Registers.

Inside the microprocessor, information is contained in a group of 32 registers (16 user, 16 system), to one degree or another, available for use by the programmer. Since the manual is devoted to programming for the 8088-i486 microprocessor, it is most logical to start this topic with a discussion of the internal registers of the microprocessor that are accessible to the user.

User registers are used by the programmer to write programs. These registers include:

    eight 32-bit registers (general purpose registers) EAX/AX/AH/AL, EBX/BX/BH/BL, ECX/CX/CH/CL, EDX/DX/DLH/DL, EBP/BP, ESI/SI, EDI/DI, ESP/SP;

    six 16-bit segment registers: CS, DS, SS, ES, FS, GS;

    status and control registers: EFLAGS/FLAGS flag register, and EIP/IP command pointer register.

Parts of one 32-bit register are indicated through a slash. The prefix E (Extended) indicates the use of a 32-bit register. To work with bytes, registers with prefixes L (low) and H(high) are used, for example, AL, CH - denoting the low and high bytes of the 16-bit parts of the registers.

        General purpose registers.

EAX/AX/AH/AL(Accumulator register) – battery. Used in multiplication and division, in I/O operations, and in some string operations.

EBX/BX/BH/BL – base register(base register), often used when addressing data in memory.

ECX/CX/CH/CL – counter(count register), used as a counter for the number of repetitions of the loop.

EDX/DX/DH/DL – data register(data register), used to store intermediate data. In some teams, its use is mandatory.

All registers in this group allow access to their “lower” parts. Only the lower 16- and 8-bit parts of these registers can be used for self-addressing. The upper 16 bits of these registers are not available as independent objects.

To support string processing commands that allow sequential processing of chains of elements with a length of 32, 16 or 8 bits, the following are used:

ESI/SI (source index register) – index source. Contains the address of the current source element.

EDI/DI (destination index register) – index receiver(recipient). Contains the current address in the destination line.

In the microprocessor architecture, a data structure – a stack – is supported at the hardware and software level. To work with the stack there is special teams and special registers. It should be noted that the stack is filled towards smaller addresses.

ESP/SP (stack poINTer register) – register pointer stack. Contains a pointer to the top of the stack in the current stack segment.

EBP/BP (base poINTer register) – stack base pointer register. Designed to organize random access to data inside the stack.

1.1.2. Segment registers

The microprocessor software model has six segment registers: CS, SS, DS, ES, GS, FS. Their existence is due to the specific organization and use of RAM by Intel microprocessors. The microprocessor hardware supports the structural organization of the program consisting of segments. To indicate the segments available in this moment

    segment registers are intended. The microprocessor supports the following segment types: Code segment. Contains program commands. To access this segment, use the CS register (code segment register) - segment code register

    . It contains the address of the machine instruction segment that the microprocessor has access to. Data segment. Contains data processed by the program. To access this segment, use the DS (data segment register) register -

    segment data register, which stores the address of the current program's data segment. Stack segment. This segment is an area of ​​memory called the stack. The microprocessor organizes the stack according to the principle - first “in”, first “out”. To access the stack, use the SS (stack segment register) register -

    stack segment register, containing the address of the stack segment.

        Additional data segment.

The processed data can be located in three additional data segments. By default, the data is assumed to be in the data segment.

When using additional data segments, their addresses must be specified explicitly using special segment override prefixes in the command. Addresses of additional data segments must be contained in the ES, GS, FS registers (extenSIon data segment registers).

    Control and status registers

The microprocessor contains several registers that contain information about the state of both the microprocessor itself and the program whose commands are currently loaded into the pipeline. This:

EIP/IP Instruction Pointer Register; flag register EFLAGS/FLAGS. Using these registers, you can obtain information about the results of command execution and influence the state of the microprocessor itself. EIP/IP (instruction poINTer register) –

pointer register teams. Bit size 32/16 bits. Individual bits of this register have a specific functional purpose and are called flags. A flag is a bit that takes the value 1 ("flag set") if some condition is met, and the value 0 ("flag cleared") otherwise. The low part of this register is completely similar to the FLAGS register for i8086.

1.1.3 Flag register

The flags register is 32-bit and is named EFLAGS (Fig. 1). Individual bits of the register have a specific functional purpose and are called flags. Each of them is assigned a specific name (ZF, CF, etc.). The lower 16 bits of EFLAGS represent the 16-bit FLAGS flag register used when executing programs written for the i086 and i286 microprocessors.

Fig.1 Flag Register

Some flags are commonly called condition flags; they automatically change when commands are executed and record certain properties of their result (for example, whether it is equal to zero). Other flags are called state flags; they change from the program and influence the further behavior of the processor (for example, they block interrupts).

Condition flags:

CF (carry flag) - carry flag. Takes the value 1 if, when adding integers, a carry unit appeared that did not “fit” into the bit grid, or if, when subtracting unsigned numbers, the first of them was less than the second. In shift commands, the bit that is outside the bit grid is entered into CF. CF also captures the features of the multiplication instruction.

OF (overflow flag) - overflow flag.

Set to 1 if, when adding or subtracting signed integers, the result is a result that exceeds the permissible value in absolute value (the mantissa overflowed and it “climbed” into the sign digit). ZF (zero flag) - zero flag

. Set to 1 if the command result is 0. SF (SIgn flag) - flag sign

. SF (SIgn flag) - Set to 1 if an operation on signed numbers produces a negative result. PF (parity flag) -

parity . Equal to 1 if the result of the next command contains an even number of binary ones. Usually taken into account only for I/O operations.

AF (auxiliary carry flag) -

extra carry flag . Fixes the features of performing operations on binary decimal numbers.

State flags: DF (direction flag) - Used in protected mode of microprocessor operation to control access to I/O commands, depending on the privilege of the task.

NT (nested task) – task nesting flag. Used in protected mode of microprocessor operation to record the fact that one task is nested within another.

System flag:

IF (INTerrupt flag) - interrupt flag. When IF=0, the processor stops responding to incoming interrupts; when IF=1, the interrupt blocking is removed.

TF (trap flag) - trace flag.

When TF=1, after executing each command, the processor makes an interrupt (numbered 1), which can be used when debugging a program to trace it. RF (resume flag) – resume flag

. Used when processing interrupts from debug registers. VM (virtuAL 8086 mode) – virtual 8086 flag.

1-processor operates in virtual 8086 mode. 0-processor operates in real or protected mode. AC (ALignment check) – alignment control flag.

      Designed to allow alignment control when accessing memory.

Organization of memory. The physical memory that the microprocessor has access to is called RAM ( or random access memory - RAM). RAM is a chain of bytes that have their own unique address (its number), called physical.

The range of physical address values ​​is from 0 to 4 GB. The memory management mechanism is entirely hardware.

    The microprocessor hardware supports several models of using RAM: segmented model

    . In this model, memory for programs is divided into contiguous memory areas (segments), and the program itself can only access the data that is located in these segments; page model . In this case, RAM is considered as a set of blocks of a fixed size of 4 KB. The main application of this model is related to the organization virtual memory, which allows you to use more memory space for programs to run than the volume

physical memory

    . For a Pentium microprocessor, the size of possible virtual memory can reach 4 TB. The use and implementation of these models depends on the operating mode of the microprocessor:

    Real address mode (real mode). The mode is similar to the operation of the i8086 processor. Necessary for the operation of programs developed for early processor models. Protected mode., memory protection using a four-level privilege mechanism and its paging organization.

    Virtual 8086 mode. In this mode, it becomes possible to run several programs for the i8086.

In this case, real-mode programs can operate.

Segmentation is an addressing mechanism that ensures the existence of several independent address spaces. A segment is an independent, hardware-supported block of memory. Each program can generally consist of any number of segments, but it has direct access to three main ones: code, data and stack - and from one to three additional data segments. The operating system places program segments in RAM at specific physical addresses, and then places the values ​​of these addresses in the appropriate registers. Within a segment, the program accesses addresses relative to the beginning of the segment linearly, that is, starting from address 0 and ending with an address equal to the size of the segment. Relative address or bias, which the microprocessor uses to access data within a segment is called

effective.

Formation of a physical address in real mode In real mode, the range of changes in the physical address is from 0 to 1 MB. The maximum segment size is 64 KB. When contacting a specific physical address RAM is determined by the address of the beginning of the segment and the offset within the segment. The segment start address is taken from the corresponding segment register. In this case, the segment register contains only the most significant 16 bits of the physical address of the beginning of the segment. The missing low four bits of the 20-bit address are obtained by shifting the value of the segment register to the left by 4 bits. The shift operation is performed in hardware. The resulting 20-bit value is the real physical address corresponding to the beginning of the segment. That is is specified as a “segment:offset” pair, where “segment” is the first 16 bits of the starting address of the memory segment to which the cell belongs, and “offset” is the 16-bit address of this cell, counted from the beginning of this memory segment (value 16 * segment +offset gives the absolute address of the cell). If, for example, the CS register stores the value 1234h, then the address pair 1234h:507h defines an absolute address equal to 16*1234h+507h =12340h+507h = 12847h. Such a pair is written as a double word, and (as for numbers) in an “inverted” form: the first word contains an offset, and the second - a segment, and each of these words, in turn, is presented in an “inverted” form. For example, the pair 1234h:5678h would be written like this:| 78 | 56| 34 | 12|.

This mechanism for generating a physical address makes it possible to make the software relocatable, that is, independent of specific loading addresses in RAM.

General information about assembly language

Symbolic assembly language can largely eliminate the disadvantages of machine language programming.

Its main advantage is that in assembly language all program elements are presented in symbolic form. Converting symbolic command names to their binary codes are entrusted to special program- assembler, which frees the programmer from labor-intensive work and eliminates the inevitable errors.

Symbolic names entered when programming in assembly language usually reflect the semantics of the program, and the abbreviation of commands reflects their main function. For example: PARAM - parameter, TABLE - table, MASK - mask, ADD - addition, SUB - subtraction, etc. etc. Such names are easy for a programmer to remember.

For programming in assembly language, it is necessary to have complex tools than for programming in machine language: you need computer systems based on a microcomputer or PC with a set peripheral devices(alphanumeric keyboard, character display, float drive and printing device), as well as resident or cross-programming systems for the required types of microprocessors. Assembly language allows you to effectively write and debug much more complex programs than machine language (up to 1 - 4 KB).

Assembly languages ​​are machine-oriented, i.e., dependent on the machine language and structure of the corresponding microprocessor, since in them each microprocessor instruction is assigned a specific symbolic name.

Assembly languages ​​provide a significant increase in programmer productivity compared to machine languages ​​and at the same time retain the ability to use all software-available hardware resources of the microprocessor. This enables skilled programmers to write programs that run in less time and occupy less memory than programs written in a high-level language.

In this regard, almost all programs for controlling input/output devices (drivers) are written in assembly language, despite the presence of a fairly large range of high-level languages.

Using assembly language, the programmer can set the following parameters:

mnemonics (symbolic name) of each microprocessor machine language command;

a standard format for lines of a program written in assembly language;

format for indicating in various ways addressing and command variants;

format for specifying character constants and constants integer type in various number systems;

pseudo-commands that control the process of assembling (translating) a program.

In assembly language, a program is written line by line, that is, one line is allocated for each command.

For microcomputers built on the basis of the most common types of microprocessors, there may be several variants of assembly language, but usually one is widely used in practice - this is the so-called standard assembly language

Programming at the machine instruction level is the minimum level at which programs can be written. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine command consists of two parts:

· operating room - determining “what to do”;

· operand - defining processing objects, “what to do with”.

The microprocessor machine command, written in assembly language, is one line with the following syntactic form:

label command/directive operand(s) ;comments

In this case, the required field in the line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on next line, then the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters when writing commands or directives.

Direct addressing: The effective address is determined directly by the offset field of the machine instruction, which can be 8, 16, or 32 bits in size.

mov eax, sum ; eax = sum

The assembler replaces sum with the corresponding address stored in the data segment (addressed by the ds register by default) and places the value stored at sum in the eax register.

Indirect addressing in turn has the following types:

· indirect basic (register) addressing;

· indirect basic (register) addressing with offset;

· indirect index addressing;

· indirect basic index addressing.

Indirect basic (register) addressing. With this addressing, the effective address of the operand can be located in any of the general purpose registers, except sp/esp and bp/ebp (these are specific registers for working with the stack segment). Syntactically in a command, this addressing mode is expressed by enclosing the register name in square brackets.

mov eax, ; eax = *esi; *esi value at address esi

Commands can be distinguished by purpose (examples of mnemonic operation codes of IBM PC assembler commands are given in parentheses):

l execution arithmetic operations(ADD and ADC - addition and addition with carry, SUB and SBB - subtraction and subtraction with borrowing, MUL and IMUL - unsigned and signed multiplication, DIV and IDIV - unsigned and signed division, CMP - comparisons, etc. .);

l execution logical operations(OR, AND, NOT, XOR, TEST, etc.);

l data transfer (MOV - forward, XCHG - exchange, IN - enter into the microprocessor, OUT - output from the microprocessor, etc.);

l transfer of control (program branches: JMP - without conditional jump, CALL - calling a procedure, RET - returning from a procedure, J* - conditional jump, LOOP - loop control, etc.);

l processing character strings (MOVS - transfers, CMPS - comparisons, LODS - loads, SCAS - scans. These commands are usually used with the prefix (repetition modifier) ​​REP;

l program interruptions (INT - software interrupts, INTO - conditional interrupt on overflow, IRET - return from interrupt);

l microprocessor control (ST* and CL* - setting and resetting flags, HLT - stopping, WAIT - waiting, NOP - idling, etc.).

WITH full list Assembly commands can be found in the works.

Data Transfer Commands

l MOV dst, src - data transfer (move - send from src to dst).

Transfers: one byte (if src and dst are in byte format) or one word (if src and dst are in word format) between registers or between a register and memory, and writes a direct value to a register or memory.

The operands dst and src must have the same format - byte or word.

Src can be of the following type: r (register) - register, m (memory) - memory, i (impedance) - immediate value. Dst can be of type r, m. You cannot use the following operands in one command: rsegm together with i; two operands of type m and two operands of type rsegm). The i operand can also be a simple expression:

mov AX, (152 + 101B) / 15

Expression evaluation is performed only during translation. Doesn't change flags.

l PUSH src - pushing a word onto the stack (push - push through; push onto the stack from src). Places the contents of src - any 16-bit register (including segment register) or two memory cells containing a 16-bit word - onto the top of the stack. The flags do not change;

l POP dst - pop a word from the stack (pop - pop; count from stack to dst). Removes a word from the top of the stack and places it in dst - any 16-bit register (including segment register) or in two memory cells. The flags do not change.

Assembly language commands (Lecture)

LECTURE PLAN

1. Main groups of operations.

Pentium.

1. Main groups of operations

Microprocessors execute a set of commands that implement the following main groups of operations:

Forwarding operations

Arithmetic operations,

Logical operations

Shift operations

Comparison and testing operations

Bit operations

Program management operations;

Processor control operations.

2. Mnemonic codes of processor commands Pentium

When describing commands, their mnemonic designations (mnemonic codes) are usually used, which are used to specify the command when programming in Assembly language. For different versions of Assembler, the mnemonic codes of some commands may differ. For example, for the command to call a subroutine, the mnemonic code is usedCALL or JSR (“Jump to SubRoutine"). However, the mnemonic codes for most commands for the main types of microprocessors are the same or differ slightly, since they are abbreviations of the corresponding English words that define the operation being performed. Let's look at the command mnemonic codes adopted for processors Pentium.

Forwarding commands. The main team of this group is the teamMOV , which provides data transfer between two registers or between a register and a memory cell. Some microprocessors implement transfers between two memory cells, as well as bulk transfers of the contents of several registers from memory. For example, microprocessors of the 68 family Motorola xxx execute the commandMOVE , providing transfer from one memory cell to another, and the commandMOVEM , which writes to memory or loads from memory the contents of a specified set of registers (up to 16 registers). TeamXCHG mutually exchanges the contents of two processor registers or a register and a memory cell.

Input commands IN and output OUT implement sending data from a processor register to an external device or receiving data from an external device to a register. These commands specify the number of the interface device (input/output port) through which data is transferred. Note that many microprocessors do not have special instructions for accessing external devices. In this case, input and output of data in the system is performed using the commandMOV , which specifies the address of the required interface device. Thus, the external device is addressed as a memory cell, and a certain section is allocated in the address space in which the addresses of interface devices (ports) connected to the system are located.

Arithmetic operations commands. The main commands in this group are addition, subtraction, multiplication and division, which have a number of options. Addition commands ADD and subtraction SUB perform the corresponding operations withcpossessed by two registers, a register and a memory location, or using an immediate operand. Teams AD C , S.B. B perform addition and subtraction taking into account the value of the attributeC, set when forming a transfer during the execution of the previous operation. Using these commands, sequential addition of operands is implemented, the number of bits of which exceeds the processor capacity. Team N.E.G. changes the sign of the operand, converting it to two's complement.

Multiplication and division operations can be performed on signed numbers (commandsI MUL, I DIV ) or unsigned (commands MUL, DIV ). One of the operands is always located in a register, the second can be in a register, a memory cell, or be an immediate operand. The result of the operation is located in the register. When multiplying (commandsMUL , IMUL ) the result is double-bit, for which two registers are used. When dividing (commandsDIV , IDIV ) as a dividend, a double-bit operand is used, placed in two registers, and as a result, the quotient and remainder are written to two registers.

Logical Operation Commands . Almost all microprocessors perform logical operations AND, OR, Exclusive OR, which are performed on the same bits of operands using commands AND, OR, X OR . Operations are performed on the contents of two registers, a register and a memory location, or using an immediate operand. Team NOT inverts the value of each bit of the operand.

Shift Commands. Microprocessors perform arithmetic, logical and cyclic shifts of addressed operands by one or more bits. The operand to be shifted can be in a register or memory location, and the number of shift bits is specified by the immediate operand contained in the instruction or determined by the contents of the specified register. The transfer sign is usually involved in the implementation of the shiftCin the status register (S.R. or EFLAGS), which contains the last bit of the operand removed from the register or memory cell.

Comparison and testing commands . Comparison of operands is usually done using the commandCMP , which subtracts operands and sets feature values N, Z, V, C in the status register according to the result obtained. In this case, the result of the subtraction is not saved, and the values ​​of the operands do not change. Subsequent analysis of the obtained feature values ​​allows us to determine relative value (>, <, =) операндов со знаком или без знака. Использование различных способов адресации позволяет производит сравнение содержимого двух регистров, регистра и ячейки памяти, непосредственно заданного операнда с содержимым регистра или ячейки памяти.

Some microprocessors execute the test command TST , which is a single-operand version of the compare instruction. When this command is executed, the signs are set N, Z according to the sign and value (equal or non-zero) of the addressed operand.

Bit Operation Instructions . These commands set the value of the attributeCin the status register in accordance with the value of the bit being testedbn in the addressed operand. In some microprocessors, based on the result of bit testing, the attribute is setZ. Test bit numbernis specified either by the contents of the register specified in the command, or by the immediate operand.

The commands of this group implement different options for changing the bit being tested. Command BT keeps the value of this bit unchanged.Command B T S post-test sets the value bn=1, and the command B T C - meaning bn=0.Team B T C inverts the value of bit bn after testing it.

Program management operations. To control the program, a large number of commands are used, among which are:

- unconditional control transfer commands;

- conditional jump commands;

- teams for organizing program cycles;

- interrupt commands;

- commands for changing attributes.

Unconditional transfer of control is performed by the commandJMP , which loads into the program counterPCnew content that is the address of the next command to be executed. This address is either directly specified in the commandJMP (direct addressing), or calculated as the sum of the current contentsPCand the offset specified in the command, which is a signed number (relative addressing). BecausePCcontains the address of the next program command, the latter method specifies the jump address, offset relative to the next address by a specified number of bytes. With a positive offset, the transition is made to subsequent commands of the program, with a negative offset - to the previous ones.

A subroutine is also called by unconditionally transferring control using the commandCALL (or JSR ). However, in this case, before loading intoPC new content that specifies the address of the first command of the subroutine, it is necessary to save its current value (the address of the next command) in order to ensure a return to the main program after execution of the subroutine (or to the previous subroutine when nesting subroutines). Conditional jump commands (program branches) load intoPCnew content if certain conditions are met, which are usually set according to the current value of various attributes in the status register. If the condition is not met, then the next program command is executed.

Feature control commands provide writing - reading the contents of the status register in which features are stored, as well as changing the values ​​of individual features. For example, Pentium processors implement the commands LAHF And SAHF , which load the low byte, which contains the signs, from the status register EFLAG to the low byte of the register EAX and padding the low byte EFLAGS from register E AX.. Teams CLC, STC carry out setting the values ​​of the transfer sign CF=0, CF=1, and the command CMC causes the value of this attribute to be inverted. Since attributes determine the flow of program execution during conditional transitions, attribute change commands are usually used to control the program.

Processor control commands . This group includes stop commands, no operation commands, and a number of commands that determine the operating mode of the processor or its individual blocks. TeamHLT stops program execution and puts the processor into a stop state, which is exited when an interrupt or restart signal is received ( Reset ). Team NOP (“empty” command), which does not cause any operations to be performed, is used to implement program delays or fill gaps formed in the program.

Special teams CLI, STI prohibit and enable servicing of interrupt requests. In processors Pentium a control bit (flag) is used for thisIF in the register EFLAGS.

Many modern microprocessors issue an identification command that allows the user or other devices to obtain information about the type of processor used in a given system. In processors Pentuim the command for this is CPUID , during which the necessary data about the processor enters the registers EAXEBXECXEDX and can then be read by the user or the operating system.

Depending on the operating modes implemented by the processor and the specified types of data being processed, the set of executed commands can be significantly expanded.

Some processors perform arithmetic operations with binary-decimal numbers or execute special instructions to correct the result when processing such numbers. Many high-performance processors include FPU - number processing unit c "floating point".

A number of modern processors implement group processing of several integers or numbers c “floating point” using one command according to the principle SIMD (“Single Instruction – Multiple Data” ”) - “One command – Lots of data.” Simultaneous execution of operations on multiple operands significantly improves processor performance when working with video and audio data. Such operations are widely used for processing images, audio signals and other applications. To perform these operations, special blocks have been introduced into the processors that implement the corresponding sets of instructions, which in various types of processors ( Pentium, Athlon) got the nameMMX (“ Milti- Media Extension ”) – Multimedia Extension,SSE(“Streaming SIMD Extension”) – Streaming SIMD - extension, “3 DExtension– Three-dimensional Expansion.

A characteristic feature of the company’s processors Intel , starting with the 80286 model, is priority control when accessing memory, which is provided when the processor operates in protected virtual addresses mode - “ Protected Mode ” (protected mode). To implement this mode, special groups of commands are used, which serve to organize memory protection in accordance with the adopted priority access algorithm.

NATIONAL UNIVERSITY OF UZBEKISTAN NAMED AFTER MIRZO ULUGBEK

FACULTY OF COMPUTER TECHNOLOGY

On the topic: Semantic parsing of an EXE file.

Completed:

Tashkent 2003.

Preface.

Assembly language and command structure.

EXE file structure (semantic parsing).

COM file structure.

The principle of action and spread of the virus.

Disassembler.

Programs.

Preface

The profession of a programmer is amazing and unique. Nowadays, it is impossible to imagine science and life without the latest technology. Everything related to human activity cannot be done without computer technology. And this contributes to its high development and perfection. Although the development of personal computers began not so long ago, during this time colossal steps have been made in software products and these products will be widely used for a long time. The field of computer-related knowledge has undergone an explosion, as has the corresponding technology. If we do not take into account the commercial side, then we can say that there are no strangers in this area of ​​​​professional activity. Many people develop programs not for profit or income, but out of their own free will, out of passion. Of course, this should not affect the quality of the program, and in this business, so to speak, there is competition and demand for quality execution, stable work and meeting all modern requirements. Here it is also worth noting the appearance of microprocessors in the 60s, which came to replace a large number of lamp sets. There are some types of microprocessors that are very different from each other. These microprocessors differ from each other in their bit depth and built-in system commands. The most common ones are: Intel, IBM, Celeron, AMD, etc. All these processors are related to the advanced architecture of Intel processors. The spread of microcomputers caused a reconsideration of attitudes towards assembly language for two main reasons. First, programs written in assembly language require significantly less memory and execution time. Secondly, knowledge of assembly language and the resulting machine code provides an understanding of the machine's architecture, which is unlikely to be provided when working in a high-level language. Although most software professionals develop in high-level languages ​​such as Pascal, C or Delphi, which are easier to write programs, the most powerful and efficient software is written entirely or partially in assembly language. High-level languages ​​were developed to avoid being technically specific to particular computers. And assembly language, in turn, is designed for the specific specifics of the processor. Therefore, in order to write an assembly language program for a specific computer, you must know its architecture. These days, the main software product is an EXE file. Considering the positive aspects of this, the author of the program can be confident in its integrity. But often this is far from the case. There is also a disassembler. Using a disassembler, you can find out interruptions and program codes. It will not be difficult for a person well versed in assembler to remake the entire program to his taste. Perhaps this is where the most insoluble problem arises - the virus. Why do people write a virus? Some ask this question with surprise, some with anger, but nevertheless there continue to be people who are interested in this task not from the point of view of causing any harm, but as an interest in system programming. Viruses are written for various reasons. Some people like system calls, others improve their knowledge of assembler. I will try to explain all this in my course work. It also says not only about the structure of the EXE file but also about the assembly language.

^ Assembly Language.

It is interesting to follow, from the time of the appearance of the first computers to the present day, the transformation of programmers’ ideas about assembly language.

Once upon a time, assembly was a language without which you could not make a computer do anything useful. Gradually the situation changed. More convenient means of communicating with a computer appeared. But, unlike other languages, assembler did not die; moreover, it could not do this in principle. Why? In search of an answer, let's try to understand what assembly language is in general.

In short, assembly language is a symbolic representation of machine language. All processes in a machine at the lowest hardware level are driven only by machine language commands (instructions). From this it is clear that, despite the common name, the assembly language is different for each type of computer. This applies to both the appearance of programs written in assembly language and the ideas that this language is a reflection of.

It is impossible to truly solve problems related to hardware (or even, moreover, dependent on hardware, such as increasing the speed of a program), without knowledge of assembler.

A programmer or any other user can use any high-level tools, even programs for constructing virtual worlds, and perhaps not even suspect that in fact the computer does not execute the commands of the language in which its program is written, but their transformed representation in the form of a boring and dull sequences of commands from a completely different language - machine language. Now let’s imagine that such a user has a non-standard problem or something just doesn’t work out. For example, his program must work with some unusual device or perform other actions that require knowledge of the operating principles of computer hardware. No matter how smart the programmer is, no matter how good the language in which he wrote his wonderful program, he cannot do without knowledge of assembler. And it is no coincidence that almost all high-level language compilers contain means of connecting their modules with assembler modules or support access to the assembly level of programming.

Of course, the time of computer generalists has already passed. As they say, you cannot embrace the immensity. But there is something in common, a kind of foundation on which any serious computer education is built. This is knowledge about the principles of computer operation, its architecture and assembly language as a reflection and embodiment of this knowledge.

A typical modern computer (i486 or Pentium based) consists of the following components (Figure 1).

Rice. 1. Computer and peripherals

Rice. 2. Block diagram of a personal computer

From the figure (Figure 1) it is clear that the computer is made up of several physical devices, each of which is connected to one unit, called the system unit. If we think logically, it is clear that it plays the role of some kind of coordinating device. Let's look inside the system unit (no need to try to get inside the monitor - there is nothing interesting there, and besides, it is dangerous): open the case and see some boards, blocks, connecting wires. To understand their functional purpose, let's look at the block diagram of a typical computer (Fig. 2). It does not claim absolute accuracy and is intended only to show the purpose, interconnection and typical composition of the elements of a modern personal computer.

Let's discuss the diagram in Fig. 2 in a somewhat unconventional style.
It is common for a person, when encountering something new, to look for some associations that can help him understand the unknown. What associations does the computer evoke? For example, I often associate a computer with the person himself. Why?

When a person created a computer, somewhere deep inside himself he thought that he was creating something similar to himself. The computer has organs for receiving information from the outside world - a keyboard, a mouse, and magnetic disk drives. In Fig. 2 these organs are located to the right of the system buses. The computer has organs that “digest” the information received - these are the central processor and RAM. And finally, the computer has speech organs that produce the results of processing. These are also some of the devices on the right.

Modern computers, of course, are far from being human. They can be compared to creatures that interact with the outside world at the level of a large but limited set of unconditioned reflexes.
This set of reflexes forms a system of machine commands. No matter how high a level you communicate with a computer, it ultimately comes down to a boring and monotonous sequence of machine commands.
Each machine command is a kind of stimulus to excite one or another unconditioned reflex. The reaction to this stimulus is always unambiguous and “hardwired” in the microcommand block in the form of a microprogram. This microprogram implements actions to implement a machine command, but at the level of signals supplied to certain logical circuits of the computer, thereby controlling various subsystems of the computer. This is the so-called principle of microprogram control.

Continuing the analogy with a person, we note: in order for a computer to eat properly, many operating systems, compilers for hundreds of programming languages, etc. have been invented. But all of them are, in fact, just a platter on which food (programs) is delivered according to certain rules. stomach (computer). Only the computer's stomach loves diet, monotonous food - give it structured information, in the form of strictly organized sequences of zeros and ones, the combinations of which make up machine language.

Thus, although outwardly a polyglot, the computer understands only one language - the language of machine instructions. Of course, to communicate and work with a computer, it is not necessary to know this language, but almost any professional programmer sooner or later is faced with the need to study it. Fortunately, the programmer does not have to try to comprehend the meaning of various combinations of binary numbers, since back in the 50s, programmers began to use a symbolic analogue of machine language for programming, which was called assembly language. This language accurately reflects all the features of machine language. That is why, unlike high-level languages, assembly language is different for each type of computer.

From all of the above, we can conclude that since assembly language is “native” for a computer, the most effective program can only be written in it (provided that it is written by a qualified programmer). There is one small “but” here: this is a very labor-intensive process that requires a lot of attention and practical experience. Therefore, in reality, mainly programs are written in assembler, which should ensure effective work with the hardware. Sometimes program sections that are critical in terms of execution time or memory consumption are written in assembler. Subsequently, they are formalized in the form of subroutines and combined with code in a high-level language.

It makes sense to start learning the assembly language of any computer only after finding out what part of the computer is left visible and accessible for programming in this language. This is the so-called computer program model, part of which is the microprocessor program model, which contains 32 registers, to one degree or another, available for use by the programmer.

These registers can be divided into two large groups:

^ 16 user registers;

16 system registers.

Assembly language programs use registers very intensively. Most registers have a specific functional purpose.

As the name suggests, user registers are called user registers because the programmer can use them when writing his programs. These registers include (Fig. 3):

Eight 32-bit registers that can be used by programmers to store data and addresses (also called general purpose registers (GPR)):

six segment registers: cs, ds, ss, es, fs, gs;

status and control registers:

Flags register eflags/flags;

Command pointer register eip/ip.

Rice. 3. User registers of i486 and Pentium microprocessors

Why are many of these registers shown with slashes? No, these are not different registers - they are parts of one large 32-bit register. They can be used in the program as separate objects. This was done to ensure the functionality of programs written for younger 16-bit models of Intel microprocessors, starting with i8086. The i486 and Pentium microprocessors have mostly 32-bit registers. Their number, with the exception of segment registers, is the same as that of the i8086, but the dimension is larger, which is reflected in their designations - they have
prefix e (Extended).

^ General purpose registers
All registers in this group allow you to access their “lower” parts (see Fig. 3). Looking at this figure, note that only the lower 16 and 8-bit parts of these registers can be used for self-addressing. The upper 16 bits of these registers are not available as independent objects. This was done, as we noted above, for compatibility with younger 16-bit models of Intel microprocessors.

Let us list the registers belonging to the group of general purpose registers. Since these registers are physically located in the microprocessor inside an arithmetic logic unit (ALU), they are also called ALU registers:

eax/ax/ah/al (Accumulator register) - battery.
Used to store intermediate data. Some commands require the use of this register;

ebx/bx/bh/bl (Base register) - base register.
Used to store the base address of some object in memory;

ecx/cx/ch/cl (Count register) - counter register.
Used in teams that perform some repetitive actions. Its use is often implicit and hidden in the algorithm of the corresponding command.
For example, the command for organizing a loop loop, in addition to transferring control to a command located at a certain address, analyzes and decreases the value of the ecx/cx register by one;

edx/dx/dh/dl (Data register) - data register.
Just like the eax/ax/ah/al register, it stores intermediate data. In some commands its use is mandatory; For some commands this happens implicitly.

The following two registers are used to support so-called chain operations, that is, operations that sequentially process chains of elements, each of which can be 32, 16 or 8 bits long:

esi/si (Source Index register) - source index.
This register in chained operations contains the current address of the element in the source chain;

edi/di (Destination Index register) - index of the receiver (recipient).
This register in chained operations contains the current address in the destination chain.

In the microprocessor architecture, a data structure such as a stack is supported at the hardware and software level. To work with the stack, there are special commands in the microprocessor instruction system, and in the microprocessor software model there are special registers for this:

esp/sp (Stack Pointer register) - stack pointer register.
Contains a pointer to the top of the stack in the current stack segment.

ebp/bp (Base Pointer register) - stack frame base pointer register.
Designed to organize random access to data inside the stack.

A stack is a program area for temporary storage of arbitrary data. Of course, data can also be stored in a data segment, but in this case, for each data temporarily stored, a separate named memory cell must be created, which increases the size of the program and the number of names used. The convenience of the stack lies in the fact that its area is reusable, and storing data on the stack and retrieving it from there is done using the effective push and pop commands without specifying any names.
The stack is traditionally used, for example, to save the contents of registers used by a program before calling a subroutine, which, in turn, will use the processor registers "for its own purposes." The original contents of the registers are popped off the stack after the subroutine returns. Another common technique is to pass the parameters it requires to a subroutine via the stack. The subroutine, knowing in what order the parameters are placed on the stack, can take them from there and use them during its execution. A distinctive feature of the stack is the unique order in which the data contained in it is retrieved: at any given time, only the top element is available in the stack, i.e. the element most recently pushed onto the stack. Popping the top element from the stack makes the next element available. Stack elements are located in the memory area allocated for the stack, starting from the bottom of the stack (i.e., from its maximum address) at sequentially decreasing addresses. The address of the top, accessible element is stored in the stack pointer register SP. Like any other area of ​​program memory, the stack must be part of some segment or form a separate segment. In either case, the segment address of this segment is placed in the segment stack register SS. Thus, the pair of registers SS:SP describes the address of an accessible stack cell: SS stores the segment address of the stack, and SP stores the offset of the last data stored on the stack (Fig. 4, a). Note that in the initial state, the stack pointer SP points to a cell that lies under the bottom of the stack and is not included in it.

Fig 4. Stack organization: a - initial state, b - after loading one element (in this example, the contents of the AX register), c - after loading the second element (contents of the DS register), d - after unloading one element, e - after unloading two elements and return to their original state.

Loading into the stack is carried out by a special command for working with the stack (push). This instruction first decrements the contents of the stack pointer by 2 and then places the operand at the address in SP. If, for example, we want to temporarily store the contents of the AX register on the stack, we should run the command

The stack goes into the state shown in Fig. 1.10, b. It can be seen that the stack pointer is shifted up two bytes (towards lower addresses) and the operand specified in the push command is written to this address. The following stack loading command is e.g.

will put the stack into the state shown in Fig. 1.10, c. The stack will now store two elements, and only the top one, pointed to by the stack pointer SP, will be accessible. If after some time we need to restore the original contents of the registers stored on the stack, we must execute the pop (push) commands to unload from the stack:

pop DS
pop AX

How big should the stack be? It depends on how intensively it is used in the program. If, for example, you plan to store an array of 10,000 bytes on the stack, then the stack must be at least this size. It should be borne in mind that in some cases the stack is automatically used by the system, in particular, when executing the int 21h interrupt command. With this command, the processor first pushes the return address onto the stack, and then DOS pushes the contents of the registers and other information related to the interrupted program onto the stack. Therefore, even if a program does not use a stack at all, it must still be present in the program and be at least several dozen words in size. In our first example, we allocated 128 words to the stack, which is certainly enough.

^ Structure of an assembler program

An assembly language program is a collection of blocks of memory called memory segments. A program may consist of one or more such block segments. Each segment contains a collection of language sentences, each of which occupies a separate line of program code.

There are four types of assembler statements:

commands or instructions that are symbolic analogues of machine commands. During the translation process, assembler instructions are converted into the corresponding commands of the microprocessor instruction set;

macrocommands - sentences of program text formatted in a certain way, replaced during broadcast by other sentences;

directives, which are instructions to the assembler translator to perform certain actions. Directives have no counterparts in machine representation;

comment lines containing any characters, including letters of the Russian alphabet. Comments are ignored by the translator.

^ Assembly syntax

The sentences that make up a program can be a syntactic construct corresponding to a command, macro, directive, or comment. In order for the assembler translator to recognize them, they must be formed according to certain syntactic rules. To do this, it is best to use a formal description of the syntax of the language, like the rules of grammar. The most common ways to describe a programming language in this way are syntax diagrams and extended Backus-Naur forms. For practical use, syntax diagrams are more convenient. For example, the syntax of assembly language statements can be described using the syntax diagrams shown in the following figures.

Rice. 5. Assembly sentence format

Rice. 6. Directive format

Rice. 7. Format of commands and macros

In these pictures:

label name - an identifier whose value is the address of the first byte of the sentence in the source code of the program that it designates;

name - an identifier that distinguishes this directive from other directives of the same name. As a result of the assembler's processing of a particular directive, certain characteristics may be assigned to that name;

an operation code (OPC) and a directive are mnemonic symbols for the corresponding machine instruction, macro instruction or translator directive;

operands are parts of a command, macro or assembler directive that designate the objects on which actions are performed. Assembly language operands are described by expressions with numeric and text constants, labels and variable identifiers using operator signs and some reserved words.

^ How to use syntax diagrams? It's very simple: all you need to do is find and then follow the path from the diagram's input (on the left) to its output (on the right). If such a path exists, then the sentence or construction is syntactically correct. If there is no such path, then the compiler will not accept this construction. When working with syntax diagrams, pay attention to the direction of the traversal indicated by the arrows, since among the paths there may be some that can be followed from right to left. In essence, syntax diagrams reflect the logic of the translator's operation when parsing the input sentences of the program.

Acceptable characters when writing program text are:

All Latin letters: A-Z, a-z. In this case, uppercase and lowercase letters are considered equivalent;

Numbers from 0 to 9;

Signs ?, @, $, _, &;

Separators, . ()< > { } + / * % ! " " ? \ = # ^.

Assembly language sentences are formed from lexemes, which are syntactically inseparable sequences of valid language symbols that make sense to the translator.

The lexemes are:

identifiers are sequences of valid characters used to designate program objects such as operation codes, variable names, and label names. The rule for writing identifiers is as follows: an identifier can consist of one or more characters. As symbols you can use letters of the Latin alphabet, numbers and some special characters - _, ?, $, @. An identifier cannot begin with a digit character. The length of the identifier can be up to 255 characters, although the translator accepts only the first 32 and ignores the rest. You can adjust the length of possible identifiers using the option command line mv. In addition, it is possible to instruct the translator to distinguish between upper and lowercase letters or to ignore their difference (which is done by default).

^Assembler commands.

Assembler commands reveal the ability to transfer your requirements to the computer, a mechanism for transferring control in a program (cycles and transitions) for logical comparisons and program organization. However, programmable tasks are rarely that simple. Most programs contain a series of loops in which several commands are repeated until a certain requirement is achieved, and various checks that determine which of several actions should be performed. Some instructions can transfer control by changing the normal sequence of steps by directly modifying the offset value in the instruction pointer. As mentioned earlier, there are different commands for different processors, but we will look at a number of some commands for the 80186, 80286 and 80386 processors.

To describe the state of the flags after executing a certain command, we will use a selection from a table reflecting the structure of the eflags flag register:

The bottom row of this table shows the values ​​of the flags after the command is executed. The following notations are used:

1 - after the command is executed, the flag is set (equal to 1);

0 - after the command is executed, the flag is reset (equal to 0);

r - the value of the flag depends on the result of the command;

After the command is executed, the flag is not defined;

space - after the command is executed, the flag does not change;

The following notation is used to represent operands in syntax diagrams:

r8, r16, r32 - an operand in one of the registers of byte size, word or double word;

m8, m16, m32, m48 - memory operand size byte, word, double word or 48 bits;

i8, i16, i32 - immediate operand size byte, word or double word;

a8, a16, a32 - relative address (offset) in the code segment.

Commands (in alphabetical order):

*These commands are described in detail.

ADD
(ADDition)

Addition

^ Command diagram:

add destination, source

Purpose: addition of two source and destination operands of size byte, word or double word.

Work algorithm:

add the source and destination operands;

write the addition result to the receiver;

set flags.

State of flags after command execution:

Application:
The add command is used to add two integer operands. The result of the addition is placed at the address of the first operand. If the result of the addition goes beyond the boundaries of the receiver operand (an overflow occurs), then this situation should be taken into account by analyzing the cf flag and the subsequent possible use of the adc command. For example, let's add the values ​​in the ax register and the ch memory area. When adding, be aware of the possibility of overflow.

Register plus register or memory:

|000000dw|modregr/rm|

AX register (AL) plus immediate value:

|0000010w|--data--|data if w=1|

Register or memory plus immediate value:

|100000sw|mod000r/m|--data--|data if BW=01|

CALL
(CALL)

Calling a procedure or task

^ Command diagram:

Purpose:

transferring control to a near or far procedure with storing the address of the return point on the stack;

switching tasks.

Work algorithm:
determined by the operand type:

Near label - the contents of the eip/ip command pointer are pushed onto the stack and the new address value corresponding to the label is loaded into the same register;

Far label - the contents of the eip/ip and cs command pointer are pushed onto the stack. Then new address values ​​corresponding to the far label are loaded into the same registers;

R16, 32 or m16, 32 - define a register or memory cell containing offsets in the current instruction segment to which control is transferred. When control is transferred, the contents of the eip/ip command pointer are pushed onto the stack;

Memory pointer - defines a memory location containing a 4 or 6 byte pointer to the called procedure. The structure of such a pointer is 2+2 or 2+4 bytes. The interpretation of such a pointer depends on the operating mode of the microprocessor:

^ State of flags after command execution (except task switching):

executing the command does not affect the flags

When a task is switched, the flag values ​​are changed according to information about the eflags register in the TSS status segment of the task being switched to.
Application:
The call command allows you to organize a flexible and multi-variant transfer of control to a subroutine while preserving the address of the return point.

Object code (four formats):

Direct addressing in a segment:

|11101000|disp-low|diep-high|

Indirect addressing in a segment:

|11111111|mod010r/m|

Indirect addressing between segments:

|11111111|mod011r/m|

Direct addressing between segments:

|10011010|offset-low|offset-high|seg-low|seg-high|

CMP
(CoMPare operands)

Operand comparison

^ Command diagram:

cmp operand1,operand2

Purpose: comparison of two operands.

Work algorithm:

perform subtraction(operand1-operand2);

depending on the result, set the flags, do not change operand1 and operand2 (that is, do not remember the result).

Application:
This command is used to compare two operands by subtraction without changing the operands. Based on the results of the command, flags are set. The cmp command is used with the conditional jump commands and the set byte by value command setcc.

Object code (three formats):

Register or memory with register:

|001110dw|modregr/m|

Immediate value with AX (AL) register:

|0011110w|--data--|data if w=1|

Immediate value with register or memory:

|100000sw|mod111r/m|--data--|data if sw=0|

DEC
(DECrement operand by 1)

Decreasing an operand by one

^ Command diagram:

dec operand

Purpose: Decrease the value of an operand in memory or register by 1.

Work algorithm:
the command subtracts 1 from the operand. State of flags after command execution:

Application:
The dec instruction is used to decrement the value of a byte, word, double word in memory or register by one. However, note that the command does not affect the cf flag.

Register: |01001reg|

^ Register or memory: |1111111w|mod001r/m|

DIV
(DIVide unsigned)

Unsigned division

Team outline:

div divider

Purpose: Perform a division operation between two unsigned binary values.

^ Operating algorithm:
The command requires specifying two operands - the dividend and the divisor. The dividend is specified implicitly and its size depends on the size of the divisor, which is specified in the command:

if the divisor is a byte in size, then the dividend must be located in the ax register. After the operation, the quotient is placed in al and the remainder in ah;

if the divisor is a word in size, then the dividend must be located in the register pair dx:ax, with the low-order part of the dividend located in ax. After the operation, the quotient is placed in ax and the remainder in dx;

if the divisor is a double word in size, then the dividend must be located in the register pair edx:eax, with the low-order part of the dividend located in eax. After the operation, the quotient is placed in eax and the remainder in edx.

^ State of flags after command execution:

Application:
The command performs an integer division of the operands, producing the result of the division as the quotient and the remainder of the division. When performing a division operation, an exception may occur: 0 - division error. This situation occurs in one of two cases: the divisor is 0 or the quotient is too large to fit into the eax/ax/al register.

Object code:

|1111011w|mod110r/m|

INT
(INTerrupt)

Calling the interrupt service routine

^ Command diagram:

int interrupt_number

Purpose: call the interrupt service routine with the interrupt number specified by the command operand.

^ Operating algorithm:

push the flags register eflags/flags and the return address onto the stack. When writing a return address, the contents of the segment register cs are written first, then the contents of the command pointer eip/ip;

reset the if and tf flags to zero;

transfer control to the interrupt service program with the specified number. The control transfer mechanism depends on the operating mode of the microprocessor.

^ State of flags after command execution:

Application:
As you can see from the syntax, there are two forms of this command:

int 3 - has its own individual operation code 0cch and occupies one byte. This circumstance makes it very convenient for use in various software debuggers to set breakpoints by replacing the first byte of any command. The microprocessor, encountering a command with operation code 0cch in the sequence of commands, calls the interrupt processing program with vector number 3, which serves to communicate with the software debugger.

The second form of the command occupies two bytes, has an opcode of 0cdh and allows you to initiate a call to an interrupt service routine with a vector number in the range 0–255. Features of control transfer, as noted, depend on the operating mode of the microprocessor.

Object code (two formats):

Register: |01000reg|

^ Register or memory: |1111111w|mod000r/m|

J.C.C.
JCXZ/JECXZ
(Jump if condition)

(Jump if CX=Zero/ Jump if ECX=Zero)

Jump if condition is met

Transition if CX/ECX equal to zero

^ Command diagram:

jcc label
jcxz label
jecxz label

Purpose: transition within the current command segment depending on some condition.

^ Command algorithm (except jcxz/jecxz):
Checking the state of the flags depending on the opcode (it reflects the condition being checked):

if the condition being tested is true, then go to the cell indicated by the operand;

if the condition being checked is false, then transfer control to the next command.

Algorithm for the jcxz/jecxz command:
Checking the condition that the contents of the ecx/cx register are equal to zero:

if the condition being checked