Self-Printing Machine Code (2005)

By Susam Pal on 27 Oct 2005

The next 12-byte program gentle of pure x86 machine code
writes itself to favorite output when accomplished in a DOS environment:

fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3

We are able to write these bytes to a file with the .COM extension and
raise out it in DOS. It runs successfully in MS-DOS 6.22, Windows 98,
as smartly as in DOSBox and writes a reproduction of itself to favorite output.

Contents

  • Demo
  • Quine Conundrums
  • Steady Quines
  • A Show on DOS Services and products
  • Writing to Video Reminiscence At as soon as
  • Boot Program

Demo

On a Unix or Linux machine, the next commands demonstrate this
program with the relieve of DOSBox:

echo fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3 | xxd -r -p> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

The diff expose would possibly well gentle fabricate no output confirming
that the output of the program is similar to the program itself.
On an right MS-DOS 6.22 machine or a Windows 98 machine, we can
demonstrate this program within the next formula:

C:>DEBUG
-E 100 fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3
-N FOO.COM
-R CX
CX 0000
:C
-W
Writing 0000C bytes
-Q

C:>FOO> OUT.COM

C:>FC FOO.COM OUT.COM
Comparing files FOO.COM and OUT.COM
FC: no differences encountered

In the DEBUG session confirmed above, we employ the debugger
expose E to enter the machine code at offset 0x100 of
the code segment. Then we employ the N expose to call
the file we deserve to write this machine code to. The expose R
CX
is venerable to specify that we deserve to write 0xC (decimal 12)
bytes to this file. The W expose writes the 12 bytes
entered at offset 0x100. The Q expose quits the
debugger. Then we lag the contemporary FOO.COM program whereas
redirecting its output to OUT.COM. Ultimately, we employ
the FC expose to overview the 2 recordsdata and make positive
that they are precisely the the same.

Let us disasssemble this program now and scrutinize what it does. The
output below is generated the employ of the Netwide Disassembler (NDISASM),
a tool that includes Netwide Assembler (NASM):

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B10C              mov cl,0xc
00000103  AC                lodsb
00000104  92                xchg ax,dx
00000105  B402              mov ah,0x2
00000107  CD21              int 0x21
00000109  E2F8              loop 0x103
0000010B  C3                ret

When DOS executes a program in .COM file, it masses the machine code
within the file at offset 0x100 of the code segment chosen by DOS. That
is why we build a query to the disassembler to reflect a load cope with of 0x100
with the -o expose line option. The major instruction
clears the route flag. The reason of this instruction is
explained later. The next instruction sets the register CL to 0xc
(decimal 12). The register CH is already location to 0 by default when a
.COM program starts. Thus environment the register CL to 0 effectively
sets the entire register CX to 0xc. The register CX is venerable as a
loop counter for the loop 0x103 instruction that comes
later. Everytime this loop instruction executes, it decrements CX
and makes a shut to bounce to offset 0x103 if CX is now not 0. This results
in 12 iterations of the loop.

In every iteration of the loop, the instructions from offset 0x103 to
offset 0x109 are accomplished. The lodsb instruction masses
a byte from cope with DS:SI into AL. When DOS starts executing this
program, DS and SI are location to CS and 0x100 by default, so at the
starting DS:SI points to the major byte of the program.
The xchg instruction exchanges the values in AX and DX.
Thus the byte we honest loaded into AL finally ends up in DL. Then we location AH
to 2 and generate the instrument interrupt 0x21 (decimal 33) to write
the byte in DL to favorite output. Right here is how every iteration reads
a byte of this program and writes it to favorite output.

The lodsb instruction increments or decrements SI
searching on the deliver of the route flag (DF). When DF is
cleared, it increments SI. If DF is location, it decrements SI. We employ
the cld instruction initially to determined DF, so
that in every iteration of the loop, SI strikes forward to conceal the
subsequent byte of the program. Right here is how the 12 iterations of the loop
write 12 bytes of the program to favorite output. In many DOS
environments, the DF flag is already in cleared deliver when a .COM
program starts, so the CLD instruction will more than most likely be uncared for in such
environments. Alternatively, there are some environments the place DF would possibly well now not
be in cleared deliver when our program starts, so it is far a most effective
instruct to determined DF sooner than counting on it.

Ultimately, when the loop terminates, we raise out the RET
instruction to end the program.

Quine Conundrums

While learning the outline of the self-printing program presented
earlier, one would possibly well presumably wonder whether it is far a quine. While there isn’t this form of thing as a
standardized definition of the time length quine, it is generally
licensed that a quine is a laptop program that takes no enter and
produces an right reproduction of its hang source code as its output. Since
a quine can not take any enter, methods inviting learning its hang
source code or evaluating itself are dominated out.

As an instance, this shell script is a valid quine:

s='s=47%s47;printf "$s" "$s"n';printf "$s" "$s"

Alternatively, the next shell script is now not regarded as a lawful
quine:

cat $0

The shell script above reads its hang source code which is thought of as
dishonest. Erroneous quines adore this are steadily known as dishonest
quines
.

Is our 12-byte x86 program a quine? It turns out that we now hang got a
conundrum. There isn’t this form of thing as a conception of source code for our program.
There would were one if we had written out the source code of
this program in assembly language. In this form of case we would first
deserve to come to a decision an assembler and a lawful quine would deserve to fabricate
an right reproduction of the assembly language source code (now not the machine
code bytes) for the chosen assembler. But we must always now not doing that
here. We need the machine code to fabricate an right reproduction of itself.
There isn’t this form of thing as a source code enthusiastic. We supreme hang machine code. So we
would possibly well argue that the entire conception of machine code quine is nonsense.
No machine code quine can exist on myth of there isn’t this form of thing as a source code to
fabricate as output.

Alternatively, we would possibly well argue that the machine code is the enter for
the CPU that the CPU fetches, decodes, and converts to a chain of
deliver adjustments within the CPU. If we outline a machine code quine to be a
machine code program that writes its hang bytes, then we would possibly well instruct
that we now hang got a machine code quine here.

Let us now entertain the idea that our 12-byte program is certainly
a machine code quine. Now we now hang got a up to date conundrum. Is it a lawful
quine? This program reads its hang bytes from reminiscence and writes
them. Does that make it a dishonest quine? What would a lawful
quine written in pure machine code even see adore? If we see at
the shell script quine above, we scrutinize that it incorporates parts of the
executable phase of the script code embedded in a string as recordsdata.
Then we format the string cleverly to fabricate a up to date string that
appears to be like exactly adore the entire shell script. It is a general pattern
adopted in many quines. The quine would now not read its hang code but
it reads some recordsdata defined by the code and codecs that recordsdata to see
adore its hang code. Alternatively, in pure machine code adore this the
lines between recordsdata and code are blurred. Even if we strive to protect the
bytes we deserve to read at a separate place within the reminiscence and treat it
adore recordsdata, they’d see exactly adore machine instructions, so one
would possibly well presumably wonder if there would possibly be any point in attempting to make a machine quine
that would now not read its hang bytes. Nonetheless the next section
displays straight forward strategies to total this.

Steady Quines

If the idea of a machine code quine program learning its hang bytes
from the reminiscence makes you heart-broken, here is an adapation of the
outdated program that retains the machine instructions to be accomplished
separate from the records bytes to be read by the program.

fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3
fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3

Right here is how we can demonstrate this 40-byte program:

echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p> foo.com
echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p>> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

Right here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B302              mov bl,0x2
00000103  B114              mov cl,0x14
00000105  BE1401            mov si,0x114
00000108  AC                lodsb
00000109  92                xchg ax,dx
0000010A  B402              mov ah,0x2
0000010C  CD21              int 0x21
0000010E  E2F8              loop 0x108
00000110  4B                dec bx
00000111  75F0              jnz 0x103
00000113  C3                ret
00000114  FC                cld
00000115  B302              mov bl,0x2
00000117  B114              mov cl,0x14
00000119  BE1401            mov si,0x114
0000011C  AC                lodsb
0000011D  92                xchg ax,dx
0000011E  B402              mov ah,0x2
00000120  CD21              int 0x21
00000122  E2F8              loop 0x11c
00000124  4B                dec bx
00000125  75F0              jnz 0x117
00000127  C3                ret

The major 20 bytes is the executable phase of the program. The next
20 bytes is the records read by the program. The executable bytes are
the same to the records bytes. The executable phase of the program has
an outer loop that iterates twice. In every iteration, it reads the
recordsdata bytes and writes them to favorite output. Therefore, in two
iterations of the outer loop, it writes the records bytes twice. In
this kind, the output is similar to the program itself.

Right here is one more much less advanced 32-byte quine essentially based entirely mostly on this kind:

b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3
b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3

Right here are the commands to demostrate this quine:

echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p> foo.com
echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p>> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

Right here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  B82309            mov ax,0x923
00000103  FEC0              inc al
00000105  A22001            mov [0x120],al
00000108  BA1001            mov dx,0x110
0000010B  CD21              int 0x21
0000010D  CD21              int 0x21
0000010F  C3                ret
00000110  B82309            mov ax,0x923
00000113  FEC0              inc al
00000115  A22001            mov [0x120],al
00000118  BA1001            mov dx,0x110
0000011B  CD21              int 0x21
0000011D  CD21              int 0x21
0000011F  C3                ret

This case too has two parts. The major half has the executable
bytes and the 2d half has the records bytes. Each and each parts are
the same. This case sets AH to 9 within the major instruction and
then later makes employ of int 0x21 to invoke the DOS carrier that
prints a dollar-terminated string starting at the cope with specifed
in DS:DX. When a .COM program starts, DS already points to the
present code segment, so we don’t desire to location it explicitly. The
dollar image has an ASCII code of 0x24 (decimal 36). We would possibly well gentle be
careful about now not having this fee wherever one day of the the records
bytes or this DOS characteristic would prematurely end printing our recordsdata
bytes as soon because it encounters this fee. That’s why we location AL to
0x23 within the major instruction, then increment it to 0x24 within the
2d instruction, and then reproduction this fee to the top of the records
bytes within the third instruction. Ultimately, we raise out int
0x21
twice to write the records bytes twice to favorite output,
so that the output suits the program itself.

While both these programs take care now not to read the the same reminiscence
location that is being accomplished by the CPU, the records bytes they read
see exactly adore the executable bytes. Right here’s what I intended after I
mentioned earlier that the lines between code and recordsdata are blurred
in an issue adore this. Right here is why I fabricate now not in point of reality scrutinize a degree in
conserving the executable bytes separate from the records bytes whereas
writing machine code quines.

A Show on DOS Services and products

The self-printing programs presented above employ int 0x21
which affords DOS companies and products that relieve various enter/output
strategies. In the major two programs, we selected the characteristic to
write a persona to favorite output by environment AH to 2 sooner than
invoking this instrument interrupt. In the next program, we selected
the characteristic to write a dollar-terminated string to favorite output
by environment AH to 9.

The ret instruction within the top too relies on DOS
companies and products. When a .COM program starts, the register SP incorporates
0xfffe. The stack reminiscence areas at offset 0xfffe and 0xffff
possess 0x00 and 0x00, respectively. Extra, the reminiscence cope with at
offset 0x0000 incorporates the instruction int 0x20 which
is a DOS carrier that terminates the program. As a result,
executing the ret instruction pops 0x0000 off the stack
at 0xfffe and masses it into IP. This ends within the
instruction int 0x20 at offset 0x0000 getting accomplished.
This instruction terminates the program and returns to DOS.

Relying on DOS companies and products affords us a contented environment to work
with. Particularly, DOS implements the conception of favorite
output
which lets us redirect favorite output to a file. This
lets us with ease overview the contemporary program file and the
output file with the FC expose and make positive that they
are the same.

But one would possibly well presumably wonder if we would possibly well protect away from counting on DOS companies and products
fully and gentle write a program that prints its hang bytes to
conceal conceal. We positively can. We would possibly well write on to video reminiscence
at cope with 0xb800:0x0000 and instruct the bytes of the program on
conceal conceal. We would possibly well forgo DOS fully and let BIOS load our
program from the boot sector and lift out it. The next two sections
discuss these items.

Writing to Video Reminiscence At as soon as

Right here is an instance of an 18-byte self-printing program that writes
on to the video reminiscence at cope with 0xb800:0x0000.

fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd

Right here are the commands to create and lag this program:

echo fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p> foo.com
dosbox foo.com

With the default code internet page packed with life, i.e., with code internet page 437 packed with life,
the program would possibly well gentle conceal an output that appears to be like approximately adore
the next and discontinuance:

ⁿ┤╕Ä└1 ▒↕┤◙¼½Γⁿ⌠δ²

Now pointless to sigh the form of output appears to be like gibberish but there is a
like a flash and dirty solution to substantiate that this output certainly represents
the bytes of our program. We are able to employ the TYPE expose
of DOS to print the program and check if the symbols that appear in
its output seem constant with the output above. Right here is an
instance:

C:>TYPE FOO.COM
ⁿ┤╕Ä└1 ▒↕┤
          ¼½Γⁿ⌠δ²
C:>

This output appears to be like very much like the outdated one with the exception of that the
byte fee 0x0a is rendered as a line damage on this output whereas
within the outdated output this byte fee is represented as a circle in
a field. This formula haven’t got labored if there were any adjust
characters equivalent to backspace or carriage return that pause in
characters being erased within the displayed output.

A lawful solution to verify that the output of the program represents the
bytes of the program would be to salvage every image within the output in a
chart for code internet page 437 and make positive that the byte fee of every
image suits every byte fee within the program. Right here is one such
chart that approximates the symbols in code internet page 437 with Unicode
symbols: cp437.html.

Right here is the disassembly of the above program:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B4B8              mov ah,0xb8
00000103  8EC0              mov es,ax
00000105  31FF              xor di,di
00000107  B112              mov cl,0x12
00000109  B40A              mov ah,0xa
0000010B  AC                lodsb
0000010C  AB                stosw
0000010D  E2FC              loop 0x10b
0000010F  F4                hlt
00000110  EBFD              jmp short 0x10f

This program sets ES to 0xb800 and DI to 0. Thus ES:DI points to
the video reminiscence at cope with 0xb800:0x0000. DS:SI points to the
first instruction of this program by default. Extra AH is location to
0xa. Right here is venerable to specify the coloration attribute of the textual convey to be
displayed on conceal conceal. Every iteration of the loop on this program
masses a byte of the program and writes it alongside with the coloration
attribute to video reminiscence. The lodsb instruction masses
a byte of the program from the reminiscence cope with specified by DS:SI
into AL and increments SI by 1. AH is already location to 0xa. The
fee 0xa (binary 00001010) here specifies murky because the background
coloration and shiny green because the foreground coloration.
The stosw instruction stores a note from AX to the
reminiscence cope with specified by ES:DI and increments DI by 2. On this
formula, the byte in AL and its coloration attribute in AH will get copied to
the video reminiscence.

As soon as extra, when it is best to now not overjoyed regarding the program learning its hang
executable bytes, we can protect the bytes we read separate from the
bytes the CPU executes. Right here is a 54-byte program that does this:

fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4
0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8
8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd

Right here is how we can create and lag this program:

echo fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4 | xxd -r -p> foo.com
echo 0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8 | xxd -r -p>> foo.com
echo 8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc | xxd -r -p>> foo.com
echo 4b 75 f1 f4 eb fd | xxd -r -p>> foo.com
dosbox foo.com

With code internet page 437 packed with life, the output would possibly well gentle see approximately adore
this:

ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²

We are able to clearly scrutinize on this output that the major 27 bytes of output
are the same to the next 27 bytes of the output. Adore the lawful
quines discussed earlier, this one too has two halves that are
the same to every diversified. The executable code within the major half
reads the records bytes from the 2d half and prints the records bytes
twice so that the output bytes is an right reproduction of all 54 bytes in
the program. Right here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B302              mov bl,0x2
00000103  B4B8              mov ah,0xb8
00000105  8EC0              mov es,ax
00000107  31FF              xor di,di
00000109  BE1B01            mov si,0x11b
0000010C  B91B00            mov cx,0x1b
0000010F  B40A              mov ah,0xa
00000111  AC                lodsb
00000112  AB                stosw
00000113  E2FC              loop 0x111
00000115  4B                dec bx
00000116  75F1              jnz 0x109
00000118  F4                hlt
00000119  EBFD              jmp short 0x118
0000011B  FC                cld
0000011C  B302              mov bl,0x2
0000011E  B4B8              mov ah,0xb8
00000120  8EC0              mov es,ax
00000122  31FF              xor di,di
00000124  BE1B01            mov si,0x11b
00000127  B91B00            mov cx,0x1b
0000012A  B40A              mov ah,0xa
0000012C  AC                lodsb
0000012D  AB                stosw
0000012E  E2FC              loop 0x12c
00000130  4B                dec bx
00000131  75F1              jnz 0x124
00000133  F4                hlt
00000134  EBFD              jmp short 0x133

This disassembly is moderately long but we can clearly scrutinize that the
bytes from offset 0x100 to offset 0x11a are the same to the bytes
from offset 0x11b to 0x135. These are the bytes we scrutinize within the
output of the program too.

Boot Program

The 32-byte program below writes itself to video reminiscence when
accomplished from the boot sector:

ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31
ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd

We are able to create a boot image that includes these bytes, write it to
the boot sector of a pressure and boot an IBM PC acceptable laptop
with it. On booting, this program prints its hang bytes on the
conceal conceal.

On a Unix or Linux machine, the next commands would possibly well additionally be venerable to
create a boot image with the above program:

echo ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31 | xxd -r -p> boot.img
echo ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p>> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img

Now we can check this boot image the employ of DOSBox with the next
expose:

dosbox -c cls -c 'boot boot.img'

We would possibly well additionally check this image the employ of QEMU x86 machine emulator as
follows:

qemu-system-i386 -fda boot.img

We would possibly well write this image to the boot sector of an right
bodily storage instrument, equivalent to a USB flash pressure, and then boot
the laptop with it. Right here is an instance expose that writes the
boot image to the pressure represented by the instrument
path /dev/sdx.

cp a.img /dev/sdx

CAUTION: You’ll need to be entirely positive of the instrument path of the
instrument being written to. The instrument path /dev/sdx is
supreme an instance here. If the boot image is written to the unfriendly
instrument, access to the records on that is per chance misplaced.

On sorting out this boot image with an emulator or an valid laptop, the
output would possibly well gentle see approximately adore this:

Ω♣|  ⁿ╕ ╕Ä└î╚Ä╪1 ╛ |╣  ┤◙¼½Γⁿ⌠δ²

This appears to be like adore gibberish, alternatively every image within the above output
corresponds to a byte of the program mentioned earlier. For
instance, the major image (omega) represents the byte fee 0xea,
the 2d image (membership) represents the byte fee 0x05, and so forth.
The chart at cp437.html would possibly well additionally be
venerable to substantiate that every image within the output certainly represents
every byte of the program.

Right here is the disassembly of the program:

$ ndisasm -o 0x7c00 boot.img
00007C00  EA057C0000        jmp 0x0:0x7c05
00007C05  FC                cld
00007C06  B800B8            mov ax,0xb800
00007C09  8EC0              mov es,ax
00007C0B  8CC8              mov ax,cs
00007C0D  8ED8              mov ds,ax
00007C0F  31FF              xor di,di
00007C11  BE007C            mov si,0x7c00
00007C14  B92000            mov cx,0x20
00007C17  B40A              mov ah,0xa
00007C19  AC                lodsb
00007C1A  AB                stosw
00007C1B  E2FC              loop 0x7c19
00007C1D  F4                hlt
00007C1E  EBFD              jmp short 0x7c1d
00007C20  0000              add [bx+si],al
00007C22  0000              add [bx+si],al
...

The ellipsis within the top represents the remainder of the bytes that
incorporates zeroes and the boot sector magic bytes 0x55 and 0xaa within the
pause. They’ve been uncared for here for the sake of brevity.

When a laptop boots, the BIOS reads the boot sector code from the
first sector of the boot instrument into the reminiscence at bodily cope with
0x7c00 and jumps to this cope with. Most BIOS implementations bounce to
0x0000:0x7c00 but there are some implementations that bounce to
0x07c0:0x0000 instead. Each and each these jumps are jumps to the the same
bodily cope with 0x7c00 but this incompatibility poses an wretchedness for us
for the reason that offsets in our program rely on which bounce the BIOS
accomplished. In instruct to make poke that our program can lag with both
forms of BIOS implementations, we employ a most favorite trick of getting the
first instruction of our program raise out a bounce to cope with
0x0000:0x7c05 in instruct to prevail within the 2d instruction. This sets
the register CS to 0 and IP to 0x7c05 and we don’t desire to wretchedness
regarding the diversities between BIOS implementations anymore. We are able to
now fake as if a BIOS implementation that jumps to 0x0000:0x7c00
is going to load our program.

The leisure of the program is equivalent to the one within the outdated
section. Alternatively, there are some runt but crucial differences.
While the DOS environment ensures that AH and CH are initialized
to 0 when a .COM program starts, the BIOS affords no such guarantee
whereas loading and executing a boot program. Right here is why we employ the
registers AX and CX (as in opposition to supreme AH and CL) in
the mov instructions to initialize them. In an identical device,
whereas DOS initializes SI to 0x100 when a .COM program starts, for a
boot program, we location the register SI ourselves.

As soon as you is per chance feeling heart-broken about calling the above program a quine
on myth of it reads its hang bytes from the reminiscence, we would possibly well need the
program read the bytes it desires to print from a separate place in
reminiscence. We end now not raise out these bytes. We supreme read them and duplicate
them to video reminiscence. The next 76-byte program does this:

ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8
8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8
00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00
b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd

Right here is how we can create a boot image with this:

echo ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8 | xxd -r -p> boot.img
echo 8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc | xxd -r -p>> boot.img
echo 4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8 | xxd -r -p>> boot.img
echo 00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00 | xxd -r -p>> boot.img
echo b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd | xxd -r -p>> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img

Right here are the commands to verify this boot image:

dosbox -c cls -c 'boot boot.img'
qemu-system-i386 -fda boot.img

The output would possibly well gentle see adore this:

Ω♣|  ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²Ω♣|  ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²

Right here is the disassembly of this program:

$ ndisasm -o 0x7c00 boot.img
00007C00  EA057C0000        jmp 0x0:0x7c05
00007C05  FC                cld
00007C06  BB0200            mov bx,0x2
00007C09  B800B8            mov ax,0xb800
00007C0C  8EC0              mov es,ax
00007C0E  8CC8              mov ax,cs
00007C10  8ED8              mov ds,ax
00007C12  31FF              xor di,di
00007C14  BE267C            mov si,0x7c26
00007C17  B92600            mov cx,0x26
00007C1A  B40A              mov ah,0xa
00007C1C  AC                lodsb
00007C1D  AB                stosw
00007C1E  E2FC              loop 0x7c1c
00007C20  4B                dec bx
00007C21  75F1              jnz 0x7c14
00007C23  F4                hlt
00007C24  EBFD              jmp short 0x7c23
00007C26  EA057C0000        jmp 0x0:0x7c05
00007C2B  FC                cld
00007C2C  BB0200            mov bx,0x2
00007C2F  B800B8            mov ax,0xb800
00007C32  8EC0              mov es,ax
00007C34  8CC8              mov ax,cs
00007C36  8ED8              mov ds,ax
00007C38  31FF              xor di,di
00007C3A  BE267C            mov si,0x7c26
00007C3D  B92600            mov cx,0x26
00007C40  B40A              mov ah,0xa
00007C42  AC                lodsb
00007C43  AB                stosw
00007C44  E2FC              loop 0x7c42
00007C46  4B                dec bx
00007C47  75F1              jnz 0x7c3a
00007C49  F4                hlt
00007C4A  EBFD              jmp short 0x7c49
00007C4C  0000              add [bx+si],al
00007C4E  0000              add [bx+si],al
...

This program has two the same halves. The major half from offset
0x7c00 to offset 0x7c25 are executable bytes. The 2d half from
offset 0x7c26 to 0x7c4b are the records bytes read by the executable
bytes. The executable phase of the code has an outer loop that makes employ of
the register BX because the counter variable. It sets BX to 2 so that
the outer loop iterates twice. In every iteration, it reads recordsdata
bytes from the 2d half of the program and prints them. The code
to read bytes and print them is amazingly much like our earlier program.
For the reason that recordsdata bytes within the 2d half are the same to the
executable bytes within the major half, printing the records bytes twice
portions to printing all bytes of the program.

While this program does protect away from learning the bytes that the CPU
executes, the records bytes see exactly adore the executable bytes.
Although I end now not scrutinize any point in attempting to protect away from learning
executable bytes in an issue adore, this program serves as an
instance of a self-printing boot program that would now not raise out the
bytes it reads.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like