By Susam Pal on 27 Oct 2005
The next 12-byte program gentle of pure x86 machine code
writes itself to favorite output when accomplished in a DOS environment:
fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3
We are able to write these bytes to a file with the .COM extension and
raise out it in DOS. It runs successfully in MS-DOS 6.22, Windows 98,
as smartly as in DOSBox and writes a reproduction of itself to favorite output.
Contents
- Demo
- Quine Conundrums
- Steady Quines
- A Show on DOS Services and products
- Writing to Video Reminiscence At as soon as
- Boot Program
Demo
On a Unix or Linux machine, the next commands demonstrate this
program with the relieve of DOSBox:
echo fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3 | xxd -r -p> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM
The diff
expose would possibly well gentle fabricate no output confirming
that the output of the program is similar to the program itself.
On an right MS-DOS 6.22 machine or a Windows 98 machine, we can
demonstrate this program within the next formula:
C:>DEBUG -E 100 fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3 -N FOO.COM -R CX CX 0000 :C -W Writing 0000C bytes -Q C:>FOO> OUT.COM C:>FC FOO.COM OUT.COM Comparing files FOO.COM and OUT.COM FC: no differences encountered
In the DEBUG
session confirmed above, we employ the debugger
expose E
to enter the machine code at offset 0x100 of
the code segment. Then we employ the N
expose to call
the file we deserve to write this machine code to. The expose R
is venerable to specify that we deserve to write 0xC (decimal 12)
CX
bytes to this file. The W
expose writes the 12 bytes
entered at offset 0x100. The Q
expose quits the
debugger. Then we lag the contemporary FOO.COM
program whereas
redirecting its output to OUT.COM
. Ultimately, we employ
the FC
expose to overview the 2 recordsdata and make positive
that they are precisely the the same.
Let us disasssemble this program now and scrutinize what it does. The
output below is generated the employ of the Netwide Disassembler (NDISASM),
a tool that includes Netwide Assembler (NASM):
$ ndisasm -o 0x100 foo.com 00000100 FC cld 00000101 B10C mov cl,0xc 00000103 AC lodsb 00000104 92 xchg ax,dx 00000105 B402 mov ah,0x2 00000107 CD21 int 0x21 00000109 E2F8 loop 0x103 0000010B C3 ret
When DOS executes a program in .COM file, it masses the machine code
within the file at offset 0x100 of the code segment chosen by DOS. That
is why we build a query to the disassembler to reflect a load cope with of 0x100
with the -o
expose line option. The major instruction
clears the route flag. The reason of this instruction is
explained later. The next instruction sets the register CL to 0xc
(decimal 12). The register CH is already location to 0 by default when a
.COM program starts. Thus environment the register CL to 0 effectively
sets the entire register CX to 0xc. The register CX is venerable as a
loop counter for the loop 0x103
instruction that comes
later. Everytime this loop instruction executes, it decrements CX
and makes a shut to bounce to offset 0x103 if CX is now not 0. This results
in 12 iterations of the loop.
In every iteration of the loop, the instructions from offset 0x103 to
offset 0x109 are accomplished. The lodsb
instruction masses
a byte from cope with DS:SI into AL. When DOS starts executing this
program, DS and SI are location to CS and 0x100 by default, so at the
starting DS:SI points to the major byte of the program.
The xchg
instruction exchanges the values in AX and DX.
Thus the byte we honest loaded into AL finally ends up in DL. Then we location AH
to 2 and generate the instrument interrupt 0x21 (decimal 33) to write
the byte in DL to favorite output. Right here is how every iteration reads
a byte of this program and writes it to favorite output.
The lodsb
instruction increments or decrements SI
searching on the deliver of the route flag (DF). When DF is
cleared, it increments SI. If DF is location, it decrements SI. We employ
the cld
instruction initially to determined DF, so
that in every iteration of the loop, SI strikes forward to conceal the
subsequent byte of the program. Right here is how the 12 iterations of the loop
write 12 bytes of the program to favorite output. In many DOS
environments, the DF flag is already in cleared deliver when a .COM
program starts, so the CLD instruction will more than most likely be uncared for in such
environments. Alternatively, there are some environments the place DF would possibly well now not
be in cleared deliver when our program starts, so it is far a most effective
instruct to determined DF sooner than counting on it.
Ultimately, when the loop terminates, we raise out the RET
instruction to end the program.
Quine Conundrums
While learning the outline of the self-printing program presented
earlier, one would possibly well presumably wonder whether it is far a quine. While there isn’t this form of thing as a
standardized definition of the time length quine, it is generally
licensed that a quine is a laptop program that takes no enter and
produces an right reproduction of its hang source code as its output. Since
a quine can not take any enter, methods inviting learning its hang
source code or evaluating itself are dominated out.
As an instance, this shell script is a valid quine:
s='s=47%s47;printf "$s" "$s"n';printf "$s" "$s"
Alternatively, the next shell script is now not regarded as a lawful
quine:
cat $0
The shell script above reads its hang source code which is thought of as
dishonest. Erroneous quines adore this are steadily known as dishonest
quines.
Is our 12-byte x86 program a quine? It turns out that we now hang got a
conundrum. There isn’t this form of thing as a conception of source code for our program.
There would were one if we had written out the source code of
this program in assembly language. In this form of case we would first
deserve to come to a decision an assembler and a lawful quine would deserve to fabricate
an right reproduction of the assembly language source code (now not the machine
code bytes) for the chosen assembler. But we must always now not doing that
here. We need the machine code to fabricate an right reproduction of itself.
There isn’t this form of thing as a source code enthusiastic. We supreme hang machine code. So we
would possibly well argue that the entire conception of machine code quine is nonsense.
No machine code quine can exist on myth of there isn’t this form of thing as a source code to
fabricate as output.
Alternatively, we would possibly well argue that the machine code is the enter for
the CPU that the CPU fetches, decodes, and converts to a chain of
deliver adjustments within the CPU. If we outline a machine code quine to be a
machine code program that writes its hang bytes, then we would possibly well instruct
that we now hang got a machine code quine here.
Let us now entertain the idea that our 12-byte program is certainly
a machine code quine. Now we now hang got a up to date conundrum. Is it a lawful
quine? This program reads its hang bytes from reminiscence and writes
them. Does that make it a dishonest quine? What would a lawful
quine written in pure machine code even see adore? If we see at
the shell script quine above, we scrutinize that it incorporates parts of the
executable phase of the script code embedded in a string as recordsdata.
Then we format the string cleverly to fabricate a up to date string that
appears to be like exactly adore the entire shell script. It is a general pattern
adopted in many quines. The quine would now not read its hang code but
it reads some recordsdata defined by the code and codecs that recordsdata to see
adore its hang code. Alternatively, in pure machine code adore this the
lines between recordsdata and code are blurred. Even if we strive to protect the
bytes we deserve to read at a separate place within the reminiscence and treat it
adore recordsdata, they’d see exactly adore machine instructions, so one
would possibly well presumably wonder if there would possibly be any point in attempting to make a machine quine
that would now not read its hang bytes. Nonetheless the next section
displays straight forward strategies to total this.
Steady Quines
If the idea of a machine code quine program learning its hang bytes
from the reminiscence makes you heart-broken, here is an adapation of the
outdated program that retains the machine instructions to be accomplished
separate from the records bytes to be read by the program.
fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3
fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3
Right here is how we can demonstrate this 40-byte program:
echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p> foo.com
echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p>> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM
Right here is the disassembly:
$ ndisasm -o 0x100 foo.com 00000100 FC cld 00000101 B302 mov bl,0x2 00000103 B114 mov cl,0x14 00000105 BE1401 mov si,0x114 00000108 AC lodsb 00000109 92 xchg ax,dx 0000010A B402 mov ah,0x2 0000010C CD21 int 0x21 0000010E E2F8 loop 0x108 00000110 4B dec bx 00000111 75F0 jnz 0x103 00000113 C3 ret 00000114 FC cld 00000115 B302 mov bl,0x2 00000117 B114 mov cl,0x14 00000119 BE1401 mov si,0x114 0000011C AC lodsb 0000011D 92 xchg ax,dx 0000011E B402 mov ah,0x2 00000120 CD21 int 0x21 00000122 E2F8 loop 0x11c 00000124 4B dec bx 00000125 75F0 jnz 0x117 00000127 C3 ret
The major 20 bytes is the executable phase of the program. The next
20 bytes is the records read by the program. The executable bytes are
the same to the records bytes. The executable phase of the program has
an outer loop that iterates twice. In every iteration, it reads the
recordsdata bytes and writes them to favorite output. Therefore, in two
iterations of the outer loop, it writes the records bytes twice. In
this kind, the output is similar to the program itself.
Right here is one more much less advanced 32-byte quine essentially based entirely mostly on this kind:
b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3
b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3
Right here are the commands to demostrate this quine:
echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p> foo.com
echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p>> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO> C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM
Right here is the disassembly:
$ ndisasm -o 0x100 foo.com 00000100 B82309 mov ax,0x923 00000103 FEC0 inc al 00000105 A22001 mov [0x120],al 00000108 BA1001 mov dx,0x110 0000010B CD21 int 0x21 0000010D CD21 int 0x21 0000010F C3 ret 00000110 B82309 mov ax,0x923 00000113 FEC0 inc al 00000115 A22001 mov [0x120],al 00000118 BA1001 mov dx,0x110 0000011B CD21 int 0x21 0000011D CD21 int 0x21 0000011F C3 ret
This case too has two parts. The major half has the executable
bytes and the 2d half has the records bytes. Each and each parts are
the same. This case sets AH to 9 within the major instruction and
then later makes employ of int 0x21
to invoke the DOS carrier that
prints a dollar-terminated string starting at the cope with specifed
in DS:DX. When a .COM program starts, DS already points to the
present code segment, so we don’t desire to location it explicitly. The
dollar image has an ASCII code of 0x24 (decimal 36). We would possibly well gentle be
careful about now not having this fee wherever one day of the the records
bytes or this DOS characteristic would prematurely end printing our recordsdata
bytes as soon because it encounters this fee. That’s why we location AL to
0x23 within the major instruction, then increment it to 0x24 within the
2d instruction, and then reproduction this fee to the top of the records
bytes within the third instruction. Ultimately, we raise out int
twice to write the records bytes twice to favorite output,
0x21
so that the output suits the program itself.
While both these programs take care now not to read the the same reminiscence
location that is being accomplished by the CPU, the records bytes they read
see exactly adore the executable bytes. Right here’s what I intended after I
mentioned earlier that the lines between code and recordsdata are blurred
in an issue adore this. Right here is why I fabricate now not in point of reality scrutinize a degree in
conserving the executable bytes separate from the records bytes whereas
writing machine code quines.
A Show on DOS Services and products
The self-printing programs presented above employ int 0x21
which affords DOS companies and products that relieve various enter/output
strategies. In the major two programs, we selected the characteristic to
write a persona to favorite output by environment AH to 2 sooner than
invoking this instrument interrupt. In the next program, we selected
the characteristic to write a dollar-terminated string to favorite output
by environment AH to 9.
The ret
instruction within the top too relies on DOS
companies and products. When a .COM program starts, the register SP incorporates
0xfffe. The stack reminiscence areas at offset 0xfffe and 0xffff
possess 0x00 and 0x00, respectively. Extra, the reminiscence cope with at
offset 0x0000 incorporates the instruction int 0x20
which
is a DOS carrier that terminates the program. As a result,
executing the ret
instruction pops 0x0000 off the stack
at 0xfffe and masses it into IP. This ends within the
instruction int 0x20
at offset 0x0000 getting accomplished.
This instruction terminates the program and returns to DOS.
Relying on DOS companies and products affords us a contented environment to work
with. Particularly, DOS implements the conception of favorite
output which lets us redirect favorite output to a file. This
lets us with ease overview the contemporary program file and the
output file with the FC
expose and make positive that they
are the same.
But one would possibly well presumably wonder if we would possibly well protect away from counting on DOS companies and products
fully and gentle write a program that prints its hang bytes to
conceal conceal. We positively can. We would possibly well write on to video reminiscence
at cope with 0xb800:0x0000 and instruct the bytes of the program on
conceal conceal. We would possibly well forgo DOS fully and let BIOS load our
program from the boot sector and lift out it. The next two sections
discuss these items.
Writing to Video Reminiscence At as soon as
Right here is an instance of an 18-byte self-printing program that writes
on to the video reminiscence at cope with 0xb800:0x0000.
fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd
Right here are the commands to create and lag this program:
echo fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p> foo.com
dosbox foo.com
With the default code internet page packed with life, i.e., with code internet page 437 packed with life,
the program would possibly well gentle conceal an output that appears to be like approximately adore
the next and discontinuance:
ⁿ┤╕Ä└1 ▒↕┤◙¼½Γⁿ⌠δ²
Now pointless to sigh the form of output appears to be like gibberish but there is a
like a flash and dirty solution to substantiate that this output certainly represents
the bytes of our program. We are able to employ the TYPE
expose
of DOS to print the program and check if the symbols that appear in
its output seem constant with the output above. Right here is an
instance:
C:>TYPE FOO.COM ⁿ┤╕Ä└1 ▒↕┤ ¼½Γⁿ⌠δ² C:>
This output appears to be like very much like the outdated one with the exception of that the
byte fee 0x0a is rendered as a line damage on this output whereas
within the outdated output this byte fee is represented as a circle in
a field. This formula haven’t got labored if there were any adjust
characters equivalent to backspace or carriage return that pause in
characters being erased within the displayed output.
A lawful solution to verify that the output of the program represents the
bytes of the program would be to salvage every image within the output in a
chart for code internet page 437 and make positive that the byte fee of every
image suits every byte fee within the program. Right here is one such
chart that approximates the symbols in code internet page 437 with Unicode
symbols: cp437.html.
Right here is the disassembly of the above program:
$ ndisasm -o 0x100 foo.com 00000100 FC cld 00000101 B4B8 mov ah,0xb8 00000103 8EC0 mov es,ax 00000105 31FF xor di,di 00000107 B112 mov cl,0x12 00000109 B40A mov ah,0xa 0000010B AC lodsb 0000010C AB stosw 0000010D E2FC loop 0x10b 0000010F F4 hlt 00000110 EBFD jmp short 0x10f
This program sets ES to 0xb800 and DI to 0. Thus ES:DI points to
the video reminiscence at cope with 0xb800:0x0000. DS:SI points to the
first instruction of this program by default. Extra AH is location to
0xa. Right here is venerable to specify the coloration attribute of the textual convey to be
displayed on conceal conceal. Every iteration of the loop on this program
masses a byte of the program and writes it alongside with the coloration
attribute to video reminiscence. The lodsb
instruction masses
a byte of the program from the reminiscence cope with specified by DS:SI
into AL and increments SI by 1. AH is already location to 0xa. The
fee 0xa (binary 00001010) here specifies murky because the background
coloration and shiny green because the foreground coloration.
The stosw
instruction stores a note from AX to the
reminiscence cope with specified by ES:DI and increments DI by 2. On this
formula, the byte in AL and its coloration attribute in AH will get copied to
the video reminiscence.
As soon as extra, when it is best to now not overjoyed regarding the program learning its hang
executable bytes, we can protect the bytes we read separate from the
bytes the CPU executes. Right here is a 54-byte program that does this:
fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4
0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8
8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd
Right here is how we can create and lag this program:
echo fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4 | xxd -r -p> foo.com
echo 0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8 | xxd -r -p>> foo.com
echo 8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc | xxd -r -p>> foo.com
echo 4b 75 f1 f4 eb fd | xxd -r -p>> foo.com
dosbox foo.com
With code internet page 437 packed with life, the output would possibly well gentle see approximately adore
this:
ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²
We are able to clearly scrutinize on this output that the major 27 bytes of output
are the same to the next 27 bytes of the output. Adore the lawful
quines discussed earlier, this one too has two halves that are
the same to every diversified. The executable code within the major half
reads the records bytes from the 2d half and prints the records bytes
twice so that the output bytes is an right reproduction of all 54 bytes in
the program. Right here is the disassembly:
$ ndisasm -o 0x100 foo.com 00000100 FC cld 00000101 B302 mov bl,0x2 00000103 B4B8 mov ah,0xb8 00000105 8EC0 mov es,ax 00000107 31FF xor di,di 00000109 BE1B01 mov si,0x11b 0000010C B91B00 mov cx,0x1b 0000010F B40A mov ah,0xa 00000111 AC lodsb 00000112 AB stosw 00000113 E2FC loop 0x111 00000115 4B dec bx 00000116 75F1 jnz 0x109 00000118 F4 hlt 00000119 EBFD jmp short 0x118 0000011B FC cld 0000011C B302 mov bl,0x2 0000011E B4B8 mov ah,0xb8 00000120 8EC0 mov es,ax 00000122 31FF xor di,di 00000124 BE1B01 mov si,0x11b 00000127 B91B00 mov cx,0x1b 0000012A B40A mov ah,0xa 0000012C AC lodsb 0000012D AB stosw 0000012E E2FC loop 0x12c 00000130 4B dec bx 00000131 75F1 jnz 0x124 00000133 F4 hlt 00000134 EBFD jmp short 0x133
This disassembly is moderately long but we can clearly scrutinize that the
bytes from offset 0x100 to offset 0x11a are the same to the bytes
from offset 0x11b to 0x135. These are the bytes we scrutinize within the
output of the program too.
Boot Program
The 32-byte program below writes itself to video reminiscence when
accomplished from the boot sector:
ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31
ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd
We are able to create a boot image that includes these bytes, write it to
the boot sector of a pressure and boot an IBM PC acceptable laptop
with it. On booting, this program prints its hang bytes on the
conceal conceal.
On a Unix or Linux machine, the next commands would possibly well additionally be venerable to
create a boot image with the above program:
echo ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31 | xxd -r -p> boot.img
echo ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p>> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img
Now we can check this boot image the employ of DOSBox with the next
expose:
dosbox -c cls -c 'boot boot.img'
We would possibly well additionally check this image the employ of QEMU x86 machine emulator as
follows:
qemu-system-i386 -fda boot.img
We would possibly well write this image to the boot sector of an right
bodily storage instrument, equivalent to a USB flash pressure, and then boot
the laptop with it. Right here is an instance expose that writes the
boot image to the pressure represented by the instrument
path /dev/sdx
.
cp a.img /dev/sdx
CAUTION: You’ll need to be entirely positive of the instrument path of the
instrument being written to. The instrument path /dev/sdx
is
supreme an instance here. If the boot image is written to the unfriendly
instrument, access to the records on that is per chance misplaced.
On sorting out this boot image with an emulator or an valid laptop, the
output would possibly well gentle see approximately adore this:
Ω♣| ⁿ╕ ╕Ä└î╚Ä╪1 ╛ |╣ ┤◙¼½Γⁿ⌠δ²
This appears to be like adore gibberish, alternatively every image within the above output
corresponds to a byte of the program mentioned earlier. For
instance, the major image (omega) represents the byte fee 0xea,
the 2d image (membership) represents the byte fee 0x05, and so forth.
The chart at cp437.html would possibly well additionally be
venerable to substantiate that every image within the output certainly represents
every byte of the program.
Right here is the disassembly of the program:
$ ndisasm -o 0x7c00 boot.img 00007C00 EA057C0000 jmp 0x0:0x7c05 00007C05 FC cld 00007C06 B800B8 mov ax,0xb800 00007C09 8EC0 mov es,ax 00007C0B 8CC8 mov ax,cs 00007C0D 8ED8 mov ds,ax 00007C0F 31FF xor di,di 00007C11 BE007C mov si,0x7c00 00007C14 B92000 mov cx,0x20 00007C17 B40A mov ah,0xa 00007C19 AC lodsb 00007C1A AB stosw 00007C1B E2FC loop 0x7c19 00007C1D F4 hlt 00007C1E EBFD jmp short 0x7c1d 00007C20 0000 add [bx+si],al 00007C22 0000 add [bx+si],al ...
The ellipsis within the top represents the remainder of the bytes that
incorporates zeroes and the boot sector magic bytes 0x55 and 0xaa within the
pause. They’ve been uncared for here for the sake of brevity.
When a laptop boots, the BIOS reads the boot sector code from the
first sector of the boot instrument into the reminiscence at bodily cope with
0x7c00 and jumps to this cope with. Most BIOS implementations bounce to
0x0000:0x7c00 but there are some implementations that bounce to
0x07c0:0x0000 instead. Each and each these jumps are jumps to the the same
bodily cope with 0x7c00 but this incompatibility poses an wretchedness for us
for the reason that offsets in our program rely on which bounce the BIOS
accomplished. In instruct to make poke that our program can lag with both
forms of BIOS implementations, we employ a most favorite trick of getting the
first instruction of our program raise out a bounce to cope with
0x0000:0x7c05 in instruct to prevail within the 2d instruction. This sets
the register CS to 0 and IP to 0x7c05 and we don’t desire to wretchedness
regarding the diversities between BIOS implementations anymore. We are able to
now fake as if a BIOS implementation that jumps to 0x0000:0x7c00
is going to load our program.
The leisure of the program is equivalent to the one within the outdated
section. Alternatively, there are some runt but crucial differences.
While the DOS environment ensures that AH and CH are initialized
to 0 when a .COM program starts, the BIOS affords no such guarantee
whereas loading and executing a boot program. Right here is why we employ the
registers AX and CX (as in opposition to supreme AH and CL) in
the mov
instructions to initialize them. In an identical device,
whereas DOS initializes SI to 0x100 when a .COM program starts, for a
boot program, we location the register SI ourselves.
As soon as you is per chance feeling heart-broken about calling the above program a quine
on myth of it reads its hang bytes from the reminiscence, we would possibly well need the
program read the bytes it desires to print from a separate place in
reminiscence. We end now not raise out these bytes. We supreme read them and duplicate
them to video reminiscence. The next 76-byte program does this:
ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8
8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8
00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00
b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd
Right here is how we can create a boot image with this:
echo ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8 | xxd -r -p> boot.img
echo 8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc | xxd -r -p>> boot.img
echo 4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8 | xxd -r -p>> boot.img
echo 00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00 | xxd -r -p>> boot.img
echo b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd | xxd -r -p>> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img
Right here are the commands to verify this boot image:
dosbox -c cls -c 'boot boot.img'
qemu-system-i386 -fda boot.img
The output would possibly well gentle see adore this:
Ω♣| ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²Ω♣| ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²
Right here is the disassembly of this program:
$ ndisasm -o 0x7c00 boot.img 00007C00 EA057C0000 jmp 0x0:0x7c05 00007C05 FC cld 00007C06 BB0200 mov bx,0x2 00007C09 B800B8 mov ax,0xb800 00007C0C 8EC0 mov es,ax 00007C0E 8CC8 mov ax,cs 00007C10 8ED8 mov ds,ax 00007C12 31FF xor di,di 00007C14 BE267C mov si,0x7c26 00007C17 B92600 mov cx,0x26 00007C1A B40A mov ah,0xa 00007C1C AC lodsb 00007C1D AB stosw 00007C1E E2FC loop 0x7c1c 00007C20 4B dec bx 00007C21 75F1 jnz 0x7c14 00007C23 F4 hlt 00007C24 EBFD jmp short 0x7c23 00007C26 EA057C0000 jmp 0x0:0x7c05 00007C2B FC cld 00007C2C BB0200 mov bx,0x2 00007C2F B800B8 mov ax,0xb800 00007C32 8EC0 mov es,ax 00007C34 8CC8 mov ax,cs 00007C36 8ED8 mov ds,ax 00007C38 31FF xor di,di 00007C3A BE267C mov si,0x7c26 00007C3D B92600 mov cx,0x26 00007C40 B40A mov ah,0xa 00007C42 AC lodsb 00007C43 AB stosw 00007C44 E2FC loop 0x7c42 00007C46 4B dec bx 00007C47 75F1 jnz 0x7c3a 00007C49 F4 hlt 00007C4A EBFD jmp short 0x7c49 00007C4C 0000 add [bx+si],al 00007C4E 0000 add [bx+si],al ...
This program has two the same halves. The major half from offset
0x7c00 to offset 0x7c25 are executable bytes. The 2d half from
offset 0x7c26 to 0x7c4b are the records bytes read by the executable
bytes. The executable phase of the code has an outer loop that makes employ of
the register BX because the counter variable. It sets BX to 2 so that
the outer loop iterates twice. In every iteration, it reads recordsdata
bytes from the 2d half of the program and prints them. The code
to read bytes and print them is amazingly much like our earlier program.
For the reason that recordsdata bytes within the 2d half are the same to the
executable bytes within the major half, printing the records bytes twice
portions to printing all bytes of the program.
While this program does protect away from learning the bytes that the CPU
executes, the records bytes see exactly adore the executable bytes.
Although I end now not scrutinize any point in attempting to protect away from learning
executable bytes in an issue adore, this program serves as an
instance of a self-printing boot program that would now not raise out the
bytes it reads.