The processor organizes and accesses memory as a 8-bit sequence, every
byte in memory should be located by an unique address, called physical
address, the range of the address can present is called an address
space.
There are two common ways for memory addressing: segmentation and
paging, and they both will be
used in Skelix.
Segmentation is familiar to us, even go back to the simple and quiet
old days: DOS age. Because all registers are 16-bit long at that time,
so we can only access a 2^16 = 65536 bytes long memory space directly,
That was not
enough for hungry programmers. So Intel uses two 16-bit registers as
Segment:Offset combination to present a physical address, it still use
a 16-bit register to present offset in one segment, and the 16-bit
Segment register indicates which segment we are using, it can present
2^16 = 65536 segments. The good thing is we can keep our code, data and
stack in separate segments, it prevents the code, data and stacks
mingle together. It sounds like using this scheme we can have 64K
segments and address 64K byte in each segment, so we finally have the
ability to access a 2^16*2^16 = 4G bytes memory space, it sounds really
great, but it is also so not true.
Well, it works sort of tricky, the segment register has to be shifted
4-bit to the left at first, then add the offset register to it. For
example, pair 7C00:0189 will give us the physical address 7C189 instead
of 7C000189. Note here all memory addresses are given in hexadecimal.
7C000 + 0189 ------- 7C189
Now we can caculate the largest value it can present FFFF:FFFF
FFFF0 + FFFF ------- 10FFEF
It is about 1M + 65519 bytes, because 80386 uses a 20-bit memory
address bus actually (it will be discussed in the following tutorial),
so the exceed 65519 byte memory space are wrapped around to physical
address 0. For example address 100010 is mapped to address 10,
accessing 100010 is the same as accessing address 10.
The other problem about this scheme is there are different ways to
refer to the exact same physical address, like 07C0:0000 and 0000:7C00
both indicate the physical address 00007C00.
The another way for memory addressing is linear address scheme, 32-bit
linear address is used in this scheme, it will be discussed in details
in later tutorial.
Booting
Process
After power-up or RESET, an initialization will be performed on
processors, it sets registers to a known state (note here, it is not a
known value) and places the processor in real-mode. Then the processor
will execute the instruction at physical address FFFFFF0 which usually
is a far JMP
instruction which set by
EPROM. You might wonder how segment:offset pairs can present a physical
address FFFFFF0, actually there is an invisible part in CS
register, it stores a base address as FFFF0000,
and the CS
will be loaded with F000 during
this initialization, but this just happens during the reset, after that
they work in ordinary way.
Once the BIOS takes full control, it then try to load the operating
system. Because the
BIOS has no idea of the OS you are using: Windows, Linux or other wired
stuff like Skelix :), so creating an environment required by the OS
will be a tough work for BIOS. So the BIOS left this work to OS itself,
after
POST, BIOS just load the first sector of the boot driver into a fixed
location, that is, physical address 00007C00, then the code start at
00007C00 takes control and starts to creating the environment needed by
the OS.
So the BIOS must find a 512 byte sector on the drive which it boots
from and
the sector must be ended with 0xAA55, which is a flag means this sector
is a valid boot sector. Skelix boots from floppy disk.
You should keep it in mind, at startup, the processor is in Real mode
and uses segment:offset pairs to access a 1MB memory space without
memory
privilege protections. And we can use BIOS interrupts at this stage,
even though they
are not used in Skelix any more after the following tutorial.
First Cry
Here we will look through our first code snippet:01/first.cry/bootsect.s
.text.text
marks the start of the code section.
.globl startThe
.globl
symbol works like extern in C tells assembler treat start
as a
external symbol.
.code16GCC
normally uses only 32-bit operands and addresses by default
configuration settings, .code16
tells
GCC compile it in 16-bit
operands and
addresses mode. start:
jmp start A
busy iteration
is the only work to this code. .org
0x1fe, 0x90
.word 0xaa55.org
indicates
the start address of the following data, that is 1FE.
Here we write a AA55 flag at 1FE, that is 510 in decimal, and the gap
before that should be filled with 90 in hexadecimal, that is the
assembly instruction NOP
which does
nothing.
Okay, now we get our first source code file done, we have to
compile it and make it work. To get this work automatically, a Makefile
will be required. How to write and use Makefile will not be covered in
this tutorial, you can find lots of stuff about it on Internet.
Compiler options are what I am going to focus on.
01/first.cry/MakefileAS=as
LD=ld as
and ld are assembler and linker that
GCC uses .s.o:
${AS} -a $< -o $*.o >$*.map
all: final.img
final.img: bootsect
mv bootsect final.img
bootsect: bootsect.o
${LD} --oformat binary -N -e start -Ttext 0x7c00 -o
bootsect $< --oformat
indicates what kind of target format to be generated, binary
means no header and other informations just raw flat binary file, sort
of like .com file in DOS. Without this option, ld will use ELF format
as default(well actually it depends on your system and GCC setting) in
general which is not what we want, because when BIOS load
the boot sector, there is no environment for the execution of ELF
files. -N
option may not be necessary
at here, but for the convenient of
further programs it makes the text section to be readable and writable
because I did not distinguish the text section and data section so
there will be some write action in text section in later tutorials.
-e
start indicate this code
should be
executed from start
symbol.
-Ttext
0x7c00 makes the text section
has a base address 7C00 which is the
start address of boot sector in memory, so all addresses refer to text
section will be added an offset 7C00, for example start symbol will be
at
address 7C00 instead of 0 or some where at 8XXXXXXX under Linux.
After the execution of make, we can get the image file final.img, which
should be exact 512 bytes long.
Now we are going to use WMWARE to create a new virtual machine like
this:
The important part is the memory is 4MB and an 100MB IDE hard driver at
0:0, these options will be important in the future. Because this
virtual machine will be reused so make these options selected correctly
or you have to modify the source code about memory management and file
system by yourself. Now load final.img as the floppy image and let
vmware boot from floppy disk first.
Then we power on this virtual machine, we can get a black screen:
and that is correct because we just let it keeps jumping.
Hello World!
Okay, I have to admit the
program in First
Cry
is not funny enough, so in this section we are going to let it print
"Hello
World!" cliche on screen. 01/hello.world/bootsect.s
.text
.globl start
.code16
start:
jmp
$0x0, $code We
use a jmp
instruction to jump over the data which
stored in text section,
that's why I use -N
option in last
Makefile msg:
.string "Hello
World!\x0"
code:
movw $0xb800,%ax
movw
%ax, %es
xorw
%ax, %ax
movw
%ax, %ds Make segment
registers DS and ES have right values, segment ES refer
to segment B800, as I mentioned about segment:offset
addressing, it locates the memory space starts from B8000, which is the
video memory for color graphics adapter, the display on screen directly
reflect the change in this area, for example in a 80x25 screen, the
first character at position 0x0, it refers to memory address B8000, and
its color attribute refers to address B8001, if we change the content
at
address B8000 to 0x31 which is letter '1', and B8001 to 0x07, then we
can get a black background and white foreground letter '1'.
movw
$msg, %si
xorw
%di, %di Sets correct
address values to registers SI
and DI
for the MOVSB
instruction. cld
movb
$0x07, %al Sets
the
character color attributes: black background and white foreground 1:
cmp
$0, (%si)
je
1f
movsb
stosb
jmp 1b
1: jmp 1b Fills
in the B8000 area with "Hello
World!" string and corresponding color
attributes. .org 0x1fe, 0x90
.word 0xaa55
We can use the same Makefile as in First Cry
section