Skelix OS Tutorial
Prev Tutorial 01: Bootstrap Next

Our Goal

In this tutorial, we are going to let system boot from a floppy disk and print "Hello World!" on screen.

Download source file

Memory Addressing

The processor organizes and accesses memory as a 8-bit sequence, every byte in memory should be located by an unique address, called physical address, the range of the address can present is called an address space.

There are two common ways for memory addressing: segmentation and paging, and they both will be used in Skelix.

Segmentation is familiar to us, even go back to the simple and quiet old days: DOS age. Because all registers are 16-bit long at that time, so we can only access a 2^16 = 65536 bytes long memory space directly, That was not enough for hungry programmers. So Intel uses two 16-bit registers as Segment:Offset combination to present a physical address, it still use a 16-bit register to present offset in one segment, and the 16-bit Segment register indicates which segment we are using, it can present 2^16 = 65536 segments. The good thing is we can keep our code, data and stack in separate segments, it prevents the code, data and stacks mingle together. It sounds like using this scheme we can have 64K segments and address 64K byte in each segment, so we finally have the ability to access a 2^16*2^16 = 4G bytes memory space, it sounds really great, but it is also so not true.

Well, it works sort of tricky, the segment register has to be shifted 4-bit to the left at first, then add the offset register to it. For example, pair 7C00:0189 will give us the physical address 7C189 instead of 7C000189. Note here all memory addresses are given in hexadecimal.

 7C000
+ 0189
-------
 7C189

Now we can caculate the largest value it can present FFFF:FFFF

 FFFF0
+ FFFF
-------
10FFEF

It is about 1M + 65519 bytes, because 80386 uses a 20-bit memory address bus actually (it will be discussed in the following tutorial), so the exceed 65519 byte memory space are wrapped around to physical address 0. For example address 100010 is mapped to address 10, accessing 100010 is the same as accessing address 10.

The other problem about this scheme is there are different ways to refer to the exact same physical address, like 07C0:0000 and 0000:7C00 both indicate the physical address 00007C00.

The another way for memory addressing is linear address scheme, 32-bit linear address is used in this scheme, it will be discussed in details in later tutorial.

Booting Process

After power-up or RESET, an initialization will be performed on processors, it sets registers to a known state (note here, it is not a known value) and places the processor in real-mode. Then the processor will execute the instruction at physical address FFFFFF0 which usually is a far JMP instruction which set by EPROM. You might wonder how segment:offset pairs can present a physical address FFFFFF0, actually there is an invisible part in CS register, it stores a base address as FFFF0000, and the CS will be loaded with F000 during this initialization, but this just happens during the reset, after that they work in ordinary way.

Once the BIOS takes full control, it then try to load the operating system. Because the BIOS has no idea of the OS you are using: Windows, Linux or other wired stuff like Skelix :), so creating an environment required by the OS will be a tough work for BIOS. So the BIOS left this work to OS itself, after POST, BIOS just load the first sector of the boot driver into a fixed location, that is, physical address 00007C00, then the code start at 00007C00 takes control and starts to creating the environment needed by the OS.

So the BIOS must find a 512 byte sector on the drive which it boots from and the sector must be ended with 0xAA55, which is a flag means this sector is a valid boot sector. Skelix boots from floppy disk.

You should keep it in mind, at startup, the processor is in Real mode and uses segment:offset pairs to access a 1MB memory space without memory privilege protections. And we can use BIOS interrupts at this stage, even though they are not used in Skelix any more after the following tutorial.

First Cry

Here we will look through our first code snippet:01/first.cry/bootsect.s         .text.text marks the start of the code section.
        .globl   startThe .globl symbol works like extern in C tells assembler treat start as a external symbol.
        .code16GCC normally uses only 32-bit operands and addresses by default configuration settings, .code16 tells GCC compile it in 16-bit operands and addresses mode.
start:
        jmp      start   
A busy iteration is the only work to this code.
.org    0x1fe,   0x90
.word   0xaa55
.org indicates the start address of the following data, that is 1FE. Here we write a AA55 flag at 1FE, that is 510 in decimal, and the gap before that should be filled with 90 in hexadecimal, that is the assembly instruction NOP which does nothing.

Okay, now we get our first source code file done, we have to compile it and make it work. To get this work automatically, a Makefile will be required. How to write and use Makefile will not be covered in this tutorial, you can find lots of stuff about it on Internet. Compiler options are what I am going to focus on.

01/first.cry/MakefileAS=as
LD=ld
as and ld are assembler and linker that GCC uses
.s.o:

${AS} -a $< -o $*.o >$*.map

all: final.img

final.img: bootsect
    mv bootsect final.img

bootsect: bootsect.o
    ${LD} --oformat binary -N -e start -Ttext 0x7c00 -o bootsect $<
--oformat indicates what kind of target format to be generated, binary means no header and other informations just raw flat binary file, sort of like .com file in DOS. Without this option, ld will use ELF format as default(well actually it depends on your system and GCC setting) in general which is not what we want, because when BIOS load the boot sector, there is no environment for the execution of ELF files. -N option may not be necessary at here, but for the convenient of further programs it makes the text section to be readable and writable because I did not distinguish the text section and data section so there will be some write action in text section in later tutorials. -e start indicate this code should be executed from start symbol. -Ttext 0x7c00 makes the text section has a base address 7C00 which is the start address of boot sector in memory, so all addresses refer to text section will be added an offset 7C00, for example start symbol will be at address 7C00 instead of 0 or some where at 8XXXXXXX under Linux.

first cry make
After the execution of make, we can get the image file final.img, which should be exact 512 bytes long.

Now we are going to use WMWARE to create a new virtual machine like this:
creating vmware machine
The important part is the memory is 4MB and an 100MB IDE hard driver at 0:0, these options will be important in the future. Because this virtual machine will be reused so make these options selected correctly or you have to modify the source code about memory management and file system by yourself. Now load final.img as the floppy image and let vmware boot from floppy disk first.

Then we power on this virtual machine, we can get a black screen:
first cry result: black screen
and that is correct because we just let it keeps jumping.

Hello World!

Okay, I have to admit the program in First Cry is not funny enough, so in this section we are going to let it print "Hello World!" cliche on screen.
01/hello.world/bootsect.s        .text
        .globl  start
        .code16
start:
        jmp     $0x0,   $code
We use a jmp instruction to jump over the data which stored in text section, that's why I use -N option in last Makefile
msg:
        .string "Hello World!\x0"
code:
        movw    $0xb800,%ax
        movw    %ax,    %es
        xorw    %ax,    %ax
        movw    %ax,    %ds
Make segment registers DS and ES have right values, segment ES refer to segment B800, as I mentioned about segment:offset addressing, it locates the memory space starts from B8000, which is the video memory for color graphics adapter, the display on screen directly reflect the change in this area, for example in a 80x25 screen, the first character at position 0x0, it refers to memory address B8000, and its color attribute refers to address B8001, if we change the content at address B8000 to 0x31 which is letter '1', and B8001 to 0x07, then we can get a black background and white foreground letter '1'.
        movw    $msg,   %si
        xorw    %di,    %di
Sets correct address values to registers SI and DI for the MOVSB instruction.   
        cld
        movb    $0x07,  %al
Sets the character color attributes: black background and white foreground
1:
        cmp     $0,    (%si)
        je      1f   
        movsb
        stosb
        jmp     1b
1:      jmp     1b
Fills in the B8000 area with "Hello World!" string and corresponding color attributes.
.org    0x1fe,  0x90
.word   0xaa55

We can use the same Makefile as in First Cry section
Hello world reuslt

Subject:

Your Name:

Your Email Address:

Comments:


Prev Home Next
Up