Skelix OS Tutorial
Prev Tutorial 08: Memory Management Next

My Goal

Not achieved. Failed on give each task a virtual 4GB memory space. Just enable paging mechanism and trigger the page fault exception. Huge disappointing.

Download source file

Paging

The memory management facilities of 386 processors provide us the ability of isolating memory space and memory protection for each task, and the best for the last, each task can access 4GB memory.

Those facilities can be divided into two parts: segmentation and paging. Segmentation has been used by us from the first tutorials, it allows tasks have separate code, data and stack modules. Paging allows mapping memory into disks as demanded, we are going to use it in this tutorial.

Because we do not have 4GB physical memory for each task, so we have to use somthing else to virtualize the memory space, this mechanism is handled by the processor's paging  mechanism. It divide each segment into pages (4KB is the size we are going to use), each page can be stored in disk or memory. The OS traces states of those pages via page directory and page table. The page directory stores the information of page tables, and page tables stores the information about pages.

When paging is enabled, the process translate a given address by tasks to physical address by the following steps:
Locates the descriptor we are using in GDT or LDT by current selecotor, then do the privilege and limit checking to insure this access is allowed.
Adds the base address of stored in descriptor to get a linear address.
Divides the linear address by page size to get the page number it uses
Checks this page is presented or not, if not then a page fault exception occures
The exception function will get a free page for storage or load it from disk by demanding.
Because it is an exception, so the processor re-execute the instruction caused this exception, and this this time the page is presented in memory.

The data structures used by the processor is page directory and page table, they both are an array of 32-bit entries.
Page table entry
The first graph is the format of an entry of page directory, the second one is an entry of page table. As you can see, they have quite similar formats, Let's take a look at the common part among them.
Bit  0 P Indicates whether the page or page table being pointed to is currently loaded in physical memory. When p=0, then the page is not in memory and a page fault exception occurs when there is any attempts of accessing this page.
Bit  1 R/W indicates whether one page or pages(in the case of a page directory entry) are read only(=0) or can be written to (=1)
Bit  2 U/S Indicates the privileges of one page or pages(in the case of a page directory entry), when they are in supervisor level(=0), then only PL0-2 can access them, or in user level(=3), then every task can access them
Bits 3, 4, (6), 7, 8 X Intel reversed, just set them to zero
Bit  5 A It the page or pages have been accessed
Bits 9-11 User Defined We are going to use bit 11 to indicate whether the page is disk, when it is not presented
PageTable's Physical Address in page directory entry stores the most significant 20-bit address of the page table it points to. Because it only stores 20-bit, so the page table must be 4K aligned. The 31-12 bits in the page table entry stores the most significant 20-bit address of the page it points to, because it has 20-bit, so it can present 2^20 = 1M pages, so that spans 1M*4K = 4GB memory spaces. The D bit in page table entry indicates whether this page's content has been changed, it is useful when swap out this page to disk, if it has not been modified and if is loaded from disk originally, then we can just abandon this page simply instead dumping it to disk.

For translating the logical address to physical address, the logical address is divided into three parts:
Bits 31-22 It is the entry index of the page directory, we can get the physical address of the page table it points to
Bits 21-12 It is the entry index of the page table, we can get the physical address of the page it points to
Bits 11-0 Indicates the offset in the page
For example, we have a logical address 0x3E837B0A, we check out its first 10-bit, that is  0x0FA, so it refers to the 0x0FA entry in page directory, say it starts from 0x0005C000, then we check out the first 20-bit of this entry for the address of the page table, say it is 0x0003F000, then we check out the second 10-bit of the logical address, that is 0x037, so we check out the first 20-bit of 0x037 entry in page table which starts at 0x0003F000, then we can get the physical address of the page, say it is 0x0001B000, then we get the last 12-bit of the logical address that is 0xB0A, finally we add it to the physical address of the page, so we can get the physical address of 0x3E837B0A is 0x0001B000+0xB0A = 0x0001BB0A.
But there is one problem, how can we find the start of this thread, the answer there is a new register for page directory, that is CR3, it stores the physical address of the page directory which is currently using, so it is also called PDBR.
page translation
The CR3 register must be loaded before enable paging, its value can be changed by MOV instruction or being loaded by the CR3 field in TSS structure during task switching.

Whenever the processor access an entry which has non-present bit or a privilege failure happens, the page fault exception function executes. CR2 stores the logical address which causes this exception, an error code also be pushed into stack, it has following format:
error code for page fault
The exception handler usually do the following steps
Find a free page in memory or load it from disk
Set the corresponding page directory and page table entries to correct value.
Invalidate TLBs. The processor stores the most recently used page directory and page table entries in a caches called Translation Lookaside Buffers(TLBs), so accessing page directory and page tables only happens when those entries do not exist in TLBs. Whenever we modified the content of page directory or page table we have to invalidate TLBs, then TLBs will abandon the old content, but TLBs is transparent to us, so we to individual ways to invalidate them, by MOV a new value into CR3 or by the task switching.

Let's take a look at the code snippet, some constants are defined
08/include/mm.h#define PAGE_DIR    ((HD0_ADDR+HD0_SIZE+(4*1024)-1) & 0xfffff000)
let's page directory after IDT table, page directory must be 4K aligned, that's why this macro looks sort of long
#define PAGE_SIZE    (4*1024)
#define PAGE_TABLE    (PAGE_DIR+PAGE_SIZE)
Let's page table after page directory
#define MEMORY_RANGE (4*1024) We use 4MB memory in Skelix

08/mm.cstatic char mmap[MEMORY_RANGE/PAGE_SIZE] = {PG_REVERSED, };
This is the map of physical memory
void
mm_install(void) {
    unsigned int *page_dir = ((unsigned int *)PAGE_DIR);
    unsigned int *page_table = ((unsigned int *)PAGE_TABLE);
    unsigned int address = 0;
    int i;
    for(i=0; i<MEMORY_RANGE/PAGE_SIZE; ++i) {
        /* attribute set to: kernel, r/w, present */
        page_table[i] = address|7;
        address += PAGE_SIZE;
    };
Initializes all page table entries from 0-4M
    page_dir[0] = (PAGE_TABLE|7);
Because one page directory entry can present 4MB memory, so we just set up the first entry in page directory
    for (i=1; i<1024; ++i)
        page_dir[i] = 6;
The next 1023 page directory entries, 1024 entries can refer to a 4GB memory space    /* set lower 1MB memory to used */
    for (i=(1*1024*1024)/PAGE_SIZE-1; i>=0; --i)
        mmap[i] = PG_REVERSED;
Because the kernel use the lower 1MB memory, so we make those pages reversed, so it can not be swapped out, that make them always present in memory
    __asm__ (
        "movl    %%eax,    %%cr3\n\t"
        "movl    %%cr0,    %%eax\n\t"
        "orl    $0x80000000,    %%eax\n\t"
        "movl    %%eax,    %%cr0"::"a"(PAGE_DIR));
To enable paging, just set the 31 bit in CR0, easy right???

We can easily find a free page in memory by searching array mmap
08/mm.cunsigned int
alloc_page(int type) {
    int i;

    for (i=(sizeof mmap)-1; i>=0 && mmap[i]; --i)
        ;

    if (i < 0) {
        kprintf(KPL_PANIC, "NO MEMORY LEFT");
        halt();
    }
    mmap[i] = type;
    return i;
}

void *
page2mem(unsigned int nr) {
    return (void *)(nr * PAGE_SIZE);
}

void
do_page_fault(enum KP_LEVEL kl,
              unsigned int ret_ip, unsigned int ss, unsigned int gs,
              unsigned int fs, unsigned int es, unsigned int ds,
              unsigned int edi, unsigned int esi, unsigned int ebp,
              unsigned int esp, unsigned int ebx, unsigned int edx,
              unsigned int ecx, unsigned int eax, unsigned int isr_nr,
              unsigned int err, unsigned int eip, unsigned int cs,
              unsigned int eflags,unsigned int old_esp, unsigned int old_ss) {
    unsigned int cr2, cr3;
    (void)ret_ip; (void)ss; (void)gs; (void)fs; (void)es;
    (void)ds; (void)edi; (void)esi; (void)ebp; (void)esp;
    (void) ebx; (void)edx; (void)ecx; (void)eax;
    (void)isr_nr; (void)eip; (void)cs; (void)eflags;
    (void)old_esp; (void)old_ss; (void)kl;
    __asm__ ("movl %%cr2, %%eax":"=a"(cr2));
    __asm__ ("movl %%cr3, %%eax":"=a"(cr3));
    kprintf(KPL_PANIC, "\n  The fault at %x cr3:%x was caused by a %s. "
            "The accessing cause of the fault was a %s, when the "
            "processor was executing in %s mode, page %x is free\n",
            cr2, cr3,
            (err&0x1)?"page-level protection voilation":"not-present page",
            (err&0x2)?"write":"read",
            (err&0x4)?"user":"supervisor",
            alloc_page(PG_NORMAL));
}
This exception handler does nothing else but printing the information about this exception.

Then we can allocate memory dynamically, new_task changed in this way
08/init.cstatic void
new_task(unsigned int eip) {
    struct TASK_STRUCT *task = page2mem(alloc_page(PG_TASK));
    memcpy(&(task->tss), &(TASK0.tss), sizeof(struct TSS_STRUCT));

    task->tss.esp0 = (unsigned int)task + PAGE_SIZE;
    task->tss.eip = eip;
    task->tss.eflags = 0x3202;
    task->tss.esp = (unsigned int)page2mem(alloc_page(PG_TASK))+PAGE_SIZE;
    task->tss.cr3 = PAGE_DIR;
    task->priority = INITIAL_PRIO;
    task->ldt[0] = DEFAULT_LDT_CODE;
    task->ldt[1] = DEFAULT_LDT_DATA;

    task->next = current->next;
    current->next = task;
    task->state = TS_RUNABLE;
}

Now, let's add mm_install to init.c, and don't forget modify the corresponding line in exceptions.c, then trying to access memory address beyond 4MB
08/init.cvoid
init(void) {
    char wheel[] = {'\\', '|', '/', '-'};
    int i = 0;

    idt_install();
    pic_install();
    mm_install();      /* <<<<< Her it is */
    kb_install();
    timer_install(100);
    set_tss((unsigned long long)&TASK0.tss);
    set_ldt((unsigned long long)&TASK0.ldt);
    __asm__ ("ltrw    %%ax\n\t"::"a"(TSS_SEL));
    __asm__ ("lldt    %%ax\n\t"::"a"(LDT_SEL));

    kprintf(KPL_DUMP, "Verifing disk partition table....\n");
    verify_DPT();
    kprintf(KPL_DUMP, "Verifing file systes....\n");
    verify_fs();
    kprintf(KPL_DUMP, "Checking / directory....\n");
    verify_dir();

    sti();
    new_task((unsigned int)task1_run);
    new_task((unsigned int)task2_run);
    __asm__ ("movl %%esp,%%eax\n\t" \
             "pushl %%ecx\n\t" \
             "pushl %%eax\n\t" \
             "pushfl\n\t" \
             "pushl %%ebx\n\t" \
             "pushl $1f\n\t" \
             "iret\n" \
             "1:\tmovw %%cx,%%ds\n\t" \
             "movw %%cx,%%es\n\t" \
             "movw %%cx,%%fs\n\t" \
             "movw %%cx,%%gs" \
             ::"b"(USER_CODE_SEL),"c"(USER_DATA_SEL));
    __asm__ ("incb 0xeeffeeff");         /* <<<< Here it is */
    for (;;) {
        __asm__ ("movb    %%al,    0xb8000+160*24"::"a"(wheel[i]));
        if (i == sizeof wheel)
            i = 0;
        else
            ++i;
    }
}
08/exceptions.cvoid
page_fault(void) {
    __asm__ ("pushl    %%eax;call    do_page_fault"::"a"(KPL_PANIC));  /* <<< Here it is */
    halt();
}

Finally, add mm.o to KERNEL_OBJS
08/MakefileKERNEL_OBJS= load.o init.o isr.o timer.o libcc.o scr.o kb.o task.o kprintf.o hd.o exceptions.o fs.o mm.omake t8

mm result

Subject:

Your Name:

Your Email Address:

Comments:


Prev Home Next
Up