http://blog.9cbs.net/xiaohan13916830/archive/2004/06/24/24863.aspx
Foreword
This article does not intend to discuss how to guide a multi-boot system program to guide different operating systems, but only plan to start from the perspective of the operating system, talk about how the computer starts from power, from nothing, will operate The system runs, which will be as detailed as possible to describe the transition from the real mode to the protection mode, and the purpose is only to be more information that can be enjoyed with the majority of enthusiasts. A little experience.
This article will take the PYOS system boot program in development as an example. PYOS is an experimental operating system under development. It does not intend to model the operating system in any of the current operations, but only want to write itself. A operating system from head to tail to learn knowledge, accumulate technology, if you are interested, welcome you to join!
This is a little experience in the learning process. If you find that there is a mistake or improper, I hope you will tell you.
First, what did the computer have done from power?
When the power button of the machine computer is pressed, the electrical signal line associated with this button will send an electric signal to the motherboard. The motherboard transmits this electrical signal to the power supply system. The power supply system starts working, and the entire system is powered. And send an electrical signal to the BIOS, notifying the BIOS power supply system ready to complete. Subsequent BIOS launched a program, hosting the host self-test, the main job of host self-test is to ensure that each part of the system has received power support, internal memory, other chips, keyboard, mouse, disk controller and some I / The O port is normal. Since then, the self-test program will control the control to the BIOS. Next, BIOS read the BIOS settings, get the order of the boot driver, then check until the drive that can be used to boot (or can be used to boot, including floppy disk, hard disk, disc, etc.), then call this drive Guide the magnetic disk guidance sector. How does BIOS know or tell which disk can be used to boot?
Second, know the boot process
BIOS puts the first sector (512b) of the inspection disk into memory, put it in 0x0000: 0x
7C
00 (see Figure 3), if the last two bytes of a sector are "55 aa", then this is a boot sector, which is a bootable disk. Usually this program of this size is 512B is called a boot program. If the last two bytes are not "55 aa", then BIOS checks the next disk drive.
Through the above expression I can summarize the characteristics of the following three-point boot procedures:
1. Its size is 512B, and it cannot be one byte by one byte because the BIOS reads only 512b to memory.
2. Its ends must be "55 AA", which is the sign of the boot sector.
3. It is always placed on the first sector of the disk (0 head, 0 track, 1 sector) because the BIOS reads only the first sector.
(Figure 1)
Therefore, when we write boot procedures, we must also pay attention to the above three principles, which meets the above three principles can be seen as a boot program, at least BIOS is thinking so, although it may be you will write A section does not have any actual code.
Because BIOS reads only one sector, it is 512 bytes of data to memory, this is obviously not enough, and now the operating system is large, so we must in the boot sector will have the core of the operating system on disk. Read into memory, then jump to the core part of the operating system to execute. Third, through the BIOS read disk sector
From the above description we can know that the boot program needs to read the operating system existing on the disk into memory, so we have to talk about it, how do you not pass the operating system (because there is still no operating system now) to read the magnetic district . Generally speaking, there are two methods to be implemented, one is the I / O port of directly read the disk, one is implemented by the BIOS interrupt. The former method is the lowest layer method (the latter method is also implemented on the basis of it), with extremely flexibility, can read the contents of the disk to anywhere in memory, but the programming is complex. The second method is a slight higher level of the previous method, sacrificing a flexibility, for example, it cannot read the contents of the disk to 0x0000: 0x0000 ~ 0x0000: 0x03FF. Why can't you read this? Here we will have to describe the interrupt processing mechanism of the CPU after power-on.
3.1 Bios Interrupt Processing
What is interrupted, I believe that people who have learned a computer will not be unfamiliar. If you don't understand the interruption, I don't know what it is recommended to look at "Computer Composition Principles" (Higher Education Press Tang Yufei), there is a very detailed description, and general There are also more compilation materials, so it is only intended to talk about BIOS to interrupt.
(Figure II)
From the above figure, we can clearly see that when the interrupt signal is generated, the interrupt signal generates an interrupt vector address via "Interrupt Address Forming Part", which is actually pointing to a pointer to an actual memory address, and this actual memory address Turning in a row of jump instructions (JMP) to the interrupt service program that actually handles this interrupt. This block is specifically used to process the memory that interrupts the jump is called an interrupt vector table. Where is this interrupt vector table in memory? What is the actual interrupt handler?
3.2 System's memory schedule (
1M
)
To answer the above two questions, we need to see how memory in the system is arranged. When the CPU is added, the initial
1M
The memory is arranged by BIOS for us, and each byte has a special place.
(Figure 3)
By the figure above, we can now very convenient question and answer the two issues proposed above. Since 0x00000 ~ 0x003FF is the interrupt vector table, the disk from the operating system cannot be read here because this will overwrite the interrupt vector table, and it is no longer able to read the disk content through the BIOS interrupt. You may say: I am calling first, read again. But the facts will call other interrupt assistances multiple times during the BIOS in the process of reading.
3.3 Using the BIOS 13 interrupt reading disk sector
With the previous description, we can formally describe how to read the disk sector through the BIOS interrupt. To read the disk sector, we need to use the BIOS 13 interrupt, and the 13 interrupt will make the value of several registers as its parameters, so we need to set the register first during the call 13 interrupt. So how do you set registers? What registers will be used? Please see it down:
AH register: When the function number is stored, when 2 is 2, it means to use the disk function.
DL Register: Save the drive letter, indicating which one drive
CH register: Save the head number, indicating which one of the magnetic head wants to read
CL register: Monitor fan area number, indicating that the start sector wants to read
AL register: Dosget value, indicating that the number of sectors to be read After setting these registers, we can use INT 13 instruction to call the BIOS 13 interrupt to read the specified disk sector, it will disk fan The area is read in ES: BX, so before calling it, we actually need to set the ES and BX registers to indicate the location where the data stores stored in memory.
Fourth, the access to the lower mode memory address in the protection mode
The write program is inseparable from memory access, however the memory accesses in the protection mode is completely different from the access in the real mode, where we will describe the access method of the memory mode. Of course, this is not intended to introduce all the memory access methods and mechanisms in the protection mode, only the transformations you need to go to the protection mode from the real mode, and complete memory access, please refer to "Intel User Manual", Of course, as Pyos's experiments, I will gradually describe in the later experiment report and experience. Now I don't describe the main reason is that I also experiment with PYOS, I'm not experimentally verified. I don't dare to discuss the conclusion, because in the preface, this article is just some of my experience. If I have not experiment, I have no experience, and I have not experienced it.
The words returned, we still first take a look at the memory access method in the real mode.
4.1 Memory Access in Real Mode
When the computer is powered up, in "real mode", there is a CR0 register in the computer, also known as the 0 control register, in which the lowest bit is 0th, called PM (protected mode: protection) Mode) bit, indicating that the CPU works under "real mode" when it is cleared. When it is set, it means that the CPU works under "protection mode". When the computer is powered up, it is cleared, and the computer is at this time is in "real mode".
Memory Accesss under "Real Mode" are composed of segment registers and offset, such as 0x: 0000: 0x0001, which often appears in the previous description, is a memory address in a real mode. The value in the value table time period in front of the semicolon, the value table behind the semicolon is the offset, the actual physical address is formed as shown below:
(Figure 4)
However, in the protection mode, the memory address is not formed by the method shown above. So how do it formed when it?
4.2 Memory address formation in protection mode
The memory address is more complicated in the protection mode, we must first divide three concepts: logical addresses, linear addresses and physical addresses. The physical address is very understandable, the logical address is also understood, the address used by the program. So what is linear address?
In fact, if you do not use the paging mechanism, the linear address is the physical address, which corresponds to the physical address, linear address 0, which is physical address 0. But we know, 32-bit CPU has 32 address lines, which is accessible:
= 4GB
Memory space, this is a big space! There are very few physical memory energy that now. How do you use 4GB space in a limited physical space? People divide the physical memory into many pages. When some pages are used, some pages are not used, and the page that is not used can be used to load 4GB space, which is called from linear addresses. Mapping of physical addresses, this is a multi-to-one mapping, which means pages in multiple linear spaces corresponding to a page in a physical space, I hope the following figure will help you understand such a paging mechanism.
(Figure 5)
The above is the simplest mapping method, which is called "direct connection" mapping, which can only be used to explain the problem, and in an actual operating system is usually "full interconnected" mapping, that is, linearity The page in the address can be in any page mapped to the physical address, as long as the physical address space is now free. However, the problem can be explained by the above figure. When the page 5 in the linear address needs to be accessed, the CPU converts it to the physical address through the address mapping mechanism and discovers the page 1 in the physical address. The CPU is then placed in the physical address page 1 in a place (virtual memory) on the hard disk, and then loads page 5 in the linear address into the physical memory page 1. Here, when you can distinguish what is linear address, what is a physical address. However, when the paging mechanism is not used, the linear address will be used by the CPU as a physical address, and the linear address will be placed directly on the address signal line of the CPU, and when we write the application, we usually use another An address-logical address also has a mapping mechanism similar to the above mechanism from logical address to linear addresses, but this mechanism is often referred to as "segment mode", which is completed by the operating system and CPU hardware. The task of the operating system is the assignment mapping table, and the task of the CPU hardware is to map in the mapping table. Such a mapping table is also known as "Descriptive Table" in the operating system writing, there are two important descriptors, one is "Global Descriptor Table (GDT)", the other is "local descriptor Table (LDT) ", the use of these two tables is different, but their usage is approximate, and we will describe the global descriptor table.
Speaking of the table, people who have learned the data structure know that it is a data structure. The global descriptor table is also a data structure. When this structure is placed in a continuous memory, it is called a table. . The table consists of the entry, the global descriptor table consists of its historical descriptor, and the simple term is called "descriptor", just because it is placed in the global descriptor table, it has become a full-term descriptor. This descriptor consists of 8 bytes. Let's take a look at its structure:
(Figure 6)
TYPE: Indicates that the type of this paragraph, the highest bit of the highest bit in the 4-digit, the table is that it is a data segment, the three digits of the corresponding, from left to right, e, W, a, ie data segment Type is: 0ewa. Where E represents the downward growth bit, set the downward growth, W represents a writable, and sets 1 surface of the ocean, which represents the access bit (if the CPU has access it, this bit will be set).
S: Time indicates that it is a code or data segment, indicated as system segment.
DPL: Represents the right level, from 00 ~ 11, total 0, 1, 2, 3 privilege level
P: 0 is to indicate that this profile is invalid and cannot be used.
AVL: Remove the system program
D / B: When 0, it means that it is a 16-bit segment, indicating that it is a 32-bit segment.
When G: is 0, the unit indicates that the unit is 1 byte, and the unit indicates the unit of the segment is 4KB, and the minimum 12 bits of the segment offset will not be detected in the segment limit. (This may now be unbearable, but I will explain now).
There are two parties in this, more interesting, one is "base address", one is "segment". The base address should be better understood that it gives an address in physical memory, for "segment", as the name suggests, is the limit of segment size. However, it is a bit special, and the maximum access to a segment is calculated by the following formula: Segment base address segment limit * segment unit = the maximum accessible address of this paragraph
If an offset address is greater than this maximum interview address, the CPU will generate an error interruption, so that you can access a program illegally access another program of memory space, which plays a protective role in memory, so "Protection mode" is named.
So, if the limit is 0, then the maximum possible access address of this section is the base address, so when the unit limit unit is one byte, the segment size of this segment is 1 byte; when the unit limit unit is 4KB Because the CPU will not detect the maximum left 12 bits of the offset, the 12-bit maximum may be 0xFFF, so this, this segment is 4KB, so:
(Different value 1) * Duan unit = this segment size
Now we can start to describe how to access memory in the protection mode down mode. The reason here is to emphasize "Section Mode" is because there is a previous memory access mode-page mode in the protection mode, which is responsible for converting the linear address to the physical address. "Page Mode" is also its segment mode. When it is not used, the linear address will be placed directly on the address line as a physical address. "Segment mode" is inevitable, so-called "pure page mode" is just a whole linear address as a whole, and there is no way to truly bypass "segment mode" because it is specified by the CPU memory access mechanism. This article only describes the paragraph mode, which is as mentioned earlier, I haven't done it yet.
We already know that the logical address from the program to the linear address is completed by "descriptor", and "descriptor" is placed in the descriptor table, then there are many descriptors in a descriptor table, in the end Which descriptor is selected? This is determined by a claim, which will point out that the first few descriptors in the table have a dedicated term to describe, often referred to as "segment selector". "Segment Selection" consists of 2 bytes, and let's take a look at what information it provides:
(Figure 7)
among them:
RPL: Indicates the privilege level, 00 ~ 11, and four privilege levels, as described above.
Ti: 0 When this is an selected subsection of the global descriptor table, indicating a partial descriptor table when 1 is 1.
The index value: used to indicate the second descriptor in the table. The index value has a total of 13 bits, so each descriptor table can have 8K entries, and a group item is as previously described, therefore a descriptor table is up to 64K.
I don't know if you pay attention to such a fact. If you select the last 3 position bit of "Segment Selection], this whole segment selector is actually a descriptor in the descriptor table! Here we can find that Intel's engineer is really very delicate, so arrangement, you can make the speed of choosing a descriptor, because the last 3 digits of the segment is clear and the descriptor table The base is added, and the physical address of a descriptor can be obtained immediately. You can get a descriptor directly through this address. So where is the base address of this descriptor table?
The base address for the descriptor table is the start address of this descriptor table in memory, that is, the memory address in which the first descriptor in the table is located, and the system is stored in two special registers, one for Store the base address of the global descriptor table, called "Global Descriptor Table Register (GDTR)", another base address for depositing a local descriptor table, called "local descriptor table register (LDTR)", Their structure is shown below: (Figure 8)
Among them, the table is limited to the size limit of the table, and its use is similar to the species described above, so it is not described herein.
In the protection mode, the segment register in the previous mode is still useful, but it is not in the base address used to store the segment, but is used to store "segment selection sub-", its name also become "segment selector Register ", when accessing memory, we need to give" Segment Select Subs ", not a section base address.
For example, I now want to use the second entry in the global descriptor table, that is, the second "segment descriptor", this "segment selector" needs to be constructed as follows:
RPL: 00, because we are now writing operating systems, working in 0 privileges
TI: 0, we use global descriptor
Code value: 01, we use the second global descriptor, the first global descriptor number is 0, the second is 1, and the binary representation is 01.
Therefore, our "segment selection" is: 0000 0000 0000 10000, i.e. 0x0008, so for the 0x0008: 0x0000 such a logical address, in the protection mode, it should be regarded as the second description in the global descriptor segment. The segment described in the context and the memory address of 0 is 0. How is the linear address of this logical address? Please see the following picture:
(Figure 9)
I believe that you can clearly see how a logical address is converted into a 32-bit linear address.
V. PYOS boot program
PYOS is an operating system in writing. It is a project in an experiment. I have talked about the purpose and motivation of writing, here, here, just talk about the contents of this document, talk about the writing of PYOS boot procedures In writing, refer to the Writing of the Linux 0.11 kernel boot program, but PYOS is not Linux-based, there are many different places in their boot programs. Let's take a look at the memory arrangement of the entire boot area of PYOS:
(Figure 10)
The above picture is the memory schema diagram of PYOS. It is also a bootstrap process. PYOS is a two-stage boot system. The first line is BOOT is read by the BIOS. Subsequent BOOT reads setup, setup reads the system program to the temporary area, then Move the System program to the top and establish a segment descriptor that points to the segment of the System program, and establish a GDT, then switch the CPU to the protection mode, then jump to the System program, to this PYOS system, the system program will be Pyos real system kernel. The data storage area in the figure is used to store parameters that need to be passed between boot, setup, and PYOS three programs.
The reason why the secondary guidance is mainly considering the convenience of future expansion, and the various prices are almost independent, and Boot or Setup can be rewritten to provide more boot mode. The SYSTEM program is because, as mentioned, the data cannot be read directly to the original interrupt vector change in the interrupt vector table. After the data is read, the interrupt is not called, and the program moves to the internal interrupt. Vector table, for the interrupt vector table in the protection mode, will be established by the System program and handed over to System, will be a complete memory. For the PYOS process memory schedule, ready to refer to Linux 0.11, the memory arrangement is as follows:
(Figure 11, source "Linux 0.11 kernel code full notes")
A process
64M
Space, 4GB /
64M
= 64, that is, the system's maximum number of processes is 64. Therefore, the segment of a segment is: 64MB, each process takes up two of the global descriptors, one is a data segment descriptor, one is a code segment descriptor, and the segment is 64MB.
Six, PYOS boot program source code
The entire source code for the PYOS boot program will be provided below, because System has not been completely completed, so this is just a simple printing of a character to show the boot work, and there is a more detailed note in the code. If it is still not clear Place, you can go to http://purec.binghua.com (pure C forum) operating system experimental area, view PYOS previous experiment reports, with a very detailed annotation and related principles, and describe how to compile and experiment.
Boot.asm 0.04
For pyos4
; xieyubo@126.com
;
This is the fourth version of Boot.asm. This version has a large change, refer to Linux 0.11 design.
First, it is pointed out that the version identification change is, in order to make it easier to modify, set the independent version number in the future.
And point out what system for use, this version is used for PYOS4 system
;
The memory allocation of this version is as follows
The memory is now 0x90000
The maximum end address is 0x9fffff
Maximum total of 64KB
All startup code is convenient to call within one segment
The startup code is divided into two parts, one is boot, one is setup, this is the design of Linux 0.11
But in fact, Boot will not move yourself to 0x90000, and directly jump to 0x90100.
; 0x90000 ~ 0x900FF (256B) system reserved to store some key data from BIOS
0x90100 ~ 0x904FF (1KB): Start storage of Setup, setup size is 1KB
[BITS 16]; compiled instructions
[ORG 0X
7C
00]
; ------------------------------------------------- ---------------------------------------------
JMP
Main
; ------------------------------------------------- ---------------------------------------------
Data definition
MSG DB "Loading Pyos ..."; Output Information
DB 13, 10, 0; 13 Represents the carriage return, 10 indicates the wrap, and 0 indicates the end of the string
BootSeg EQU 0x0000; Segment base address in Boot
Setupseg EQU 0x9000; Segment base address in SETUP
Setupoffset EQU 0x0100; offset in SETUP
Setupsize EQU 1024; SETUP size,
; Must be a multiple of 512
Bootdriver DB 0; Save the boot drive letter
; ------------------------------------------------- ---------------------------------------------
SHOWMESSAGE:
; The following program behavior displays the output information
MOV AH, 0x0E; Set display mode
MOV BH, 0x00; set page number
MOV BL, 0x07; set font properties
.NEXTCHAR:
Lodsb
OR Al, Al
JZ .Return
INT 0x10
JMP .nextchar
. Return:
RET
; ------------------------------------------------- ----------------------------------------------
Main
:
MOV [BootDriver], DL; get started drive letter
; The following program sets the data segment
Mov AX, BootSeg
MOV DS, AX
MOV Si, MSG
Call showMessage; display information
Read setup
; Read 0x90100 from the second sector of the disk
.readfloopy:
MOV AX, Setupseg
Mov ES, AX
MOV BX, Setupoffset
MOV AH, 2
MOV DL, [BootDriver]
MOV CH, 0
MOV CL, 2
MOV Al, Setupsize / 512; reading in sector (2 a total of 1KB)
INT 0x13
jc .readfloopy
; Save the start drive number at 0x90000
MOV Al, [BootDriver]
MOV [0], Al
Jump
JMP setupseg: setupoffset; --------------------------------------------- ------------------------------
Times 510 - ($ - $$) DB 0
DB 0x55
DB 0xAA
Setup.asm 0.04
For pyos4
; xieyubo@126.com
This setup program completes the boot work of Boot,
; Includes reading the system information from the BIOS in the specified location
; Initialize the GDT, LDT table, complete the conversion from the protection mode to real mode
The code of the real mode is also read by this program.
[BITS 16]
[Org 0x0100]
; ------------------------------------------------- ------------------------------------
JMP
Main
; ------------------------------------------------- ------------------------------------
Setupseg EQU 0x9000
Setupoffset EQU 0x0100
Setupsize EQU 1024; SETUP size 1KB,
; Must be a multiple of 512
SYSTEG EQU 0x0000
SYSTEMOFFSET EQU 0x0000
Systemsize EQU 1024; System 1kB,
This value must be a multiple of 512, and the actual value may not match
; Define the descriptor of the temporary GDT table below
A total of three segments, one empty segment is reserved by Intel, a code segment, a data segment
GDT_ADDR:
DW 0x7FFF; GDT table size
DW GDT; GDT table
DW 0x0009
GDT:
GDT_NULL:
DW 0x0000
DW 0x0000
DW 0x0000
DW 0x0000
GDT_SYSTEM_CODE:
DW 0x3FFF; paragraph limit (16K * 64KB = 64MB)
DW 0x0000
DW 0x
9A
00
DW 0x
00C
0
GDT_SYSTEM_DATA:
DW 0x3FFF
DW 0x0000
DW 0x9200
DW 0x
00C
0
; ------------------------------------------------- ------------------------------------
Waiting for the keyboard controller idle subroutine
EMPTY_8042:
IN Al, 0x64
Test al, 0x2
JNZ EMPTY_8042RET
; ------------------------------------------------- ------------------------------------
Main
:
Initialization registers because BIOS interrupts and CALLs use the stack or SS register
; When the CPU is started or reset, it is initialized by the BIOS, and now the segment transfer is performed, we need to reset
MOV AX, Setupseg
MOV DS, AX
Mov ES, AX
Mov SS, AX
MOV SP, 0xFFFF
; ------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------
What useful information should be read from the BIOS, it is still not sure, so this function block is temporarily skipped
; -----------------------------------------------
; 0x90000 (1B): Save the boot drive letter, deposited by the boot program
; ------------------------------------------------- -------------------------
Next, read the SYSTEM to the SETUP program.
Because the 0x00000 is now placing the BIOS interruption, I can't read the SYSTEM directly at 0x00000 directly.
Otherwise, the BIOS interrupt reading disk will not be called.
.readfloopy:
MOV AX, Setupseg
Mov ES, AX
MOV BX, Setupoffset setupsize
MOV AH, 2
MOV DL, [0]
MOV CH, 0
MOV CL, 1 1 setupsize / 512; start sector where SYSTEM is located
(The first 1 refers to the number of records from 1, the second 1 is the number of sector of the boot)
MOV Al, SystemSize / 512; read the number of sectors (2 sectors a total of 1KB)
INT 0x13
jc .readfloopy
; Next, the read SYSTEM is moved to 0x00000 position
CLD
MOV Si, Setupoffset setupsize
MOV AX, SystemSeg
Mov ES, AX
MOV DI, Systemoffset
MOV CX, SystemSize / 4
REP MOVSD
; Below begins to initialize the protection mode
CLI;
LGDT [GDT_ADDR]; a descriptor loaded with GDT
; Turn on the A20 address line below
Call Empty_8042
Mov al, 0xd1
OUT 0x64, Al
Call Empty_8042
MOV Al, 0xDF
OUT 0x60, Al
Call Empty_8042
The following settings enter the 32-bit protection mode operation
MOV Eax, Cr0
OR EAX, 1
MOV CR0, EAX
JMP DWORD 0x8: 0x0
; ------------------------------------------------- ------------------------------------
Times 1024 - ($ - $$) DB 0
System.asm 0.04
For pyos4
; xieyubo@126.com
This program will completely use 32-bit assembly code, which is the core module of the system.
[BITS 32]
[ORG 0x0]
; ------------------------------------------------- ---------------------------------------------
JMP
Main
; ------------------------------------------------- ---------------------------------------------
Main
:
Set the register
MOV AX, 0x10
MOV DS, AX
MOV CL, '1'
MOV [0xB8000], CL
MOV CL, 0x04
MOV [0xB8001], CL
JMP $
; ------------------------------------------------- ---------------------------------------------
One place in the above program is also mentioned in the previous experimental report. This is the problem of the A20 address line. It is very detailed in "Linux 0.11 kernel source code full note" in "Linux 0.11 kernel source code" for A20 address line. Description, the author also lists several other ways to open A20 address lines, and analyzes possible problems. This is a very good book, recommend everyone to read. Pure C Forum (http://purec.binghua.com) You can download the electronic version of this book (PDF format) or another other relevant resources can be found.
Below is the screenshot of the runtime, now it can only boot, what can't do it, I hope that next time it can do ~~
Reference
1. "IA-32 Intel? Architecture Software Developer's Manual Volume 3: System Programming Guide" (Intel 2001)
2. "Linux Core 0.11 Complete Notes" (Zhao Wei, 2003)
3. "Computer Composition Principle" (Tang Yushang, Higher Education Press)
Original: http://purec.binghua.com/Article/showArticle.asp? ArticleId = 81