The NT Insider: Stop Interrupting Me - of Pics And Apics
Creation time: 2005-03-16
Article attribute: translation
Article submission:
Tombkeeper (T0MBKEEPER_AT_HOTMAIL.COM)
The NT Insider: Stop Interrupting Me - of Pics And Apics
Dong Yan translation
Greatdong_2001@163.com
Although Windows design is on a variety of platforms, most of us is actually used in 32-bit X86 systems. As we were seen in the latest THE NT INSIDER ("Don't Call US - Calling Conventions for THE X86", V10N1), when trying to analyze Crash Dump, the thorough understanding of the X86 builds a call frame. The benefit is unparalleled - especially when there is no symbol. When writing a driver, the programmer becomes part of the operating system, and deeply understands the platforms running in the operating system helps to develop and debug.
Remember, always use HAL to write the platform-related code. But this does not mean that you can not know anything about the platform system. We have written a series of articles to explore some insideracts of the platform. These insiders are what we are interested in it is also what we think that every driver developer should know. The following article is the first in this series of articles. We assume that the reader understands the basic knowledge of the device and also knows something about Windows drivers: interrupt service programs, IRQLS, and other related things.
The interrupt
If there is no input and output, the CPU can do nothing is good (duh! There is no need to enter the output). When the device status changes, or because it is to transmit data, or because some other external conditions need to be noted, the device sets a device-related register of the device. In order to detect changes in the device state, the driver should be tested repeatedly. But this efficiency is too low. Another solution is that it can generate an interrupt to be notified to the system asynchronously when the device status changes. By using interrupts, we can ignore the existence of the device when we do not use the device, so you don't have to waste the clock cycle and improve the overall performance of the system.
Since the interrupt is an indivisible part of the device and device drive development, we will explore the internal operation of the interrupt. At the same time, we also wish to explain the process of notifying this from the device generating interrupt to device driver. Our interpretation is to stand in the drive developer, this means that we will not influence the details of hardware. Therefore, if the reader is hardware, familiar with the connection of the interrupt controller, please don't complain that we didn't talk about the difference between 8259 and 8259A-2, and how to program the OCW4 or CPU second time INTA What data exchange is made when it is effective. But not we don't know, (of course, no absolute), just from the starting point of this article, we don't need to manage these things.
Interrupt Descriptor Table, INTERRUPT DESCRIPTOR TABLE
It is important to understand the interrupt, first understand the "Interrupt Descriptor Table (IDT)". Simply put, IDT is an array of function pointers. The function in each array member either points to an interrupt handler (called Interrupt Service Routine in the drive development) or pointing to an exception handler. Here we only care about the interruption, it is ignored. The IDT index is "interrupt vector", the interrupt vector is a UCHAR value. Note that the number of interrupts and abnormal handles is limited to 256. Each CPU has its own IDT, and can be viewed by the window of Windbg's kdex2x86.idt command. The following code is selected to use this command from the output of this command on my test system. 1: KD>! KDEX2X86.IDT
IDT for Processor # 0
...
DD: 80AC2AC2 (NT! _kiunexpectedInterRupt173)
DE: 80AC2ACC (NT! _kiunexpectedInterrupt174)
DF: 80AC2AD6 (NT! _kiunexpectedInterrupt175)
E0: 80AC2AE0 (NT! _kiunexpectedInterrupt176)
E1: 804E0084 (Hal! HalpipiHandler)
E2: 80AC2AF4 (NT! _kiunexpectedInterrupt178)
E3: 804DFDD8 (HAL! HALPLOCALAPICERRORSERVICE)
E4: 80AC2B08 (NT! _kiuNexpectedInterrupt180)
...
When the device generates an interrupt, how does the interrupt vector becomes the index of the CPU's IDT, and then call the corresponding terminal service program? Hey, you will mention the hardware interrupt controller. Let's take a look at the implementation of the two Intel "Programmable Interrupt Controller" (PIC), one is a traditional 8259 PIC, and the other is a more advanced "Advanced PIC". We also have to explore Windows interrupt handling mechanisms, making us understand how PIC and APIC are used in the real world.
THE 8259
At first IBM PC used Intel's 8259 PIC. 8259 The system only supports a single processor and only 8 "interrupt request lines" of IRQ0-IRQ7 enable the device to interrupt, so you can only process up to 8 interrupts. Thereafter, the second 8259 was added to the system, and this 8259 is connected to the main 8259 level through three levels (CAS0-Cas2). The second 8259 INT line also connects to the IRQ2 of the main 8259, so that the newly added 8 IRQs will then reduce an IRQ for the connection, there is a total of 15 IRQs. Figure 1 is a simplified schematic.
Figure 1
All modern motherboards either use two physics 8259 chips, or use other chips to simulate these two chips. Ok, historical class is over, let's go deep into a detail.
Each device that needs to be interrupted needs to obtain a unique IRQ and then connect to 8259. In order to generate an interrupt specified IRQ, the device is to enable the corresponding IRQ line on the bus.
8259 is assigned a priority for each IRQ, the larger IRQ0, the lower the number, the lower the priority. Therefore IRQ0 is the highest, IRQ15 is the lowest, right? Oh, no, it is not right. I still remember the second 8259 connected to IRQ2? The actual IRQ priority is:
IRQ0, IRQ1, (now Off to the second 8259) IRQ8-IRQ15, (now back to the first 8259) IRQ3-IRQ7 At first glance, it is not so obvious, but if you are smashing your head, you seem to be reasonable. of.
Each IRQ is "shielded", meaning can be disabled by programming 8259 "Interrupt Mask Register" (IMR). If the IRQ is blocked, the device interrupts the device interrupted by this IRQ is ignored. Further, the high-priority IRQ ratio of the low priority IRQ is first served, and the high priority IRQ can interrupt the low priority IRQ. Therefore, if the IRQ0 is again when the service IRQ1 is, the processing stopped by IRQ1 interrupt, and the IRQ0 interrupt is sent to the CPU.
It is important to do, although there is hardware priority, the Windows system does not actually use it. The Windows system plus its own priority mechanism by direct manipulating IMR (see SideBar at this text Conclustration) for 8259.
Now we know how to connect to 8259 and 8259 when connecting to how to connect together, how is the 8259 connected to the CPU?
The CPU of the X86 system has two interrupt lines, LINT0 and LINT1. In the 8259 configuration, the LINT1 is connected to "Non-Maskable Interrupt" (NMI), which generates NMI when a serious, potential, unrecoverable error is detected. The reason why "Non-Maskable Interrupt" is because there is no way to block it - in addition to the processor, no one can block it. A typical example of the NMI is a memory check error.
LINT0 is used as "Interrupt Input Line" (INTR), which is connected to the INT feet of the main 8259. 8259 The interrupt is notified to the system by enabling INT. When the CPU is confirmed, 8259 sends an 8-bit value to the CPU through the bus (programmed by the O / S to PIC). This 8-bit value is the interrupt vector of the corresponding IRQ. This interrupt vector is used as an index of the IDT to determine the address of the interrupt service program (ISR). Then, the CPU jumps to the ISR to serve the processing required for this interrupt.
Hey, it is still difficult to look. But if you need to interrupt more than 15 more than 15? Then you have to share it. At this time, ISRS will debut. In this case, the OS calls the first ISR registered to the interrupt vector, and the interrupt notifies it. Suppose this interrupt is an interrupt of a Level-Triggered (such as a line-based interrupt on the PCI bus). OS will call all ISRs associated with the interrupt vector number until the one that returns True, returning True means this Interrupt is for this device. There are some problems with the shared interrupt because the ISR written will make the system hang up. In fact, even written ISRs may hang out the system. To get an explanation of this problem, you can see
Http://www.microsoft.com/hwdev/platform/Proc/apic.asp Article. Moreover, 8259 cannot be used in multiprocessor systems, this is too time, because now we use the desktop computer with two CPUs. Still use APIC.
The APIC
Usually, the APIC is actually actually made from two parts: "LAPIC" and "IOAPIC). Each of the system (logically) CPU generally has a LAPIC on a piece. Therefore, if the system has four CPUs, there are four Lapics (note, because each logical processor has a LPIC, so if there are two ultra-threaded CPUs there will be four LPICs). IOPIC is part of the Intel chipset, and the Pentium IV and later systems can have any number of IOAPICs. The CPU before Pentium IV is limited to 8. An IOPIC can be designed to support up to 64 "Interrupt Your Cord" (INTINS), but most standard systems IOAPICs are 24 intins. INTINS is the same as the IRQS in 8259. In other words, each device generating an interrupt will get an intin. As you can see, in the APIC configuration, you can use sufficient intins to reduce shared issues in 8259. Ok, in order to save time, let's take a closer look at the two parts of the APIC. Lapic
As mentioned earlier, LAPIC is generally on the actual CPU, and the CPU after Pentium has. Other vendors may also not use APICs as part of the CPU. No matter what, remember the two lines of LINT0 and LINT1? In the APIC configuration, these lines are coupled to LAPIC, and the LAPIC is connected to the system bus (the Intel CPU before Pentium IV actually has a separate APIC bus, but we ignore these hardware details). All CPUs in the system are made, such as the "Interprocessor Interrupts, IPIS) to be delivered from a LAPIC from a LAPIC from a LAPIC, as shown in Figure 2.
Figure 2 - the lapic and oapic
It is necessary to pay attention to an important register in Lapic, ie "Task Priority Register" (TPR). The operating system can set the priority of the CPU run by setting the TPR to a certain value. These priorities have the lowest priority from 0 to 15, 0. Interrupt intin is a vector from 16 to 255, when the interrupt is sent from the IOAPIC to LAPIC, the priority of the interrupt is calculated by the following formula:
Priority = CEIL (Vector / 16)
The x86 has predefined a range from 0 to 31, so the OS is defined from 31 to define the device interrupt. When the request is interrupted, if the obtained priority is less than or equal to the current TPR value of the target processor LAPIC, it will not take effect on that CPU. Such OS can control the interrupt priority by controlling the interrupt vectors assigned to the interrupt handler in the IDT and writing the corresponding value to the LAPIC TPR when performing these handles. The higher the vector value assigned to the OS, the higher the priority of the interrupt.
IOAPIC
In order to make this paper relatively simple, we only discuss 24 INTINS single IOAPIC systems. IOAPIC is also connected to the system bus, but is not directly connected. It is actually connected to a bridge (also part of the Intel chipset), and this bridge has been connected to the system bus (see Figure 3). Just 8259, all the devices that need to be interrupted in the system are connected to an intin line on the bus. However, the difference is that each INTIN line does not have an implicit priority. Remember, in the APIC, the priority is processed by the interrupt vector number and the TPR in the LAPIC. There is a register called I / O Redirection Table (ioredTBL) in IOAPIC. Every intin has a 64-bit ioredTBL entry and each value describes the corresponding interrupt. The IRedTable entry describes the interrupt's pro-processor. This interrupt is the EDGE triggering or the interrupt Polarity, the interrupt is shielded and the vector associated with this interrupt. Of course, OS must be responsible for filling the corresponding value into this table to match the features of the IDT and the device connected to the INTINS.
About ioredTBL has some interesting questions, such as a bit mask used to describe each interrupt CPU relatives can only be 8-bit wide. Ugh! We still don't care about these complex things. We are limited to discuss 8 or less CPUs.
Aside: PCI to IOAPIC
Before we discuss how the interrupt is handled, you should first look at how the PCI bus is connected to IOAPIC. This problem is itself written a good article, but as long as you know that each PCI interrupt request line (pirqxs) is connected to an IOAPIC INTIN line. This can be a connection of a hard wiring or a connection that can be dynamically adjusted by the BIOS.
Handling apic-based Interrupts in Windows
Details I have told now can be used in any one of the PIC or APIC operating system. All OSS will have an IDT, which need to program the PIC to reflect the IDT, and must have an ISRS that is interrupted. Because this is the NT Insider, this article will not be a lot, unless we want to link all of these to Windows actually handle interrupts on the X86. Since 8259 is about to out, we only discuss the APIC.
OK, then what happened when a device issued an interrupt? Suppose there is a Level-Triggered device connected to INTIN16, the OS is 0x42 for its assignment. When the device is interrupted, it makes INTIN16 valid, then ...
The effectiveness of INTIN16 will bring us to IOAPIC INTIN16. If IOAPIC sees that the interrupt is not blocked, a message is generated on the system bus through the bridge chipset.
We use simplified situations, there is an idle CPU to see this interrupt, then jump to the interrupt handler pointing at the index 0x42 in the IDT. Remember, here you don't have to go back to the IOAPIC as 8259 to check the ISR on that vector, vector is part of the message sent from the system bus.
Because the CPU knows that we have to handle an interrupt, it is called the EIP, ESP, SS, and CS register stacks before calling Windows General Interrupt Dispatcher pointed to by IDT [0x42].
General Windows Interrupt Dispatcher Press EBP, EAX, EBX, ECX, EDX, EDI, ESI, ES, DS, FS, or GS register. In order to verify the correctness of the above steps, we set a breakpoint in a function in the function it wants to call after General Dispatcher in which all required registers are pressed. As a result, the KichainedDispatch function is called for the IDT table item with chained isrs, and the Non-chained is called KiINTERRUPTDISPATCH (these function names are obtained with a less intelligent disassembler: set one in ISR After breakpoint, you can see them on the stack). Some stacks are as follows:
NT! kiINTERRUPTDISPATCH 0x89 (FPO: [0, 2] TrapFrame @ F42F3C30)
Hal! KfloweriRQL 0x35 (FPO: [0,0,0])
NT! KESETPRIRITYTHREAD 0XC2 (FPO: [NON-FPO])
NT! PSPEXITTHREAD 0X9C (FPO: [NON-FPO])
So what will it be? We can see that Generic Windows Interrupt Handler calls KiInterruptdispatch, but what other things on the stack do? Remember, interrupts may happen at any time, this interrupt occurs when thread exits. We did not discuss what happened to the code that was interrupted on the CPU during interruption, and now the time has arrived. When the break is delivered, a "trap frame" is created, which saves the status of the CPU before the interrupt (with the reader said: "AHA! This is why the CPU will put the register in the CPU before calling my ISR! "). In Windbg, ". Trap" command allows us to set our context to the context of the TRAME provided. Let's use the value next to the trapframe of KiInterruptdispatch.
1: kd> .trap F42f3c30
Errcode = 00000000
EAX = 00000000 EBX = 00000000 ECX = 00000000 EDX = FFDFF538 ESI = 81e32DA8 EDI = 00000009
EIP = 804E049D ESP = F42F3CA4 EBP = F42F3CB4 IOPL = 0 NV UP EI PL ZR NA PO NC
CS = 0008 ss = 0010 DS = 01ff es = 01ff fs = 077a GS = 7F30 EFL = 00000246
Hal! KfloweriRQL 35:
804E049D 3BC8 CMP ECX, EAX
1: KD> KB
*** Stack TRACE for Last Set Context - .thread / .cxr Resets IT
Childebp Retdr args to child
F42F3CA0 80A35AD8 8214C228 81E32DA8 00000000 HAL! KfloweriRQL 0x35
F42F3CB4 80BB295A 00E32DA8 00000010 81E32DA8 NT! KESETPRIORITYTHREAD 0XC2
F42F3D40 80BB3360 00000000 00000000 F6A79680 NT! PSPEXITTHREAD 0X9C, our current situation is where it is interrupted! So I guess that when we returned from the interruption, if we recover the context of the CPU to this trap frame, no one knows the interrupt.
Once the required registers are saved, we are in a kixxxRoutines, and Windows gets the priority of the interrupt. With Windows, this is called an interrupt "Interrupt Request Level" (IRQL - Read as Er-quel). HAL decides when to give the device resources and give the device what kind of IRQL. Note that this is nothing to do with the physical intin line used to generate an interrupt, and cannot be changed. Device IRQL (DIRQL) is related to the interrupt priority of the device (IRQL high interrupt priority is high), and therefore it is related to the value fills the LAPIC TPR.
Windows The following is to upgrade the current CPU's IRQL (IRQL is a PER-CPU concept) to the DIRQL given by HAL. After that, the CPU can only be interrupted by a higher IRQL interrupt.
Now we are on DIRQL, we have guaranteed the same interrupt from the same CPU, so the execution of the current CPU, the execution of the ISR is synchronized. Windows must now get a lock to ensure that the ISR's execution is to synchronize with other CPUs in the system. Because Intin's share, ISR associated with this interrupt may have multiple, each ISR must get its own lock before ISR execution, and then release the lock after executing the ISR. Once Windows gets the lock, we can perform ISR.
For interrupts in Windows, each IDT vector also has a linked list of a PkInterRupt object. Because this structure is semi-disclosed, all domains of this structure cannot be accessed, but it can use the "DT" command of the debugger to come out:
1: KD> DT NT! _KINTERRUPT
0x000 TYPE: INT2B
0x002 Size: int2b
0x004 InterruptListentry: _List_ENTRY
0x00c ServiceReroutine: PTR32
0x010 ServiceContext: Ptr32 Void
0x014 Spinlock: UINT4B
0x018 Tickcount: UINT4B
0x01c actuallock: PTR32 UINT4B
0x020 dispatchaddress: PTR32
0x024 Vector: UINT4B
0x028 Irql: uchar
0x029 SynchronizeiRQL: Uchar
0x02a floatings: uchar
0x02b Connected: Uchar
0x02c Number: char
0x02d Sharevector: uchar
0x030 mode: _kinterrupt_mode
0x034 ServiceCount: UINT4B 0x038 Dispatchcount: UINT4B
0x03C Dispatchcode: [106] UINT4B
It can be seen from the definition of the structure that is actually linked by the InterruptListentry domain. Moreover, it is also possible to see the vector from which the ISR is separated, the corresponding DIRQL and ISR must be obtained before the execution of the spin lock. Because we have got this self-locking in the previous step, Windows will start from the first PKINTERRUPT and call its ISR (Structural serviceeroutine field).
ISR will check hardware to see if it is interrupted, we assume that it is interrupted. The ISR tells the device to stop disruption, may return a DPCForisR to the queue and return TRUE, indicating that the interrupt has been processed. Remember, our interrupt is Level-Triggered so the process can stop because we found the ISR that handles this interrupt. If the ISR returns false, Windows will consider multiple devices upstream of an intin, release the spin lock and move to the next pkinterrupt in the linked list. Then get the lock and call its ISR, repeat this process until the one returns True.
After the ISR returns, Windows will work in the following:
. Release the ISR lock.
. Recover the IRQL of the CPU to the IRQL before the interrupt occurs.
. Restore previous registers and issue a "Interrupt Return". We use IRET here because we want to let CPUs know that we are returned from ISR, so that the CPU will restore EIP, ESP, SS, and CS registers (ie, "Trap Frame" discussed earlier as discussed above).
Cheng! We just tracked the interrupt processing, from the device originally issued an interrupt from IOAPIC Intins until the ISR of the respective driver.
Conclusion
There are still many of the topics discussed in this article, but I guessed that before I explained more details, I must write the NT Insider Bathroom Bible. If the flame of the reader is extinguished at this moment, I strongly recommend that the reader finds an Intel Architecture Manuals and searches your favorite debugger. You will never know what you will find! (Translation: Let me think of "Life is like a box of chocolate ..." It seems that readers should be used as Agump!).
------------------------------------------
Wait! Doesn't Irq == Interrupt Priority?
Under Windows, it is like this, although the hardware document may be different. Depending on the configuration, the ProgramMable Interrupt Controller (PIC) is either used or an Advanced Programmable Interrupt Controller (APIC). If you read this question about this question, you will know that the APIC's intin line does not implicitly prioritized, but the IRQ line of the PIC has an implicit priority. This may make some people think that the IRQ (that is, PIC priority) is a direct relationship with IRQL (interrupt priority). However, this is wrong. This kind of misunderstanding is very common, and now I forget it! The location of the device is connected to the PIC, the method, and the reason are absolutely related to the device priority of the device in WINDOWS.
For example, assuming that the system uses a PIC. The device X is connected to IRQ3, and the device Y is connected to IRQ7. If Windows uses the priority of the PIC, the interrupt of X always takes precedence over Y because the IRQ priority of X is higher (hardware interrupt). However, Windows does not distinguish between the hardware interrupt priority of the PIC. Therefore, the case of IRQ1 is higher than IRQ3 is normal. On Windows, the IRQ of the device will never explain the urgency of the device interrupt. This is the case if the PIC is also APIC. Can you change? Oh, it is really not. Windows is so dry.