asynchronous
Jack
Hong Kong University of Science and Technology
Introduction
In general, simple asynchronous calls are called mode: the initiator requests an asynchronous call, notifies the executive, and then handles other work, waiting for the executor in a synchronization point; the executor performs call Actual operation, notify the initiator after completion. It can be seen that there are two roles in asynchronous calls: the initiator and executor, they are objects that can actively run, we call the active object, and there is a synchronization point, the active object is coordinated in synchronization points. In this article, we discuss asynchronous calls mainly the general computer and multi-process multithreaded time-time operating systems. From the perspective of the operating system, the active object includes the ICs, threads, and hardware, etc., as for the interrupt, can be seen as a context of a process or thread to borrow the CPU. Synchronous operations can be done by operating systems: mutex, signal lights, etc.
We can first look at asynchronous calls in Windows (this article generally does not point out, it refers to the application of NT / 2000) read and write files. The ReadFile and Writefile in Windows provide asynchronous interfaces. Take readfile as an example,
Bool Readfile (Handle Hfile, LPVOID LPBUFEFER, DWORD NNUMBYTESTOREAD, LPDWORD LPNUMBYTESREAD, LPOVERLAPPED LPOVERLAPPED);
If the last parameter LPOVERLAPPED is not null, and the file is opened by file_flag_overlapped flag, then this call is asynchronous: readfile will return immediately if the operation does not complete (return false and getLastError () returns ERROR_IO_PENDING), then the caller can be in a The time waits for a function of WaitForsingleObject to wait for the operation (which may have been completed) to perform synchronization. After the operation is complete, you can call the result of the GetoverlappedResulT to get the result of the operation, such as whether it is successful, how many bytes, etc. The initiator here is an application, and the executor is the operating system itself. As for how the executor is executed, we will discuss it later. And the synchronization of the two is done through a Windows Event.
Repatriate this asynchronous process to abstract and expand, we can break up asynchronous calls to solve the problem to two: one is the power of execution, the other is the schedule of the active object. Simply, the former is how each active object (thread, process or some code) gets the CPU, the latter is how the active objects work together, and the process of ensuring the operation is correct. In general, both processes and threads can be directly scheduled by the operating system to obtain CPUs, and more fine particle size, such as some code scheduling, often require a more complex model (such as implementation inside the operating system, this time thread The particle size is too thick). The scheduling of active objects, when the participant is less, can be completed by basic synchronization mechanisms, in more complex case, may be more actual through a Schedule mechanism.
Power and scheduling
As mentioned earlier, asynchronous calls mainly need to solve two problems: the power and execution scheduling. The most common situation is that a primary process called the caller process (thread), one or more workers' processes (threads), through the synchronization mechanism provided by the operating system to complete asynchronous calls. This synchronization mechanism is in the case of extension, is one or more fence Barrier, corresponding to execution points for each synchronization. All active objects that need to be synchronized in this execution point will wait for the corresponding Barrier until all objects are completed. In some simplified situations, such as workers don't care about the caller's synchronization, then this Barrier can simplify the signal light. In the case of only one worker, it can simplify a Windows event Event or condition variable Condition variable. Now consider the complicated situation. Suppose we use some threads to collaborate to complete a work, there is a restriction between the execution of each thread, and the operating system is the scheduler of this work, which is responsible for scheduling the appropriate thread at appropriate time to obtain the CPU. Obviously, one thread in concurrent execution is essentially asynchronous for another thread. If there is a call relationship between them, it is an asynchronous call. The operating system can make the appropriate thread by the basic synchronization mechanism to be scheduled, and other unfinished threads are in the waiting state. For example, we have 4 threads A, B, C, D to complete a job, where the order limit is A> B; C> D, ">" means that the thread on the left must be performed first on the right. And ";" means that the two threads can be performed simultaneously. At the same time, it is assumed that one operation of B needs to be called C to complete, obvious, at this time, this operation is an asynchronous call. We can set a synchronization point at each ">" location and then pass through a signal light. Thread B, C waits for the first signal light, and D will wait for the second signal light. The power and scheduling of this example is done by the basic mechanism of the operating system (thread scheduling and synchronization mechanism).
Abstract this process, you can describe: Several active objects (including code) coordination to complete a job, by a scheduler, in fact, this scheduler may just be some scheduling rules. Obviously, the process or thread can obtain the CPU as long as it is scheduled, so we mainly consider the code (such as a function) to be executed. Use the worker thread to call this function is clearly an intuitive and universal solution. In fact, this method is very common in user space or user mode. In the kernel mode, the CPU can be obtained by an interrupt, which can be completed by the registration IDT entry and the trigger soft interrupt. IC on the hardware device is another source of power. The scheduling of the active object is the most basic and the various synchronization mechanisms mentioned earlier. Another common mechanism is a callback function. It should be noted that the callback function generally occurs different contexts with the arductor, such as the different threads of the same process, this difference will bring some limits. If you need a callback to occur in the caller's process (thread) context, you need some of the APC mechanisms under Signal or Windows under UNIX, which we will explain later. So what do you do in the callback function? The most commonly used, combined with the synchronous mechanism, of course, is released a mutex, signal light, or Windows Event (condition variable), etc., so that other objects waiting for synchronization can be scheduled and re-executed, actually It can be seen as a notification scheduler (operating system) certain active object (waiting synchronization) can be re-scheduled, thereby reinterping the scheduler. But for another scheduler, the participation of synchronous objects may not require synchronization objects during this process. In some extreme examples, scheduling or even requires strict order. In practical applications, depending on the limitation of the environment, the implementation of asynchronous calling power and scheduling can have a big difference. We will explain in the following examples.
Asynchronous asynchronous in the operating system: asynchronous I / O
Windows NT / 2000 is a preemptive timing operating system. Windows scheduling unit is a thread, and its I / O architecture is completely asynchronous, that is, the I / O synchronization is actually done based on asynchronous I / O. When a user-state thread requests an I / O, it will cause a running status to the transition from user mode to Kernel Mode (the operating system maps the kernel to 2G-4G address of each process, is the same for each process. of). This process is done by interrupting some of the SYSTEM services output, for example, ReadFile actually executes ntreadfile (ZwreadFile), you need to note that the run context is still the current thread. NTREADFILE is implemented based on the asynchronous I / O framework of the Windows kernel, which is completed with the assistance of I / O Manager. It should be pointed out that I / O Manager is just an abstract concept constructed by a number of APIs, and does not have a real I / O Manager thread running.
Windows I / O drivers are hierarchical stacked. Each driver provides a consistent interface for initialization, cleanup, and function calls. The call is called based on the I / O Request Packet (I / O Request Packet, IRP) instead of using the stack as a normal function call. The operating system and PNP manager initialize and clean up the appropriate drivers in accordance with the appropriate timing according to the registry. When the general function is called, the IRP will specify a function call number and the corresponding context or parameter (I / O Stack location). A driver may call another driver, which may be synchronized (thread context does not change), or may be asynchronous. NTREADFILE is realized, which is generally one or more IRPs to the top of the driver, and then waits for the completion of the corresponding event (synchronized), or directly returns (the case with overlapped), which is executed in the initiated thread. When the driver is processed IRP, it may be completed immediately, or it can be done in the interrupt, for example, a request is sent to the hardware device (usually written I / O Port), when the device is completed, it triggers Interrupt, then the operation result is obtained in the interrupt processing function. Windows has two types of interrupts, interrupts and soft interrupts of hardware devices, divided into several different priorities (IRQL). Soft interrupts have two main: DPC (Delayed Procedure Call) and APC (Asynchronous Procedure Call) are at a lower priority. The driver can register the ISR (Interrupt Service Routine) for hardware interrupts, usually modifying an entry for an entry of the IDT. Similarly, the operating system will also register the appropriate interrupt process for the DPC and APC (also in IDT). It is worth noting that the DPC is related to the processor, and each processor will have a DPC queue, and the APC is related to the thread. Each thread will have its APC queue (actually includes a Kernel APC queue and user APC queues, their scheduling policies are different), you can imagine that the APC is not a strict interrupt because the interrupt may happen in any thread context, which is called an interrupt, mainly because IRQL improvements ( From Passive to APC), the scheduling of the APC is generally carried out in the case of thread switching and the like. When the interrupt occurs, the operating system calls the interrupt processing routine. For the ISR of the hardware device, the general processing is the off device interrupt, issues a DPC request, and then returns. There is not much CPU time in the interrupt processing of the device, mainly considering whether it may lose other interrupts. Since the IRQL interrupt from the hardware device is higher than the DPC interrupt, the DPC in the ISR will block until the ISR returns IRQL back to a lower level, the DPC interrupt will be triggered, and the DPC interrupt will be used to read data from the hardware device and refurbishment. Request, open interrupts and other operations. ISR or DPC may perform in any interrupted thread context, the context of the truth is invisible, which can be considered that the system borrows a time slice.
In general, there are two main powers in the asynchronous I / O architecture of Windows, one is the thread that initiates the request, and some kernel code will execute in this thread context, and the second is ISR and DPC, which will be interrupted in the interrupt. The context of any thread may be used in all threads. Scheduling common use of callbacks and events, such as when requested by the driver to the next layer, you can specify a completion routine Completion Routine. When the driver is driver, this routine will be called. And often in this routine, it is a simple triggering an event. It can also be mentioned by Linux. Linux 2.6 also has a similar interrupt mechanism, which has more soft interrupt priority, which is SoftIRQ of different priorities, and similar to DPC, Linux also provides a special soft interrupt, corresponding to DPC is Tasklet. Linux does not have a hierarchical driver architecture like Windows, so its asynchronous I / O is slightly rough, mainly through some of the previous blocking points, now returning to-Eiocbretry, let the caller continue to try again . In this method, it can be considered that the entire operation is done by a function. When there is progress each time, the function is executed from the head, and the part that has been completed will no longer have actual I / O. Such a biggest benefit is that the original file system and driver do not have to be overwritten. For synchronous calls, as long as blocking is, it is possible to make a small changes to the system. At this time, to provide the semantics of POSIX AIO, you may need to provide some user threads to complete the retries (recalling that Windows can be done by interrupts and DPC). For Solaris, it is also a similar process. If the device supports asynchronous I / O, it can be completed by interrupts, otherwise the internal LWP is used to simulate.
Application: Design of an asynchronous HTTP server
Suppose we want to design an HTTP server, its design objectives include: high concurrency, streamline (part support HTTP / 1.1), supports the PLUG-IN structure. There may be this demand in many cases. Overall, the HTTP server can be classified than a multi-thread-based operating system: OS schedule, each working thread is performed, while the working thread provides services (that is, processing HTTP requests). On this basis, the main consideration is that the size of the scheduling particle size is too large, and the sexy will decrease, and the particle size is too small to cause efficiency due to task switching (considering the content switching), this is another one. Commercial results. Similar to Apache (as well as other HTTP servers), we can divide an HTTP process into several states based on these states that can construct a state machine for HTTP processing. In this case, we can use the processing of each state as the scheduling particle size. A schedule process is: a working thread removes an HTTP_CONTEXT structure from the global task queue; complete the appropriate processing according to the current status; then set the next state according to the state machine; then put it back to the global task queue. In this case, several HTTP status can constitute a complete HTTP processing process through this scheduling policy. Obviously, one state can be considered asynchronous for the call to the next state process. The design of an HTTP state machine is shown below.
Figure 1. HTTP state machine
The function of the working thread is actually two operations: take an http_context from the status queue, call the HTTP_CONTEXTEXT service () function, and resolve. On this architecture, it is easy to introduce the mechanism of asynchronous I / O and PLUG-IN. In fact, we can also use an event (such as Select / Pol) to simulate an asynchronous I / O, and a user thread is used in the implementation. For asynchronous I / O and PLUG-IN, we also use the retry scheme similar to the AIO in Linux 2.6, while the callback function is used when it is completed. In a state, if the system requires I / O operation (RECV or Send), an asynchronous I / O is requested (asynchronous I / O provided by the operating system or asynchronous I / O) by the user thread simulation), this time The corresponding http_context will not be returned to the status queue, and in the callback function of I / O will be resembled to the status queue to get the chance of re-scheduled. HTTP_CONTEXT will check the I / O state when it is re-scheduled (this can be done by some flag bits). If it has been completed, the process then sets the next state, re-scheduled, otherwise a new I / O request can be reclaimed. Plug-I can also use a similar solution, such as a plug-in to communicate with an external server, at this time, you can re-return the http_context to the statue of the http_context when the communication is completed. Obviously, PLUG-IN is multi-to-many relationships with HTTP state, and a plug-in can register themselves at several concerns, and some short-pats can also be set to increase processing efficiency.
in conclusion
In general, the design and application of asynchronous calls is the management problem of multiple active objects: how to provide the power of execution and how to ensure the execution of the order logic. The main consideration is the particle size of the active object and the execution method, synchronization or callback to complete the order of schedule, or use approximate scheduling to add some robust error processing mechanism to ensure the correct semantics. The latter can consider that when using an event-based socket, the notification of the readable event can be redundant, or more than the readable event that occurs in the actual, this time uses non-blocking socket, some read () (or RECV ()) will return to EWouldBlock directly, and the system is acceptable when considering processing this situation (using the Non Blocking Socket instead of the Blocking Socket), is acceptable when the exception is not much. At this time, it can be said that the report is just approximation.