Daniel Robbins President / CEO, Gentoo Technologies, Inc. July 2000
POSIX (portable operating system interface) thread is a strong means of improving code response and performance. In this series, Daniel Robbins exactly presents how to use threads in the program. It also involves a large number of behind-the-scenes details, read this series of articles, you can use the POSIX thread to create multi-threaded programs.
Thread is interesting to understand how to properly use threads to be a must-have quality for each outstanding programmer. Threads are similar to the process. As the process, the thread is managed by the kernel according to time. In a single processor system, the kernel uses time slice to simulate the concurrent execution of the thread, which is the same as the process. In the multiprocessor system, like multiple processes, threads can actually implement concurrently.
So why is it more superior to most partnerships, multithreaded more than a plurality of independent processes? This is because threads share the same memory space. Different threads can access the same variable in memory. Therefore, all threads in the program can read or write a global variable. If you have written important code with fork (), you will recognize the importance of this tool. why? Although fork () allows you to create multiple processes, it also brings the following communication issues: How to communicate multiple processes, each of which has its own independent memory space. There is no simple answer to this issue. Although there are many different types of local IPC (inter-process communication), they all have two important obstacles:
It is imposed a form of extra core overhead to reduce performance. For most cases, IPC is not an "natural" extension for code. Usually greatly increase the complexity of the program.
Double bad things: overhead and complexity are not good. If you have moved your procedure in order to support IPC, you will really enjoy the simple shared memory mechanism provided by threads. Since all threads reside in the same memory space, the POSIX thread does not need to be overhead, complex long distance calls. As long as you use a simple synchronization mechanism, all threads in the program can read and modify the existing data structure. There is no need to store data via a file descriptor or squeeze into a narrow shared memory space. This is enough to let you consider using a single process / multi-thread mode rather than multi-process / single-threaded mode.
The thread is fast, not only this. The thread is also very fast. Compared with standard fork (), the overhead of threads is small. The kernel does not require the memory space or file descriptor of the process separately. This saves a lot of CPU time, making thread creation 10 to one hundred times more than the new process. Because of this, you can use a lot of thread without having to be too worried about the CPU or memory. A large number of CPUs that are caused when using fork () do not have multiple. This means that the thread can be created as long as it makes sense in the program.
Of course, like the process, the thread will utilize multiple CPUs. If the software is designed for multiprocessor system, this is really a major feature (if the software is open source, it may eventually run on a lot of platforms). The performance of a particular type of thread program (especially CPU-intensive programs) will increase almost linearly with the number of processors in the system. If you are writing a CPU very intensive program, you absolutely try to use multithreading in your code. Once the thread coding is grasped, there is no need to use a cumbersome IPC and other complex communication mechanisms, and the code puzzle can be solved in a new and creative approach. All of these features make multithreaded programming more interesting, fast and flexible.
Threads are portable if familiar with Linux programming, it is possible to know the __clone () system call. __clone () is similar to fork (), and there are also many threads. For example, using __clone (), the new sub-process can be selected to share the execution environment of the parent process (memory space, file descriptor, etc.). This is a good side. But __clone () also has a deficiency. Just as __clone () online help pointing: "__clone call is a Linux platform, not applicable to implement portable programs. To write threaded applications (multi-threaded control same memory space), it is best to use the implementation of POSIX 1003.1C Thread API library, such as a Linux-Threads library. See pthread_create (3thr). "
Although __clone () has a number of characteristics of thread, it is not portable. Of course, this does not mean that you can't use it in your code. But this fact should be weighed when using __clone () in the software. Fortunately, just as __clone () online help points out, there is a better alternative: POSIX thread. If you want to write portable multi-thread code, the code can run in Solaris, FreeBSD, Linux, and other platforms, and POSIX threads are of course optional.
The first thread is below a simple sample program for a POSIX thread:
Thread1.c
#include
#include
#include
Void * thread_function (void * arg) {
INT I;
For (i = 0; i <20; i ) {
Printf ("Thread Says Hi! / N");
Sleep (1);
}
Return NULL;
}
INT main (void) {
PTHREAD_T mythread;
IF (Pthread_create (& mythread, null, thread_function, null) {
"" ERROR CREATING THREAD. ");
Abort ();
}
IF (pthread_join (mythread, null) {
"" "Error Joining Thread.");
Abort ();
}
exit (0);
}
To compile this program, just save the program as thread1.c, then enter:
$ gcc thread1.c -o thread1 -lpthread
Running input:
$ ./thread1
Understanding thread1.cthread1.c is a very simple thread program. Although it does not implement any useful features, it can help the runtime mechanism of the thread. Below, let's learn about this program step by step. MAIN () declared the variable mythread, the type is pthread_t. The pthread_t type is defined in pthread.h, commonly referred to as "thread id" (abbreviated as "TID"). It can be considered to be a thread handle.
Mythread declaration (remember Mythread is just a "TID", or the handle of the thread to be created), call the pthread_create function to create a real activity thread. Don't be confused because pthread_create () is confused in the "IF" statement. Because pthread_create () returns zero when it comes to fail, returns a non-zero value, and put the pthread_create () function call in the IF () statement is just to facilitate the detection of failed calls. Let's check the pthread_create parameter. The first parameter & mythread is a pointer to mythread. The second parameter is currently null, which can be used to define certain properties of the thread. Since the default thread properties are applicable, just set this parameter to NULL. The third parameter is the function name called when the new thread is started. In this example, the function is called thread_function (). When thread_function () returns, the new thread will terminate. In this example, the thread function does not achieve a large function. It only outputs "Thread Says Hi!" And then exits. Note Thread_Function () accepts VOID * as a parameter, and the type of return value is also void *. This indicates that any type of data can be returned when any type of data can be passed to the new thread with VOID *. How do you pass an arbitrary parameter to thread? Very simple. Just use the fourth parameters in pthread_create (). In this example, because any data is not necessary to pass any data to the slightly passive Thread_Function (), the fourth parameter is set to NULL.
You may have speculated that after pthread_create () successfully returns, the program will contain two threads. Waiting for one, two threads? Don't we only create a thread? Yes, we only created a process. But the main program is also a thread. It can be understood that if the written program does not use the POSIX thread at all, the program is a single thread (this single-thread is called "master" thread). There are two threads after creating a new thread.
I want to have at least two important issues at this time. The first question, how the main thread runs after the new thread is created. Answer, the main thread continues to perform the next line of rows in order (this example ")"). The second question, how to deal with the new thread. Answer, the new thread stops first, then as part of its cleaning process, waiting to be merged with another thread or "connection".
Now let's take a look at pthread_join (). As pThread_create () splits a thread into two, pthread_join () merges two threads into a thread. The first parameter of pthread_join () is Tid Mythread. The second parameter is a pointer to the VOID pointer. If the Void pointer is not null, pthread_join places the thread's VOID * return value at the specified location. Since we don't have to pay back the return value of thread_function (), set it to NULL.
You will notice Thread_Function () spent 20 seconds. The main thread has been called PTHREAD_JOIN () in thread_function (). If this happens, the main thread will be interrupted (turn to sleep) and then wait for thread_function (). When thread_function () is complete, pthread_join () will return. The program only has only one main thread. When the program exits, all new threads have been merged using pthread_join (). This is how the process should be handled for each new thread created in the program. If you don't merge a new thread, it still limits the maximum number of threads in the system. This means that if the thread is not properly cleaned, it will eventually lead to pthread_create () call failed. Father, no child, if you use the fork () system call, you might be familiar with the concept of the Patient process and sub-process. When you create another new process with fork (), the new process is a child process, the original process is a parent process. This creates a hierarchical relationship that may be very useful, especially when the sub-process is terminated. For example, a waitpid () function allows the current process to wait for all sub-process to terminate. WaitPid () is used to implement a simple cleaning process in the parent process.
The POSIX thread is more interesting. You may have noticed that I have always had to avoid using the "parent thread" and "sub-threads". This is because this hierarchical relationship is not present in the POSIX thread. Although the main thread can create a new thread, the new thread can create another new thread, and the POSIX thread standard will be considered equivalent level. So the concept of waiting for the child thread exits meaningless here. POSIX thread standard does not log any "family" information. There is a major meaning of the lack of family information: if you want to wait for a thread to terminate, you must pass the TID of the thread to pthread_join (). The thread library cannot determine the TID for you.
This is not a good news for most developers, because this will complicate multiple threads. But don't worry about it. The POSIX thread standard provides all the tools that effectively manage multiple threads. In fact, there is no parent / sub-relationcture that opens up more creative methods for using threads in the program. For example, if there is a thread called thread 1, thread 1 creation thread called thread 2, thread 1 does not need to call pthread_join () to merge thread 2, and any other thread in the program can be done. This may allow interesting things when writing a lot of code using a thread. For example, you can create a global "dead thread list" containing all stop threads, and then let a thread that deactivated threads are added to the list. This cleaning thread calls pthread_join () to merge the thread the thread. Now, only one thread is cleaned and effectively processed.
Synchronous roaming Now let's look at some code, these codes have made some unexpected things. The code of thread2.c is as follows:
Thread2.c
#include
#include
#include
#include
Int myglobal;
Void * thread_function (void * arg) {
INT I, J;
For (i = 0; i <20; i ) {
J = myglobal;
J = J 1;
PRINTF (".");
Fflush (stdout);
Sleep (1);
Myglobal = j;
}
Return NULL;
}
INT main (void) {
PTHREAD_T mythread;
INT I;
IF (pthread_create (& mythread, null, thread_function, null) {printf ("Error Creating Thread.");
Abort ();
}
For (i = 0; i <20; i ) {
Myglobal = myglobal 1;
Printf ("o");
Fflush (stdout);
Sleep (1);
}
IF (pthread_join (mythread, null) {
"" "Error Joining Thread.");
Abort ();
}
Printf ("/ NMYGLOBAL Equals% D / N", MyGlobal);
exit (0);
}
Understanding thread2.c is like the first program, this program creates a new thread. The main thread and new thread will add global variables MYGLOBAL to 20 times. But the program itself produces some unexpected results. Compile code, please enter:
$ gcc thread2.c -o threeread2 -lpthread
Run Please enter:
$ ./thread2
Output:
$ ./thread2
..>.>.O.O.O.O.O.O.O.O.O.O.O.O.O.-
Myglobal Equals 21
Very unexpectedly! Because MyGlobal starts from zero, the main thread and new thread each have 20 times, the MYGLOBAL value should be equal to 40 at the end of the program. Since the MYGLOBAL output is 21, this is certain in which there is a problem. But what is it?
Will it? Ok, let me explain what happened. First look at the function thread_function (). Note how to copy myGlobal to the local variable "J"? Next, just one, sleep again, then to copy the new J value to MyGlobal? This is the key. Imagine, if the main thread adds myGlobal value to J, what will happen immediately after the new thread gives JGLOBAL value? When thread_function () writes the value of j back to MyGlobal, the modification of the main thread is overwritten.
When writing a thread program, you should avoid this useless side effects, otherwise it will only waste time (of course, in addition to writing article about POSIX thread). So how can I eliminate this problem?
Since IGLOBAL replicates J and wait for a second to write back, you can try to avoid using temporary local variables and add MyGlobal to one. Although this solution is applicable to this particular example, it is still incorrect. If we perform a relatively complex mathematical operation for MyGlobal, rather than simply adding, this method will fail. But why?
To understand this problem, you must remember that the thread is running concurrent. Even if you run on a single processor system (kernel utilization time slice analog multitasking) is also possible, from programmers's perspective, imagine that two threads are executed simultaneously. Thread2.c The problem is because thread_function () relies on the following argument: MYGLOBAL will not be modified during the approximately one second before MYGLOBAL. There are some ways to let a thread notify other threads when making changes to MyGlobal. Don't be close. How to do this in the next article.