BEEJ Network Socket Programming Guide --------------------------------------------- ----------------------------------- Introduction Socket programming make you frustrated? Is it a useful information from MAN PAGES? You want to keep up with the Internet-related procedure, but don't know what the structure before calling connect (). Wait ... I have done these things, I will share my knowledge with everyone. If you understand the C language and want to cross the web programming swamp, then you come to the place. -------------------------------------------------- ------------------------------ Reader Object This document is a guide, not a reference book. If you start Socket programming and want to find a book, then you are my reader. But this is not a full Socket programming. -------------------------------------------------- ----------------------------- Platform and compiler This document has used GNUs on the Linux platform PC GCC successfully compiled. And they have successfully compiled with GCC on the HPUX platform. But attention is not to be tested independently of each code snippet. -------------------------------------------------- ------------------------------ Directory: 1) What is a socket? 2) Two types of Internet sockets 3) Network theory 4) Structural 5) Native conversion 6) IP address and how to handle them 7) Socket () function 8) BIND () function 9) Connect () function 10 () Function 11) accept () function 12) send () and RECV () functions 13) Sendto () and Recvfrom () Functions 14) Close () and shutdown () function 15) getPeerName () function 16) gethostname () Function 17) Domain Name Service (DNS) 18) Customer - Server Background Know 19) Simple Server 20) Simple Client 21) Data Supply Set Socket 22) Block 23) SELECT () - Multi-Synchronous I / O 24) Reference material -------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ What is Socket? You often hear people talk about "socket", maybe you still don't know its exact meaning. Now let me tell you: it is a way to communicate with the standard UNIX file descriptor and other program. what? You may hear some UNIX masters (Hacker) said: "Yeah, everything in UNIX is the file!" The guy may be talking about a fact: When performing any form of I / O, the UNIX program is reading Or write a file descriptor. A file descriptor is just an integer associated with an open file. But (note that later), this file may be a network connection, FIFO, pipeline, terminal, file on disk or what other things. All things in UNIX are files! So, when you want to communicate with other programs on the Internet, you will use the file descriptor. You must understand what you just have just now.
Now your mind may take this idea: "So where I get the file descriptor of the network communication?" Socket Descriptor, then you will pass it again for Send () and RECV () calls. "But ...", you may have a lot of doubts, "If it is a file descriptor, then don't you usually call read () and write () to connect communication?" Simple answers: "you can use!". The detailed answer is: "You can, but using Send () and RECV () let you better control data transfer." There is such a situation: there are many sockets in our world. There is a Darpa Internet address (Internet Socket), the path name of the local node (UNIX socket), CCITT X.25 address (you can completely ignore the X.25 socket). Perhaps there are other things on your UNIX machine. We are here only the first one: Internet socket. -------------------------------------------------- ------------------------------ Internet socket two types What does it mean? Have two types of Internet sockets? Yes it is. No, I am lying. There is also a lot, but I don't want to be scared you. We only talk about two. In addition to this, I plan to introduce "Raw Sockets" is also very powerful, it is worth seeing. So what is these two types? One is "stream sockets" (stream format), and the other is "DataGram Sockets" (packet format). We will also use "Sock_Stream" and "SOCK_DGRAM" in the future. Datagram sockets are sometimes called "connectionless sockets" (time if you really want to connect with can connect ().) Stream sockets are reliable two-way flow of data communication. If you output "1, 2" in order to the socket, they will reach the other side by order "1, 2". They are not erroneous, have their own error control, here no discussion. What is in using a streaming socket? You may have heard Telnet, isn't it? It uses stream sockets. You need the characters you entered in order, isn't it? Similarly, the HTTP protocol used by the WWW browser also uses them to download the page. In fact, when you telnet to a WWW site on port 80, and enter "GET pagename", you also can get HTML content. Why can stream socket meet high quality data transfer? This is because it uses the Transmission Control Protocol, also called "TCP" (see RFC-793 to get detail.) TCP controls your data in order and has no error. You may hear "TCP" because he heard "TCP / IP". IP here is the "Internet Protocol" (refer to the RFC-791.) IP only deal with Internet routing only. So what is the data settlement? Why is it called no connection? Why is it unreliable? There are some facts: If you send a datagram, it may arrive, it may order. If it arrives, there is no error in the inside of this package.
Denual newspapers also use IP routing, but it does not use TCP. It uses the "User DataGram Protocol", is also called "UDP" (see RFC-768.) Why are they not connected? Mainly because it maintains a connection as it is not like a stream socket. Just create a package, construct a IP header with target information, and then send it out. No need to connect. They are usually used for transmission packages - package information. Simple applications include: TFTP, BootP, etc. You may think: "If the data lost these programs normal work?" My friend, each program has its own agreement on UDP. For example, the TFTP protocol has been accepted by a package, and the receiver must send back a bag to say "I received it!" (A "command correctly responded" is also called "ACK" package). If within a certain period of time (eg 5 seconds), the sender does not receive a response, which will be resent until ACK is obtained. This ACK process is very important when implementing the SOCK_DGRAM application. -------------------------------------------------- ------------------------------ Network Theory Since I just mentioned the protocol layer, now how to work and some An example of how the SOCK_DGRAM package is established. Of course, you can also skip this paragraph, if you think it is already familiar. It's time to learn data packages (Data Encapsulation)! It is very important. Its importance is important to your online courses (Figure 1: Data Package) in the study, no matter how you have to master it. The main content is: a package, first by the first protocol (here is TFTP) in its header ("package"), then, the entire data (including TFTP header) is another protocol (Here is UDP) package, then the next (IP), repeatedly, until the hardware (physical) layer (here is an Ethernet). When another machine receives the package, the hardware strips the Ethernet head, the kernel strips IP and UDP head, and the TFTP program will peel the TFTP header, and finally obtain the data. Now we finally talked about the name of the phrase a network layering model (Layered NetWork Model). This network model has many advantages to the relative other models on the network system. For example, you can write a socket program without the physical transmission of data (serial port, Ethernet, connection unit interface (AUI) or other media), because the underlying program will handle them. The actual network hardware and topology are transparent for programmers. I don't say other nonsense, I now list the entire hierarchy model. If you have to participate in the network exam, you must remember: Application Layer (Application) Conference Layer (SESSION) Transport Network Layer (DATA LINK) Physical Layer (Physical The physical layer is hardware (serial port, Ethernet, etc.). The application layer is the farthest separation of hardware layers - it is where users and network interactions. This model is so common, if you want, you can use it as a car guide.
Putting it to UNIX, the result is an Application Layer (Telnet, FTP, etc.) (TELNET, FTP, etc.) (TCP, UDP) Internet Layer (IP and Routing) Network Access Layer (Network Access Layer) now, you may see how these hierarchy coordinate to encapsulate the original data. See how much work is to build a simple packet? Oops, you will have to use "cat" to create a data package! This is just a joke. For streaming sockets, you want to do Send () sends data. For datagram socket, you encapsulate data according to your choice, and use Sendto (). The kernel will establish a transport layer and the Internet layer, and the hardware completes the network access layer. This is modern technology. Now end our network theory speed class. Oh, forget about telling you about the route. But I am not ready to talk about it. If you really care, please refer to IP RFC. -------------------------------------------------- ------------------------------ The structure finally talked about programming. In this chapter, I will talk about the types of data used by the socket. Because some of them is very important. The first is a simple one: socket descriptor. It is the type below: INT is just a common int. From now on, things become incredible, and what you need to do is to continue. Note this fact: There are two types of bytes: important bytes (sometimes called "OcTet", ie, eight-bit groups) in front, or unimportant bytes in front. The previous called "Network Byte Order". Some machines are internally stored in this order, while others are otherwise. When I say a data must be in the Nbo order, then you want to call a function (such as htons ()) to convert it from the host byte order. If I didn't mention NBO, then let it keep the unit sequence. My first structure (in this technical manual TM) - struct sockaddr.. This structure is a multi-type socket storage socket address information: struct sockaddr {unsigned short sa_family; / * address family, AF_XXX * / CHAR SA_DATA [14]; / * 14 byte protocol address * /}; sa_family It is a variety of types, but in this article "AF_INET". SA_DATA contains target addresses and port information in the socket. It seems that it is a bit unused. In order to handle Struct SockAddr, the programmer creates a parallel structure: struct sockaddr_in ("in" represents "Internet".) Struct SockAddr_in {short int sin_family; / * Communication Type * / unsigned short int sin_port; / * port * / Struct; / * Port * / Struct IN_ADDR SIN_ADDR; / * Internet Address * / unsigned char sin_zero [8]; / * The same length as the SOCKADDR structure; use this data structure to easily handle the basic element of the socket address.
Note SIN_ZERO (which is added to this structure, and the length and Struct SockAddr) should use function Bzero () or MEMSET () to settle all zero. At the same time, this important byte, a pointer to the SockAddr_in structure can also be directed to the structure SockAddr and replaced it. In this way, even if Socket () wants Struct SockAddr *, you can still use Struct SockAddr_in and in the final conversion. At the same time, pay attention to the SA_FAMILY in SIN_FAMILY and STRUCT SOCKADDR and can be set to "AF_INET". Finally, sin_port and sin_addr must be a network byte order (NetWork Byte Order)! You may be against: "But how do you make the entire data structure struct in_addr sin_addr follow the network byte order?" To know the answer to this question, we have to look at this data structure: struct in_addr, there is this A Union (Unions): / * Internet Address (a structure related to history) * / struct in_addr {unsigned long s_addr;}; it used to be a worst union, but now the days have passed. If you declare "INA" is an instance of the data structure struct sockaddr_in, "INA.SIN_ADDR.S_ADDR" stores 4 bytes of IP addresses (using network byte order). If your unfortunate system is still in terror, you can rest assured that 4 bytes of IP addresses and the same as I said above (this is because "#define".) -------- -------------------------------------------------- ------------------------ This machine converts us now to the new chapter. We have talked a lot of networks to the transformation of the native byte, and now you can practice! You can convert two types: short (two bytes) and long (four bytes). This function is also available for variable type unsigned. Suppose you want to turn the SHORT from this unit byte sequence to network byte order. Use "H" to "host", then "TO", then use "N" indicate "network", finally use "S" to indicate "SHORT": H-TO-NS, or Htons () ("Host to Network Short"). Too simple ... If it is not too stupid, you must think of the correct combination of "n", "h", "s", and "l", such as here, there is no stolh ("Short to Long Host ") function, not only there is not here, not all occasions. But here is: htons () - "Host to NetWork Short" HTONL () - "Host to NetWork Long" () - "NetWork to Host Short" Ntohl () - "NetWork to Host Long" now, You may think you already know them.
You may also think: "What should I do if I want to change the order of char?" But you may think about it immediately, "don't think about it". You may think that my 68000 machine has used the network byte order, I don't have to call HTONL () conversion IP addresses. You may be right, but when you transplant your program to other machines, your program will fail. portability! Here is UNIX world! Remember: When you put the data on the network, it is sure that they are network byte order. Last: Why is SIN_ADDR and SIN_PORT in the data structure Struct SockAddr_in, and SIN_FAMIL is not needed in SIN_FAMILY: SIN_ADDR and SIN_PORT are encapsulated in the package IP and UDP layers. Therefore, they must be a network byte order. However, the SIN_FAMILY domain is only used by the kernel to determine what type of address is included in the data structure, so it must be the unit of this byte. At the same time, SIN_FAMILY is not sent to the network, which can be the native byte order. -------------------------------------------------- ----------------------------- IP address and how to handle them now we are lucky, because we have a lot of functions to conveniently Operate the IP address. There is no need to calculate them with manually, and there is no need to store the growth typing type with "<<" operation. First, assume that you already have a SockAddr_in structure INA, you have an IP address "132.241.5.10" To store it, you have to use the function inet_addr (), convert the IP address from the point format into unsigned long Integer. The method is as follows: INA.SIN_ADDR.S_ADDR = INET_ADDR ("132.241.5.10"); note that inet_addr () returned is already a network byte format, so you don't need to call the function htonl (). We now find that the above code snippet is not very complete because it has no error check. Obviously, returns -1 when inet_addr () occurs. Remember these binary numbers? (Number of no characters) - 1 is only in line with IP address 255.255.255.255! This is a broadcast address! Big wrong! Remember to make an error check. Ok, now you can convert the IP address to growth. Is there any way it? Can it output an IN_ADDR structure to a point format? In this case, you will use the function inet_ntoa () ("NTOA" meaning is "Network to ascii"), just like this: Printf ("% s", inet_ntoa (INA.SIN_ADDR)); it outputs an IP address . It should be noted that INT_NTOA () uses structural in-addr as a parameter, not a long shape. It is also important to note that it returns a pointer to a character. It is a static fixed pointer controlled by inet_ntoa (), so each time you call inet_ntoa (), it will overwrite the IP address from the last call.
For example: char * a1, * a2;. A1 = inet_ntoa (INA1.SIN_ADDR); / * This is 198.92.129.1 * / a2 = inet_ntoa (INA2.SIN_ADDR); / * This is 132.241.5.10 * / printf ("" Address 1:% S / N ", A1); Printf (" Address 2:% S / N ", A2); Output as follows: Address 1: 132.241.5.10 Address 2: 132.241.5.10 If you need to save this IP address, Use the strPy () function to point to your own character pointer. The above is the introduction of this topic. Later, you will learn a string of "Wintehouse.gov" to convert a string itself into the IP address it (see domain name service, after later). -------------------------------------------------- ------------------------------ Socket () function I think I can't mention this - I will discuss Socket below. System call. Here is a detailed introduction: #include
Simple is not very? Let's take a look: #include
If you use connect () to communicate with the remote machine, you don't need to care about your local port number (just when you use Telnet), you only need to simply call connect (), it will check The socket is bound to the port. If not, it will bind himself to a local port that is not used. -------------------------------------------------- ------------------------------ CONNECT () program Now we assume that you are a Telnet program. Your user commands you get the file descriptor of the socket. You listen to the command call socket (). Next, your user tells you to "132.241.5.10" through port 23 (standard Telnet port). What should you do? Lucky is, you are reading connect () - how to connect to the remote host. You don't want your user to disappoint. Connect () system call is such: #include
If you don't want to connect with a remote address, or say it is only to kick it, then you need to wait for access request and use various methods to handle them. Process is divided into two steps: First, you listen - Listen (), then you accept --Accept () (see below). In addition to a little explanation, the system calls Listen is also quite simple. INT Listen (int SockFD, int Backlog); SockFD is a socket file descriptor that calls socket (). BACKLOG is the number of connections allowed in the entered queue. What do you mean? The entry connection is waiting in the queue until you accept (Accept () Please see the following article) The number of their number is limited to the allowable queue. Most systems allow the number of allowed 20, you can also set to 5 to 10. Like other functions, return -1 when an error occurs, and set global error variable errno. You may imagine, you or call bind () before you call listen () or let the kernel choose a port. If you want to listen to the entered connection, the order in which the system call can be like this: socket (); bind (); listen (); / * accept () should be here * / because it is quite clear, I will Not presented here. (The code in the chapter of Accept () will be more complete.) The truly trouble part is in Accept (). -------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- You can imagine something that happens: Some people have passed a port connection from the Listen ()) from a far away place (Connect ()) to your machine. Its connection will be added to the queue waiting for acceptance (Accept ()). You call accept () tells it that you have an idle connection. It will return a new socket descriptor! This way, you have two sockets, the original one is still listening to your port, new in preparing for send (Send ()) and receives (RECV ()) data. This is this process! The function is defined: #include
#include
Here is some possible examples: char * msg = "Beej Was Here!"; Int LEN, BYTES_SENT ;.. Len = Strlen (MSG); Bytes_sent = Send (sockfd, msg, len, 0);... Send ) Returns the number of bytes of data actually sent - it may be less than the number you asked to send! Note, sometimes you tell it to send a bunch of data, but it can't handle success. It is just the data that it may send, and then you can send other data. Remember, if the data returned by Send () does not match, you should send other data. But there is also a good news here: if you want to send a small package (less than about 1K), it may handle the data once. Finally, it is, it returns -1 when it is wrong, and set Errno. The RECV () function is very similar: int RECV (Int Sockfd, Void * BUF, INT LEN, UNSIGNED INT FLAGS); SOCKFD is a socket descriptor to read. BUF is the buffering of the information to read. Len is the maximum length of the buffer. Flags can be set to 0. (Please refer to the man page.) RECV () returns the number of bytes that actually read buffered data. Or return -1 when the error is returned, and Errno is set. Very simple, isn't it? You can now send data and receive data on the stream socket. You are now a UNIX web programmer! -------------------------------------------------- ------------------------------ Sendto () and Recvfrom () functions "This is very good", you said, "but You haven't mentioned the connection of the connection data. "No problem, now we start this content. Since the data settlement is not connected to the remote host, what information is needed before we send a package? Yes, it is the target address! Take a look at: int SENDTO (int Len, unsigned int flag, int in, uncoven); you have seen, except for additional two information, the rest of the and functions Send () is the same. TO is a pointer to the data structure Struct SockAddr, which contains the IP address and port information of the destination. Tolen can be simply set to sizeof (Struct SockAddr). Similar to the function send (), sendTo () returns the number of bytes actually sent (it may be less than the number of bytes you want to send!), Or return -1 at the time. Similar features RECV () and RECVFROM (). The definition of Recvfrom () is this: int RECVFROM (Int Sockfd, Void * BUF, INT LEN, Unsigned Int Flags, StRUCT SOCKADDR * from, INT * FROMLEN); again, in addition to two additional parameters, this function and RECV () is also the same. From is a pointer to the local data structure Struct SockAddr, its content is the IP address of the source machine and port information. Fromlen is an INT type partial pointer, its initial value is SIZEOF (Struct SockAddr). After the function call returns, the future of the address saves the length of the address actually stored in the FROM.
Recvfrom () Returns the received byte length or returns -1 after an error occurs. Remember, if you connect a data settlement with Connect (), you can simply call Send () and RECV () to meet your requirements. At this time, it is still a data settlement, depending on UDP, the system socket interface will automatically add the target and source information for you. -------------------------------------------------- ------------------------------ Close () and shutdown () functions you have been sending all day (Send ()) And receiving (RECV ()) data, now you are ready to turn your socket descriptor. This is simple, you can use a close () function of the general UNIX file descriptor: Close (sockfd); it will prevent more data from more data on the socket. Any at an attempt to read a write set on the other end will return an error message. If you want to control more about how to turn off the socket, you can use the function shutdown (). It allows you to close communication or two-way communications in a certain direction (just like close ()), you can use: int shutdown (int suckfd, int how); sockfd is a socket description you want to close complex. The value of how is one of the following: 0 - Does not allow 1 - Do not allow transmission 2 - Do not allow transmission and acceptance (with Close () Successful returning 0, return -1 when failing (simultaneously set Errno.) If you use shutdown () in connectionless data sets, it is just to let Send () and RECV () cannot be used (remember that you use Connect in the data settlement. Use them). -------------------------------------------------- ------------------------------ getpeername () function this function too simple. It is too simple, so I don't want to one list. But I still did this. The function getPeername () tells you who is on the other side on the connected stream socket. The function is such: #include
Here is defined: #include
Here is an example: #include
If you want to test this program, you can run the program on a machine, then log in on another machine: $ telnet remotehostname 3490 RemoteHostName is the name of the machine running.
Server code: #include
/ * Parent doesn't Need this * / while (waitpid (-1, null, wnohang> 0); / * clean up child processes * /}} If you are very picky, you must not satisfy all my code is A large main () function. If you don't like it, you can divide more points. You can also use the programs in our next chapter to get the string sent by the server. -------------------------------------------------- ------------------------------ Simple customer program This program is simpler than the server. All of this program is connected to the host specified in the command line through the 3490 port, and then get the string sent by the server.
Customer code: #include
Listener Wait for packets from port 4590 on the machine. Talker sends a packet to a certain machine that contains the contents of the user entered in the command line. Here is listener.c: #include
We are using unconnected datagram sleeves! The following is Talker.c: #include
-------------------------------------------------- ------------------------------ Blocking, you may have heard it. "Blocking" is "Sleep" technology. You may notice the listener program running in front, it keeps running, waiting for the delivery of the packet. Actually running that it calls Recvfrom (), then there is no data, so the recvfrom () says "block" until the data is arriving. Many functions use blockage. Accept () blocking, all RECV * () functions block. The reason why they can do this is because they are allowed to do so. When you first call socket () to build a set of sets, the kernel is set to block it. If you don't want to block the block, you have to call the function fcntl (): #include
In this example, you should set this value to SOCKFD 1. Because it must be larger than the standard input file descriptor (0). When the function select () is returned, the value of the readfds is modified to reflect which file descriptor you choose can read. You can test with the macro fd_isset () told below. Before we continue, let me talk about how to operate these collections. Each collection type is fd_set. There are some macros to operate on this type: fd_zero (fd_set * set) - Clear a file descriptor set fd_set (int FD, fd_set * set) - Add FD to Collect FD_CLR (int FD, FD_SET * SET) - From Collection Transfer to FD FD_ISSET (int FD, FD_SET * SET) - Test if the FD is finally in the collection, is a bit weird data structure Struct TimeVal. Sometimes you don't want to wait for others to send data. Maybe there is no matter what happened, you also want to print strings "Still Going ..." every 96 seconds. This data structure allows you to set a time, if time is, and select () has not found a ready-to-face descriptor, which will return to let you continue processing. The data structure Struct Timeval is this: struct timeval {int TV_sec; / * seconds * / int TV_usec; / * microseconds * /}; simply set TV_sec to you want to wait, set TV_USEC to you want to wait The number of seconds can be. Yes, it is microseconds instead of milliseconds. 1,000 microseconds or equal to 1 millisecond, 1,000 milliseconds is equal to 1 second. That is, 1 second is equal to 1,000,000 microseconds. Why is the symbol "usec"? The letter "U" is very like a Greek letter MU, and MU means "micro" meaning. Of course, Timeout may be the remaining time when the function returns, because it relies on your UNIX operating system. what! We now have a micro-second timer! Don't calculate, the standard UNIX system time film is 100 milliseconds, so no matter how you set your data structure Struct Timeval, you have to wait for a long time. There are also some interesting things: if you set the data in the data structure Struct TimeVal is 0, select () will immediately time out, so you can effectively poll all the file descriptors in the collection. If you assign the parameter Timeout as null, you will never have timeout, you will wait until the first file descriptor is ready. Finally, if you are not very concerned about waiting for how long, then assign it as null.
The following code demonstrates waiting for 2.5 seconds on standard input: #include