Socket programming principle

zhaozj2021-02-16  59

1 The I / O command set of the introduction of the UNIX system is evolved from the commands in the Maltics and early systems, and their mode is open-write-write-close. When a user process performs I / O operation, it first calls "Open" to get the right to specify the specified file or device, and return to the integer of the file descriptor to describe the user on the open file or device. The process of I / O operation. The user process then calls "read / write" multiple times to transfer data. When all transfer operations are completed, the user process turns off call, and notify the operating system has completed the use of an object.

When the TCP / IP protocol is integrated into the UNIX kernel, a new type of I / O operation is equivalent to the UNIX system. The interaction of UNIX user processes and network protocols is much more complicated than user processes and traditional I / O devices. First, the two processes of network operations are in the same way, how to build them on the machine? Secondly, there are a variety of network protocols, how to build a general mechanism to support multiple protocols? These are issues to be solved by the network application programming interface.

In UNIX systems, there are two types of network application programming interfaces: sockets (socket) and UNIX System V. TLI. Since Sun has adopted UNIX BSD operating systems that support TCP / IP, the application of TCP / IP has a greater development, and its network application programming interface - sockets are widely used in network software, so far Introducing the microcomputer operating system DOS and Windows systems, becoming a powerful tool for developing network applications, this chapter will discuss this issue in detail.

2 SLR programming Basic concepts The following concepts must first be established before using the socket programming.

2.1 The concept of online process communication process communication is originally derived from a stand-alone system. Since each process runs within its address range, the operating system provides corresponding facilities for process communication between the two mutually communicative processes, the operating system provides the process communication, such as the pipeline in UNIX BSD (PIPE Named PIPE and Soft Interrupt Sign (SIGNAL), Message, Shared Memory, and Semaphore, etc., but are limited to use in native processes Inter-communication. Network process communication should be solved by mutual communication issues between different host processes (which can be used as a special case for the same machine process). To this end, the first thing to solve is the problem of network process identification. On the same host, different processes can be uniquely identified. However, in a network environment, the process numbers independently allocated by each host cannot uniquely identify the process. For example, host A is paired with a process number 5, and there is also a process in the B machine, so the "No. 5 process" is meaningless.

Second, the network protocol supported by the operating system has different ways of working in different protocols, and the address format is different. Therefore, network process communication also solves the identification problem of multiple protocols.

In order to solve the above problems, the TCP / IP protocol introduced the following concepts.

port

A communication port that can be named and addressed in the network is a resource that the operating system is assignable.

According to the description of the OSI seven-layer protocol, the maximum difference between the transport layer and the network layer in function is to provide process communication capabilities. In this sense, the final address of network communication is more than just host addresses, but also an identifier that can describe the process. To this end, the TCP / IP protocol proposes the concept of protocol port (PROTOCOL Port, referms) to identify the process of communication. The port is an abstract software structure (including some data structures and I / O buffers). After the application (ie process) is connected to a bit, the data transmitted to the port is received by the system call, and the data transmitted to the port is received by the corresponding process, and the corresponding process is transmitted to the transport layer through the port output. In the implementation of the TCP / IP protocol, the end of the discipline is similar to the general I / O operation, the process gets a port, which is equivalent to obtaining a local unique I / O file, and can be accessed by general reading and writing .

Similar to the file descriptor, each port has an integer identifier of port number, which is used to distinguish different ports. Since two protocols of the TCP / IP transport layer TCP and UDP are completely independent software modules, their respective port numbers are also independent of each other, such as TCP has a 255-port port. UDP can also have a 255-port, both Not conflict.

The allocation of the port number is an important issue. There are two basic allocations: the first kind of global allocation, this is a centralized control method, and a recognized central organization is unified according to the user needs, and will be published in the results. The second is local allocation, also known as dynamic connection, that is, when the process needs to access the transport layer service, apply to the local operating system, and the operating system returns a local unique port number, and the process will use the port by the appropriate system call. The number is linked (tied). The above two methods are integrated in the allocation of the TCP / IP port number. TCP / IP divides the port number into two parts, a small amount of retention port, assigned to the service process in a global manner. Therefore, each standard server has a globally recognized port (ie, Well-KNown Port), even if the key is on the same machine, its port number is also the same. The remaining port is allocated in a local manner. TCP and UDP are all specified that the port number of less than 256 can be reserved.

address

The two processes of communication in network communications are respectively on the machine. In the interconnected network, two machines may be in the network of Ji Ji, which is connected through the network interconnect equipment (gateway, bridge, router, etc.). Therefore, it takes three levels of addressing:

1. Some host can be connected to multiple networks, you must specify a specific network address;

2. Each host on the network should have its unique address;

3. Each process on each host should have a unique identifier on that host.

Typically, host addresses are composed of network ID and host ID, and use a 32-bit integer value in the TCP / IP protocol; TCP and UDP use a 16-bit port number to identify user processes.

Network byte order

Different computers store multi-byte value, some machines store low bytes (low price first) in the start address, some save high bytes (high-priced first). In order to ensure the correctness of the data, the network byte order must be specified in the network protocol. TCP / IP protocol uses 16-bit integers and 32-bit integers, which are included in the protocol header file.

connection

Communication link between two processes is called a connection. Connections in the Ciji   为 一些 一些 区 区 区 区 协 机 机 机 机 机 区 机 机 机 机 机 机 机 区 性 区 性 区 性 性 性 性 性 可 性 性 可

Semi-related

In summary, a three-way group can be used in the global unique logo in the global: (protocol, local address, local port number)

Such a three-way group called a half-association, which specifies every half of the connection.

Fully relevant

A complete network process communication needs to be composed of two processes and can only use the same high-level protocol. That is to say, it is impossible to communicate with the TCP protocol, and the other ends with the UDP protocol. So a complete network communication requires a five-way group to identify:

(Agreement, local address, local port number, remote address, far port number)

Such a five-way group called a association, that is, two protocols the same semi-correlation to combine a suitable correlation, or fully specify a connection.

2.2 Service Method In the network hierarchical structure, all layers are strictly aligned, and the division of labor and collaboration in each level are reflected in the interface between the phase of the phase. "Services" is an abstract concept that describes the relationship between the phases   , ie the two sets of operations in the network to the upper layer. The lower layer is the service provider, the upper layer is the user of the request service. The performance of the service is primitive, such as system calls or library functions. The system call is the service primitive that the operating system is available to the network application or high-level protocol. The N layer in the network will always provide a more complete service than N-1 layers to the N 1 layer, otherwise the N-layer does not exist.

In OSI's terms, the network layer and the following layers thereof are also referred to as communication subnets, only point-to-point communication, no programs or processes. The transport layer is implemented, and the network process communication concept is introduced, and it is also necessary to solve the error control, traffic control, data sort (packet sorting), connection management, etc., provide different service methods for this :

Connection (virtual circuit) or no connection

The connection service is an abstraction of the telephone system service mode, that is, each complete data transfer has to be established, using the connection, and terminating the connection. During the data transfer, each data packet does not carry the destination address, and the connection number (Connect ID) is used. Essentially, the connection is a pipeline, and the transmission and reception data is not only the same, but also the same content. The TCP protocol provides a connected virtual circuit.

The connectionless service is an abstraction of the postal system service, and each packet carries a complete destination address, each packet is transmitted independently in the system. No connection service does not guarantee the order of the group, and the recovery and retransmission of grouping errors are not performed, and the reliability of the transmission is not guaranteed. The UDP protocol provides a connectionless datagram service.

The type and application of these two services are given below:

Service type service case

Connecting reliable message flow

Reliable byte stream

Unreliable connection file transfer (FTP)

Remote login (Telnet)

Digital voice

Unconnected Data News

Acknowledgment

Request - Answer Email (E-mail)

Registration letter in email

Network database query

order

In network transmission, two consecutive packets may pass different paths in terminal-end communication, so that the order of arrival destination may be different from the transmission. "Order" means that the sequence of reception data is the same as the sequence of transmitted data. The TCP protocol provides this service.

Error control

Ensure that the application has no error in the data received. Checking the error method is generally a method of testing "Checksum). It is guaranteed that there is no error-free approach to the two parties use a confirmation response technology. The TCP protocol provides this service.

Flow control

Control the mechanism of the data transfer rate during the data transfer to ensure that the data is not lost. The TCP protocol provides this service.

Byte stream

The byte flow means that only the packets in the transfer are regarded as a byte sequence, no boundaries of the data stream. The TCP protocol provides a word throttle service.

Packet

The recipient is to save the sender's packet boundary. The UDP protocol provides packet services.

Full duplex / half-duplex

The end-end data is transmitted in two directions / one direction.

Cache / with external data

In byte stream, since there is no message boundary, the user process can read or write any number of bytes at a certain time. To ensure that the transfer is properly or uses a streaming protocol, it is to be cached. But for some special needs, such as interactive applications, will request cancel this cache.

During data transfer, it is desirable to transfer some type of information to the user in time by conventional transmission, such as the interrupt key of the UNIX system (delete or control-c), terminal stream control (Control-S and Control-Q). ), Called out-of-band data. Logically, as if the user process uses a separate channel to transfer this data. This channel is associated with the flow of each pair of connections. Since the implementation of the extracted data in Berkeley Software Distribution is inconsistent with the Host Agreement specified in RFC 1122, the application writer is required to subtract the problem in interoperability, unless the external data is required, unless otherwise provided with existing services. It is best not to use it.

2.3 Customer / Server Mode In TCP / IP network applications, the main mode of the two processes between communication is a client / server mode, that is, the client issues a service request to the server, and the server receives the request. Provide corresponding services. The establishment of the client / server mode is based on the following two points: First, the cause of the network is the soft hardware resource, calculation capacity, and information in the network. It needs to be shared, thus making the host providing services with many resources, and customers request services. This is a non-equal role. Second, the network process communication is completely asynchronous. There is neither a parent-child relationship between the processes of communication between each other, and does not share the memory buffer, so there is a mechanism to establish a connection between the processes that wish to communicate, provide data exchange for both Synchronize, this is the TCP / IP of Yoshang Ji Ji Fuji / Server Mode.

Customer / Server Mode Keys Take the proactive request method:

First, the server side must start first, and provide the appropriate service according to the request:

1. Open a communication channel and inform the local host, which is willing to receive a customer request on a publicly recognized address (Zhou Zhi, such as FTP 21);

2. Wait for the customer request to reach this port;

3. Receive the repeated service request, process the request and send a response signal. Receive concurrent service requests, to activate a new process to handle this customer request (such as in the UNIX system with fork, exec). The new process handles this customer request and does not need to respond to other requests. After the service is complete, turn off the new process with the customer's communication link and terminate.

4. Return to the second step and wait for another customer request.

5. Turn off the server

client:

1. Open a communication channel and connect to the specific port of the host where the server is located;

2. Send a service to the server, wait and receive a response; continue to ask a request ...

3. The communication channel is closed after the request is completed.

From the above described process:

1. The role of the client and the server process is unsmark, so the coding is different.

2. The service process is generally initiated by the first Ji Ji. As long as the system is running, the service process has always exist until it is normal or forced to terminate.

2.4 Socket Type TCP / IP Socket provides the following three types of sockets. Flow socket (SOCK_STREAM)

Provides a connection, reliable data transfer service, no error in data, no repeatedly transmitted, and receiving in the sequence of transmission. Internal flow control, avoiding data flow overrun; data is considered to be byte stream, no length limit. File Transfer Protocol (FTP) is based with stream socket.

Data report (SOCK_DGRAM)

A connectionless service is available. The packet is sent in the form of an independent package, and there is no error guarantee.

The data may be lost or repeated, and the reception order is confusing. Network File System (NFS) uses a datagram socket.

Original socket (SOCK_RAW)

This interface allows direct access to lower layer protocols, such as IP, ICMP. Commonly used to check new protocols implementation or access new devices configured in existing services.

3 Basic socket system call To better illustrate the socket programming principle, the following is given below for several basic socket system call descriptions.

3.1 Creating a socket - Socket () application first must have a socket before using a socket, the system calls socket () provides a means of creating a socket to the application, and its call format is as follows:

Socket Pascal Far Socket (int AF, INT TYPE, INT Protocol);

This call should receive three parameters: AF, TYPE, Protocol. The parameter AF specifies the area of ​​communication, the address family supported by UNIX system has: AF_UNIX, AF_INET, AF_NS, etc., while only AF_INET is supported in Windows, which is an Internet area. Therefore, the address family is the same as the protocol. The parameter Type describes the type of socket to be established. Parameter protocol Describes the specific protocol used by the socket, if the caller does not want to specify the protocol to be used, set to 0, use the default connection mode. Establish a socket based on these three parameters and assign the corresponding resource to it, and return a integer socket. Therefore, the Socket () system call actually specifies the "protocol" in the relevant five-component group.

See 5.2.23 for the detailed description of Socket ().

3.2 Specify local address - bind () When a socket is created with socket (), there is a name space (address family), but it is not named. Bind () links the socket address (including the local host address and local port address) with the created socket, which will give a socket to specify local semi-associated. Its call format is as follows:

INT Pascal Far Bind (Socket S, Const Struct Sockaddr Far * Name, Int Namelen);

The parameter S is a socket descriptor (socket) that is returned and not connected by the socket () call. The parameter NAME is a local address (name) assigned to the socket, which is variable, and the structure is different from the communication domain. Namelen demonstrates the length of the Name.

If there is no error, bind () returns 0. Otherwise the return value socket_ERROR is returned.

The address plays an important role in establishing socket communication, as a network application designer must have a clear understanding of the socket address structure. For example, UNIX BSD has a set of data structures describing the socket address, where the address structure of the TCP / IP protocol is:

Struct sockaddr_in {

Short sin_family; / * AF_INET * /

U_SHORT SIN_PORT; / * 16-bit port number, network byte order * /

Struct in_addr sin_addr; / * 32-bit IP address, network byte order * / char sin_zero [8]; / * reserved * /

}

See 5.2.2 for a detailed description of bind ().

3.3 Creating a socket connection - inconnect () and Accept () These two system calls are used to complete a completely relevant establishment, where connect () is used to establish a connection. The connectionless socket process can also call connect (), but there is no actual message exchange between the processes, and the call will return directly from the local operating system. The advantage of doing this is that the programmer does not have to specify the destination address for each data, and if a data report is received, the destination port does not establish "connection" with any socket, and it is possible to determine that the end is discipline. operating. ACCEPT () is used to allow the server to wait for the actual connection from a client process.

The CONNECT () call format is as follows:

Int Pascal Far Connect (Socket S, Const Struct Sockaddr Far * Name, INT Namelen);

The parameter S is a local socket descriptor that wants to establish a connection. The parameter Name indicates a pointer to the other side socket address structure. The other side socket address length is described by Namelen.

If there is no error, Connect () returns 0. Otherwise the return value socket_ERROR is returned. In the connection-oriented protocol, the call leads to the actual establishment between the local system and the external system.

Since the address family is always included in the first two bytes of the socket address structure, and via the socket () call is related to a certain protocol family. Therefore, bind () and connect () do not have to agree to be used as parameters.

See 5.2.4 for detailed descriptions of Connect ().

The call format of the accept () is as follows:

Socket Pascal Far Accept (Socket S, Struct SockAddr Far * Addr, Int Far * Addrlen);

The parameter S is the local socket descriptor, which should be adjusted first before the parameters that use the Accept () call. Addr points to the pointer of the client-square socket address structure to receive the address of the connection entity. The exact format of Addr is determined by the address family established when the socket is created. Addrlen is the length of the client's plug-in address (number of bytes). If there is no error, accept () returns a value of a socket type to indicate the descriptor of the received socket. Otherwise, return value invalid_socket.

Accept () is used to facilitate the connection server. Parameters AddR and Addrlen store the client's address information. Before the call, the parameter addr points to an address structure of an initial value, and the initial value of AddRlen is 0; after calling accept (), the server waits to accept the client connection request from the socket number S, and the connection request is The client's connection is sent. When there is a connection request arrives, the Accept () calls the first client plug-in address and length on the request connection queue into the ADDR and AddRlen, and create a new socket number with the same characteristics as S. New sockets can be used to handle the server concurrent request.

See 5.2.1 for a detailed description of Accept ().

Four socket system calls, socket (), bind (), connect (), acception (), can complete a complete five-yuan related establishment. Socket () Specifies the protocol element in the five-component group, its usage is not related to whether or not the client or server is facing. Bind () Specifies the local binary, namely the local host address, and port number in the five-component group. It is related to whether or not the connection is connected: in the server side, whether it is connected, it is necessary to call bind (); key    Households, if you use a connection, you can do not call (), and automate is automatically completed by connect (). If you use no connection, the client must use bind () to get a unique address. The above discussion is only in terms of customer / server mode, and the use of the socket is very flexible. The only principle that needs to follow is before the process communication, it must establish a complete correlation.

转载请注明原文地址:https://www.9cbs.com/read-19854.html

New Post(0)