Detailed explanation of the official BitTorrent protocol (formal translation)
Document Description: This translation version is complete by solitary wave independence
Original text http://bitconjurer.org/bittorrent/protocol.html
Author Bram Cohen
Lone wave is interpreted and modified to this translation version
Welcome to post, pay, please bring the header of the file
Non-commercial reference, please note
Detailed explanation of the official BitTorrent protocol
BitTorrent (BT, Bit Mountain) is a file distribution protocol. It recognizes content through URL and seamlessly combines networks. It has advantage compared to the ordinary HTTP protocol, and the downloader downloading a file is downloaded at the same time to upload data, allowing the file source to support a large number of downloaders while adding a large number of downloads.
One BT file distribution requires the following entities:
· A ordinary network server
· A static meta information file ('MetaInfo' File)
· A bt tracker
· A "original" downloader ('ORIGINAL' DOWNLOADER)
· Network terminal browser
· Downloaders of the network terminal
Here, it is assumed that the next file has a plurality of network terminals.
Set up a BT server steps as follows:
1. Start running Tracker (this step has been running);
2. Start running a normal web server program, such as Apache, running this step;
3. Associate the .torrent file on the web server to MIME Type Application / X-Bittorrent (which has been jumped);
4. Create a meta-information file (.torrent file) with the full file to be published and the TRACKER's URL;
5. Place the meta-information file on the web server;
6. Publish the link to the web page (.torrent file);
7. The original downloader began to provide a complete file (original).
The BT download procedure is as follows:
1. Install the BT client program (which is already installed);
2. Internet;
3. Click on a link to the .torrent file;
4. Select a local storage path or select the renewal of the unfinished download;
5. Wait for download to complete;
6. The downloader exits the download (previous downloaders do not stop upload).
The connectivity is as follows:
· The website provides static files, and starts the BitTorrent Helper on the client (hereinafter official client program);
· Tracker instantly receives all downloader information, and gives each downloader? 膒 EER list. Implementation by HTTP or HTTPS protocol;
· Downloaders are registered to Tracker to make it know how everyone's progress is downloaded with each of those directly connected PEERs. These connections follow the BitTorrent Peer protocol and communicate via TCP protocol.
· The original downloader only uploads the unloaded, he has the entire file, so it is necessary to transmit all parts of the file in the network. In some popular downloads, the original downloaders can often quit upload after a shorter time, because many downloads have been completed and may still run (at this time, it is equivalent to the original downloader then providing upload).
The meta-information file and Tracker response information are transmitted in a simple and efficient scalable format (Bencoding, B coded format). B The encoded information is the nesting of the dictionary and the list (like? YTHON), these dictionaries and lists contain strings and integer data. Its scalability is because there is a negligible key value (key) in the dictionary, so additional optional key values can also be added later. B The rules of the coding are as follows:
· String is represented as a prefix ten-based string length to retrore the original string.
Such as 4: Spam is equivalent to 'spam'.
· The representation of integer data is that the front adds 'I' is added to 'e' is a decimal number, such as I3E is equivalent to 3, i-3e is -3. The integer data has no length limit. I-0E is invalid, all of the other I0E starts in addition to I0E, which is 0e, which is ineffective.
• The list is encoded as a 'L' beginning, followed by the items it contains (already encoded), and then adds a 'E', such as L4: SPAM4: EggSe is equal to ['spam', 'eggs'].
· The dictionary is encoded as a 'd' beginning, and the latter is a key value (key) and its corresponding value turntable, and finally add a 'E'.
Such as: D3: COW3: MOO4: SPAM4: Eggse equivalent to {'COW': 'Moo', 'spam': 'Eggs'}
D4: SPAML1: A1: BEE is equivalent to {'spam': ['a', 'b']}
The key value must be a processed string (encoded with the original string instead of a digital letter mixed).
The meta-information file is a dictionary having the following key values (the word in parentheses is the key value in the parentheses, the key value, the same):
Announce (declaration)
Tracker's URL.
INFO
This key value corresponds to a dictionary containing the following key values:
The key value Name corresponds to a string, represents the name of the default download file (or save the directory). It is pure suggestive.
The number of bytes of the block split by the key value Piece Length corresponds to the file. For transmission needs, the file is split into a block size, except that the last piece may be cut off and small (the remaining size is less than one block length). The block length is generally 2 weight, most of the 18th power of the block is 256k (BitTorrent official version 3.2 previous default value is 1M, 2 power).
Key Value PIECES corresponds to a string, this string length is a multiple of 20. It can be divided into multiple strings of each 20 bytes, respectively correspond to the SHA1 check code (HASH) of the block in the corresponding index.
There is also a key value Length and Files (files), they can't appear simultaneously. When Length appears, this meta information file is just providing single file download (The Multi-File Case) is described in a directory.
In the case of single file, the number of words that Length corresponds to the length of the file.
Multi-file cases are considered to be a large file download in the order in the file list, and the key value FILES is a list of dictionaries, of which each dictionary contains the following key values: Length (length)
The number of bytes in the file length.
Path (path)
A list containing a string, a string is a sub-directory name, and the last string is the file name of the file itself.
(Length form form for a length is wrong.)
In the case of single file, the key value name is the file name; in the case of multi-file cases, it has become a directory name.
The Tracker challenge is two-way. Tracker obtains information through the GET parameter of the HTTP protocol, then returns a B encoded information. Although Tracker needs to be executed on its own server, it runs smoothly like an embedded module in Apache.
Tracker's GET request has the following key value:
INFO_HASH
The 20-byte long SHA1 verification code is the string in the INFO value in the meta-information file for B encoding, which is a branch of the meta-information file. This value must be automatically converted.
Peer_id
A 20-byte long string, is the new download of each user? D. This value must also be automatically converted.
IP
A non-powerful parameter (no) gives the IP (or DNS host name) in Peer, which is usually used to distribute files after the original downloader of the machine.
port
Listening port, the official default is to start trial from the 6881 port. If the port is occupied, push one port in turn until the idle port is found, and the 6889 port is not found.
Uploaded
At present, the total amount of transmission is encoded as a decimal ASCII code.
Downloaded
At present, the total download amount is encoded as a decimal ASCII code.
Left
Also download the number of bytes, encoding as a decimal ASCII code. This number cannot pass the file length and the number of downloaded numbers, because the file may be resumed, and there may be some downloaded data that cannot be re-downloaded through the integrity check.
Event
This is a non-mandatory key value, there is STARTED, Completed or Stopped (or Empty, equivalent to not running) three values. If there is no such key value, the declaration of the download status will also be issued regularly from the downloader. When you start downloading, you will send a started value to complete the download. When the file is complete, then no completed is issued, and the downloader will issue stopped when the downloader is aborted.
Tracker's response is also a B-coded dictionary. If the Tracker responds to key Value Failure Reason (failure reason), the reason why the challenge failure is explained, and other key values are required. Otherwise, there must be two key values: Interval corresponds to the number of interval seconds of the downloader regularly; peers, peers is a list containing dictionaries, each dictionary corresponding to a peer, contains key value peer ID, IP and Port, corresponding to the peer optional ID, IP address, or DNS hostname string and port number. Remember if the downloader has an unexpected event or wants more list of Peer, the downloader will reach the request.
If you want to extend the meta-information file or Tracker challenge, coordinate with Bram Cohen to ensure that all extensions are compatible.
The PEER protocol of BitTorrent is operated by TCP protocol. It does not need to adjust any Socket options to efficiently.
The connection between Peer is symmetrical. The information received in both directions looks the same, and the data can flow into either party. The Peer Protocol is to say that a downloader downloads from zero, and each of the indexes described in the meta-information file is consistent, this block is obtained to all PEER declarations.
The two terminals have 2 state indicators, blocked or not, the chokes are indicated that the data is not issued to the other party before the recovery is not emitted. Blocking reasoning and technical issues will be mentioned later.
Data transmission happens to the other party and the other party does not block him. Focus on state must be unanimous - if a PEER that does not block does not have data needed, others will lose attention, whether they are blocked by this Peer. It may be somewhat awkward, but this can you know which Peer can start downloading immediately after blocking disappears.
When the connection starts, no matter whether the downloader is blocked and attacked by the other party.
When the data is transmitted, the downloader immediately places a queue to obtain a queue to get a better TCP performance (this is called "pipeline operation"). On the other hand, it is not possible to write to the TCP buffer immediately to arrange queues in memory, rather than in an application-level network buffer, once blocked, discard all of these requests.
The Peer Connection Agreement includes a handshake followed by constant data streams, and the data flow is from the prefix length. The handshake starts with character nineteenth (decimal), follows the string 'bittorrent protocol'. The character of the beginning is a length prefix, and I hope that other new protocols can be done in this way.
All data for all feed protocols are encoded to a 4-byte larger (FOUR BYTES BIG-Endian).
After the header data is now all bytes that are all reserved for 0. If you want to change the 8 reserved bytes to extend the protocol, coordinate with BRAM Cohen to ensure that all extensions are compatible.
It is then the 20-byte SHA1 verification code (INFO_HASH) in the INFO value in the meta-information file is the same as the value of the Tracker, but here is the original value there is a reference). If the value of both parties is different, the connection is disconnected. One exception is that the downloader wants to make multiple connection downloads with one port, they will first download the verification code from the access connection, then the same in the list, the same response.
After downloading the verification code is a 20-byte peer ID, which is included in the Tracker to respond to the downloader's peer list, which is made in the request of Tracker. If the acceptor Peer ID does not meet the sender's hope, the connection is disconnected.
Handshake. Then the length of the data stream appearing in length prefix and information. Zero length information is used to keep the connection and is ignored. This information is typically sent once in 2 minutes, but it is easy to happen during waiting for data.
All bytes that do not keep the connection information are given, and the possible values are as follows:
· 0-blocking
· 1-smooth
· 2- is concerned
· 3- Not being concerned
· 4- already
· 5-bit group
· 6-request
· 7-block
· 8- Cancel
"Blocking", "Tong Chang", "Followed" and "Not Followed" class information has no load.
The "Bit Group" is sent only as the first message. It loads a bit group, and the downloader already has the index serial number of the block set the value of the number of serial numbers to 1, and the other is 0. The downloader without any data is started to download the "Bit" class information. The first byte is high to the low position corresponding index number 0-7, so that the second byte corresponds to the index number 8-15, and so on. The remaining bit of the tail is set to 0.
"Existing" class information load is one number, that is, the index serial number corresponding to the block of the perchart code is just downloaded.
"Request" class information includes a index number, start number, and length. The latter is the number of offset bytes. The length is generally 2 weight unless the end of the file is truncated. The current general is a 15 power of 2, and a 17-power connection that is less than 2 is closed. "Cancel" class information load and "request" information have the same load. It is usually issued in the "endgame mode" in the download close completion. When the download is fast, there are several blocks to download from the same communication circuit, which will be slow. In order to ensure the residual block download, once the remaining blocks have not been issued to anyone, first send a request for all the remaining blocks from the connector of the other party download data. To avoid inefficiency, whenever a block download is complete, issue cancellation information to other Peer.
"Block" information contains an index sequence number, start number, and block. Remember that it and the "request" class information are related. When the transmission speed is very slow or "block" "through" class information, it can be used simultaneously or both at the same time, it may be contained in a block.
Downloaders The order of downloading blocks is? 卣哂 渌鸓 EER only has the same block set or supercoming.
There are many reasons for making the decision to block others. Congestion Control of the TCP protocol behaves extremely poor in the process of sending information to multiple connections. At the same time, the presence of blocked enables the downloaders to ensure consistent download rate.
The blocking algorithm described below is the current base configuration. It is important that all new algorithms are not only to run well in the network that contains all extended algorithms, but also in the network mainly containing this base algorithm.
A excellent blocking algorithm has many standards. It must block certain uploads to achieve good TCP performance, and avoid frequent clogging and smooth alternating, so-called "fiberization". It should be exchanged to the peer to yourself with data. Finally, it should occasionally try to connect with unconnected Peer, find out more than existing connection, this is called an attemistic unchoking.
The currently configured blocking algorithm avoids the means of fibrosis is converted every 10 seconds to the blocked Peer. Dredging 4 you pay attention and can get the highest download rate of Peer from them to upload to get the return data, which will close some data exchange and upload. There is a higher speed rate but the peer that is not concerned is dredge, once these peer starts the downloader being followed, the peer of those uploaded is blocked. If the downloader has a complete file, he decides who the connection with his own upload rate rather than download.
There is a peer dredge in any attempt to dredge, regardless of his upload rate (if he is concerned, he will become one of the peers that provide downloaded PEER). This peer that is tried to be dredges once every 30 seconds. Peer, who just started to download, three times the opportunity to be tried to be dredging so that they got upload a whole chance (thus gain data repayment).
Some terms control:
Protocol protocol
'MetaInfo' file meta information file (.torrent file)
'Original' Downloader "Original" Downloaders
Key key value
Bencoding B Code format
SHA1 HASH SHA1 check code
PIECE block
Single File Case Single File Download
Multi-File Case multi-file download
Choking to block other people's behavior
Choked is blocked by others
INTERESTED is concerned
Four Bytes Big-Endian four bytes of termination end (to be determined)
Contemsis Control Cutting Control (to be determined)
Optimistic UNCHOKING Try sexual dredge