Tracker Server Source Code Analysis 4: Tracker Class
Author: Ma Ying-jeou
Date: 2004-6-10
This article analyzes the TRACKER class, which is in the track.py file.
Before analyzing, we review the content of the previous articles, to clarify the ideas.
The source code of the BT can be mainly divided into two parts, part of which is used to implement the Tracker server, and the other part is used to implement the BT client. Our series of articles are expanded around the implementation of the Tracker server.
BT clients and Tracker servers communicate over the TRACK HTTP protocol, and BT clients communicate with BT peer protocols.
The TRACKER server responsibility is to collect the information of the client and help the client discover each other, so that the client can establish a connection to each other, and can download the desired file pieces to each other.
When implementing the TRACKER server, first, the functionality of the network server is first implemented by the Rawserver class, and then the first layer analysis of protocol data is completed by the HTTPHandler class. Because the Track HTTP protocol interacts in the form of an HTTP protocol, Httphandler performs the first layer of the client in accordance with the HTTP protocol (which is the URL and HTTP message header), and then further hand over the URL and HTTP message to Tracker The class has been analyzed for the second layer, and the results of the analysis are sent to the client after the format of the HTTP protocol.
The TRACKER class is analyzed for the Track HTTP protocol, which further obtains the client's information based on the URL and the HTTP message header after the first layer, and the client's IP address, port, and data overloaded data, and the remaining data. ), Then synthesize all the current downloaders, generate a list, this list records information downloaded by other downloaders of the same file (but not all the downloaders, just select some), and hand over this list to httphandler, It is further returned to the client by it.
In this way, the implementation of the entire Tracker server is relatively clarified at level.
To analyze the Tracker class, first understand the "status file".
l Status file:
In the first article, we said, to start a Tracker server, at least one parameter is to specify, is the status file. In the initialization function of Tracker, it is mainly to read the specified status file and do some initialization work according to the file. So you must figure out the role of status files:
1. The role of the status file:
The TRACKER server stops because some accidents are stopped, but all the downloaders not only can continue to download, but also the previous efforts have worked hard. This situation cannot be tolerated, so it is necessary to ensure that all downloaders continue to work after Tracker restart. The TRACKER server periodically saves the necessary download status information in the current system to the status file, and when it stops, when it is restarted, it can restore the "site" according to this information, so that the downloader can continue to download .
2. Status file format:
The status file information corresponds to a more complex 4-level nesting dictionary.
To analyze this dictionary type in detail, you must understand a little: a Tracker server, you can provide services to several batch downloaders downloading different files at the same time.
We know that a group of downloaders downloading the same file, they inevitably have the same Torrent file, they can find the same Tracker server according to the Torrent file. And a batch of downloaders downloading another file inevitably hold another Torrent file, but these two different Torrent files may point to the same TRACKER server. So "A Tracker server, you can provide services to several batches of downloaders downloading different files." In fact, those sites that provide BT downloads are set up some special TRACKER server, each server can be more Document provides a download tracking service.
Understand this, we continue to analyze the format of status files.
The first class dictionary:
In the initialization function of Tracker, there is such a code,
IF exissrs (self.dfile):
h = open (Self.dfile, 'RB')
DS = H.Read ()
H.close ()
TempState = BDECode (DS)
Else:
TempState = {}
This code is to read information from the status file. Since it is read by Bencoding encoded data, it is necessary to decode, decoded to get a dictionary type data, saved to Template, this is the first Level Dictionary. It has two keywords, peers, and completed, which are used to record the information of the PEER participating in the download and the information that has completed the downloaded peer (any peer that appears in Completed, must also appear in Peers). The data types corresponding to these two keywords are the dictionary, and we focus on the second-level dictionary corresponding to the PEERS key.
Secondary Dictionary:
Keywords: Sha hash in the Torrent file in the INFO section
Data: Third Level Dictionary
A downloaded file, the only Torrent file identifier, Tracker is a 20-byte string by calculating SHA HASH in the INFO section in the Torrent file, which can uniquely identify information of the downloaded file. The second class dictionary uses this string as a keyword, saves the information downloaders downloading this file.
The third class dictionary:
Keywords: Downloader's peer ID
Data: Fourth Dictionary
Explanation: Each downloader creates a single string that identifies your 20-byte string called Peer ID. The third-level dictionary saves the information of each downloader in a second-level dictionary.
The fourth class dictionary:
Keywords: IP, port, left, etc.
Data: Save the downloader's IP address, port number and un downloaded bytes
There are also two optional keywords GIVEN IP and NAT, which are used for NAT, about NAT, and will be mentioned later.
Understand this 4-class nesting dictionary, the analysis of Tracker continues to go.
Below we look at the member function of the Tracker class.
l Initialization function __init __ ():
Starting to be some parameters initialization, which is more difficult to understand:
Self.Response_size = config ['response_size']
Self.max_give = config ['max_give']
To understand these two parameters, you must see the explanation of the "Numwant" keyword in the more detailed BT protocol specification:
· Numwant: Optional Number of peers that the client would like to receive from the tracker This value is permitted to be zero If omitted, typically defaults to 50 peers.If a client wants a large peer list in the response, then it... Should Specify The Numwanted Parameter.
It means that by default, the Tracker server responds to the peers of the downloader's response is response_size, but sometimes, the downloader may want to get more PEERS information, then it must contain Numwant keywords in the request, and specify I hope to get the number of Peers. For example, 300, Tracker takes a smaller one in 300 and max_give, as the number of PEERS returns to the downloader.
Self.natcheck = config ['NAT_CHECK']
Self.only_local_override_ip = config ['Only_Local_Override_ip']
These two parameters are related to NAT, we must finally have to say NAT.
We know, if a BT client is in the local area network, then connect to the Tracker server through NAT, then the IP address of the client obtained from the connection is a public IP, if other clients are trying to connect by this IP The client is definitely rejected by NAT.
Through some NAT crossing technologies, in some cases, some clients can pass through NAT, establish a connection with clients in the local area network, I have been posted on the forum, everyone is interested. Take a look. It turns out that BT has also used some NAT through technology, but it is now found that it may be complicated in technology, and it is not guaranteed to be effective in any case.
Let's look at the interpretation of "IP" keywords in the more detailed agreement:
· Ip: Optional The true IP address of the client machine, in dotted quad format Notes:... In general this parameter is not necessary as the address of the client can be determined from the IP address from which the HTTP request came The parameter is only needed in the case where the IP address that the request came in on is not the IP address of the client. This happens if the client is communicating to the tracker through a proxy (or a transparent web proxy / cache.) It also is necessary when both the client and the tracker are on the same local side of a NAT gateway. The reason for this is that otherwise the tracker would give out the internal (RFC1918) address of the client, which is not routeable. Therefore the client must explicitly state its (external, routeable) IP address to be given out to external peers. Various trackers treat this parameter differently. Some only honor it only if the IP address that the request came in on is in RFC1918 space. Others honor it unconditionally W Hile Others Ignore It Comsetly. In the request of the client to the Tracker server, "IP" may contain "IP", which is to specify its own IP address. You may have questions, why should the client notify the TRACKER server's own IP address? The TRACKER server can get this IP from the connection. Well, the actual network situation is very complicated. If the client is connected after the local area through NAT, or the client is connected to the Tracker server after passing a proxy server, then the IP address obtained from the connection Not the real IP address of the client, in order to get real IP, you must let the client actively inform TRACKER in the protocol. Therefore, two IP addresses appear, one is the IP address obtained from the connection, I called "Connect IP", the other is the client that the client passed through the IP, I called it "real IP". Obviously, Tracker should record the "true IP" of the client and notify this "real IP" to other downloaders.
This "IP" parameter is also optional, that is, if the client has a public IP, and does not pass NAT or proxy, then it does not need to pass this parameter, "Connect IP" is "true IP ".
According to the statement, "IP" parameters are useful in two cases:
1. The client may have a public IP, but it is also connected to the Tracker server through a proxy server, which needs to pass "IP". 2, the client is in a local area network, just in Tracker is also in the same LAN,. . . (How will this situation? I haven't figured it out :)
Tall back to see Natcheck and ONLOCAL_OVERRIDE_IP,
Natcheck: How Many Times To Check if A Downloader Is Behind A Nat (0 = DON 'CHECK)
ONLY_LOCAL_OVERRIDE_IP: If IP passed from the GET parameter is a public IP, is it ignored? Its default is 1.
It's still not well understood, when we look back, you will understand it.
Self.becache1 = {}
Self.becache2 = {}
Self.cache1 = {}
Self.cache2 = {}
Self.Times = {}
There are 5 dictionaries here, where Times is used, and what is the role of other 4 dictionaries?
Well, let's take a look at the "BT Transplant Mail List", a post from BRAM Cohen,
There Are Two New Get Parameters for the Tracker In The Latest Release. They is -
key = xxxx -.. this is like peer id, but it's only known to the client and the tracker It allows clients to be behind dynamic IP If a peer announced a key previously, then it's accepted if and only if it gives the same key again. If no key was given, then the fallback is checking that the IP has not changed. If the IP has changed, mainline currently will give a peer list but not change any data related to that peer, so that peers behind dynamic IP .
compact = 1 - when a client sends this, the 'peers' return value is a single string whose length is a multiple of 6 rather than a dict To extract peer information from the string, chop it into substrings of length 6. For each. substring, the first four bytes are the IP and the last two are the port, encoded big-endian. This results in huge bandwidth savings.Everybody developing ports should implement these keys, they're very useful.
-BRAM
BT is continuously developed, so the agreement is also developed, and two keywords are introduced, one of which is Compact, if compact = 1 in the client request, it means compact mode, that is, Tracker to the client The response data uses a more compact form than the original, which can effectively save bandwidth.
Becache1 and Cache1 are used in normal mode, while Becache2 and Cache2 are used in compact patterns. We can immediately see their initialization operations.
IF exissrs (self.dfile):
h = open (Self.dfile, 'RB')
DS = H.Read ()
H.close ()
TempState = BDECode (DS)
Else:
TempState = {}
IF TempState.has_Key ('Peers'):
Self.State = TempState
Else:
Self.State = {}
Self.State ['Peers'] = TempState
Self.Downloads = Self.State.SetDefault ('peers', {})
Self.completed = Self.State.SetDefault ('completed', {})
StateFileTemplate (Self.State)
This part of the code is to read status files, initialize Downloads, and Completed two dictionaries, and check if the read data is valid.
Now, Downloads is saved for all downloaders, and Completed saves all information about the downloaders that complete the download.
For x, dlin self.downloads.Items ():
Self.Times [x] = {}
For Y, Dat in DL.Items ():
Self.Times [x] [y] = 0
IF not Dat.get ('Nat', 1):
IP = dat ['ip']
GIP = DAT.GET ('GIVEN IP')
IF GIP and IS_VALID_IPV4 (GIP) AND (not self.only_local_override_ip or is_local_ip (ip)):
IP = GIP
Self.becache1.setdefault (x, {}) [Y] = Bencached (Bencode ({'IP': IP, 'Port': DAT ['Port'], 'peer id': y}) Self.Becache2. SetDefault (x, {}) [y] = compact_peer_info (IP, DAT ['Port'])
Here, Times, Becache1, Becache2 is initialized. They are 2-level nested dictionaries. The first level is the HASH of the Info section in the Torrent file. The second-level keyword is the downloader's peer ID. Becache1 saves a Bencached object, and Becache2 saved It is a string that is a string that combines IP and port.
After the parameter is set, it is:
Rawserver.add_task (self.save_dfile, self.save_dfile_interval)
Add_task () We have seen many times, which means that every time, you need to call Save_dfile () to save the status file.
After another code, I didn't look carefully, like Allow_Get and ALLOWED_DIR, but also need to see the relevant code to understand, if you look at these parts, I hope to add it.
After initialization, it is the most important thing of Tracker and the longest code: Get ().
l get ():
In a third article, we have seen that after the first layer analysis of the TRACK HTTP protocol is called by Httphandler, the TRACKER:: GET () is called for the second layer analysis. Its parameters are URLs and HTTP messages.
In this function, first call URLPARSE () to parse the URL, such as such a URL:
/ Nannounce?ip=192.168.112.1&port=9999&left=2000
After the analysis, I got the PATH, which is Announce, and the parameters, including:
Ip: 192.168.112.1
Port: 9999
LEFT: 2000
Then, according to the difference in PATH, separately.
In general, the client is sent to the Tracker request, but the path is Announce, but sometimes, the third party may also want to query the status of the Tracker server, then it can request the Tracker server by other PATH, such as Scrape. On some specially providing BT download websites, we can see information such as the updated downloader, seed bit, which is obtained from the Tracker server.
We only see that PATH is the situation of Announce.
The first is to check the validity of the parameters passed by the client, including whether there is an Info_hash keyword? Whether IP address is legal, etc.
then,
IP = connection.get_ip ()
The IP thus obtained is the IP obtained in the connection established by the client and the Tracker server, is "Connecting IP".
Next,
IP_OVERRIDE = 0
IF params.has_key ('ip') and is_valid_ipv4 (params ['ip']) and (not self.only_local_override_ip or is_local_ip (ip): ip_override = 1
The intention of this code is to determine whether to use "real IP" to replace "Connect IP" when the IP address of the client is saved. If IP_OVERRIDE is 1, then "real IP" is saved, that is, "Connect IP" is overridden by "real IP".
The process of analyzing the source code is actually a process of speculating the author's intent, and my speculation is like this:
If the client passes "real IP" from the request, then for Tracker, since the client has reported "real IP", then, of course, it will save "real IP". But if "real IP" is a public IP, and only_local_override_ip = 1, that is, ignoring "real IP" is the case of public IP, then saved "Connect" IP.
Sentence truth, why do I have such a parameter inLy_Local_Override_ip, I still don't understand.
If Peers.has_Key (MyID):
Myinfo = peers [myid]
IF myinfo.has_key ('key'):
IF params.get ('key')! = MyInfo ['key']:
Return (200, 'OK', {'Content-Type': 'Text / Plain', 'Pragma': 'No-Cache'},
Bencode ({'Failure Reason ":' Key Did NOT Match Key Supplied Earlier '))
CONFIRM = 1
Elif MyInfo ['ip'] == IP:
CONFIRM = 1
Else:
CONFIRM = 1
This code involves authentication, I didn't look carefully, about "Key" explanation, please see the post above BRAM Cohen's post.
Next, if the verification is passed, and the event is not "stopped", then the information of the client is saved. If you already exist the information of the client, then update it. Note that IP_OVERRIDE sees the field, that is, if it is overwritten, saved "real IP", otherwise, "Connect IP".
if port == 0:
Peers [myid] ['NAT'] = 2 ** 30
Elif self.natcheck and not ip_override:
TO_NAT = peers [myid] .get ('nat', -1)
if to_nat and to_nat Natcheck (Self.ConnectBack_Result, Infohash, MyID, IP, Port, Self.rawServer) Else: Peers [myid] ['nat'] = 0 The first port == 0 situation, I don't know what it means? The second means that the case to check NAT. Probably means that the Tracker server actively uses the BT peer protocol to hold hands with the client. If the handshake is successful, then the client can be directly connected. This is important. If the Tracker server cannot establish a connection directly to the client, then other downloaders cannot establish a connection with the client. The NATCHECKER class used here is also a Handler class, specific details, and everyone is analyzed. Data = {'interval': self.reannounce_interval} From this to the end, it is based on two different situations in the compact mode and normal mode, and the random PEERS information is returned from Becache1 or Becache2, respectively. Here, we will summarize the use of cache1, becache1, cache2, becache2. I feel that cache1 and cache2 seem to have no effect because they have not seen their two meaning from the code. Becache1 and Becache2 are used to cache PEERS in normal mode and compact mode, respectively. They initialize themselves from the status file; if there is a new peer appearance, it is added to these two cache; if it is a "stopped" event, then the corresponding peer is removed from the cache. Finally, the Tracker obtains the information of the random Peers depending on the situation, and returns to the client. l connectionback_result () This function is used for the NATCHECK class as a callback function. It do some processing based on the results of the Tracker server to establish a connection to the client. The parameter result is to indicate whether the Tracker establishes a connection to the client. If it is successful, it is obvious that the other party is not behind NAT, otherwise it is behind NAT. Record ['Nat'] = 1 This didn't understand, why not directly replad ['NAT'] = 1? Finally, if the connection is established, then update Becache1 and Becache2. The Tracker class is basically analyzed. Some places I have not fully understood, and many places are not deep enough, but I hope to communicate more. At this point, the TRACKER server series analyzed is over. Take a break, I will start writing some analytics articles from the perspective of the BT client. Tired!