Three client source code analysis: StorageWrapper class
Author: Ma Ying-jeou
Date: 2004-6-30
StorageWrapper role: Put the file pieces further into a sub-piece, and send the Request message for these sub-wraps. After obtaining a sub-piece, the data is written to the disk.
Please closely in the analysis of the Storage class.
Some explanations:
1. In order to obtain transmission performance, BT cuts the file fragment into multiple sub-pieces.
2, BT is to get a sub-piece, need to send a Request message to the peer with the sub-fragment (for the request message, see "BT Protocol Specification").
3. For example, a piece of 256K size, the index number is 10, which is divided into a 16 16K size sub-piece. Then you need to generate a Request message for these 16 sub-pieces. These Request messages are saved in the inactive_requests in the form of LIST before being issued. For example, for this piece, it is stored in the place where the inactive_requests subscript is 10 (segment index number), and the value is the following List: [(0, 16K), (16K), (32K, 16K), (48K, 16K), (64K, 16K), (80K, 16K), (112K, 16K), (128K, 16K), (144K, 16K), (160K, 16K), (176K, 16K) (192K, 16K), (208K, 16K), (240K, 16K)]. This processing is in the _make_inactive () function. Because these requests have not been sent, it is called INACTIVE REQUEST (not activated request). If a request is sent out, it is called Active Request. Request number recorded for each piece has been sent in Numactive. If you receive a sub-piece, the number of Active Request is reduced by 1. Amount_inactive records the size of the sub-chip breaks that have not been issued.
4. Write a disk whenever a sub-segment is obtained. If the segment belongs to the sub-slice is not allocated on the disk, then you need to allocate space for the entire interrupt. How to allocate space for fragmentation? This is the most difficult code of the most difficult understanding in the StorageWrapper class. This "spatial allocation algorithm" is very simple, but in the case where there is no comment, I will take the code, I spent a few days. Specific algorithm analysis, please read _piece_came_in () comments.
Class StorageWrapper:
DEF __INIT __ (Self, Storage, Request_size, Hashes,
Piece_Size, Finished, Failed,
StatusFunc = Dummy_Status, Flag = Event (), Check_Hashes = True,
Data_flunked = Dummy_Data_flunked):
Self.Storage = Storage # Storage object
Self.Request_size = request_size # sub-slice size
SELF.HASHES = Hashes # File Summary Information
Self.piece_size = piece_size # Piece size
Self.Data_flunked = data_flunked # a function to check the integrity of the pieces
SELF.TOTAL_LENGTH = storage.get_total_length () # file total size Self.amount_left = Self.total_Length # Unsecured file size
# 文件 总 总 大 有 有 有 检查 检查 检查
# Because the last piece length may be less than PIECE_SIZE
IF self.total_length <= piece_size * (len (have (have hamp) - 1):
Raise ValueError, 'Bad Data from Tracker - Total Too Small'
IF Self.Total_Length> Piece_Size * LEN (HASHES):
Raise ValueError, 'Bad Data from Tracker - Total Too Big'
# Two events, distribution when downloading and download failure
Self.finished = finished
Self.failed = failed
The role of these variables has been introduced earlier.
Self.numactive = [0] * len (hashes)
INACTIVE_REQUEST
The value of Inactive_Requests is all initialized to 1, which means that each piece needs to be sent. Behind the disk file check, those already obtained, in inactive_requests, NONE, indicating that it is not necessary to send Request to these pieces.
Self.inactive_Requests = [1] * len (havehes)
SELF.AMOUNT_INACTIVE = SELF.TOTAL_LENGTH
# Do you enter your Endgame mode? About Endgame mode, there is a presentation in "INCENTIVES Build Robustness in Bittorrent". It will be seen later that after the request is requested for the last "sub-fragment", enter the Endgame mode.
Self.endgame = false
Self.Have = bitfield (Len (Hashes))
# 该否 是 完 检查 完 完
Self.waschecked = [check_haashs] * len (have Hashes)
These two variables are used for "spatial allocation algorithm"
Self.Place = {}
Self.holes = []
if Len (Hashes) == 0:
Finished ()
Return
Targets = {}
Total = len (hashes)
# Check every piece ,,,
For i in xrange (len (hashes):
# If the disk is not completely assigned a space for this piece, then this piece needs to be downloaded, add one item in the Targets dictionary (if you already exist, you don't have to add), its keyword (key) is the piece Summary, its value is a list, and the index number of this piece is added to this list.
This once makes me very confused because I have always used a different piece of file to have different summary values. Later, I wanted to understand, that is: Two different file pieces may have the same summary value. Is not it? As long as the content of these two pieces is the same.
This is very important for later analysis.
IF Not Self._waspre (i):
Targets.SetDefault (Hashes [i], []). Append (i) total - = 1
NumChecked = 0.0
IF Total and Check_haamp:
StatusFunc ({"Activity": 'Checking EXISTING FILE', "FRACTIONDONE": 0})
# This is a function in embedded in the function. In C , there can be an internal class, but it seems that there is no statement of internal functions. This function can only be used inside __init __ ().
This function is confirmed after a fragment is confirmed.
# pECE: Search of the pieces
# POS: This piece is stored on disk
For example, the clip 5 may be stored in the position of the piezing 2. See the "Spatial Assignment Algorithm" later
Def Markgot (Piece, POS, Self = Self, Check_Hashes = Check_Hashes):
Self.Place [Piece] = POS
Self.have [Piece] = TRUE
SELF.AMOUNT_LEFT - = Self._piecelen (PIECE)
SELF.AMOUNT_INACTIVE - = Self._piecelen (PIECE)
No need to send the request message for this piece.
Self.inactive_Requests [PIECE] = NONE
Self.waschecked [Piece] = Check_Hashes
Lastlen = Self._piecelen (Len (Hashes) - 1) # The length of the last piece
# 对 片 片
For i in xrange (len (hashes):
# If the disk is not completely assigned a space for this piece, then add the index number of the piece in Holes.
IF Not Self._waspre (i):
Self.holes.Append (i)
# Otherwise, that is, the space has been assigned. But it is still not guaranteed that this piece is fully obtained. As mentioned when analyzing Storage, there may be "empty cave"
# If you don't need to check, simply call Markgot () indicates that the piece has been obtained. This is obviously an irresponsible approach.
Elif Not Check_Hashes:
Markgot (i, i)
# If you need a validity check
Else:
SHA is a Python built-in module that encapsulates the SHA-1 abstract algorithm. The SHA-1 summary algorithm calculates a number of any long data to draw a message summary of 160 bit (which is 20 bytes). In the Torrent file, the message summary of each piece is saved. After receiving a file snippet, the recipient calcifies a message summary, and then compared to the corresponding value in the Torrent file, if the result is inconsistent, the data has changed during transmission, such data should be discarded.
Here, first, depending on the starting position of the clip I, a segment of the Lastlen is constructed to construct an SHA object.
SH = SHA (Self.Storage.Read (Piece_Size * i, lastle))
Summary of the message calculating this data
sp = sh.digest ()
Then, update the SHA object, note that it is updated according to the data left by the clip i. See the help of sha :: update (), see Python's help. If there are two data A and B, then
SH = SHA (a)
SH.UPDATE (B), equivalent to sh = SHA (A B)
So, the expression below is equal to
Sh.Update (Self.Storage.Read (Piece_SIZE * i, Self._piecelen (i)))))
Sh.Update (Self.Storage.Read (Piece_Size * i Lastlen, Self._piecelen (i) - Lastlen)
Therefore, this calculation is the summary of the piezage I
(Original confusion: Why not directly calculate the summary of i, do you want to get wrap? Later, after the "Spatial Assignment Algorithm", there is no problem with this late code.)
s = sh.digest ()
If the calculated summary is consistent with Hashes [i] (the latter is obtained from the Torrent file), then this piece is valid and existing on the disk.
IF s == Hashes [i]:
Markgot (i, i)
Elif Targets.get (s)
And self._piecelen (i) == Self._piecelen (targets [s] [- 1]):
Markgot (targets [s] .pop (), i)
Elif Not Self.Have [Len (Hashes) - 1]
And sp == Hashes [-1]
AND (i == LEN (HASHES) - 1 or not self._waspre (len (have hshes) - 1)):
Markgot (Len (Hashes) - 1, i)
Else:
Self.Place [i] = i
IF flag.isset ():
Return
NumChecked = 1
StatusFunc ({'FractONE': 1 - FLOAT (Self.Amount_left) / Self.Total_Length})
# If all pieces are overloaded, then end.
IF self.amount_left == 0:
Finished ()
# Check some piece, have been assigned a space on the disk, calling Storage :: Was_PREAllOcated ()
Def_waspre (Self, Piece):
Return self.Storage.was_preallocated (Piece * Self.piece_size,
Self._piecelen (PIECE))
# Get the length of the specified piece, only the last piece size may be less than PIECE_SIZE
Def _Piecelen (Self, PIECE):
IF piece Return Self.piece_size Else: RETURN SELF.TOTAL_LENGTH - PIECE * SELF.PIECE_SIZE # Return the size of the remaining file DEF get_AMOUNT_LEFT (Self): Return Self.Amount_left # Judging whether some file pieces have been obtained DEF DO_I_HAVE_AVETHING (Self): Return Self.Amount_left # Cut the specified piece to "sub-piece" DEF _MAKE_INACTIVE (Self, INDEX): # First get the length of this piece Length = min (self.piece_size, self.total_length - self.piece_size * index) l = [] X = 0 # In order to achieve better transmission performance, BT divides each file to smaller "sub-pieces", we can find the "sub-slice" definition in the Download.py file, find the "sub-slice" definition: 'Download_SLICE_SIZE', 2 ** 14, "How Many Bytes to Query for Per Request." The "sub-slice" size defined here is 16K. This loop below is the process of further cutting a piece to "sub-fragment". While X Self.Request_size L.Append (x, self.request_size)) X = self.Request_size L.Append (x, length - x))) # Save the L in the list of inactive_requests Self.inactive_Requests [index] = L # Is it in an endgame mode, About Endgame mode, participate in "INCENTIVES Build Robustness in Bittorrent" DEF is_ENDGAME (Self): Return Self.Endgame DEF get_have_list (self): Return self.have.tostring () DEF DO_I_HAVE (Self, INDEX): Return self.have [index] # The specified piece, is there any request not issued? If there is, returns true, otherwise returns false. DEF DO_I_HAVE_REQUESTS (Self, INDEX): Return NOT NOT SELF.INAACTIVE_REQUESTS [INDEX] Create a Request message for the specified clip, returning is a binary group, for example (32k, 16K), indicating that the starting position of "sub-pieces" is 32K, and the size is 16K. Def new_request (self, index): # Returns (Begin, Length) # If you haven't created the request for the clip yet. So call _make_inactive () Create a Request list. (Inactive_Requests [index] initialization value is 1) if self.inactive_requests [index] == 1: Self._make_inactive (Index) # Numactive [index] How many requests have been issued for this piece. Self.numactive [index] = 1 RS = Self.inactive_Requests [index] # From inactive_request to the minimum REQUEST (that is, the starting position is minimum). R = min (RS) Rs.Remove (R) # Amount_inactive Record the size of the sub-wrapped bracket that has not been issued. Self.Amount_inactive - = r [1] # If this is the last "sub-piece", then enter the endgame mode if self.amount_inactive == 0: Self.endgame = T.Rue # 回 这个 res Return R DEF PIECE_CAME_IN (Self, INDEX, Begin, Piece): TRY: Return self._piece_came_in (index, begin, piece) Except Ioerror, E: Self.Failed ('IO Error' Str (E)) Return True If a "sub-piece" is obtained, then this function is called. Index: The index number of the pieces of "child pieces", Begin: "sub-piece" starting position in the piece, PIECE: actual data DEF _PIECE_CAME_IN (Self, INDEX, Begin, Piece): # If any "sub-piece" in this piece is not obtained, the first need to allocate space on the disk. The algorithm for spatial allocation is as follows: Suppose there is a total of 6 pieces, now there is 0, 1, 4 three pieces allocated space, then Holes: [2, 3, 5] Places: {0: 0, 1: 1, 4: 4} Now you have to assign space for a piece 5, and the idea is to temporarily assign the space of the pieces 5 to the space that should be in the space 2. After allocation, Holes: [3, 5] Places: {0: 0, 1: 1, 4: 4, 5: 2} Suppose the next step is to dispense the space, because the space of 2 has been occupied by 5, so the data of 5 is transferred to 3, and 2 can use their own space. After allocation, Holes: [5] Places: {0: 0, 1: 1, 2: 2, 4: 4, 5: 3} Finally, for 3 allocation space, because the space is occupied by 5, so the 5 data is transferred to the 5 own space, 3 can use your own space. After allocation, Holes: [] Places: {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5} The following is a more embarrassing code, which is implemented is this spatial allocation algorithm. If not self.places.has_key (index): n = self.holes.pop (0) if Self.Places.has_Key (N): Oldpos = Self.Place [N] Old = self.Storage.read (Self.piece_Size * Oldpos, Self._piecelen (n)) IF self.have [n] and sha (old) .digest ()! = self.hashes [n]: Self.Failed ('Data Corrupted On Disk - Maybe you have twoless pning?') Return True Self.Storage.write (Self.piece_Size * n, old) Self.Place [n] = n IF index == Oldpos or index in self.holes: Self.Place [index] = Oldpos Else: For p, v in self.places.Items (): IF v == INDEX: Break Self.Place [index] = index Self.Place [P] = Oldpos Old = Self.Storage.Read (Self.piece_SIZE * INDEX, SELF.PIECE_SIZE) Self.Storage.write (Self.piece_Size * Oldpos, OLD) Elif index in self.holes or index == N: IF not self._waspre (n): self.Storage.write (Self.piece_SIZE * N, Self._piecelen (n) * chr (0xff)) Self.places [index] = N Else: For p, v in self.places.Items (): IF v == INDEX: Break Self.Place [index] = index Self.Place [P] = N Old = self.Storage.read (Self.piece_Size * index, self._piecelen (n)) Self.Storage.write (Self.piece_Size * n, old) # Call Stoarge :: Write () Write this sub-tablet to disk, pay attention to the space written to Places [Index]. Self.Storage.write (Self.Place [INDEX] * Self.piece_Size Begin, Piece # Since a sub-piece, then the number of requests is obviously reduced. Self.numactive [index] - = 1 # If there is neither a Request that has neither been issued, there is no Request (a sub-piece, Numactive [Index], Numactive [index] is 0, indicating that all emitted request has received the response data. Then, it is clear that the entire piece has been all obtained. IF not self.inactive_requests [index] and not self.numactive [index]: Check the effectiveness of the entire piece, if you pass the check IF sha (self.storage.read (self.piece_size * self.placs [index], Self._piecelen (index))))))))))))))))). Digest () == Self.hashes [index]: # "I" already has this piece Self.have [Index] = TRUE Self.inactive_Requests [index] = NONE #I and check validity Self.waschecked [index] = true SELF.AMOUNT_LEFT - = Self._piecelen (INDEX) IF self.amount_left == 0: Self.finished () If there is no check by validity Else: Self.Data_flunked (Self._piecelen (INDEX)) I have to discard this piece Self.inactive_Requests [index] = 1 SELF.AMOUNT_INACTIVE = Self._piecelen (INDEX) Return False Return True # If the request for the acquisition "sub-piece" sent to a peer is lost, then this function is called Def Request_Lost (Self, INDEX, Begin, Length): Self.inactive_Requests [index] .append (begin, length)) SELF.AMOUNT_INACTIVE = Length Self.numactive [index] - = 1 Def get_piece (Self, Index, Begin, Length): TRY: Return Self._Get_piece (Index, Begin, Length) Except Ioerror, E: Self.Failed ('Io Error' Str (E)) Return None Def _GET_PIECE (Self, INDEX, Begin, Length): IF not self.have [index]: Return None IF not self.waschecked [index]: # Check the hash value of the pieces, if you are wrong, return none IF sha (self.storage.read (self.piece_size * self.placs [index], Self._piecelen (Index)))). Digest ()! = Self.hashes [index]: Self.Failed ('Told file completed "up, but pieces failed hash check") Return None # Check through Hash Self.waschecked [index] = true # Check if the length of the "child piece" is off IF Begin Length> Self._piecelen (INDEX): Return None # Call Storage :: read (), read the "sub-piece" data from the disk, the return value is this data. Return self.Storage.read (self.piece_size * self.placs [index] begin, length)