1 In recent years, the peer calculation is currently in the academia, the industry has no unified definition for P2P, which lists several commonly used definitions for reference: definition: 1. Peer-to-peer is a type of Internet Network allowing a group of computer users with the same networking program to connect with each other for the purposes of directly accessing files from one another's hard drives. 2, Peer-to-peer networking (P2P) is an application that runs on a personal computer and shares files with other users across the Internet. P2P networks work by connecting individual computers together to share files instead of having to go through a central server. 3, P2P is a distributed network, network participants have to share part of their Hardware resources (processing capabilities, storage capabilities, network connection capabilities, printers, etc.), these shared resources need to provide services and content by network, can be directly accessed by other peers without passing through intermediate entities. Participants in this network are both a resource (service and content) provider (Server), and a resource (service and content) acquisitioner (Client). Although the above definitions are slightly different, the common point is P2P to break the traditional Client / Server (C / S) mode, and the status of each node in the network is equal. Each node acts as both a server, providing services for other nodes, while also enjoying the services provided by other nodes. Comparison of P2P and C / S mode as shown in the following figure:
Client / Server Mode PEER TO Peer Mode P2P technology is characterized in the following aspects. :: The resources and services in the network are dispersed on all nodes, and the implementation of information transmission and service is directly between nodes, and there is no need to intervene in intermediate links and servers to avoid possible bottlenecks. The non-center of P2P has brought its advantages in scalability, and robustness. In the P2P network, as the user's joins, not only the demand for service increases, the overall resources and service capabilities of the system are also expanded in synchronously, which is always easier to meet the needs of users. The entire system is all distributed, there is no bottleneck. Theoretically, its scalability can almost considered unlimited. The P2P architecture is natural with attack and high-fault. Since the service is done between the dispersed, some nodes or networks have been damaged to other parts. The P2P network typically adjusts the overall topology when the partial node is faded, maintaining the connectivity of other nodes. The P2P network is usually established in an organization and allows nodes to be freely added and left. The P2P network can also be continuously adjusted according to the change in network bandwidth, node, and load. / Price ratio: Performance advantage is an important reason for P2P is widely concerned. With the development of hardware technology, the computing and storage capacity of personal computers and network bandwidth are high-speed in accordance with Moore theorem. The P2P architecture can effectively utilize a large amount of normal nodes spread in the Internet, distribute the computational task or storage data to all nodes. Using the purpose of the idle calculation capacity or storage, high performance calculation and mass storage. By using a large number of free resources in the network, you can provide higher computing and storage capabilities with lower cost.
Dedinating (scalability: in robustness: high performance privacy protection: In the P2P network, due to the transmission of information in each node, there is no need to pass a concentration link, the user's privacy information is eavesdropped and leaking. The possibility is greatly reduced. In addition, the current solution of Internet privacy issues primarily uses technical methods for relay forwarding, thereby hiding communication in many network entities. In some of the traditional anonymous communication systems, this mechanism is achieved In some relay server nodes. In P2P, all participants can provide relay forwarding features, which greatly improve the flexibility and reliability of anonymous communications, and can provide better privacy protection for users. Load balancing : P2P Network Environment Since each node is both a server and a client, it has reduced the computational power of traditional C / S structural servers, storage capabilities, while the resources are distributed in multiple nodes, better implementation of the entire network. Load balancing. P2P technology has an unparalleled advantage. At the same time, P2P technology has broad application prospects. All P2P application software is endless, and the number of users has increased dramatically. Data from www.slyck.com in March 2004, The number of users of a large number of P2P software distributes from millions, millions of million to tens of millions and has increased dramatically, and bring huge impact to the Internet bandwidth. P2P computing technology is constantly being applied to military fields, business sectors, government information, communication Waiting for the field. Compared to traditional distributed systems, P2P can be divided into the following types: providing files and other content shared P2P networks, such as Napster, GNUTELLA, EDONKEY, EMULE, BITTORENT, etc .;
Mining P2P peer computing power and storage sharing capabilities, such as SETI @ Home, Avaki, Popular Power, etc .;
Synergistic processing and service sharing platform based on P2P mode, such as JXTA, MAGI, GROOVE, .NET MY Service, etc .;
Instant messaging, including ICQ, OICQ, Yahoo Messenger, etc .;
Safe P2P communication and information sharing, such as Skype, Crowds, or Onion Routing, etc.
1.2 Domestic and Foreign P2P Technology Research Status 1.2.1 Topology Research Topology in the P2P Network refers to the interconnection relationship between the physical or logic between the various computing units in the distributed system, and the topology between the nodes has always been determined to determine the system type. An important basis. At present, the topology, class-class topology, which is widely used in the Internet, and the P2P system generally constructs a non-centralized topology. It is necessary to solve the large number of nodes contained in the system during the construction process. Missing, organize, and determine nodes. The problem of adding / leaving mode, error recovery. P2P research is divided into four forms: Centralized Topology; DECENTRALIZED Unstructure Topology; DECENTRALIZED STRUCTURED TOLOGY, also known as DHT networks) and semi-distributed Topology (Partially Decling Topology). / Server structure is similar, easy to cause single-point fault, "hotspot" phenomenon and legal, etc., which is the structural mode used in the first generation P2P network, and the classic case is the famous MP3 shared software napster. According to the relationship between the topology, the centralized topology is the most advantage of maintaining the simple discovery efficiency. Due to the discovery of resources dependent the centralized directory system, it is found that the algorithm is flexible and efficient and enable complex queries. The biggest problem and traditional client Napster is one of the earliest P2P systems and grows rapidly in the short term. Napster is not a pure P2P system, which saves all the music file indexes and storage locations uploaded by a central server. When a user needs a music file, first connect to the NAPSTER server, retrieve the server, and return user information with the file by the server; then transfer files directly to the owner of the file. Napster first implements the separation of file queries and file transfer, which effectively saves the bandwidth consumption of the central server, reducing the system's file transfer delay. The biggest hidden dangers in this way are on the central server, and if the server fails, the entire system will be paralyzed. When the number of users increases to 105 or higher, NAPSTER's system performance is greatly reduced. Another problem is security, NAPSTER does not provide a valid security mechanism. In the Napster model, a group of high-performance central servers saves directory information for all active peer-to-peer computers in the network. When you need to query a file, the peer chance issues a file query request to a central server. After the central server performs the corresponding retrieval and queries, the list of peer address information that meets the query requirements will be returned. After receiving the response, the query initiating peer is received, and the connection is selected according to the information such as network traffic and delay, and the connection is established and the file transfer is started. The working principle of NAPSTER is shown in Figure 1. There are many problems with such peer network models, mainly as: (1) The paralysis of the central server can easily lead to the low milement, reliability and safety of the entire network. (2) With the expansion of the network size, the cost of maintaining and updating the central index server will increase sharply, and the cost required is too high. (3) The existence of the central server causes disputes on shared resources on copyright issues, and is therefore attacked as unpolaricized P2P network models. For small networks, centralized catalog models account for a certain advantage in management and control. However, in view of its defects, this model is not suitable for large network applications.
NAPSTER Structure Napster Interface Overlay uses a random map of organizational mode, node degrees, from "Power-Law" [A] [B] rules, so that the destination node can be discovered, and the dynamic changes of the network are reflected. The fault tolerance, therefore has better availability. At the same time, complex queries can be supported, such as multi-keyword queries with rule expressions, fuzzy queries, etc., the most typical cases are GNUTELLA. TTL (TIME-to-Live), flooding, random stroll, or selection forwarding algorithm, so the diameter is uncontrolled, and scalable is poor. DHT) [A] fully distributed structured topology network. Full Distribution Non-Structured Network In overlapping network (GNUTELL is a P2P file sharing system, which is the biggest difference between Napster lies in gnutella is a pure P2P system, without an index server, which uses a flooding based on full random map discovery And random Forward (Random Walker) mechanism. In order to control the transmission of search messages, it is implemented through the impairment of TTL (TIME TO LIVE). Specific protocol Refer to [GNUTELL Agreement Chinese] In GNUTELLA Distributed Garre Network Model n, each One networking computer is functioning, both the client is also a server, so it is called peer (servent, server client combination). As the networking node is increasing, the network size is constantly expanding. Through this flooding method, the method of positioning the peer-to-peer point will result in a sharp increase in network traffic, resulting in a wide range of low bandwidth nodes in the network due to the overload of network resources. So in the initial GNUTELLA network, there is more serious partition, broken link Phenomenon. That is, a query access can only be performed on a small number of networks, so the network has a bad scalability. So, solving the scalability of the GNUTELLA network is critical to the further development of the network. Since there is no determination The support of topology, non-structured networks cannot guarantee the efficiency of resource discovery. Even if there is a discovery of the destination node that needs to be found, it may fail. Due to the use of the accuracy and scalability of the discovery, it is two important issues facing the non-structured network. The current research on such structures is mainly focused on improved discovery algorithms and replication strategies to improve the accuracy and performance of the discovery.
The FLOoding search algorithm using the original GNUTELLA uses the second generation of Gnutella protocol, the most classic software -Bearshare, due to the non-structured network, the overlapping network is considered a full random map, and the link between nodes does not follow some predefined Topology is built. These systems generally do not provide performance guarantees, but the fault tolerance is good, support complex queries, and the impact of frequent additions and exit systems are smaller. However, the results of the query may not be complete, the query speed is slow, and the system with broadcast query is very large, and there is a problem of scalability. In addition, a large number of studies are concentrated in how to construct a highly structured system due to the unensual scalability of random sections in the non-structural system. The focus of the current research is placed on how to effectively find information, and the latest results are DHT-based distributed discovery and routing algorithms. These algorithms have avoided the central server similar to Napster, nor is it based on the broadcast based on Gnutella, but through a distributed hash function, the input keyword is only one mapped to a node, and then through some routing algorithms Establish a connection with this node. The latest research results are reflected in the use of distributed hash tables (distributed has been actually maintained by a large number of nodes in the wide area range. The hash table is split into a discontinuous block, each knot Point is assigned to a part of your own hash block and be a manager of this hash block. The distributed hash table originated from the most recent research to build overlapping routing networks to reduce routing table capacity and routing Delay. The basic principle of these new topologies is that the NG's nodes are both a dynamic node and therefore unintended and atomic organizations become an important goal of two design. By encrypting the column function, one The name or keyword of the object is mapped to a 128-bit or 160-bit hash. A number of nodes in the system using DHT is mapped to a space, if the aquary function maps a bit of the name to a hash value, then There is. SDDS (Scalable Distribute Data Structures) [A] Research, Gribble et al. Implemented a highly scalable, fault-tolerant SDDS cluster. DHT table one dimensional space introduced more topologies to reflect the structure of the underlying network. The DHT structure can be adaptively added to the dynamic joining / exiting / exit of the node, with good scalability, robustness, node ID distribution uniformity and self-organizing capability. Since the overlapping network uses a certaintable topology, DHT can provide Accurate discovery. As long as the destination node exists in the network, DHT can always find it, the accuracy of the discovery is guaranteed, the most classic case is TapeStry, Chord, CAN, and PASTRY. TapeStry provides a distributed fault tolerance and route Basic platform, on this platform, you can develop a variety of P2P applications (OceanStore is an application on this platform). Tapestry's idea is derived from plaxton. In Plaxton, node uses your own neighboring node table According to the destination ID, the message is gradually transmitted. TapeStry is based on the thinking of PLAXTION, which has incorporates a fault-tolerant mechanism, thereby adapting to the dynamic changes of P2P. The canand is a P2P platform that takes Tapestry and finds the infrastructure. It is a global Data stored P2P application. Any user can join the OceanStore system, or share your own storage, or use resources in the system. By using replication and cache technology, OceanStore can improve the efficiency of findings.
Recently, Tapstry has made a lot of improvements to adapt to the dynamic characteristics of the P2P network, adding additional mechanisms to achieve the soft state of the network, and provides self-organizing, robustness, scalability, and dynamic adaptability. When the network is high load and there is a limited decrease in performance, the performance of the global information, the root node is easy to fail, and the resilience is poor. PaSTRY is the scalable distributed object positioning and routing protocol proposed by Microsoft Research, which can be used to build large-scale P2P systems. In PASTRY, each node assigns a 128-bit node identifier symbol (NodeID), all of the node identifiers form an annular NodeID space, range from 0 to 2128 - 1, and the node is added to the system The column IP address is randomly assigned in 128-bit NodeID spaces. MIT has carried out multiple research projects related to P2P: Chord, Grid and Ron. The target of the Chord project is to provide a distributed resource discovery service suitable for the P2P environment, which makes the specified object only requires a routing table that requires the specified object only requires the specified object. In DHT technology, the network node allocates a unique node identifier (Node ID) in a certain manner, and the resource object creates a unique resource identifier (Object ID) through the hash operation, and the resource will be stored in the node ID. Equally or similar nodes. When you need to find the resource, the same method can be positioned to the node that stores the resource. Therefore, CHORD's main contribution is to propose a distributed lookup protocol that maps the specified keyword (key) to the corresponding node (Node). From the algorithm, Chord is a variant of a compatible aquifer algorithm. MIT's GRID and RON projects provide a system framework for implementing findings in a distributed wide area network. The CAN (Content Addressable Networks) project in the AT & T Aciri is unique in the use of multi-dimensional identifier space to implement a distributed hash algorithm. CAN maps all nodes into an N-dimensional Cartesian space and assigns an area as possible for each node. The hash function used by the CAN is hasced to have a hash operation in the KEY, Value, to obtain a point in the Cartesian space, and will store (key, value) to the node of the area owned by this point. . The routing algorithm used by the CAN is quite straightforward and simple. After the coordinates of the target point, the request is passed to the node of the coordinates of the current node quad 4 neighboring coordinates. CAN is a system with good scalability. It gives N nodes. The system dimension is D. The routing path length is O (N1 / D), and the routing table information and network size of each node maintained are O. (d). The biggest problem with the DHT structure is that the maintenance mechanism of DHT is more complicated, especially the network fluctuations caused by the addition of exiting from the node, which will greatly increase the maintenance cost of DHT. Another problem faced by DHT is that DHT only supports exact keyword matching queries, unable to support complex queries such as content / semantics.
Chord's message Route in English literature is called: SuperNodes, HUBS), stores information on other partial nodes in the system at each super point, and the discovery algorithm is only forwarded between the superpot, the super point will The query request is forwarded to the appropriate leaf node. The semi-distributed structure is also a hierarchical structure, and a high-speed forwarding layer, a super point, and a regular node constitute a number of hierarchies. The most typical case is Kazaa. The semi-distributed structure (some literature is known as Hybrid Structure) to draw the advantages of centralized structure and all-distributed non-structured topologies, select the nodes with higher performance (processing, storage, bandwidth and other aspects) as super points ( Kazaa is one of the world's popular P2P software. According to CA company statistics, the world's Kazaa's downloads exceed 250 million. With Kazaa software, file transfer consumes 40% bandwidth of the Internet. The reason why it is so successful, It is because it combines Napster and Gnutella common advantages. From the structure, it uses the full-distributed structure of GNUTELLA, which can be a better extension because it does not need a central index server to store the file name, it is automatic The performance of the performance is supernode, which stores the file information of its nearest leaf node, these superNode, which connects to form an overlay network. Due to the index function of SuperNode, the search efficiency is greatly improved. Half distributed structure ( The superNode) Kazaa interface compares the comprehensive performance of the four structures, and the comparison results are shown in Table 1-1. 1: Comparison of the performance of the four structures
Comparison Standard / Topology Center Topology All Distributed Non-Structure Topology All Distributed Structure Topology Half Distributed Topology Scalability Difference Difference Difference Better Well, Best Well - Maintaining Optimism The highest high school Complex query support support does not support support
1.2.2 Domestic P2P Research Status Academic Agency R & D
Peking University -maze Maze is a central control of Beijing University Network Laboratory Development and peer-to-peer computing file sharing system for peer-to-peer connections, similar to Napster, peer-to-peer computational search method is similar to GNUTELLA. A computer on the network, whether in the internal network or the external network, you can freely add and exit the Maze system by installing client software running Maze. Each node can share files in one or more of your directory to other members of the system, or share resources for other members. Maze supports keyword-based resource retrieval, or it can be obtained directly through a friend relationship.
Tsinghua University -Granary Granary is the peer-to-peer-to-peer computing storage service system independently developed by Tsinghua University. It stores data in object format. In addition, Granary designed a specialized node information collection algorithm PeerWindow structured overlay network routing protocol TOURIST.
Huazhong University of Science and Technology-Ansee Anysee is a video live broadcast system designed and developed by Huazhong University. It uses a pair of service modes, supports part of the NAT and firewalls, improves the scalability of video live systems; at the same time, it uses short-demand principles, the idea of space scheduling, and uses the Landmark roadbed algorithm to directly build trees. Building a multicast tree on the application layer, overcoming a pair of multi-mode systems such as ESM, by the load of the connection map constructor and maintenance. More detailed introduction [China Computer Society Newsletter Page 38-51 Introduction to Zheng Weimin and other peers] Enterprise R & D products
Guangzhou Software Technology Co., Ltd. -Poco Poco is China's largest P2P user sharing platform, which is a third-generation P2P resource exchange platform for safe, traffic control, and has no central server. It is also a less profitable P2P. platform. At present, a 26 million variety of users have been formed, with an average online 585,000, and the peaks have exceeded 710,000, and all user groups of broadband users. Become the first P2P sharing platform in China. [A] Shenzhen 点石 Software Co., Ltd. -Op OP- is also known as OpenExt Media Desktop, a network entertainment content platform, the successor of Napster, it can find the music, film, software, game you want most directly , Pictures, books, and various documents, online sharing file capacity is "100,000 film and television, millions of music, 10 million pictures" online at any time. OP integrates Internet Explorer, Windows Media Player, Realone Player, and ACDSEE, which is a domestic online entertainment content platform. [a]
P2P-based online TV live broadcast-PPLive PPLive is a shared software for large-scale video live on the Internet. It uses a mesh model to effectively solve the bandwidth and load limited problem of the current network video on demand service. The more users are realized, the more smooth playback, the quality of the overall service is greatly improved! (During the 2005 Super Girls Finals, this software is very hot, and at the same time, there is an example in Hunan Satellite TV) Other commercial software, please visit the P2P portal http://www.ppcn. Application of NET / 1.2.3 P2P technology launched the academic groups from foreign companies on P2P research mainly including the P2P Working Group (P2PWG), GM Grid Forum, GGF. The main purpose of the P2P Working Group is to increase the establishment of P2P calculation infrastructure and corresponding standardization. After the establishment of P2PWG, the terminology in P2P calculation has been unified, and the relevant drafts are also formed, but the work is slow in standardization. At present, P2PWG has been merged with GGF and is managed by the Forum to manage P2P computing. GGF is responsible for grid computing and P2P calculations, etc. related standardization work. From the support of P2P calculations, Microsoft, Sun and Intel investigated larger. Microsoft has established a Pastry project group, mainly responsible for research and development of P2P computing technology. At present, Microsoft has released a PASTRY-based package Simpastry / Vispastry. Rice University also issued a FreePAstry package on the basis of Pastry. In August 2000, Intel announced the establishment of the P2P working group and officially launched a P2P study. After the establishment of the Working Group, actively cooperate with the application developers to develop P2P application platforms. 2002 Intel was released. Accelerator Kit (P2P Acceleration Kit) and P2P Security API software package above the NET infrastructure, making Microsoft. NET developers can quickly establish P2P security web applications. Sun has developed a JXTA project with Java technology. JXTA is based on Java-based open source P2P platform, any individual and organization can join the project. Therefore, the project not only attracts a large number of P2P researchers and developers, but also published JXTA-based instant chat packages. JXTA defines a set of core services: certification, resource discovery, and management. In terms of security, JXTA joins the encrypted package, allowing the encryption package to be used to encrypt, thereby ensuring the privacy, authentication and integrity of the message. On the JXTA core, various optional JXTA services including content management, information search, and service management are also defined. Based on the core service and optional services, users can develop P2P applications on various JXTA platforms. The actual application of P2P is mainly reflected in the following aspects: P2P distributed storage P2P distributed storage system is a data storage system for peer-to-peer networks, which provides high efficiency, robust and load balancing file access Features. These studies include: OceanStore, Farsite, etc. Among them, half-distributed P2P applications based on superposition structures such as Kazza, Edonkey, Morpheus, Bittorrent, etc. are also a scope of distributed storage, and the number of users has increased dramatically. The node of computing power sharing join peer network can share the P2P application layer multicast in addition to the storage capability. Applying layer multicast is to implement multicast features in the application layer without the need for network layers. This avoids the case where the network layer cannot be deployed for multicast applications to make the multicast application.
Application layer multicast requires an expandable, supported overlapping network between participated application nodes, and Internet Indirection Infrastructure is indirectly accessed based on Internet Indirection Infrastructure. In order to expand P2P technology from the application of various fields, it is only used for several years. It has proved that P2P technology has a very broad application prospect. 1.3 For several aspects of the research content, as P2P application booms, the discovery technique of core issues in P2P applications In addition to the logic of technology itself, it is also subject to certain technological trends and demand trends. . 1.3.1 Degree and Diameter Commercial Relationship (TradeOff) The impact of the discovery algorithm is as described above, the gradual grade correlation between the researcher and the diameter
As can be seen from the progressive curve relationship, if you want to get a shorter path length, it will inevitably lead to an increase in the degree; and the change in the actual connection state of the network has increased the maintenance complexity of large number neighbor relationships. In addition, the researchers demonstrate that the average path length of O (LOGN) or even O (loglogn) is also unable to meet the needs of the network application of the state change. The new discovery algorithm is affected by this compromise relationship restriction in the deterministic understanding of DHT on the network topology. 1.3.2 SMALL World Theory's impact on P2P discovery technology The impact of unstructured actual network reflected in the distribution of power regular distributions can be simply interpreted as a few "degrees", "degrees" of most nodes Lower. A higher degree of nodes is much higher than other nodes, and the probability of finding the information is high. In the P2P system, the discovery technique has always used flooding to rotate, compared with DHT's heuristic discovery algorithm, the reliability is poor, and the consumption of network resources is large. The latest research is deployed from the improvement of the reliability of the discovery algorithm and looking for two aspects of the shortest path in the random map. That is, rehearsably the overlapping network. Among them, Small World features and power regulations prove that the topology of the actual network is neither a fully random map known by the non-structured system, nor the DHT discovery algorithm adopted for a certain topology. SMALL-World [A] [B] Model: The network topology has high aggregation and short chain characteristics. In a network model that conforms to the Small World feature, the node can be divided into several clusters according to the aggregation of the node, and at least one degree of nodes in each cluster are central nodes. A large number of studies have shown that the P2P network represented by GNUTELL is in line with the Small World feature, that is, there is a large number of high-link nodes in the network, and some of the "short chains" is present. How to shorten the path length in the P2P discovery algorithm becomes how to find these "short chains" issues. Especially in the DHT discovery algorithm, how to generate and find "short chain" is a new idea to find algorithm design. The introduction of Small World features a significant impact on the P2P discovery algorithm. 1.3.3 Semantic query and DHT contradictions existing the most important research results based on the achievements and lack of P2P discovery techniques in 1.4 P2P discovery technologies should be based on the non-structured discovery algorithm based on Small World theory and DHT-based structured Discovery algorithm. In particular, DHT and its discovery technique provide a new method for the organization of resources. The development of P2P system is actually applied, and some factors affecting routes in physical networks begin to affect the efficiency of the P2P discovery algorithm. On the one hand, there is a large difference between the status of the actual network, ie heterogeneity. Due to the popularity of the Internet and the distributed field for more than a decade in the Internet and distributed fields, such as laptops, mobile phones or PDAs. These devices differ greatly in computational power, storage space, and battery capacity. In addition, the actual network is divided into different autonomous regions by routers and switches, reflecting strict hierarchies. Churn, Fluctuation Of Network includes joining, exiting, failure, migration, concurrent joining process, network segmentation, etc. DHT's discovery algorithm such as Chord, CAN, Koorde, etc., is designed and implemented in the worst case of network fluctuations. Since the degree of each node is minimal, the maintenance of the member relationship that needs to be responsive can be relatively small, so that the impact of network fluctuations can be quickly restored. However, only a small amount of routing state is the high-delay of the discovery algorithm, because every lookup needs to contact multiple nodes, this idea is unnecessary in a stable network. The characteristics of DHT precise keyword mapping hinders DHT in complex query applications. 2 Complex P2P Network Topology Model 2.1 Internet Topology Model
As a marker of today's human social information, the scale is growing high speed at the exponential speed. Today, the "appearance" of the Internet has a large phase of the prototype Arpanet, and it can regard it as a "ecology" composed of computers. System. "Although Internet is built by human hand, no one can say that this is a big master looks like something like, how is it works. Internet topology modeling research is to find what is contained in this seemingly chaotic network? Not the laws we know. Discovery of the inner mechanism of the Internet topology is the inevitable process of recognizing the Internet, is the basis for using the Internet at a higher level. However, the Internet is the non-born isomeric dynamic development The concentration and the huge scale now bring great challenges to the topology. Internet topology is still an open problem, which plays an important role in computer network research. Internet Topology As the "bones" of the Self-organized system of Internet The integration of the simulated Internet is integrated with the traffic protocol, that is, the protocol is executed between the topology network, forming traffic .INTERNET topology model is the basis for establishing the Internet system model, which can also be said It is the meaning of Internet modeling, namely, as a tool, people use it to analyze the INTERNET to analyze decisions or control. The topology part of thenetnet model is portrayed by the Internet in macro, reflects a general trend, so Application is also expanded in large scale. The need for the Internet topology model is mainly from the following aspects: (1) Many new applications or experiments are not suitable for direct applications for Internet, some of which are harmful, such as worms in large Communication simulation on the scale; (2) For some protocols that depend on the network topology, in its research and development phase, the current Internet topology can only provide a test sample, and it is impossible to fully evaluate the protocol. Multiple simulated topology environments are experimenting; (3) From national security perspectives, online control network behavior is required, such as NMS (NETWORK Modeling and Simulation) of the US Defense Advanced Research Program (DARPA). 2.1.1 Random Network Topological model random networks are shown by N vertices, and there is a strip side, and we are randomly connected to the network composed of M strips. There is also a method of generating a random network that gives a probability P, for any possible connection, we all try to connect by probability P. If we choose M = P, these two random network models can be associated. For such a simple random network model, the study of its geometric nature is not the same. The study of randomized network geometry is completed between Paul, Alfréd Rényi and Béla Bollobás in the fifties to 1950s. Random networks possess in the topology of the Internet. 2.1.2 Random Network Parameters Description Random Network has some important parameters. The degree owned by a node is the number of edges associated with other nodes, and the degree is the basic parameters describing the local characteristics of the network. The network is not all nodes have the same degree, the distribution of node degrees in the system, can be used by distribution function aggregation coefficients to describe a pair of nodes connected to the third node connection. In the sense of the side of the connection node, if the description, the degree distribution function reflects the macroscopic statistical characteristics of the network system. Theoretical utilization distribution can calculate other quantization values characterizing global characteristic parameters. The degree of the i-th node, in the subnet consisting of k. Near neighbor nodes, the actual number E (i) of the actual presentation is defined as the ratio of the total number of charge when the node is fully connected, defined as Node I Gathering Coefficient 2.1.3 Topology Model Internet Topology Model can be divided into two categories: one is to describe the Internet topology feature, including Waxman model, tiers, transit -stub and power law; another class is to describe the mechanism of topology characteristics, including Barabási and Albert BA and ESF and an improved model GLP. Since the description of the INTERNET power law formation mechanism is not mature, more is a graphic generating algorithm for generating power law.
For the first class model, the foundation of the Internet topology is actually the metric discovery of this feature. A topological model belonging to the first class is a measure of several existing or new discovery, then according to The actual Internet topology data results in the value of these metrics. Therefore, the evaluation of this type of model requires from two aspects: on the one hand, evaluate the topology data used; on the other hand, the metric is evaluated. All discovered Internet topologies, the most basic thing is the node extension frequency fd. It is the most important basis for judging whether a topological map is similar to the Internet topology. In the study, researchers or considered Internet nodes The degree is completely randomly (such as a Waxman model), or that the node is a regular (e.g., TierS model). The power law discovery proves that the Internet topology is between the two, the power rate distribution. According to the right The internet topology can be divided into the following three categories: (1) random type, that is, the Internet topology is a completely unordered state, a large scale. Waxman model, is a kind Similar to the random model of the ERDS-Rényi model, the exertion frequency is contiguous. This model has two versions: RG1, first arrange the node in the right angle coordinate grid, the distance between the nodes is its Euclide RG2, according to (0, l) uniformly distributed into node to specify the specified distance. In the two versions, the probability P (u, v) between the nodes, the distance is related to the distance from Poisson distribution, the closer distance, the more probability Large. Squeezing, from the hierarchical characteristics of the INTERNET structure, the nodes on the same layer are close, and different interlayer nodes are very different. Arrange the node arrangement on the same layer to borrow Waxman Model method. TierS (level) model divides the Internet to 3 levels of LAN (LAN), MAN (Metropolitan Network) and WAN (WAN). In this model, only one, by specifying the number of LAN and MAN and their interior The number of nodes contains the topology map .Transit-stub model divides the AS domain into the Transit class and the Stub class. In this model, the Transit node constitutes a node group, one or more TRANSIT node groups constitute topology maps. The core, while the Stub node is distributed around the Transit node group. TRANSIT -STUB is part of the GT-ITM (GEORGIA TECH Internetwork Topology Models) package, and sometimes GT-ITM means the Transit-Stub model. (3) Power rate. 1999 FaloutsOS et al. On the 3 BGP data between NLANR (National Lab for Applied Network Research) in 1997 to 1998, and analyzed in 1995, there were 4 power laws in the Internet topology. Power Law is Finger shape, like a power law 2.2 Small World network
In reality
P value of the Small World Network (2) Group Coefficient of the WS Network. R1 initial fixed node number can calculate T1 {P = 0, the aggregation coefficient of the rule network is c (0), c (0) depends on the network structure and is independent of the size N, so there is a relatively large value. As the probability P is randomized, the aggregation coefficient varies near Cyo}. (3) degree distribution. The WS model is a model between the rule network and the randomized network. The rule distribution of the rule network is the 8 function of k = k, and the P = 1 random network Poisson distribution is great. value. P from. During the process of changed to 1, the original S function formation gradually broaden the Poisson distribution. After the study of the Small World network, more and more scientists are put into complex network research. Everyone finds that more features of other geometries have also have a large extent, universal and specific structural functional relationships. The Scale Free network is an important aspect. The Scale Free network refers to the network's degree distribution in accordance with the power rate distribution, and is called a non-scaled network due to its lack of a feature scale of a description issue. We all know that in statistics, physical phase changes and critical phenomena, as well as special status in self-organizing critical (SOC). Scale-free actually has the characteristics of Small World. 3 Non-Structured P2P Search Algorithm can be divided into two major classes in accordance with the search strategy: blind search and information search. Blind Searches By propagating query information in the network and spreading this information to each node. Search through this flooding way to search for the resources. The information search uses some existing information in the search process to assist the search process. Since the information search has some knowledge on resources, the information search can find resources relatively fast. 3.1 BLIND Search In the initial Gnutella protocol, the FLOoding method used is a typical blind search. In the network, each node does not know the resources of other nodes. When it is looking for a file, passing the query information to its neighboring node. If the neighbor node contains this resource, return a queryhit information to the Requester. If its adjacent nodes do not hit this query file, then this message forwards this message to his neighboring node. This way is like a flood flow like a node in the network, so it is called a flooding search. Since this search strategy is first traversing your neighboring point, then spread down, it is also called a width priority search method (BFS). BFS Search propagates the message to all adjacent points, which consumes a lot of network bandwidth, which makes the message are severly, the efficiency is relatively low, and the extensibility is not good. Some people have improved on BFS, and its method is to randomly draw a certain proportion of adjacent nodes to transmit messages, rather than transmitting query information to all adjacent points like flooding. This correction greatly reduces the query information in the network, but is not as good as BFS in the hit rate. ITERATIVE DeEpening: This search policy is in the initial stage, a small value for TTL, if the TTL is reduced to 0, has not searched resources, then re-impart a higher value for TTL. This strategy can reduce the radius of the search, but in the worst case, the delay is large. Random Walk: In the random walk, the requester issued a k query request to randomly selected K adjacent nodes. Then each query information is directly connected to the requester in the future, and ask if you still have to continue. If the requester agrees to continue walk, then start randomly select the next step, otherwise the search is stopped.
GNUTELLA2: Create Super-Node, which stores file information from its nearest leaf node, these superNode, connect up to form an overlay network. When the leaf node needs to query the file, it is first looking for the index of the supernode it connects, If the file is found, it is directly established according to the IP address of the machine stored. If you are not found, SuperNode sends this query request to other super nodes connected to it until you want the resources you want. 3.2 Informed Search Methods 3.2.1 Cache Method This method is different from the blind search. Local in each node, whether there is a central node or a simple node. This is the idea of cache. New search does not need to reach the resource storage, as long as you find the previous search in the search path, you can find the source IP address on the basis of its previous search, you can return the request message. This can greatly reduce the searched message and improve efficiency. 3.2.2 Mobile Agent based Method Mobile Agent is a program that can migrate from a host from a host in the heterogeneous network, and can interact with other Agent or resources. Agent is ideal for the task of helping users complete information retrieval in a network environment. Some researchers in Italy have made some frontier research in the Mobile Agent combined with P2P, some of which are to embed the runtime environment in the P2P software. When a node needs to search, it sends a node that moves Agent to it, and the mobile agent records some of its search information. When this agent arrives on a new machine, then make resource search tasks on this machine, if there is no desired resource on this machine, it passes the information of these search to its neighbors, if resources are found, Then return to the requested machine. Summary of search methods: Reference [CHongGang Wang Bo Li: Peer-to-Peer Overlay Networks: a survey] [Hybrid Search Schemes for unstructured peer-to-peer networks] 4 P2P information security issues 4.1 Intellectual property protection in P2P There is generally in the sharing network with intellectual property protection issues. Although the P2P shared software such as GNUTELA, Kazaa promotes the backup of any content involving property rights protection, but only saves the storage index on the Internet. However, it is undoubtedly, the prosperity of P2P sharing software accelerates the distribution of pirated media, and improves the difficulties of intellectual property protection. The US Record Industry Association Riaa (Recording Industry Association of America) with these shared software launched a long official, the famous Napster is the first prey of this war. Another battlefield involved in the face is RIAA and civilians who use P2P to exchange genuine music. Since January 2004, RIAA has submitted 1,000 parties in relevant parties. Despite this, there are still more than 150,000,000 songs, still free downloads on the network. The P2P shared software in the Napster era is more dispersible than napster, and it is more difficult to control. Even if the operation company of P2P sharing software is closed, the entire network will still survive, at least for a while. On the other hand, the P2P sharing software after Napster is also in urgent to find a symbiotic mutual benefit of the manufacturer. How to apply these sharing software more legally and reasonable, is a new era topic. After all, P2P can share considerable beneficial information in addition to sharing piracy software. The network society is the same as the natural society, and itself has a trend to find balance between disorder and ordered.
P2P technology has brought revolutionary improvements to network information sharing, and this improvement must be premised on the basis of the basic interests of the content provider if you want to continue to have benefits for the majority of users for a long time. This requires knowledge protection mechanisms to a certain extent without affecting the performance of the existing P2P sharing software. Currently, some P2P manufacturers and other companies have already studied such problems. This may be one of the challenging technical issues facing next-generation P2P sharing software. 4.2 Network Virus Communication With the in-depth development of computer network applications, the threat of computer virus on information security increases. Especially in the P2P environment, the convenient sharing and rapid selection mechanism provides better intrusion opportunities for certain network viruses. Due to the logic adjacent nodes in the P2P network, the geographical location may be very far apart, and the number of nodes participating in the P2P network is very large, so the virus propagating through the P2P system, the wavefrosis is large, the coverage is wide, and the damage caused will be large. . In a P2P network, the ability of each node defensive virus is different. As long as there is a node infectious virus, the virus can be spread to the nearby neighbor nodes through internal sharing and communication mechanisms. In a short period of time, network congestion or even paralysis, shared information loss, confidential information stolen, even through network viruses, can fully control the entire network. A prominent example is a significant increase in the case of instant message software to spread viruses in 2003. High-level technical supervisors including Symantec and McAfee predict that instant messaging software will become one of the main carriers of network virus dissemination and hacker attacks. With the development of P2P technology, there will be a variety of network viruses specifically for P2P systems. Using system vulnerabilities to achieve rapid damage, disintegration, and control system. Therefore, the potential crisis of network viruses has put forward higher requirements for P2P system security and robustness, and it is urgent to establish a complete set of complete, efficient and safe anti-virus systems. Other information security issues include reactionary movies, porn movies in P2P, negative impacts in national and adolescents. For details, please refer to [Cheng Xueqi and other information technology express 2004 third P2P technology and information security] 5 end 5.1 P2P Some resources http://www.ppcn.net/ China P2P portal http://iptps05.cs.cornell . its. The famous international conference in P2P http://www.jxta.org/ Java P2P technology website http://www.hpl.hp.com/techreports/2002/HPL-2002-57R1.PDF is very Classic P2P review article http://david.warekly.org/code/napster.php3 The first P2P software -Napster protocol http://www.globus.org/alliance/publications/papers/kazaa.pdf is very famous Kazaa software parsing http://p2p.weblogsinc.com/ More famous P2P Weblog http://www.google.com Finally, of course, it is a good helper Google :) 5.2 Acknowledgment and suggestion This article is my calculation in the Chinese Academy of Sciences Part of a technical report (modification). The first is to hang on your own personal homepage in 2005-4-25. The aim is to let more friends have a probably understanding of P2P technology through reading. In order to make the article become easy to understand, I deleted some parts of the very detail, and this time I made the first edition. Bearing and loved. During this time, I received a lot of friends, including the school's classmates, the company's technical staff wrote the Email, expressing the affirmation of the article, hoping to discuss the problem of P2P, thank you here.
Because P2P technology includes many aspects, it is certainly impossible to face with an article, so we must try to study P2P. Personally think, you must see Paper, watch the program. I remember to write this article, including 624 Paper (610 Chinese, Chinese 14) from Google, Citseer, IEEE, ACM, WANFANG Journal, and then classified, including various review, topology Network, search algorithm, security, etc. This is definitely a must. In addition, since most P2P software are open source, it can be done a deeper understanding of the source code. 2005-4-25 1 Edition 2005-11-3 Modified Edition The author hopes more perfect in future versions, and also hopes that the article can bring some help. Roger Luojw@ics.ict.ac.cn Institute of Calculation Technology of Chinese Academy of Sciences 2005-11-3
(1) Average path. The line is called shortcut in the figure, which has a big impact on the average path of the entire network. The analysis shows that when P> = 21 (NK), that is, at least one shortcut in the guarantee system Next, the average path of the system begins to decline. Even a few shortcuts can also significantly reduce the average path length of the network. This is because every shortcut, its impact on the entire system is non-linear, which not only affects the two points directly connected by this line, but also affects the nearest neighbors of these two points, the second neighbor, and the secondary neighbor Wait.
In the Internet environment, the network topology does not completely meet the random network topology. Watt and Strogatz have found that there is only two of the above two properties only need to be randomly modified on the rule network. The method of the change is that all ends of each of the vertices of the rule network are disconnected in probability P, reconnect, and the connected new endpoint is randomly selected from other vertices in the network. If the selected vertex has already This vertex is connected, then randomly selects other vertices to re-connect. When P = 0 is a rule network, P = 1 is a random network, and there is a large portion of the region of 0
The geometry of the network is shown in the figure.
(Hop-Plot Index H): The number of pairs of nodes is proportional to the H 's H' s H 's H' s H - Power Based. Power Law 3 (Feature Index E): A Power Combination Power of the E - Power of the Eigenvalu LI and Second Order I Equation, for two variables x and y, there is a constant c such that Y and X power power is proportional. There are two declarations: (1) The level of Node V is, V is in descending order The index value in the sequence; (2) The adjacent matrix feature value is arranged in descending order, and the i-th ei-feature value is Li. Power Law 1 (Rating Index R): The node DV is proportional to the R power of the node level RV. Power Law 2 (Explosion Index O): The Extension Frequency FD is proportional to the O sub-power of the extent D. D (U, V) represents the distance between node u and V, L is the longest distance between nodes. The á and a value range are (0, 1). As another aspect, the extent of network fluctuations severely affects the efficiency of the discovery algorithm. Network fluctuations (simultaneous, as a resource organization and discovery technology must support complex queries, such as keywords, content queries, etc. Although information retrieval and data mining areas provide a large number of mature semantic query technology, due to the distribution of DHT algorithms The type of aquary, so it is only suitable for accurate search. If you want to support the multi-keyword search for search engines on the Web, you will introduce new methods. The main reason is that DHT works. DHT P2P system The positioning and discovery of the object using the compatible column function according to the precise keyword. The hash function always tries to ensure that the generated hash value is evenly distributed, and the two contents of similarity are high but not exactly the same object being generated. A completely different hash value, stored on the two nodes completely random. Therefore, DHT can provide precise matching queries, but support semantics is very difficult. DHT is based on semantic resource management technology research. Less. Since the characteristics of DHT's precise keyword mapping determines the combination of research results in the fields of unable and information retrieval, the large-scale application of DHT-based P2P systems is hindered. [A] Therefore, DHT discovery technology is fully established in deterministic topology Based on the structure, it shows a strong control of the guidance of the network and the network knots and data management. However, the awareness of the deterministic structure is limited to the discovery algorithm efficiency. Research and analysis Based on DHT discovery algorithm, there is a relationship between two important parameter degrees (representing neighbor relationships, routing tables) and link lengths (the average path length of the discovery algorithm). Modeor (DIAMETER) Two parameters research DHT discovery algorithm, found that these DHT discovery algorithms have progressive curve relationship between the degree and the diameter, as shown below. In n knot network, the figure is intuitive When the number of times is N, the diameter of the algorithm is O (1); when each node is maintained only when a neighbor is maintained, the diameter of the algorithm is O (N). This is the two extreme conditions of the degree and the diameter relationship. At the same time, it is impossible to study the algorithm of the degree of o (d) and the diameter of O (D) in the theoretical theory of chart, is impossible .INTERNET better supports multicast, unicast, and mobile features, internet indirect access The structure proposes a communication abstraction based on aggregation point. In this configuration, the packet is not directly emitted, but the identifier is assigned to each packet, and the destination node receives the corresponding packet according to the identifier. The identifier actually represents the aggregation point of the information. The destination node tells the gathering point of the packet you want to receive, telling the gather point through a trigger, when the gather point receives the packet, will forward the packet according to the trigger The corresponding destination node. Allnet is indirect access to the infrastructure actually constitutes a overlapping network on the Internet, which requires the route system of the network to provide corresponding support. DHT's hair The current mechanism is just a good foundation platform for the implementation of the application layer multicast. CPU processing capabilities. There have been some computing power sharing systems based on peer-to-peer networks. For example, seti @ home. The current seti @ Home is still similar to the Napster's centralized directory strategy. Xenoservers will take a step towards real peer applications. Such computing power sharing systems can be used to make applications such as gene database retrieval and password cracking require large-scale computing power.