FTP is one of the main services of the Internet, saving a large number of shared software, technical information, and multimedia data on the FTP server. Because each FTP server has several directories, its directory and file structure are more complicated. To find the files you need on the FTP server, it is difficult to find files on multiple FTP servers. WWW-based FTP search engines can solve the above problems well. At present, there are many FTP search engines at home and abroad, and there are more famous Northern Daily Network, Lily Valley Search and FTP Star Search. In order to better serve our school campus users and the main node users, we designed your own FTP server search engine.
1. FTP Search Engine
The FTP search engine consists of data acquisition, data query, and site maintenance modules. Implement an FTP search engine, first collect file information on each FTP site, store this information into the database; then give the user a query interface to charge the user to query information, transform these query information into database language And perform the database query, display the results of the query to the user; after the search engine is established, in order to keep the database data with the data of the FTP site, update the file information of the FTP site, add new FTP sites and other management And maintenance. Its structure is shown below.
When we design the FTP search engine, we use the Linux operating system redhat 8.0. The WWW server uses Apache, the database uses mysql, and the programming language uses PHP. 2. Database Structure and Settings 2.1 File Information Analysis On the FTP site, there are many folders and files in the directory in the root directory. The information of each file includes file name, file address, file size, date, type, etc. Corresponding to these file information, set the corresponding field in the file database to record this information, store the file name with the field name, generally no more than 255 characters, set to varchar type, length is 255, and Host means the name of the FTP website. The document is the file on which website, the Address field accurately gives the URL address of the file. Since some file URLs are relatively long, set the Address field type to longtext, with these fields, you can find this file in the network. In addition, it is also necessary to record the size, time, date of the file, for users to resolve the files they want. Finally, the access to the file name field is more frequent when the query is set, which sets it to the index field, which can improve the query speed. 2.2 FTP Site Information Analysis An FTP site usually contains server names, user names, and passwords. The information corresponding to the FTP site, the data field is set as follows: set a few fields such as site name, site IP address, user name, user password, and the site name type is VCHAR, the length is 60, the IP address Vchar type, the length is 50, the user Name VCHAR type, length 50, password is set to password type, length is 60. In addition, the FTP site name is also a more frequent data and set it to the index field. 2.3 Database Structure Settings Based on the above analysis, set the database as follows: There are two data sheets in the file_address database, one for the FileAddress data table, the other is the FTPServer data table, where the FileAddress data table is used to record FTP sites, FTPServer data Table is used to record information of each FTP site. 3. Data Acquisition To build a search engine, first collect file information of each FTP site, record the database to provide search. There are many FTP sites in the Internet. When you collect information about an FTP site, read the site information from the data table, then log in to this site, most FTP servers open up a public access area, called "anonymous FTP" Provide free file information services to the public, the general user name is anonymous, the password is an email address. The data acquisition program uses this username and password to log in, then collect all directories of the site, read file information in each directory, after receiving the file information, analyze it, store file information to the corresponding data sheet In the field. After completing the data acquisition of this site, read the information of another FTP site and make file information collection. This loop, collect file information for all known FTP sites.
First connect to this FTP site, and log in with the corresponding username and password. At this time, the current directory is generally rooted, and some is not, so you must first get the current directory, then start the file information of this site from this directory, if this The directory is empty (only two files: ...), then this site has no file, exits the login, if not empty, determine if each file is a directory, if yes, change the current directory to this subdirectory Scan this subdirectory and judge, if there is a subdirectory, continue to go to it. The following is a program that collects data. GET_FTP_INFO () {ftp_connect () / * Connect the FTP site; FTP_Login () / * Log in to the FTP site; FTP_PWD () / * acquire the current directory; get_path_info () / * Call GET_PATH_INFO () to handle this directory; ftp_quit () / * Exit Login;} get_path_info () is the most important function of the acquisition program. This function uses the method of recursive call sequentially handles the individual directory and files, and writes directory and file information to the database. Here is the implementation code of this function.
GET_PATH_INFO () {ftp_rawlist () / * Read Directory Information; if (Dir_IS_EMPTY) / * Returns Retrun () {GET_PATH_INFO () / * Recursive Call Function Get_path_info (),}} // -------------------------------------------------- function get_path_info ($ ftpserver, $ ftplink, $ ftp_dir, $ sqlmasterlink) {mysql_query ( "use fileaddress", $ sqlmasterlink); $ n_list = ftp_rawlist ($ ftplink, $ ftp_dir); // $ n_list can not be setted as a global Var. IF (($ N_LIST)) <= 2) Return; // at Least: One IS .. Another Is. For ($ I = 0; $ I
/ n"; Break;} Case "D": {IF ($ filename! = ".") && ($ FileName! = ".")) {ftp_chdir ($ ftplink, $ filename); $ ftp_dir = ftp_pwd ($ ftplink);
GET_PATH_INFO ($ FTPSERVER, $ FTPLINK, $ FTP_DIR, $ SQLMASTERLINK); ftp_chdir ($ ftplink, "..");} Break;}} // end of switch} // end of for} // the end of functon get_path_info () // ---------------------------------------------------------------------------- - 4. Data query data query mainly includes the design of the query page, the processing of the query program and the processing of query results. The query page provides the query interface by the web server to collect file information you want to find, users browse to this web page, fill in and submit forms, contain information with users to find files, such as file name, size, etc. After submitting to the web server, the query program is analyzed, generate query statements, and submitted to the database server for query, the query result is analyzed by the query program, and the query results generate a web page in hyperlinks, providing users to browse. The information entered by the user is converted into a database query language, and then inquire. You can use the regular expressions given in the MySQL database query language. Its commonly used representation is as follows: a. Special character "^": String used to match to specify the string. For example: "^ Hello": This mode matches the string "Hello, PHP World!", But does not match the "Say Hello to you". B. Special characters "$": The string used to match to specify the end of the string. For example: "You $": This mode matches "how are you", with "Your" does not match. C. When special characters "^" and "$" are used simultaneously, it indicates exact match. For example: "^ Hello $": This mode matches the string "Hello". When looking up the file, set the variable that represents the file name of the user input to $ filename, then generates the following query statement for query: $ sql = "select * from $ dbtablename where name regexp /" [^] * (". $ qfilename. ") [^] *". "/" ""; "The system includes two retrieval interfaces, namely file name retrieval, and advanced retrieval. Here is the interface and functionality of advanced retrieval. In the file information collected to the database, the network address of the file name, size, date, and file is included. The query result can generate the URL information of the file in the database, indicating the download address of the file. The code structure is: file name file size and date ) The picture is the output of the FTP search engine result: