Spider programs written with HTTPUnit: You can monitor the wrong page of the website!

xiaoxiao2021-03-06  60

This program comes from Java Tools for Extreme Programming book. Import com.meterware.httpunit. *; import java.util.hashset; import java.util.set

Public class checksite {

Private WebConversation Conversation;

Private set checkedlinks;

Private string host = "www.sohu.com";

Public static void main (string [] args) throws exception {checksite cs = new checksite (); cs.setup (); cs.testentiresite ();}

Public void setup () {conversation = new webconversation (); checkedlinks = new hashset ();

public void testEntireSite () throws Exception {WebResponse response = conversation.getResponse ( "http: //" host); checkAllLinks (response); System.out.println ( "Site check finished Link's checked:." checkedLinks.size ( ) ":" checkedlinks);

private void checkAllLinks (WebResponse response) throws Exception {if (isHtml (response)!) {return;} WebLink [] links = response.getLinks (); System.out.println (response.getTitle () "- links found = " Links.Length); for (int i = 0; i

Private Boolean Ishtml (WebResponse Response) {Return Response.getContentType (). Equals ("Text / Html");}

private void checkLink (WebLink link) throws Exception {WebRequest request = link.getRequest (); java.net.URL url = request.getURL (); System.out.println ( "checking link:" url); String linkHost = Url.Gethost (); if (linkHost.Equals (this.host)) {WebResponse response = conversation.getResponse (Request); this.checkAllLinks (Response);}}}}

转载请注明原文地址:https://www.9cbs.com/read-83696.html

New Post(0)