Be a similar spider, the capture page how to remove HTML, JS, CSSJ, the content of the remaining webpage?

xiaoxiao2021-03-06  17

Master: Be a similar spider grip, how to remove HTML, JS, CSSJ, the content of HTML, JS, CSSJ, the remaining page? 2004-12-03 11:01:38 How to make a similar spider catch page, recaptured page: Author: PCIBM (PCIBM) ​​credit value: 67 belong Forum: Web Development ASP questions Points: 50 Replies: 4 Published Remove HTML, JS, CSSJ, the content of the remaining webpage?

Reply to: Butcher2002 (speaking only to personal point of view, does not guarantee correct) () Reputation: 100 2004-12-03 11:06:00 Score: 0 οnclick = alert (Test.outerHTML)

TOP

Reply to: huangchao (super) () reputation: 100 2004-12-03 11:07:00 Score: 0 pay attention

TOP

Reply to: Babyt () Reputation: 100 2004-12-03 11:13:00 Score: 0 Use this function to filter the content you caught to see

<%

Function Removehtml (strHTML)

Dim ObjregEXP, Match, Matches

Set objRegexp = new regexp

Objregexp.Ignorecase = TRUE

Objregexp.global = true

'Take the close <>

ObjregEXP.PATTERN = "<. ?>"

'Matching

Set matches = objregEXP.EXECUTE (STRHTML)

'Traversing the matching collection and replacing the matching item

For Each Match in matches

strHtml = Replace (strHtml, Match.Value, "")

NEXT

REMOVEHTML = strHtml

Set objRegexp = Nothing

END FUNCTION

%>

TOP

Reply to: pswdf (小) () Reputation: 108 2004-12-03 11:30:00 Score: 0 Replace it with regularly.

If it is an URL, it is not to catch it.

转载请注明原文地址:https://www.9cbs.com/read-45505.html

New Post(0)