Crawl links in the web page

xiaoxiao2021-03-06  86

Enter an address, you can extract the links in that web page. The following code can be easily implemented, mainly to use the regular expression.

The geturl.aspx code is as follows:

<% @ Page language = "vb" codebehind = "geturl.aspx.vb" autoeventwireup = "false" inherits = "aspxweb.geturl"%>

http://lucky_love.www1.dotnetplayground.com/


Width = "100%" height = "400">

The posturpx.vb is as follows:

Imports system.io

Imports system.net

Imports System.Text

Imports System.Text.RegularExpressions

Imports system

Public Class getURL

Inherits System.Web.ui.page

Protected Withevents Label1 As System.Web.ui.WebControls.label

Protected Withevents UrltextBox As System.Web.ui.WebControls.TextBox

Protected Withevents ScrapeButton as System.Web.ui.WebControls.Button

Protected Withevents TipResult As System.Web.ui.WebControls.labelprotected WitHevents Resultlabel as System.Web.ui.WebControls.TextBox

#Region "The code" of the web form designer "

'This call is required for the web form designer.

private subinitializecomponent ()

End Sub

Private sub page_init (byval sender as system.object, byval e as system.eventargs) Handles mybase.init

'Codegen: This method call is necessary for the web form designer

'Don't modify it using the code editor.

InitializeComponent ()

End Sub

#End region

Private Sub Page_Load (Byvale AS System.Object, Byval E AS System.Eventargs) Handles MyBase.LOAD

'Place the user code of the initialization page here

Label1.text = "Please enter a URL address:"

Scrapebutton.text = "Separate HREF Link"

End Sub

Private Report As New StringBuilder ()

Private WebPage As String

Private countofmatches as int32

Public Sub ScrapeButton_Click (Byval Sender As System.Object, ByVal E as System.EventArgs)

WebPage = graburl ()

DIM MyDelegate As New Matchevaluator (Addressof Matchhandler)

Dim LinkSexpression As New Regex (_

"/] ?) [─ 】/ > ", _

Regexoptions.multiline or regexoptions.ignorecase or regexoptions.ignorepatternwhitespace)

Dim newwebpage as string = linksexpression.replace (webpage, mydelegate)

TipResult.text = "

HREF link from" & URLTextBox.Text & "Separates " & _

" find and organize" & countOfMatches.ToString () & "links

" & _

Report.toString (). Replace (Environment.newline, "
")

TipResult.text & = "

Page