Software development, ideas is soul - Extract from HTTPWebRequest and regular expressions Sunhai
Development Tools: Microsoft Visual Studio .NET 2003 operating system: Windows XP
Question: What is the use of extraction web link? For example, you can do a web address collection device, email address collection device, picture or flash collector, and more. How to extract the link address in the web page with the most efficient and easy way, which is excribed herein.
Extracting a web link address has a variety of ways. In the vs.net development environment, is always in two ways: 1. Use the AxWebBrowser control. After the web is loaded, then the link is extracted. II. Do not need the way AXWebBrowser control, first get the web source code, then extract the links. I have earned the first way first, the first way should first wait for the page to load, the browser will download a lot of content, slower speed. Therefore, it is recommended to use the HTTPWebRequest to combine the link in the page with the HTTPWebRequest to obtain the link in the web page. This document describes the following steps: Use HTTPWebRequest to get the page source code to obtain the link address to remove the link address to remove the duplicate address Save to XML with httpwebRequest to get web source code
Dim url As String = "http://sunhai.tianyablog.com" 'This is my station, come around Dim httpReq As System.Net.HttpWebRequest Dim httpResp As System.Net.HttpWebResponseDim httpURL As New System. = 'Gets or sets a value False Uri (url) httpReq = CType (WebRequest.Create (httpURL), HttpWebRequest) httpReq.Method = "GET" httpResp = CType (httpReq.GetResponse (), HttpWebResponse) httpReq.KeepAlive, the value Indicates whether a persistent connection is established with the Internet resource. Dim reader As StreamReader = _New StreamReader (httpResp.GetResponseStream, System.Text.Encoding.GetEncoding ( "GB2312")) Dim respHTML As String = reader.ReadToEnd () 'respHTML page source code is
Is not it simple? For conceptual understanding, there is a very detailed introduction in MSDN, please click "Help" in VS.NET 2003, click "Search", then enter the name, such as HttpWebRequest, a car, what information is . For the programming concepts you encounter, if you don't understand, please search for MSDN first, no longer repeated. Get the link address with a regular expression
DIM strregex as string = "http: // ([/ w-] /.) [/ W-] (/ [/ w- ./?%&=]*)?" "This is the expression DIM r As System.Text.RegularExpressions.RegexDim m As System.Text.RegularExpressions.MatchCollectionr = New System.Text.RegularExpressions.Regex (strRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase) m = r.Matches (respHTML) Dim i As integerfor i = 0 to m.count - 1 form1.definstance.listbox1.items.add (m (i) .value) 'form1.definstance is the shared property of Form1 and instance next i form1.definstance.listbox.visible = true 'Setting Listbox to see the ListBox Element Elements for ListBox. DEFINSTANCE.LISTBOX1.DEFINSTANCE.LISTBOX1.DEFINSTANCE is the shared properties and instances of Form1. If you are a method or attribute, we can use the 'class name. Shared member' method without creating an instance. The setting method is as follows:
Private Shared m_vb6FormDefInstance As form1Public Shared Property DefInstance () As form1 Get If m_vb6FormDefInstance Is Nothing OrElse m_vb6FormDefInstance.IsDisposed Then 'form instance is determined whether there m_vb6FormDefInstance = New form1 End If DefInstance = m_vb6FormDefInstance End Get Set (ByVal Value As form1) m_vb6FormDefInstance = Value End setnd Property
Remove the replication address
DIM CountForms as integer 'The following code removed the address DIM LSTFORMS () AS STRINGDIM CURID AS INTEGERWITH FORMBROW.DEFINSTANCE.LISTBOX1 Redim Preserve Lstforms (0) Lstforms (0) = .Items (0)' New arrays of the first and list of List The first identical for countforms = 1 to .Items.count - 1 'items.count Get the number of items in List1 Curid = Ubound (Lstforms)' Curid is the number of items in NEWLIST IF .Items (CountForms) <> Lstforms (Curid) ), If the second item is not equal to the new Table Maximum REDIM PRESERVE LSTFORMS (Curid 1) 'is located to a new table second Lstforms (Curid 1) = .Items (countforms)' new table second item is equal to Old Table 2 End if Next CountForms .ite ms.clear () 'Delete Old Table All Item for CountForms = 0 To Ubound (Lstforms)' Write the new table to the old table .Items.add (Lstforms)) Next CountFormSend with the address exported to an XML Scalable Markup Language (XML) is a tag language that provides data description format. This language makes more accurate content declarations across multiple platforms and get more meaningful search results. In addition, XML implements separation of data. For example, in HTML, use the tag to tell the browser to display the data as a bold or arabic; in XML, the mark is only used to describe data, such as city name, temperature, and atmospheric pressure. In XML, the data is displayed in the browser using a style sheet (eg, Extended Style Sheet Language (XSL), and a stacked style sheet (CSS)). XML makes data from representation and processing, by applying different styles tables and applications, allowing you to display and process data as needed.
XML is a subset of SGMLs optimized for transmission on the Web. It is defined by the World Wide Web Federation (W3C). This standardization ensures the unity of structured data and the independence of applications or suppliers.
XML is the core of many features of Visual Studio .NET and .NET Framework. XML is a format that can be used in many different applications, we can save the collected links as XML. XMLTextWriter is an implementation of the XMLWRITER class that provides an API that writes XML to file, stream, or TextWriter. This class has many verification and inspection rules to ensure that the write-written XML is correct. When conflicts with certain rules, exceptions will be triggered, and these exceptions should be captured. XMLTextWriter has different constructor, each function specifies that different types of positions written in XML data. The following code uses the constructor of the XML write to the file. First use the formatting property to specify the format of the XML data being written. By setting this property to Indented, the writer uses Indentation and IndentChar attribute to zero the child element. The code shows the XML write method corresponding to each XML node type. For example, writing an element will call the WriteElementString method, write a property that will call the WriteAttributeString method. For nested levels, you can use WriteStartElement / WriteEndelement pairs; if you want to create more complex properties, you can use WriteStartAttribute / WriteEndattribute. Note how the code uses the WritestartDocument method to write an XML declaration with version number "1.0". If you want the writer to check if the format of the document is correct (first XML declaration, DOCTYPE in the preamble, only one root level element, etc.), you must call this optional WriteStartDocument method before calling any other write methods. Next, this code calls the WriteDOCTYPE method to write a document type named "URLS". The third parameter in the WriteDOCTYPE call specifies the writer to write system "urls.dtd". After writing, the XML file indicates an external DTD to be verified according to it. Finally, the code calls the flush method to save the XML data to the file before calling the Close method. (Although this example does require only a Close method, there is also a case that needs to be saved, and the generated XML needs to be saved, and the writer needs to be reused.)
To check the output of XMLTextWriter, you can perform round-trip testing by reading the generated file with XMLTextReader to verify the format of XML is correct.
Private Sub saveXml () Dim saveFileDialog1 As New SaveFileDialog saveFileDialog1.Filter = "xml | * .xml" saveFileDialog1.Title = "Save a xml File" saveFileDialog1.ShowDialog () If saveFileDialog1.FileName <> "" Then 'if the file name does not equal_space Dim fileName As String = saveFileDialog1.FileName If not System.IO.File.Exists (fileName) Then 'If the same file name does not exist Dim myXmlTextWriter As XmlTextWriter = New XmlTextWriter (fileName, Nothing) myXmlTextWriter.Formatting = System.Xml.Formatting .Indented 'disposed indent myXmlTextWriter.WriteStartDocument (False) myXmlTextWriter.WriteDocType ( "urls", Nothing, "urls.dtd", Nothing) myXmlTextWriter.WriteComment ( "This file save the Urls")' Note myXmlTextWriter.WriteStartElement ( "urls ") 'Start Element MYXMLTEXTWRITER.WRITESTARTELEMENT (" URL1 ", Nothing)' Start Elements MyXmlTextWriter.writeAttributeString (" Now ", NOW) Record Time for Counting AS Integer = 0 to ListBox1.Items.count - 1 DIM Title As string = strings.right (listbox1.items.Item l), 3) 'after taking URL words Dim body As String = lstMuLu.Items.Item (countAll) myXmlTextWriter.WriteElementString (title, Nothing, body) Next myXmlTextWriter.WriteEndElement () myXmlTextWriter.WriteEndElement ()' Write the XML to File and close the myxmltextwriter myxmltextwriter.flush () MyXmlTextWriter.close () end if end ifend sub
Software development, creativity is the soul in software development practice, truly let us feel poor, will never be technology, but creative. Creativity is the soul of software development. With the upgrade, evolution of development tools, software development is increasingly like a wooden. We are more about learning development tools, rather than writing underlying codes - many underlying code, development tools have been handled for us. It's not that I am more proficient in software development. In fact, I started self-study programming (VB6) from October 2003 and turned to VS.NET in November. In January 2004, completed my first software AdKing. As long as the learning method is proper, the master is not difficult. This is discussed in my first VS.NET article "VS.NET learning methodology". Everyone engaged in software development may wish to ask (naturally including me), what is the idea of software development? Can you guarantee that the technology you have mastered has been exhausted through your creativity? How many time do you write CODE a lot in a lot of things? In this way, how many application software can be written in this document? So, not how many applications are you prepared. You can list a large number of software projects, scenarios, and naturally you can choose the best solution for you. One is more. The sharpener is not mistaken. Take "Link Address" with a regular expression as an example, as long as the Strregex expression value is flexible, we can any extract our Dongdong from the web source code. If you want to develop an email address collection device, strregex = "[/W -] @ ([/w-] /.) [w-] " can. What if you want to collect Flash? If you can list 50 solutions, please let me tell me. If you list the scheme than me, I fu le u, I worship you as a teacher. My QQ: 26624998 My website: http://blog.9cbs.net/2066/ This article address: http://www.9cbs.net/develop/read_article.asp? Id = 23731 January 24, 2004