Searcharoo TOO: Populating The Search Catalog With A C # spider

xiaoxiao2021-03-06  109

Download the Source Code for this article [zip 8kb]

Comment on this article at thecodeproject

Article I describes building a simple search engine that crawls the filesystem from a specified folder, and indexing all HTML (or other types) of document. A basic design and object model was developed as well as a query / results page which you can see here .

THIS Second Article In The Series Discusses Replacing The 'FileSystem Crawler' with a 'Web Spider' To Search and Catalog A Website by Following The Links in The Html. The Challenges Involved Include:

Downloading HTML (and other document types) via HTTP Parsing the HTML looking for links to other pages Ensuring that we do not keep recursively searching the same pages, resulting in an infinite loop Parsing the HTML to extract the words to populate the search catalog from Article I

DESIGN

The Design from Article I Remains Unchanged ...

A Catalog Contains a Collection of Words, And Each Word Contains a Reference To Every File That It Appears in

... The Object Model Is The Same Too ...

What has changed is the way the Catalog is populed. Instead of looping through folders in the filesystem to look for files to open, the code requires the Url of a start page which it will load, index and then attempt to follow every link within that ............................

Code structure

Some of the code from Article I will be referenced again, but we've added a new page - SearcharooSpider.aspx - that does the HTTP access and HTML link parsing [making the code that walks directories in the filesystem - SearcharooCrawler.aspx -obsolete ] We've also changed the name of the search page to SearcharooToo.aspx so you can use it side-by-side with the old one.Searcharoo.cs Implementation of the object model;. compiled into both ASPX pages RE-USED FROM ARTICLE 1 SearcharooCrawler.aspx OBSOLETE, REPLACED WITH SPIDERSearcharooToo.aspx <% @ Page Language = "C #" Src = "Searcharoo.cs"%> <% @ import Namespace = "Searcharoo.Net"%> Retrieves the Catalog object from the Cache and allows searching via an HTML form. UPDATED SINCE ARTICLE 1 tO IMPROVE uSEABILITY, and renamed to SearcharooToo.aspx SearcharooSpider.aspx <% @ Page Language = "C #" Src = "Searcharoo.cs"%> <% @ import Namespace = " SEARCHAROO.NET "%> Starting from the start page, download and index every linked page. New Page for this Article

There Are Three Fundamental Tasks for a Search Spider:

Finding the page

The big search engines - Yahoo, Google, MSN -. All 'spider' the internet to build their search catalogs Following links to find documents requires us to write an HTML parser that can find and interpret the links, and then follow them This includes! being able to follow HTTP-302 redirects, recognising the type of document that has been returned, determing what character set / encoding was used (for Text and HTML documents), etc. -! basically a mini-browser We'll start small and Attempt to build a passable spoter using c # ... build the spider [Searcharoospider_alpha.aspx]

Getting Started - Download a page

To get something working quickly, let's just try to download the 'start page' - say the root page of the local machine. (Ie Step 2 -. Downloading pages) Here is the simplest possible code to get the contents of an HTML page from A Website (Localhost In this case):

Using system

.NET

;

/*...

String

URL

=

"http:// localhost /"

;

// Just for testing WebClient

Browser

=

NEW WebClient

(

)

;

UTF

8encoding

ENC

=

New UTF

8encoding

(

)

;

String

FileContents

=

ENC

.Getstring

(

Browser

.Downloaddata

(

URL

)

)

;

Listing 1 - SimpleSt Way to Download an Html Document

The first thing to notice is the inclusion of the System.Net namespace. It contains a number of useful classes including WebClient, which is a very simple 'browser-like' object can download text or data from a given URL.The second thing that is that we assume the page is encoded using UTF-8, using the UTF8Encoding class to convert the downloaded Byte [] array into a string. If the page returned was encoded differently (say, Shift_JIS or GB2312) then this conversion would produce garbage. We'll have to fix this later.The third thing, which might not be immediately obvious, is that I have not actually specified a page in the url. We rely on the server to resolve the request and return the default document to us -. however the server might have issued a 302 Redirect to another page (or another directory, or even another site) WebClient will successfully follow those redirects but it's interface has no simple way for the code to query what the pages actual URL is (after The redirects. we'll have to fix this later, too, otherwise it's impossible to resolve relative Urls within the page.Despite those problems, we now have the full text of the 'start page' in a variable. That means we can begin to work on the code for Step 1 - Finding Pages To Index.

Parsing the page

There Are Two Options (OK, Probably More, But Two Main Options) for Parsing The Links (AND Other Data) Out of HTML:

Reading in Entire Page String, Building A Dom and Walking Through It's Elements Looking For Links, or Using Regular Expressions to Find Link Patterns in The Page String.

Although I suspect "commercial" search engines might use option 1 (building a DOM), it's much simpler to use Regular Expressions Because my initial test website had very-well-formed HTMl, I could get away with this code:. // Create ArrayLists to Hold The Links We Find ... arraylist

Linklocal

=

New arraylist

(

)

;

ArrayList

LINKEXTERNAL

=

New arraylist

(

)

;

// DODGY REGEX WILL FIND * SOME * LINKS

Foreach

(Match

Match

in Regex

.Matches

(

HTMLDATA

, @

"(? <= <(a | area) / s href ="

"). *? (? ="

"/ s * /?>)"

, RegexOptions

.IGNORECASE

| Regexoptions

.Explicitcapture

)

)

{

LINK

=

Match

.Value

;

// regex matches from opening "quote

int

Spacepos

=

LINK

.Indexof

(

'' '

)

;

// Find First Space (IE No Spaces in URL)

int

Quotepos

=

LINK

.Indexof

(

'"'"

)

;

// or first closing quote (Single Quotes Not Supported)

int

Choppos

=

(

Quotepos <

Spacepos

?

Quotepos

:

Spacepos

)

;

// end url at the first space or quote

IF

(

Choppos>

0

)

{

// CHOP URL

LINK

=

LINK

.Substring

(

0

,

Choppos

)

;

}

IF

(

(

LINK

"

8

)

&

&

(

LINK

.Substring

(

0

,

Seduce

)

.Tolower

(

)

=

=

"http://"

)

)

{

// Assumes all links beginning with http: // is _external_

LINKEXTERNAL

.Ad

(

LINK

)

;

}

Else

{

// Otherwise they're "relative" / internal links sowe contatenate the base url

LINK

=

StartingURL

LINK

;

Linklocal

.Ad

(

LINK

)

;

}

}

// end looping through matches of the 'link' Pattern in the HTML Data

Listing 2 - SimpleSt Way To Find Links in a page

As with the first cut of page-downloading, there are a number of problems with this code Firstly, the Regular Expression used to find the links is * very * restrictive, ie it will find -..

- Because The Href Appears A (or Area), And The Url Itself Is Double-quoted. However That Code Will Have Trouble with a Lot of Valid Links, Including:

News

News

news

It will also attempt to use 'internal page links' (beginning with #), and it assumes that any link beginning with http: // is external, without first checking the servername against the target server Despite the bugs, testing against tailored HTML. pages this code will successfully parse the links into the linkLocal ArrayList, ready for processing - coupling that list of URLs with the code to download URLs, we can effectively 'spider' a website!

Download More Pages

The Basic Code Is Shown Below - Comments Show Where Additional Code Is Required, Either from The Listings Above Or In Article I.

protected

Void Page_Load

(

Object

Sender

SYSTEM

.Eventargs e

)

{

/ * The initial function call * /

StartingPageurl

=

"http:// localhost /"

;

// Get from Web.config

Parseurl

(

StartingPageurl

,

New UTF

8encoding

(

)

,

NEW WebClient

(

)

)

;

}

/ * This is Called Recursively for Every Link We Find * /

public

Void

Parseurl

(

String

URL

UTF

8encoding

ENC

, WebClient

Browser

)

{

IF

(

Visited

.Contains

(

URL

)

)

{

// URL Already Spide, Skip and Go To Next Link Response

.Write

(

"
"

URL

"Already SpideRed "

)

;

}

Else

{

// add this url to the 'visited' list, so we'll Skip it at Id Across It Again

Visited

.Ad

(

URL

)

;

String

FileContents

=

ENC

.Getstring

(

Browser

.Downloaddata

(

URL

)

)

;

// from Listing 1

// ### pseudo-code ###

// 1. Find Links in The Downloaded Page (Add to Linklocal ArrayList - Code in Listing 2)

// 2. Extract and <meta> Tag Description, Keywords (Same As Version 1 Listing 4)</p> <p>// 3. Remove All HTML and Whitespace (Same As Version 1)</p> <p>// 4. Convert Words to string Array, And Add to Catalog (Same As Version 1 Listing 7)</p> <p>// 5. If any links werefact, Recursively Call this Page</p> <p>IF</p> <p>(</p> <p>NULL</p> <p>!</p> <p>=</p> <p>PMD</p> <p>.Locallinks</p> <p>)</p> <p>Foreach</p> <p>(</p> <p>Object</p> <p>LINK</p> <p>in</p> <p>PMD</p> <p>.Locallinks</p> <p>)</p> <p>{</p> <p>Parseurl</p> <p>(Convert)</p> <p>.Tostring</p> <p>(</p> <p>LINK</p> <p>)</p> <p>,</p> <p>ENC</p> <p>,</p> <p>Browser</p> <p>)</p> <p>;</p> <p>}</p> <p>}</p> <p>}</p> <p>Listing 3 - Combining The link Parsing and page Download Code.</p> <p>Review The Three Fundamental Tasks for a search Spider, and you can see we've developed ENOUGH CODE To Build IT:</p> <p>Finding the pages to index - we can start at a specific Url and find links using Listings 2 & 3. Downloading each page successfully - we can do this using the WebClient in Listings 1 & 2. Parsing the page content and indexing it - we already have this code from Article IAlthough the example above is picky about what links it will find, it will work to 'spider' and then search a website! FYI, you can view the 'alpha version' of the code and use it in conjunction with . the other files from Article I to search the catalog The remainder of this article discusses the changes required to this code to fix the shortcomings discussed earlier; the ZIP file contains a complete set of updated code.</p> <p>FIX The Spider [Searcharoospider.aspx]</p> <p>Problem 1 - Correctly Parsing Relative Links</p> <p>The Alpha Code Fails To Follow 'Relative' AND 'Absolute' Links (Eg. "../../News/page.htm" and "/news/page2.htm" respective) Partly Because It Does Not 'Remember' What What What? folder / subdirectory it is parsing. My first instinct was to build a new 'Url' class which would take a page URL and a link, and encapsulate the code required to build the complete link by resolving directory traversal (eg "../" Absolute References (Eg. Starting with "/"). The code 10 NEED TO do Something Like this:</p> <p>Page URLLLINK IN PAGERESULT Should Behttp: //localhost/news/page2.htmhttp://localhost/news/page2.htmhttp://localhost/news/../localhost/contact.htmhttp: // Localhost / news // downloads / http: // localhost / downloads / etc.</p> <p>Solution: URI Class</p> <p>The first lesson to learn when you have a class library at your disposal is LOOK BEFORE YOU CODE. It was almost by accident that I stumbled across the Uri class, which has a constructor -new Uri (baseUri, relativeUri)</p> <p>- That does exactly what I need. No re-inventing the wheel!</p> <p>Problem 2 - Following Redirects</p> <p>Following relative links is made even more difficult because the WebClient class, while it enabled us to quickly get the spider up-and-running, is pretty dumb. It does not expose all the properties and methods required to properly emulate a web browser's behaviour. .. IT IS Capable of Following Redirects Issued By A Server, But It Has No Simple Interface To Communicate To The Calling Code Exactly What URL IT Ended Up Requesting.</p> <p>Solution: httpwebrequest & httpwebresponse classes</p> <p>The HttpWebRequest and httpwebresponse classes provide a MUch More Powerful Interface for http communication. Httpwebrequest Has A Number of Useful Properties, include:</p> <p>! AllowAutoRedirect - configurable MaximumAutomaticRedirections - redirection can be limited to prevent 'infinite loops' in naughty pages UserAgent - set to "Mozilla / 6.0 (MSIE 6.0; Windows NT 5.1; Searcharoo.NET Robot)" (see Problem 5 below) KeepAlive - efficient Use of connections timeout - configurable based on the expected performance of the target Website</p> <p>which are set in the code to help us get the pages we want HttpWebResponse has one key property - ResponseUri - that returns the final Uri that was read; for example, if we tried to access http:. // localhost / and the server issued a 302-redirect to /en/index.html then the HttpWebResponseInstance.ResponseUri would be http: //localhost/en/index.html and NOT just http:. // localhost / This is important because unless we know the URL of the CURRENT PAGE, WE Cannot Process Relative Links Correctly (See Problem 1) .problem 3 - Using The Correct Character-set by When Download Files</p> <p>Getting content-type</p> <p>Solution: httpwebresponse and the encoding namespace</p> <p>The Httpwebresponse Has Another Advantage Over WebClient: It's Easier To Access Http Server Headers Such as the contenttype and contentencoding. This enables the following code to be written:</p> <p>IF</p> <p>(</p> <p>WebResponse</p> <p>.Contentencoding</p> <p>!</p> <p>= String</p> <p>Empty</p> <p>)</p> <p>{</p> <p>// use the httpheader content-type in preference to the one set in meta</p> <p>HTMLDOC</p> <p>.Encoding</p> <p>=</p> <p>WebResponse</p> <p>.Contentencoding</p> <p>;</p> <p>}</p> <p>Else</p> <p>IF</p> <p>(</p> <p>HTMLDOC</p> <p>.Encoding</p> <p>=</p> <p>= String</p> <p>Empty</p> <p>)</p> <p>{</p> <p>// Todo: if still no encoding determined, Try to readline the stream untric we find</p> <p>// * meta content-type or * </ head> (ie. Stop loops for meta)</p> <p>HTMLDOC</p> <p>.Encoding</p> <p>=</p> <p>"UTF-8"</p> <p>;</p> <p>// default</p> <p>}</p> <p>//http://www.c-sharpcorner.com/code/2003/Dec/readingwebpagesources.asp system</p> <p>.Io</p> <p>.StreamReader</p> <p>Stream</p> <p>=</p> <p>New system</p> <p>.Io</p> <p>.StreamReader</p> <p>(</p> <p>WebResponse</p> <p>.GetResponsestream</p> <p>(</p> <p>)</p> <p>, ENCODING</p> <p>.Geetencoding</p> <p>(</p> <p>HTMLDOC</p> <p>.Encoding</p> <p>)</p> <p>)</p> <p>;</p> <p>HTMLDOC</p> <p>.Uri</p> <p>=</p> <p>WebResponse</p> <p>.RESPONSEURI</p> <p>;</p> <p>// We * May * Have Been Redirected ... and we want the * final * URL</p> <p>HTMLDOC</p> <p>.Length</p> <p>=</p> <p>WebResponse</p> <p>.ContentLength;</p> <p>HTMLDOC</p> <p>.All .all .all .all</p> <p>=</p> <p>Stream</p> <p>.Readtoend</p> <p>(</p> <p>)</p> <p>;</p> <p>Stream</p> <p>.Close</p> <p>(</p> <p>)</p> <p>;</p> <p>Listing 4 - Check The Http Content Encoding and Use The Correct Encoding Class to Process The Byte [] Array Returned from The Server</p> <p>Elsewhere in The Code We Use The ContentType To Paarse Out The Mime-Type of The Data, So That We Can Ignore Images, Stylesheets (AND, For this Version, Word, PDF, Zip and other file types).</p> <p>Problem 4 - Does Not Recognise Many Valid Link Formats</p> <p>When Building The Alpha Code I Implement I Could Find To Locate Links in A String - (? <= <(A | area) / s href = "). *? (? =" / S * /? >). The problem is this it is far too dumb to find the massity of links.</p> <p>Solution: Smarter Regular Expressions</p> <p>. Regular Expressions can be very powerful, and clearly a more complex expression was required Not being an expert in this area, I turned to Google and eventually Matt Bourne who posted a couple of very useful Regex patterns, which resulted in the following code:</p> <p>// http://msdn.microsoft.com/library/en-us/script56/HTML/JS56JSGRPREGEXPSYNTAX.ASP</p> <p>// Original Regex, Just Found <a href=""> links; and was "broky" by spaces, out-of-order, etc</p> <p>@ "(? <= <A / s href ="). *? (? = "/ s * /?>)"</p> <p>Foreach</p> <p>(Match</p> <p>Match</p> <p>in Regex</p> <p>.Matches</p> <p>(</p> <p>HTMLDATA</p> <p>, @</p> <p>"(? <anchor> </ s * (a | area) / s * (?: (?: / b / w / b / s * (?: = / s * (?:"</p> <p>"[^"</p> <p>"] *"</p> <p>"| '[^'] * '| [^"</p> <p>"<>] ) / s *)?) *)? / s *>)"</p> <p>, RegexOptions</p> <p>.IGNORECASE</p> <p>| Regexoptions</p> <p>.Explicitcapture</p> <p>)</p> <p>)</p> <p>{</p> <p>// PARSE All Attributes from With Tags ... Important when're out of Order !!</p> <p>// in addition to the 'href' attribute, there is might also be 'alt', 'class', 'style', 'Area', etc ...</p> <p>// there might also be 'spaces' between the attributes and the year be ",', or uNuoted</p> <p>LINK</p> <p>= String</p> <p>Empty</p> <p>;</p> <p>Foreach</p> <p>(Match</p> <p>Submatch</p> <p>in Regex</p> <p>.Matches</p> <p>(</p> <p>Match</p> <p>.Value</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>, @</p> <p>"(? <name> / b / w / b) / s * = / s * ("</p> <p>"(? <value> [^"</p> <p>"] *)"</p> <p>"| '(? <value> [^'] *) '| (? <value> [^"</p> <p>"<> / s] ) / s *) "</p> <p>, RegexOptions</p> <p>.IGNORECASE</p> <p>| Regexoptions</p> <p>.Explicitcapture</p> <p>)</p> <p>)</p> <p>{</p> <p>// We're only intended in the Href Attribute (Although IN FUTURE MAYBE INDEX THE 'Alt' / 'Title'?)</p> <p>IF</p> <p>(</p> <p>"href"</p> <p>=</p> <p>=</p> <p>Submatch</p> <p>.Groups</p> <p>[</p> <p>1</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>)</p> <p>{</p> <p>LINK</p> <p>=</p> <p>Submatch</p> <p>.Groups</p> <p>[</p> <p>2</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>;</p> <p>Break</p> <p>;</p> <p>}</p> <p>}</p> <p>/ * check for interNal / External Link and supported scheme, the add to arraylist * /</p> <p>}</p> <p>// foreach</p> <p>Listing 5 - More Powerful Regex Matching</p> <p>Listing 5 Performs Three Steps:</p> <p>Match Entire Link Tags (from <to>) include. The Match.value for Each Match Could Be and of the link Samples Shown Earlier <a href='news.htm'> <a href = news. HTM> <A class="csslink" href="news.htm"> <area shape = Rect "COORDS =" 0, 0, 110, 20 "href =" news.htm "> <area href = 'news.htm' Shape = "Rect" COORDS = "0,0,110,20"> The second expression matches the key-value pairs of each attribute, so it will return: href = 'news.htm' href = news.htm class = "csslink" HREF = "news.htm" Shape = "Rect" COORDS = "0, 0, 110, 20" href = "news.htm" href = 'news.htm' shape = "Rect" Coords = "0, 0, 110, 20" WE Access The Groups Wtem for The HREF Attribute, Which Becomes a Link for US to Process.the Combination of these Two regular expressions makes the link paingsing a lot more row.</p> <p>Problem 5 - Poor meta-tag handling</p> <p>The alpha has very rudimentary META tag handling - so primative that it accidentally assumed <META NAME = "" CONTENT = ""> instead of the correct <META HTTP-EQUIV = "" CONTENT = ""> format There are two reasons to. Process the Meta Tags Correctly: (1) TO Get The Description and keywords for this document, and (2) read the robots tag so much Our spider behaves nicely when our spotted.</p> <p>Solution: Smarter Regular Expressions and Support for more tags</p> <p>Using a variation of the Regular Expressions from Problem 4, the code parses out the META tags as required, adds Keywords and Description to the indexed content and stores the Description for display on the Search Results page.string</p> <p>Metakey</p> <p>= String</p> <p>Empty</p> <p>,</p> <p>Metavalue</p> <p>= String</p> <p>Empty</p> <p>;</p> <p>Foreach</p> <p>(Match</p> <p>Metamatch</p> <p>in Regex</p> <p>.Matches</p> <p>(</p> <p>HTMLDATA</p> <p>, @</p> <p>"<meta / s * (?: (?: / b (/ w | -) / b / s * (?: = / s * (?:"</p> <p>"[^"</p> <p>"] *"</p> <p>"| '[^'] * '| [^"</p> <p>"<>] ) / s *)?) *) /? / s *>"</p> <p>, RegexOptions</p> <p>.IGNORECASE</p> <p>| Regexoptions</p> <p>.Explicitcapture</p> <p>)</p> <p>)</p> <p>{</p> <p>Metakey</p> <p>= String</p> <p>Empty</p> <p>;</p> <p>Metavalue</p> <p>= String</p> <p>Empty</p> <p>;</p> <p>// loop through the attribute / value pairs inside the tag</p> <p>Foreach</p> <p>(Match</p> <p>Submetamatch</p> <p>in Regex</p> <p>.Matches</p> <p>(</p> <p>Metamatch</p> <p>.Value</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>, @</p> <p>"(? <name> / b (/ w | -) / b) / s * = / s * ("</p> <p>"(? <value> [^"</p> <p>"] *)"</p> <p>"| '(? <value> [^'] *) '| (? <value> [^"</p> <p>"<>] ) / s *) "</p> <p>, RegexOptions</p> <p>.IGNORECASE</p> <p>| Regexoptions</p> <p>.Explicitcapture</p> <p>)</p> <p>)</p> <p>{</p> <p>IF</p> <p>(</p> <p>"http-equiv"</p> <p>=</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>1</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>)</p> <p>{</p> <p>Metakey</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>2</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>;</p> <p>}</p> <p>IF</p> <p>(</p> <p>(</p> <p>"name"</p> <p>=</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>1</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>)</p> <p>&</p> <p>&</p> <p>(</p> <p>Metakey</p> <p>=</p> <p>= String</p> <p>Empty</p> <p>)</p> <p>)</p> <p>{</p> <p>// if it's already set, http-equiv takes precedence</p> <p>Metakey</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>2</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>;</p> <p>}</p> <p>IF</p> <p>(</p> <p>"Content"</p> <p>=</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>1</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>)</p> <p>{</p> <p>Metavalue</p> <p>=</p> <p>Submetamatch</p> <p>.Groups</p> <p>[</p> <p>2</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>;</p> <p>}</p> <p>}</p> <p>Switch</p> <p>(</p> <p>Metakey</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>)</p> <p>{</p> <p>Case</p> <p>"description"</p> <p>:</p> <p>HTMLDOC</p> <p>.Description</p> <p>=</p> <p>Metavalue</p> <p>;</p> <p>Break</p> <p>;</p> <p>Case</p> <p>"keywords"</p> <p>:</p> <p>Case</p> <p>"keyword"</p> <p>:</p> <p>HTMLDOC</p> <p>Keywords</p> <p>=</p> <p>Metavalue</p> <p>;</p> <p>Break</p> <p>;</p> <p>Case</p> <p>"robots"</p> <p>:</p> <p>Case</p> <p>"robot"</p> <p>:</p> <p>HTMLDOC</p> <p>.Setrobotdirective</p> <p>(</p> <p>Metavalue</p> <p>)</p> <p>;</p> <p>Break</p> <p>;</p> <p>}</p> <p>}</p> <p>Listing 6 - Parsing Meta Tags Is A Two Step Process, Because We Have to Check The 'Name / Http-Equiv' So That We know What the Content Relate To!</p> <p>It also obeys the ROBOTS NOINDEX and NOFOLLOW directives if they appear in the META tags (you can read more about the Robot Exclusion Protocol as it relates to META tags; note that we have not implemented support for the robots.txt file which sites in the root of a website - perhaps in version 3) We also set our User-Agent (Solution 2) to indicate that we are a Robot so that the web log of any site we spider will clearly differentiate our requests from regular browsers; it!. Also Enables U.</p> <p>Spidering the web!</p> <p>When you load the SearcharooSpider.aspx page it immediately begins spidering, starting with either (a) the root document in the folder where the file is located, OR (b) the location specified in web.config (if it exists).</p> <p>Screenshot 1 - The title of each page is displayed as it is space Factbook as test data</p> <p>Once the catalog is Built, You are ready to search.</p> <p>Performing the search</p> <p>All The Hard Work Was Done in Article 1 - This Code Is Repeated for Your Information ...</p> <p>/// <summary> Returns all the files which contain the searchword </ summary></p> <p>/// <Returns> Hashtable </ returns> Public HashTable Search</p> <p>(</p> <p>String</p> <p>Searchword</p> <p>)</p> <p>{</p> <p>// Apply the Same 'Trim' As When We're Building The Catalog</p> <p>Searchword</p> <p>=</p> <p>Searchword</p> <p>RIM</p> <p>(</p> <p>'?'</p> <p>,</p> <p>'/ "</p> <p>,</p> <p>','</p> <p>,</p> <p>'/' '</p> <p>,</p> <p>';'; '</p> <p>,</p> <p>':'</p> <p>,</p> <p>'.'. '</p> <p>,</p> <p>'</p> <p>,</p> <p>')'</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>;</p> <p>Hashtable</p> <p>RetVal</p> <p>=</p> <p>NULL</p> <p>;</p> <p>IF</p> <p>(</p> <p>Index</p> <p>.Containskey</p> <p>(</p> <p>Searchword</p> <p>)</p> <p>)</p> <p>{</p> <p>// does all the work !!! Word</p> <p>Thematch</p> <p>=</p> <p>(Word</p> <p>)</p> <p>Index</p> <p>[</p> <p>Searchword</p> <p>]</p> <p>;</p> <p>RetVal</p> <p>=</p> <p>Thematch</p> <p>.Infiles</p> <p>(</p> <p>)</p> <p>;</p> <p>// Return the Collection of File Objects</p> <p>}</p> <p>Return</p> <p>RetVal</p> <p>;</p> <p>}</p> <p>Article 1 Listing 8 - The Search Method of The Catalog Object</p> <p>We have not modified any of the Search objects in the diagram at the start of this article, in an effort to show how data encapsulation allows you to change both the way you collect data (ie. From filesystem crawling to website spidering) and the way you present data (ie. updating the search results page) without affecting your data tier. In article 3 we'll examine if it's possible to convert the Search objects to use a database back-end without affecting the collection and presentation classes ...</p> <p>Improving the results [SearcharoOTOO.ASPX]</p> <p>THESE Are The Changes We Will Make To The Results Page:</p> <p>ENABLE SEARCHING for more Than One Word and Requiring All Terms To Appear in The Resulting Document Matches (Boolean and Search) Improved Formatting, Impruted:</p> <p>Pre-Filled Search Box on The Results Page Document Count For Each Term in The Query, and Link to View Those Results Time Taken To Perform Query</p> <p>The first change to support searching on muliple terms is to 'parse' the query typed by the user This means:. Trimming whitespace from around the query, and compressing whitespace between the query terms We then Split the query into an Array [] of. Words and Trim Any Puncture from Around Each Term.Searchterm</p> <p>= Request</p> <p>.QueryString</p> <p>[</p> <p>"SearchFor"</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>RIM</p> <p>(</p> <p>'' '</p> <p>)</p> <p>;</p> <p>Regex R</p> <p>=</p> <p>New Regex</p> <p>(@</p> <p>"/ s "</p> <p>)</p> <p>;</p> <p>// Remove All Whitespace</p> <p>Searchterm</p> <p>= r</p> <p>.Replace</p> <p>(</p> <p>Searchterm</p> <p>,</p> <p>""</p> <p>)</p> <p>;</p> <p>// TO a Single Space</p> <p>Searchterma</p> <p>=</p> <p>Searchterm</p> <p>.Split</p> <p>(</p> <p>'' '</p> <p>)</p> <p>;</p> <p>// THEN SPLIT</p> <p>for</p> <p>(</p> <p>INT i</p> <p>=</p> <p>0</p> <p>i <</p> <p>Searchterma</p> <p>.Length</p> <p>i</p> <p> </p> <p> </p> <p>)</p> <p>{</p> <p>// Array of Search Terms</p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p>=</p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p>RIM</p> <p>(</p> <p>'' '</p> <p>,</p> <p>'?'</p> <p>,</p> <p>'/ "</p> <p>,</p> <p>','</p> <p>,</p> <p>'/' '</p> <p>,</p> <p>';'; '</p> <p>,</p> <p>':'</p> <p>,</p> <p>'.'. '</p> <p>,</p> <p>'</p> <p>,</p> <p>')'</p> <p>)</p> <p>.Tolower</p> <p>(</p> <p>)</p> <p>;</p> <p>// Get Trimmed Individally</p> <p>}</p> <p>Listing 7 - The Search Method of The Catalog Object</p> <p>Now that we have an Array of the individual search terms, we will find ALL the documents matching each individual term. This is done using the same m_catalog.Search () method from Article I. After each search we check if any results were returned, And Store The SearchResultsarrayArray to Process Further.</p> <p>// array of arrays of results match one of the search criteria hashtable</p> <p>[</p> <p>]</p> <p>SearchResultsarrayArray</p> <p>=</p> <p>New hashtable</p> <p>[</p> <p>Searchterma</p> <p>.Length</p> <p>]</p> <p>;</p> <p>// FinalResultsArray Is Populated with Pages That * Match * All The Search Criteria HybridDictionary</p> <p>FinalResultsArray</p> <p>=</p> <p>New hybriddictionary</p> <p>(</p> <p>)</p> <p>;</p> <p>// html output stringstring</p> <p>Matches</p> <p>=</p> <p>""</p> <p>;</p> <p>Bool</p> <p>BothertofindMatches</p> <p>=</p> <p>True</p> <p>;</p> <p>int</p> <p>IndexOfshortestResultset</p> <p>=</p> <p>-</p> <p>1</p> <p>,</p> <p>LengthofshortestResultset</p> <p>=</p> <p>-</p> <p>1</p> <p>;</p> <p>for</p> <p>(</p> <p>INT i</p> <p>=</p> <p>0</p> <p>i <</p> <p>Searchterma</p> <p>.Length</p> <p>i</p> <p> </p> <p> </p> <p>)</p> <p>{</p> <p>SearchResultsarrayArray</p> <p>[i</p> <p>]</p> <p>=</p> <p>M_catalog</p> <p>.Search</p> <p>(</p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p>.Tostring</p> <p>(</p> <p>)</p> <p>)</p> <p>;</p> <p>// ##### THE Search #####</p> <p>IF</p> <p>(</p> <p>NULL</p> <p>=</p> <p>=</p> <p>SearchResultsarrayArray</p> <p>[i</p> <p>]</p> <p>)</p> <p>{</p> <p>Matches</p> <p> </p> <p>=</p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p> </p> <p><font color = gray style = 'font-size: xx-small> (not found </ font> "</p> <p>;</p> <p>BothertofindMatches</p> <p>=</p> <p>False</p> <p>;</p> <p>// if * any one * of the Terms isn't Found, There Won't be a 'set' of matches</p> <p>}</p> <p>Else</p> <p>{</p> <p>int</p> <p>Resultsinthisset</p> <p>=</p> <p>SearchResultsarrayArray</p> <p>[i</p> <p>]</p> <p>.Count</p> <p>;</p> <p>Matches</p> <p> </p> <p>=</p> <p>"<a href = /"? Searchfor = "</p> <p> </p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p> </p> <p>"/"> "</p> <p> </p> <p>Searchterma</p> <p>[i</p> <p>]</p> <p> </p> <p>"</a> <font color = gray style = 'font-size: xx-small'> ("</p> <p> </p> <p>Resultsinthisset</p> <p> </p> <p>") </ font>"</p> <p>;</p> <p>IF</p> <p>(</p> <p>(</p> <p>LengthofshortestResultset</p> <p>=</p> <p>=</p> <p>-</p> <p>1</p> <p>)</p> <p>|</p> <p>|</p> <p>(</p> <p>LengthofshortestResultSet></p> <p>Resultsinthisset</p> <p>)</p> <p>)</p> <p>{</p> <p>IndexOfshortestResultset</p> <p>= I</p> <p>;</p> <p>LengthofshortestResultset</p> <p>=</p> <p>Resultsinthisset</p> <p>;</p> <p>}</p> <p>}</p> <p>}</p> <p>Listing 8 - Find The Results for Each of The Terms Individally</p> <p>Describing how we find the documents that match ALL words in the query is easiest with an example, so imagine we're searching for the query "snow cold weather" in the CIA World FactBook. Listing 8 found the Array of documents matching each word, and placed them inside another Array. "snow" has 10 matching documents, "cold" has 43 matching documents and "weather" has 22 matching documents.Obviously the maximum possible number the of overall matches is 10 (the smallest result set), and minimum is zero -. maybe there are NO documents that appear in all three collections Both of these possibilities catered for - indexOfShortestResultSet remembers which word had fewest results and botherToFindMatches is set to false if any word fails to get a single match.</p> <p>Diagram 1 - Finding The Intersection of The Result Sets for Each Word Involves Traversing The 'Array Of Arrays'</p> <p>Listing 9 shows how we approached this problem. It may not be the most efficient way to do it, but it works! Basically we choose the smallest resultset and loop through its matching Files, looping through the SearchResultsArrayArray (counter 'cx') looking for That Same File in All The Other Results.</p> <p>Imagine, Referring to The Diagram Above, That We Begin with [0] [0] FILE D (We Start with Index [0] "Snow" Because IT's The Smallst Set, Not Just Because it's item 0). The loop bellow Start Checking All The Other Files To See IT Finds D Again ... But n't start in set [0] BECAUSE WE Already Know That D Is UNIQUE IN this set. "IF (cx == c)" Checks That Condition and prevents looping through resultset [0].</p> <p>Counter 'CX' Will Be Incremented To 1, And The Loop Will Begin Examing Items [1] [0], [1] [1], [1] [2], [1] [3], [1] [4 (FILES G, E, S, H, K, D) But 'ney (fo.key = fox.key) "Won't match because we are still search for matches to file [0] [0] D. However , on the next orthopes, sowe ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ? file exists in both sets I chose a very simple solution - count the number of sets we're comparing totalcount - and keep adding to the matchcount when we find the file in a set We can then safely break out of that loop (knowing. That the file is unique within a resultset, and we would noting the next resultset.</p> <p>After the looping has completed, "if (matchcount == totalcount)" then we know this file exists in ALL the sets, and can be added to the FinalResultsArray, which is what we'll use to show the results page to the user.</p> <p>The looping will continue with 'cx' incremented to 2, and the "weather" matches will be checked for file D. It is found at position [2] [2] and the matchcount will be adjusted accordingly. The whole looping process will then Begin again in the "snow" matches [0] [1] FILE G, AND All The Other Files Will Again Be Checked Against this One To See if IT exissrs in all sets.</p> <p>After a LOT of looping, the code will discover that only files D and G exist in all three sets, and the finalResultsArray will have just two elements which it passes to the same display-code from Listings 10-13 in Article I.</p> <p>// Find The Common Files from the array of arrays of documents // matching one of the criteria</p> <p>IF</p> <p>(</p> <p>BothertofindMatches</p> <p>)</p> <p>{</p> <p>// all Words Have * Some * matches</p> <p>Int C</p> <p>=</p> <p>IndexOfshortestResultset</p> <p>;</p> <p>// loop through the * Shortest * Resultset HashTable</p> <p>SearchResultsArray</p> <p>=</p> <p>SearchResultsarrayArray</p> <p>[c</p> <p>]</p> <p>;</p> <p>IF</p> <p>(</p> <p>NULL</p> <p>!</p> <p>=</p> <p>SearchResultsArray</p> <p>)</p> <p>Foreach</p> <p>(</p> <p>Object</p> <p>FoundInfile</p> <p>in</p> <p>SearchResultsArray</p> <p>)</p> <p>{</p> <p>// for Each File In The * Shortest * Result Set Dictionaryentry</p> <p>fo</p> <p>=</p> <p>(Dictionaryentry</p> <p>)</p> <p>FoundInfile</p> <p>;</p> <p>// Find Matching Files in The Other Resultsets</p> <p>int</p> <p>Matchcount</p> <p>=</p> <p>0</p> <p>,</p> <p>Totalcount</p> <p>=</p> <p>0</p> <p>,</p> <p>Weight</p> <p>=</p> <p>0</p> <p>;</p> <p>for</p> <p>(</p> <p>int</p> <p>CX</p> <p>=</p> <p>0</p> <p>;</p> <p>CX <</p> <p>SearchResultsarrayArray</p> <p>.Length</p> <p>;</p> <p>CX</p> <p> </p> <p> </p> <p>)</p> <p>{</p> <p>Totalcount</p> <p> </p> <p>=</p> <p>(</p> <p>CX</p> <p> </p> <p>1</p> <p>)</p> <p>;</p> <p>// Keep Track, So We CAN Compare At The End (if Term is in All Results)</p> <p>IF</p> <p>(</p> <p>CX</p> <p>=</p> <p>= C</p> <p>)</p> <p>{</p> <p>// Current Resultset</p> <p>Matchcount</p> <p> </p> <p>=</p> <p>(</p> <p>CX</p> <p> </p> <p>1</p> <p>)</p> <p>;</p> <p>// Implicitly Matches in The Current Resultset</p> <p>Weight</p> <p> </p> <p>=</p> <p>(</p> <p>int</p> <p>)</p> <p>fo</p> <p>.Value</p> <p>;</p> <p>// sum the weighting</p> <p>}</p> <p>Else</p> <p>{</p> <p>Hashtable</p> <p>SearchResultsArrayx</p> <p>=</p> <p>SearchResultsarrayArray</p> <p>[</p> <p>CX</p> <p>]</p> <p>;</p> <p>IF</p> <p>(</p> <p>NULL</p> <p>!</p> <p>=</p> <p>SearchResultsArrayx</p> <p>)</p> <p>Foreach</p> <p>(</p> <p>Object</p> <p>FoundInfilex</p> <p>in</p> <p>SearchResultsArrayx</p> <p>)</p> <p>{</p> <p>// for Each File In The Result Set Dictionaryentry</p> <p>Fox</p> <p>=</p> <p>(Dictionaryentry</p> <p>)</p> <p>FoundInfilex</p> <p>;</p> <p>IF</p> <p>(</p> <p>fo</p> <p>Key</p> <p>=</p> <p>=</p> <p>Fox</p> <p>Key</p> <p>)</p> <p>{</p> <p>// see if it matches</p> <p>Matchcount</p> <p> </p> <p>=</p> <p>(</p> <p>CX</p> <p> </p> <p>1</p> <p>)</p> <p>;</p> <p>// and if it matches, TRACK THE MATCHCOUNT</p> <p>Weight</p> <p> </p> <p>=</p> <p>(</p> <p>int</p> <p>)</p> <p>Fox</p> <p>.Value</p> <p>;</p> <p>// and weighting; the Break out of loop, Since</p> <p>Break</p> <p>;</p> <p>// no need to keep loops}</p> <p>}</p> <p>// foreach</p> <p>}</p> <p>// if</p> <p>}</p> <p>// for</p> <p>IF</p> <p>(</p> <p>(</p> <p>Matchcount></p> <p>0</p> <p>)</p> <p>&</p> <p>&</p> <p>(</p> <p>Matchcount</p> <p>=</p> <p>=</p> <p>Totalcount</p> <p>)</p> <p>)</p> <p>{</p> <p>// Was matched in Each Array</p> <p>fo</p> <p>.Value</p> <p>=</p> <p>Weight</p> <p>;</p> <p>// set the 'weight' in the combined results to the sum of individual document matches</p> <p>IF</p> <p>(</p> <p>!</p> <p>FinalResultsArray</p> <p>.Contains</p> <p>(</p> <p>fo</p> <p>Key</p> <p>)</p> <p>)</p> <p>FinalResultsArray</p> <p>.Ad</p> <p>(</p> <p>fo</p> <p>Key</p> <p>,</p> <p>fo</p> <p>)</p> <p>;</p> <p>}</p> <p>// if</p> <p>}</p> <p>// foreach</p> <p>}</p> <p>// if</p> <p>Listing 9 - Finding The Sub-Set of Documents That Contain Every Word in The Query. There's Three Nested Loops in There - I Never Said this Was Efficient!</p> <p>The algorithm described above is performing a boolean AND query on all the words in the query, ie. The example is searching for "snow AND cold AND weather". If we wished to build an OR query, we could simply loop through all the files And filter out duplicates. or queries aren't this useful unless, such as "Snow and clauses" - But this is not supported in Version 2!</p> <p>BTW, the variables in that code which I've called "Array" for simplicity are actually either Hashtables or HybridDictionaries Do not be confused when you look at the code -. There were good reasons why each Collection class was chosen (mainly that I Didn't Know In Advance The Final Number of items, so useful is too hard.</p> <p>The finished result</p> <p>Screenshot 2 - The Search Input Page Has Minor Changes, Including The FileName To SearcharoOTOO.ASPX!</p> <p>Screenshot 3 - You can refine your search, see the number of matches for each search term, view the time taken to perform the search and, most importantly, see the documents containing all the words in your query Using the sample code!</p> <p>The goal of this article was to build a simple search engine that you can install just by placing some files on your website;! So you can copy Searcharoo.cs, SearcharooSpider.aspx and SearcharooToo.aspx to your web root and away your go However ...................</p> <p>To change Those defaults you need to add some settings to web.config:</p> <p><appsettings> <add key = "searcharoo_virtualroot" value = "http: // localhost /" /> <! - Website</p> <p>To Spider -> <add key = "searcharoo_requesttimeout" value = "5" /> <! - 5</p> <p>Second Timeout when Downloading -> <add key = "searcharoo_recursionlimit" value = "200" /> <! - max</p> <p>Pages to Index -> </ appsettings> Listing 14 - Web.configthen SIMPLY NAVIGATE TO</p> <p>Http://localhost/searcharootoo.aspx (or wherever you put the search "and it will build the catch.</p> <p>If your application re-starts for any reason (. Ie You compile code into the / bin / folder, or change web.config settings) the catalog will need to be rebuilt - the next user who performs a search will trigger the catalog build. ............. ..</p> <p>FUTURE</p> <p>SearcharooSpider.aspx greatly increases the utility of Searcharoo, because you can now index your static and dynamic (eg. Database generated) pages to allow visitors to search your site. That means you could use it with products like Microsoft Content Management Server (CMS) Which Does Not Expose It's Content-Database Directly.The Two Remaining (Major) Problems With Searcharoo Are:</p> <p>(a) IT cannot Persist the catalog to disk or a database - meaning this a lot of memory to be used to store the catalog, and</p> <p>(B) Most websites contain more than just HTML pages; they also link to Microsoft Word or other Office files, Adobe Acrobat (PDF Files) and other forms of content which Searcharoo currently can not 'understand' (ie parse and catalog.).</p> <p>The Next Articles in this Series Will (Hopefully) Examine these Two Problems in more detil ...</p> <p>Glossary</p> <p>TermMeaningHTMLHyper Text Markup LanguageHTTPHyper Text Transmission ProtocolURLUniversal Resource LocatorURIUniversal Resource IdentifierDOMDocument Object Model302 RedirectThe HTTP Status code that tells a browser to redirect to a different URL / page.UTF-8Unicode Transformation Format - 8 bitMIME TypeMulitpart Internet Mail ExtensionSpiderProgram that goes from webpage to webpage by finding and following links in the HTML: visualize a spider crawling on a web:) CrawlerAlthough the terms 'spider' and 'crawler' are often used interchangably, we'll use 'crawler' to refer to a program that locates target pages on a filesystem OR External 'List'; WHEREAS A 'Spider' Will Find Other Pages Via Embedded Links.shift_JIS, GB2312Character Sets ... Search Engine Glossary</p> <p>Postscript:? What about code-behind and Visual-Studio.NET (from Article I) You'll notice the two ASPX pages use the src = "Searcharoo.cs" @Page attribute to share the common object model without compiling to an assembly WITH 'Using <script runat = "server"> tags (Similar to ASP3.0).</p> <p>The advantage of this approach is that you can place these three files in any ASP.NET website and they'll 'just work'. There are no other dependencies (although they work better if you set some web.config settings) and no DLLs TO WORRY About.</p> <p>However, this also means these pages can not be edited in Visual-Studio.NET, because it does not support the @Page src = "" attribute, instead preferring the codebehind = "" attribute coupled with CS files compiled to the / bin / directory. to get these pages working in VisualStudio.NET you'll have to setup a Project and add the CS file and the two ASPX files (you can move the <script> code into the code-behind if you like) then compile .</p> <p>Links</p> <p>Code for this article [zip 24kb]</p> <p>Article I - Which Describes The Data Model and Initial Implementation</p> <p>Working with single-file Web Forms Pages in Visual Studio .NET (To Help Those Wanting To Use Visualstudio)</p></div><div class="text-center mt-3 text-grey"> 转载请注明原文地址:https://www.9cbs.com/read-124772.html</div><div class="plugin d-flex justify-content-center mt-3"></div><hr><div class="row"><div class="col-lg-12 text-muted mt-2"><i class="icon-tags mr-2"></i><span class="badge border border-secondary mr-2"><h2 class="h6 mb-0 small"><a class="text-secondary" href="tag-2.html">9cbs</a></h2></span></div></div></div></div><div class="card card-postlist border-white shadow"><div class="card-body"><div class="card-title"><div class="d-flex justify-content-between"><div><b>New Post</b>(<span class="posts">0</span>) </div><div></div></div></div><ul class="postlist list-unstyled"> </ul></div></div><div class="d-none threadlist"><input type="checkbox" name="modtid" value="124772" checked /></div></div></div></div></div><footer class="text-muted small bg-dark py-4 mt-3" id="footer"><div class="container"><div class="row"><div class="col">CopyRight © 2020 All Rights Reserved </div><div class="col text-right">Processed: <b>0.045</b>, SQL: <b>9</b></div></div></div></footer><script src="./lang/en-us/lang.js?2.2.0"></script><script src="view/js/jquery.min.js?2.2.0"></script><script src="view/js/popper.min.js?2.2.0"></script><script src="view/js/bootstrap.min.js?2.2.0"></script><script src="view/js/xiuno.js?2.2.0"></script><script src="view/js/bootstrap-plugin.js?2.2.0"></script><script src="view/js/async.min.js?2.2.0"></script><script src="view/js/form.js?2.2.0"></script><script> var debug = DEBUG = 0; var url_rewrite_on = 1; var url_path = './'; var forumarr = {"1":"Tech"}; var fid = 1; var uid = 0; var gid = 0; xn.options.water_image_url = 'view/img/water-small.png'; </script><script src="view/js/wellcms.js?2.2.0"></script><a class="scroll-to-top rounded" href="javascript:void(0);"><i class="icon-angle-up"></i></a><a class="scroll-to-bottom rounded" href="javascript:void(0);" style="display: inline;"><i class="icon-angle-down"></i></a></body></html><script> var forum_url = 'list-1.html'; var safe_token = 'mXmFhCwp1q4IzVSJ_2BGYsFTWE8T3RiV5qqQ9rkOuT_2B1_2FlUZ8dyOfQtUgrkk9MeSrexlPC8k0p5xoS0g3u'; var body = $('body'); body.on('submit', '#form', function() { var jthis = $(this); var jsubmit = jthis.find('#submit'); jthis.reset(); jsubmit.button('loading'); var postdata = jthis.serializeObject(); $.xpost(jthis.attr('action'), postdata, function(code, message) { if(code == 0) { location.reload(); } else { $.alert(message); jsubmit.button('reset'); } }); return false; }); function resize_image() { var jmessagelist = $('div.message'); var first_width = jmessagelist.width(); jmessagelist.each(function() { var jdiv = $(this); var maxwidth = jdiv.attr('isfirst') ? first_width : jdiv.width(); var jmessage_width = Math.min(jdiv.width(), maxwidth); jdiv.find('img, embed, iframe, video').each(function() { var jimg = $(this); var img_width = this.org_width; var img_height = this.org_height; if(!img_width) { var img_width = jimg.attr('width'); var img_height = jimg.attr('height'); this.org_width = img_width; this.org_height = img_height; } if(img_width > jmessage_width) { if(this.tagName == 'IMG') { jimg.width(jmessage_width); jimg.css('height', 'auto'); jimg.css('cursor', 'pointer'); jimg.on('click', function() { }); } else { jimg.width(jmessage_width); var height = (img_height / img_width) * jimg.width(); jimg.height(height); } } }); }); } function resize_table() { $('div.message').each(function() { var jdiv = $(this); jdiv.find('table').addClass('table').wrap('<div class="table-responsive"></div>'); }); } $(function() { resize_image(); resize_table(); $(window).on('resize', resize_image); }); var jmessage = $('#message'); jmessage.on('focus', function() {if(jmessage.t) { clearTimeout(jmessage.t); jmessage.t = null; } jmessage.css('height', '6rem'); }); jmessage.on('blur', function() {jmessage.t = setTimeout(function() { jmessage.css('height', '2.5rem');}, 1000); }); $('#nav li[data-active="fid-1"]').addClass('active'); </script>