Dotlucene: 37 line code full-text search

xiaoxiao2021-03-06 31

Dotlucene is a powerful open source search engine that is transplanted from Apache's Lucene (Java) project to .NET (C #).

Dotlucene has a very high efficiency and has features of search results, high light, search non-structural data, and localization. It is also compatible with Lucene's index, so you can migrate between different platforms without losing any index data.

This article describes how to complete the full-text search function using Dotlucene through a concise code.

This article translated from CodeProject

Dan Letecky

Dotlucene: Full-text Search for your intranet or Website Using 37 Lines of code, the article copyright is owned by the original author.

Translator:

Samuel chen

This article Source Code Download - 363 KB

Dotlucene Online Demo Download Download Source Code with Precoming Index and HTML Documents [Link]

Dotlucene: Excellent Full-text Search Engine

Is it possible to write a full-text search with a 37 line code? Well, I am preparing to make some tips to use dotlucene to do this trouble work.

Dotlucene is a portfront project of Jakarta Lucene search engine, which is

George Aroushh et al defends. Below is some of its features:

It can be used in ASP.NET, WINFORMS or Console applications; very efficient performance; search results rating; search results in keyword highlights; search structure and non-structural data; Metadata search (time query, search specified domain / Field ...) The index size is approximately 30% of the index text; and can store all documents that can be used. CAN Store Also Full Indexed Documents pure .NET hosting code, a single execution file (244 kB) Very friendly license (Apache Software License 2.0) Localization (support Brazil, Czech, Chinese, Dutch, English, French, Japanese, Korean and Russian) scalable (source code has included)

note

Don't care too much about the number of lines. I will use not more than 37 lines of code to give you a truthful core function, but to make a real practical application, you still need to spend more time ...

Demo project

Here, we will be a simple project demonstration to do the following:

Indexing HTML files found in the specified directory (including subdirectory) use an ASP.NET application to search for the word highlight in the search results

Dotlucene also has more potential. You probably want to do this in actual applications:

How do you add an index when a directory is added without recompiling the entire index contains a variety of file types. Dotluncene can index any file type that can be converted into plain text

Why not use Microsoft Indexing Server?

If you like to use index services, no problem. However, there will be more benefits using dotlucene:

Dotlucene is a single execution file for 100% hosting code that does not require any dependencies that it can be used to use a shared host. If you are ready for an index in advance, you can use it from using it, you can index any type data from any source ((Database, Website ...) (email, XML, HTML file ...). That is Because you need to provide plain text to the indexer (INDEXER), load and resolution depends on you allow you to select the specified property ("field") to be included in the index, so you can use these fields to search (for example, author, Date, keyword, etc.) It is an open source software It is easy to expand the first line: establish an index

The following code creates a new index from the storage store, and Directory is a directory path parameter that stores the index.

Indexwriter Writer = New Indexwriter (Directory, New StandardAnalyzer (), TRUE

In this example, we always recreate the index from this Example We Always Create The Index from Scratch, but this is not a must, you can also open an existing index and add a document. You can also update the existing document by deleting the new version of the new version (the translation: here should refer to the creation of the object)

No. 2 - 12 lines: add a document

We add two fields to each HTML document to the index:

"Text" field, the text content of the HTML file (removed the tag), the text data itself does not store the "path" field in the index, accommodating the file path, it will be (index and) full deposit index

Public void addHTMLDocument (String Path)

{

Document doc = new document ();

String RawText;

Using (streamreader sr = new streamreader (path, system.text.Encoding.default)

{

RawText = PARSEHTML (Sr.ReadToeend ());

}

Doc.add (Field.unStored ("Text", RawText));

Doc.Add (Field.Keyword ("Path", PATH));

Writer.addDocument (DOC);

}

Section 13 - 14 lines: Optimize and save the index

After adding a document, you need to turn off the indexer. Optimization will increase search performance.

Writer.optimize ();

Writer.close ();

Chapter 15: Open Index Search

You need to open an index before doing any search. The Directory parameter is a directory path that stores the index.

IndexSearcher Searcher = New IndexSearcher (DIRECTORY);

Search 16 - 27 lines: search

Now, we parse query ("text" is the default search field)

Query Query = queryParser.Parse (Q, "Text", New StandardAnalyzer ());

Hits Hits = Searcher.Search (Query);

Variable Hits is a collection of search results documents, we will use it to store results to DataTable

DataTable dt = new data (); dt.columns.add ("pat", typeof (string);

Dt.columns.add ("Sample", TypeOf (String));

For (int i = 0; i

{

// Get the Document from Index

Document doc = Hits.DOC (i);

// Get the Document FileName

// We can't get the text from the index Because We Didn't Store it there

DataRow Row = DT.NEWROW ();

Row ["Path"] = DOC.GET ("path");

Dt.Rows.Add (Row);

}

128 - 37 lines: highlight LINES 28 - 37: Query Highlighting

We first create a highlight object highlighter and will use a bold font to highlight ( query ).

Queryhlight = new queryhighlightextractor (query, new standardAnalyzer (), "", "");

By traversing the results, we will load the most similar part of the original text.

For (int i = 0; i
{

// ...

String plaintext;

STREAMREADER SR = New StreamReader (Doc.get ("FileName"), System.Text.Encoding.default))

{

PlainText = PARSEHTML (sr.readToend ());

}

Row ["Sample"] = Highlighter.getBestFragments (PlainText, 80, 2, "...");

// ...

}

Resource

Dotlucene Download Run Dotlucene Online DEMO DOTLUCENE ONLINE DEMO NOTLUCENE Documentation

转载请注明原文地址:https://www.9cbs.com/read-51119.html

9cbs

New Post(0)