Oracle to establish a binary file index

xiaoxiao2021-03-06  59

Oracle Text is Oracle's full-text search technology, part of the 9i version of the standard version and the enterprise version. Oracle Text uses standard SQL language indexes, looks out, and analyzes text and documents stored in Oracle databases, files, or networks. Oracle Text can perform language analysis on documents, use multiple methods to find documents, including keywords, context queries, logical operations, pattern matching, hybrid topics queries, HTML / XML paragraph lookups, etc. Oracle Text has superiority in hybrid inquiry that contains text and structured relationship properties. The following is an example.

The existing document table zycontent_table is a storage table for uploading files. In the BLOB type in the blob_content column, Oracle Text Technology provides an index for the binary text file in the BLOB column, the details are as follows:

First, prepare the mode of work document table to zyfileup, the document table is ZyContent_Table. Document table is defined as:

CREATE TABLE ZYCONTENT_TABLE (NAME VARCHAR (128) UNIQUE NOT NULL, MIME_TYPE VARCHAR (128), DOC_SIZE NUMBER, DAD_CHARSET VARCHAR (128), LAST_UPDATED DATE, CONTENT_TYPE VARCHAR (128), CONTENT LONG RAW, BLOB_CONTENT BLOB);

Second, establish a text index for the mode of the document table to authorize the database, authorize the Zyfileup.

GRANT "CTXAPP" TO "ZYFILEUP"; ALTER USER "ZYFILEUP" DEFAULT ROLE ALL;. GRANT EXECUTE ON "CTXSYS" "CTX_ADM" TO "ZYFILEUP";. GRANT EXECUTE ON "CTXSYS" "CTX_CATSEARCH" TO "ZYFILEUP"; GRANT EXECUTE ON "ctxsys". "CTX_Contains" to "zyfileup"; Grant Execute on "ctxsys". "CTX_DDL" to "zyfileup"; Grant Execute on "ctxsys". "CTX_Doc" to "zyfileup"; grant execute on "ctxsys". "CTX_Query" to "zyfileup"; Grant Execute on "ctxsys". "CTX_uleXer" to "zyfileup"; grant execute on "ctxsys". "CTX_Xpcontains" to "zyfileup"; zefileup;

Third, establish a document table to connect the database with the Zyfileup user, establish a text index reference

beginctx_ddl.create_preference ( 'mylex', 'BASIC_LEXER'); ctx_ddl.set_attribute ( 'mylex', 'printjoins',' _- '); ctx_ddl.create_preference (' mywordlist ',' BASIC_WORDLIST '); ctx_ddl.set_attribute (' mywordlist ',' PREFIX_INDEX ',' TRUE '); ctx_ddl.set_attribute (' mywordlist ',' PREFIX_MIN_LENGTH ', 1); ctx_ddl.set_attribute (' mywordlist ',' PREFIX_MAX_LENGTH ', 5); ctx_ddl.set_attribute (' mywordlist ',' Substring_index ',' yes'); end; beginctx_ddl.create_preference ('cnlex', 'chinese_lexer'); END; established index

drop index i_zycontent_table force; create index i_zycontent_table on zycontent_table (blob_content) indextype is ctxsys.context parameters ( 'DATASTORE CTXSYS.DIRECT_DATASTORE FILTER CTXSYS.INSO_FILTER LEXER ZYFILEUP.CNLEX WORDLIST ZYFILEUP.MYWORDLIST');

Fourth, index synchronization and delete synchronous index:

Alter Index i_zycontent_table rebuild parameters ('Resume Memory 20M);

with

EXEC CTX_DDL.SYNC_INDEX ('i_zycontent_table', '20m');

Delete the method of index:

Drop Index i_zycontent_table force;

V. Index function introduction 1. The indexed file type Oracle text can filter and extract content from different format documents, which support more than 150 document formats, the most common MS Office document, PDF documents, etc. can be retrieved by Oracle Text. 2. Filter introduces files such as TXT, HTML, XML, such as txt, html, xml, using an empty filter, NULL_FILTER, for binary, using the INSO_FILTER filter, if the B1 input file and a plain text file are included in the BLOB column of the document table, It is also necessary to use an INSO_FILTER filter, but it is best to separate the plain text and binary files separately store and index. Oracle Text also provides bags to extract text into a plain text file from a binary. 3. The dictionary can customize the dictionary of different languages, the dictionary contains the level of consent words and words. Oracle Text can provide multi-language optimal features to support documentation written across Western languages, Japanese, Korean, Traditional and Simplified Chinese.

6. Text query statement format

Select Name from ZyContent_TableWhere Contains (Blob_Content, Education Informatization ", 1)> 0order by Name

The Contains function provides powerful query features, ",", "or" relationship, similar (NEAR;) and exclude (NOT ~), etc., it is more convenient to find the text of different languages ​​according to a keyword, of course This is to set the dictionary in advance. 7. Problems in practical applications In my practical application, most commonly used Oracle Text indexes for the Chinese documentation in Word, Excel, PowerPoint, HTML, PDF, but I found that no matter whether I use INSO_FILTER or null_filter, no matter With Basic_LEXER or Chinese_LEXER, you cannot retrieve the text documentation of the RTF format. I have been experimenting at Win2000, XP, I haven't been successful, I don't know why. Overall, Oracle Text's text retrieval feature is still quite excellent, and it doesn't even need text files in the database, even if it is stored in the file directory of the operating system, Oracle Text can establish an index in the database.

转载请注明原文地址:https://www.9cbs.com/read-112973.html

New Post(0)