RTF file parser

xiaoxiao2021-03-06  91

(1) Description This converter is to convert the RTF file stream into a segment in memory, a logical structure.

(2) RTF format Description Reference is Microsoft's RTF Format Description (3) Code Description I. First is the defined logical structure, provide the following interface to define Class iParser; Class iParagraphs; Class iParagraph; Class Isentes; class; ISentence; class IParaStyle; class IParser {/ * the method of rtf parser * / public:. / * parse the rtf stream and create format text char * stream - (in) the rtf stream buffer return error code if need * / virtual int Parse (char * stream) = 0; / * Release Object * / Virtual void release () = 0; / * The Properties Of Parser * / Public: / * Get Paragraphs of RTF * / Virtual iParaGraphs * Get_Paragraphs () = 0; / * GET FONTS Count * / Virtual INT GET_FNTCOUNT () = 0; / * GET FONT * / VIRTUAL FONTSTYLE * GET_FNT (INT NINDEX) = 0; / * Get Color Count * / Virtual INT GET_CLRCOUNT () = 0; / * Get color * / virtual ColorStyle get_color (int nIndex) = 0;}; class IParagraphs {/ * the methods of paragraphs * / public: / * add color into tables * / virtual void add (IParagraph * paragraph) = 0; / * release Object * / virtual void release () = 0; / * Copy Object * / Virtual iParagraphs * Co Py () = 0; / * the property of paragraphs * / public: / * the paragraph count * / virtual int count () = 0; / * the paragraph item * / virtual iparagramraph * item (int nindex) = 0; / * the top item * / virtual IParagraph * top () = 0; / * the header item * / virtual IParagraph * header () = 0;}; class IParagraph {/ * the methods of paragraph * / public: / * release object * / virtual void release () = 0; / * the properties of paragraph * / public: / * get paragraph sentences * / virtual ISentences * get_sentences () = 0; / * get paragraph style * / virtual IParaStyle * get_style () = };

class ISentences {/ * the methods of sentences * / public: / * add sentence into tables * / virtual void add (ISentence * sentence) = 0; / * release object * / virtual void release () = 0; / * the properties Of paragraphs * / public: / * The sentence count * / virtual int count () = 0; / * The sentence item * / virtual iSentence * Item (int nindex) = 0;}; class isencept {/ * the methods of sendnce * / public: / * release object * / virtual void release () = 0; / * the properties of paragraphs * / public: / * content property * / virtual char * get_content () = 0; virtual void set_content (char * content ) = 0; / * SENTENCE * GET_STYLE () = 0; Virtual void set_style (iSentStyle * style) = 0; / * SENTENCE SIZE * / VIRTUAL SIZE & GET_SIZE () = 0;}; ii. Implementation Explanation, the implementation of the IPARSER interface is XParser, and the implementation is mainly used in several data structures: StackMSTK This stack is mainly to handle {} pairing, when the stack is empty, indicating stream analysis Complete Stack

MSTATUS This stack indicates what the current {} attribute is an example {/ fonttbl {/ f0 / f1 / fputq2 / fcharset134 Times new Roman;}}, when you see the first {time pressure into the attribute fonttbl, when you encounter Two {Time Pressure into the Attribute F, and then pop up in the stack until} will then pop up, by judging the top of the stack to determine the currently processed properties.

Iii. The main code description, the most important thing is the code parsing part of the code, mainly for this code Int XParser :: PARSE (Char * stream) {char * psz = stream; / * Parse the RTF Stream * / while (psz) {switch (* psz) {copy '{': {/ * push the '{' INTO Stack and pop it find '}' * / mstk.push (* psz); PSZ ; / * get main Key Word and Push IT ITO Stack * / ErrorCode Nerrnum; Long Noff = 0; IF ((Nerrnum = (ErrorCode) Parsestatus (PSZ, Noff))! = EC_OK) Return Nerrnum; PSZ = Noff; / * Move Stream Pointer * / Break;} case '}': {/ * if stack overflow donothing * / if (! mstk.size ()) Return EC_STACKOVERFLOW; / * pair '{' and pop it * / mstk.pop (); / * here Trace debug info * / tracedebug (); / * Pop status * / mstatus.pop (); / * Parse is finished * / if (! mstk.size ()) Return EC_OK; / * Move Stream Pointer * / PSZ ; Break } case '//': {IF (ISBREAK (* psz, * (psz 1))) {ErrorCode Nerrnum; long noff = 0; if ((Nerrnum = (ErrorCode) ParseProperty (PSZ, Noff))! = EC_OK) Return Nerrnum; PSZ = Noff;} else {ErrorCode Nerrnum; long noff = 0; IF ((Nerrnum = (ErrorCode) PARSEDEF AULT ​​(PSZ, NOFF))! = EC_OK) Return Nerrnum; PSZ = Noff;} Break;} default: {ErrorCode Nerrnum; long noff = 0; if ((Nerrnum = (ErrorCode) PARSEDEFAULT (PSZ, Noff))! = EC_OK) RETURN NERRNUM; PSZ = Noff; Break;}}} Return EC_OK;} These characters have special characters in the RTF because there is / {} here because of the special characters in the {} in the text content, so when I encountered When you want to look at a character to determine whether it is an attribute string, this is IsBreak (* psz, * (psz 1)) This sentence must be done, if it is when it is attribute The flow is processed, and if it is not a synthesis string to process.

转载请注明原文地址:https://www.9cbs.com/read-107699.html

New Post(0)