Scalable framework for formattingstatistical text (attachment code)

zhaozj2021-02-17  68

[Declaration] If you need to copy, spread, please attach this statement, thank you. Original source: http://morningspace.51.net/ ,moyingzz@etang.com This article is a documentation for TextFormator Framework (formerly "TextFormator Framework Introducture), the source code is available here.

Germination Writing the original cause of TextFormator and the following two things:

At the beginning of the year, I mentioned a problem in the 9CBS forum. The content is to ask where there is a good "code statistical analysis tool", and I didn't get any response. Some time ago, a colleague showed me a gadget in his spare time, which generated a coloring HTML text based on a C program. Since then, I have been trying to find a common solution to meet a variety of application requirements including the above mentioned situation. Fortunately, there is a relatively idle time, so that I have to put this idea into the implementation. I used a week of time to complete all work from design to encoding to add document comments, and of course, I also included writing this article ^^, which made me re-college the creative passion. Remember that it is still a big three, I have spent 3 days, using the assembly language to write a "Student File Manager" that can support the hierarchical menu operation in text state. Perhaps just a moment of interest, perhaps this simple framework does not necessarily have considerable practical value, may have a good code statistical analysis tool. However, writing this framework, for me, is an OO design idea, practicing STL's good experience. And, as the purpose of the Open Source released, it is also "throwing bricks", hoping with colleagues who are interested, can continue to develop and improve on this, which has greater flexibility than ready-made tools. And targeted. For the frame, if you have any good suggestions, ideas or problems, please contact me: Moying@etang.com

This is an Open Source's scalable application framework that you can expand to the frame code to meet different application needs. TextFormator is suitable for processing (such as: C / C , Java, Pascal, MASM) for multiple programming languages, and also supports plain text processing, which can be processed as follows:

Any form of formatting output, for example: generating a colored HTML text, a stacked typeset, deleting a different form of code statistical analysis, such as: keyword lookup, note line statistics, function statistics

The code is published in the Open Source method, and uses OO design ideas. Strive to have considerable flexibility and scalability based on C STL, and strive to have considerable portability.

The original text reads memory through the character stream, and parses it by row. The results of the resolved result stream will be used as the input stream of formatting / statistical processing. After one or more formatted characters will be output to a file or standard output device, and after statistics, the statistics are not output, and the corresponding output is some statistical results. The illustration is as follows:

Class diagram>

At present, this framework is not very complicated. Despite this, some work is still under design, this can be seen from the "Detailed Description" section later. The following is a summary of the class diagram of the frame. It can be seen from it, the overall structure of the framework uses an organization mode similar to Strategy Pattern:

Core section

PARSEHANDLER

Abstract class, the Handler used in all parsed processes should be derived from this class, which defines the basic behavior of such Handler (s). Mainly an Accept method, the function is to pass the text in which the text is currently in, starting from the current location, parsing in the specified manner. What kind of ways are used, it is Subclass's things. If the resolution is successful, set the result in a tokeninfo structure, and then point to the next paragraph of the current row, and return true; otherwise returns false to inform the frame to hand over the control to the next handler. As for how to write the actual Handler, reference may extend part of the various classes: StringParseHandler, NumberParseHandler, OperatorParseHandler, IdentifierParseHandler, WhitespaceParseHandler, CommentParseHandler Further, ParseHandler also introduces the concept of priority, the specific implementation is somewhat similar to the java thread priority. The reason why this is based on the actual situation: If a code is analyzed, C / Java as an example, for the Handler of the annotation of the annotation, the priority of the former priority is obviously over the latter, because when the logo When you are in the comment, you should still be treated as an annotation process; in addition to the Handler of the parsing string, its priority is the same as the Handler that is annotated, because when the comment is in the character string, it should still It should be used as a string process, and when the string is within the comment, an annotation process should be made. In practical applications, general Handler uses MAX_PRIORITY to use MAX_PRIORITY for Handler that handles similar notes and strings. In addition, the framework also predefines a special defaultParseHandler, which is to replace when all other Handler parses the current string, and its priority is defined as Min_Priority. Do not use Min_Priority priority when derived your own handler. DefaultParseHandler's call chance is generally very small, and its handling of strings is also extremely "rude", its call tends to mean that the Handler (s) of the parsing existing text you have is not complete enough.

Lineparser

Parallel the text on line by word, and you can get the parsing result through the GetToKensInfolist method after the parsing is completed. You can set a variety of specific Parse Handler in the Run-Time phase via the RegistParsehandler method and the UnReigistParsehandler method to set up a variety of specific Parse Handler to adjust the behavior of the LineParser at a timely adjustment. For example, you can customize the LineParser of the original parsing C program into the LineParser for parsing the assembler. In fact, it is the same PARSER object before and after. LineParser will automatically generate a defaultParseHandler during the creation. Different Handler parsed from high to low in order during the parsing process until you find a handler that handles the current string, if all Handler You can't parse the current string, then the DEFAULTPARSEHANDLER is fully handled. Because each handler parsed, the current position "pointer" will be modified to point to the next paragraph of the Bank, and when the handler cannot process the current string, the position "pointer" is not modified, so DefaultParsehandler Definition is not redundant, and its appearance is to avoid potential death cycles, and make the parsing process can be done until the effective parsing information is obtained, even if there is a "undintered" result of DefaultParsehandler. . Formathandler

Abstract class, all the Handler used during all formatting outputs should be born from this class, which defines the basic behavior of such Handler (s). Mainly an Format method, the function is to pass the parsing of the parsed in which the strings are processed in a particular manner. As for the specific way, it is Subclass's thing. The processed results directly affect the strings that passed therein, so when multiple Handler processes the same string, pay attention to the predecessor sequence, different order may result. As an attached function of Formathandler, through the cooperation with the LineFormator, you can implement some code / text analysis statistics. In fact, you only need to make any modification actions in the Handler's string, and only analyze its content, you can implement some very valuable features. For example: the annotation line in the statistical code, statistical frequency, statistical functions, and frequent frequency of the long function, and so on. The results of the statistics can be placed in their respective Handler, the count :: KeywordCountHandler, Count :: CommentCountHandler, Count :: CommentCountHandler, and count.cpp demonstrate some simple statistics.

LineFormator

Format the text-by-line parallelism after parsing, and the processing result can be obtained by getFormattedLines method after processing. You can use the Registformathandler method and the unreigistformathandler method to dynamically set various specific Format Handler to adjust the behavior of the LineFormator in a timely adjustment of the LineFormator. For example, you can customize the LINEFormator that is originally processed by the C program to generate the LineFormator that generates HTML output. In fact, it is the same Formator object. Unlike the LineParser, LineFormator allows multiple format handler (s) to be registered with the strings after the same class, that is, you can process multiple times of the same string, of course, it will affect the original string after each processing. The face. For example, for the comment, you can register a indent :: NormalFormathandler for lineFormator, then register a htmlize :: bodyformathandler, this result is that the original text produces HTML text after transitioning, you can add any more Handler, any combination of these Handler, as long as this added and combination has practical significance. This feature has brought convenience and flexibility to practical applications. It should be noted that the order of the frame calls Handler (s) is consistent with the order you registered. EXTEND section htmlize :: bodyathandler, indent :: WHITESPACATHANDAL, INDENT :: WhitursFormatLer, Indenter, and htmilze.cpp, indent.cpp, indenthtmilze.cpp demonstrates the content described above. Helper part

Session

The session class design imitates the session function in the ASP / JSP, of course, is just like (mainly the functionality is similar), but it is far less complicated. The main role of this class is to provide convenience to each other between Handler, including the Handler and formatted output during the parsing process. Inside its inside holds a MAP container, you can insert the value corresponding to the specified key to the Map through the SET method, and then get the value through the GET method. This design is to mitigate the burden of the frame code. The framework does not have to be specific to the information transfer between Handler; it also brings convenience for future extensions, you can use the session in your own Subclass Handler to deal with specific applications .

Context

Represents the context environment in the parsing process and formatting the output. The frame code will incorporate information such as the current processing line number, the current processing string (after parsing), current string position, and other information to prepare Subclass Handler (s). The information contained in the future version may be expanded. Context holds a session member inside. In fact, although previously mentioned that each subclass handler passed information through the session object, but the handler could not access the session directly, but the actual session object is required to get the actual session object through the context. In fact, in a sense, SESSION is also a context. However, the actual meaning of this context is unpredictable in the frame code, and this point is different from other members in Context. That is, Context is a semi-open object, and some clear stable information should not be placed inside the SESSION in the range of the framework force, and they can act as a fixed member of Context, sitting with the session, because of time Include, this information will not change, so it will not be treated as a Hard Code. As for volatile information that the remaining frames cannot be determined, it is appropriate to handle the session. FILEHELPER

A secondary class for a file operation provides a function of file read and write, and it can be used in other places of the frame, and it can be used in a frame-based specific application.

Filefinder

A auxiliary class for a file operation provides a function of traversing a specified directory and a specific file in its subdirectory, where it can be used in other places of the framework and in the framework. However, this type is currently only used on the Windows platform, and the macro __windows__ must be defined before use, and the portability.h file can be seen.

Extend part

Currently, here is included in the following sections:

Several parsing processing classes derived from Parsehandler, including:

Parsing a string StringParseHandler All Digital NumberParseHandler resolution operator OperatorParseHandler resolvable identifier parsing and key blanks and tabs IdentifierParseHandler WhitespaceParseHandler resolve formatting from a plurality of comments CommentParseHandler FormatHandler derived class, comprising:

Html formatted output text Htmlize :: BodyOutputHandler indent typesetting Indent :: NormalOutputHandler, Indent :: WhitespaceOutputHandler, Indent :: OperatorOutputHandler statistics keywords and comment lines Count :: KeywordCountHandler, Count :: CommentCountHandler

<<< List>

Description: Click the link to view the corresponding source code file, the file is HTML text after HTMLize processing.

GeneralDefine.h

The type definition of the global type is defined portability.h contains some macro definitions used in the platform transplant LineParser.h text parsing part (Line Parser) .CPP file LineFormator .h text formatic component (Line Formator) of the .CPP file Parsehandler.h parsing the abstract class definition of the Parse Handler, a predefined manner. Default parsing processor ConcreteParseHandlers.h defines the definition of the processor and implementation of the Formathandler.h formatting processor (format handler) defines the definition and implementation of the derived class of HTMLFormathandlers.h Format Handler, support HTML formatting Output IndentformatHandlers.h format handler defines and implementation, supports the definition and implementation of the derived class of the PRT Handler, support statistical function context.h defines the context background in the parsing and formatting the output. .h is used to assist the auxiliary class FileHelper.h and file operations related to file operations between the class fileHelper.h and file operations related to the information of the Handler (s). confed. Related secondary .h file filefinder.cpp and file operation related secondary .CPP file htmlize.cpp utilized the framework, the source code is used to format the output of the presentation program Indenter, the source code division The typographic demo is indenthtmlize.cpp utilizes a framework, and then the presentation of the source code division and the presentation of the presentation of the HTML formatted output. COUNT.CPP utilizes the framework, and the source code will perform some simple statistics for some simple statistics (statistics Void and FOR. The number of times, the number of occurrences of statistical notes) Batch.cpp demonstrates the use of FileFinder, using the rest of the presentation, can implement bulk file processing

Regarding the portability I have tested the frame code and sample program on MSVC and G , in MSVC, PJ STL and STLPORT have been tested separately, and some errors have been made, please see the source code. Attached changelog. Among them, the version of the MSVC command line compiler is 12.00.8168 for 80x86, the version G command line compiler is EGCS-2.91.57 19980901 (EGCS-1.1 Release), the version of Stlport is 4.5 Release. So far, the test under the above platforms is successful. The contradiction of efficiency and flexibility This frame is not optimal, but compared to the flexibility and scalability of the frame, I have chosen after carefully weighing the contradiction. the latter. Not very satisfying in the current framework and several places that can be improved

When LineFormator is used as a statistical analysis, some interfaces and code are redundant, although this does not affect the use. The existing framework is implemented by defining the derived class of ParseHandler when parsing the specific text. The side effects of this are to support the resolution of new types of text, and the source code needs to be changed to add new Parsehandler's derived classes. Another feasible solution is to define a template file that will put information related to the parsing new type text into it. This way, you do not need to change the source code, just modify the template file. However, doing this requires reasonable extraction of the common characteristics of various programming languages ​​to form template, the requirements for the frame code, also improved accordingly. Some of the suggested frames for the framework are not the GUI interface, the presentation routines are only running in the command line state, in the future extension, I hope to join the GUI feature, but as an extension, this should not belong to The scope of the framework. Namespace naming suggestions on expanding the framework:

For the extension of the Parse Handler above the framework, if you apply to a variety of programming languages, use the name of the textFormator :: xxx, XXX represents the specific Parse Handler (such as: textFormator :: StringParseHandler). If only applicable to a programming language, use the name of the textFormator :: language_name :: xxx, language_name represents language name (such as: textFormator :: Pascal :: CommentParseHandler) For some specific applications, if the demand is relatively stable It can be considered to join the extend portion of the TextFormator, and use the name of the textFormator :: util_name :: xxx, util_name represents the app name (such as: textformator :: htmlize :: bodyMathandler)

--Morning -

转载请注明原文地址:https://www.9cbs.com/read-29142.html

New Post(0)