Some fun of sax

zhaozj2021-02-08  232

Chris Lovett August 21, 2000

View and download the source code of this article

Now I know the secret that caused a lot of readers. It is enough to write an article about the latest technologies that readers can't get in touch. Now, since it provides the technology overview of the .NET Framework SDK (English), I hope that you have already understood it.

In this article, I would like to take a closer study of the Visual Basic® SAX interface included in the Microsoft XML Parser beta (English) in July 2000. In order to make a change, I decided to write some Visual Basic code, and finally I wrote a lot of code. I also realized MSXml2.vbsaxxmlReader30 class. The application results are as follows:

OASIS consistency testing

I started from writing a Visual Basic test to run the OASIS Consistency Test Suite (English). Another project I do in doing this test, so I think I will be two.

This test is used to load a large XML document, which is the index of all tests to run. Each test is listed below:

Attribute Values ​​Must Start with Attribute Names, Not "?".

Each Test element contains the following properties:

Attribute value enlightenment-wf This time you can think that the analyzer will report a format clear error message invalid At this time, you can think that the verification analyzer will report to verify the error message, not the validated analyzer, will make these tests via Valid. It is considered that verification analyzer and non-verified analyzer will make these tests via Entitiesnone regardless of whether the test is required to support loading entity IDNOT-WF-SA-001 unique test identifier URINOT-WF / SA / 001.XML to analyze the actual XML to be analyzed. Test file location sections3.1 [41] References to related parts in XML 1.0 specification (English)

Oasistest.cls enters the primary entry of the OasisTest class module to get a URL of the XMLConf.xml main index document: public subun (TestURL AS String)

When calling it, I load this test index into a DomDocument object, select all Test elements, and then call my SAX test code, which is related to each test. DIM DOC As DomDocument

Dim node as ixmldomelement

DIM Tests as ixmldomnodelist

Set doc = new domdocument30

Doc.async = false

Set tests = doc.selectnodes ("// Test")

Set node = tests.nextNode ()

While not node is nothing and not ca Zan

Runtest (Node)

Node = tests.nextnode ()

Wend

I also created an empty document, which will contain a log of all test results. This document will be converted to the final test report using the Template.xsl style sheet to display the final test report when the user clicks to generate a report button. In order to actually run this test with msxml2.vbsaxxmlreader object, I use the following method: Public Sub Runtest (DOCBASE AS STRING, Element as ixmldomelement)

First, this method creates a new SAX reader object and configures the object to handle external entities and callback I implemented IVBSAXCONTENTHANDLER, IVBSAXDTDHANDLER, and IVBSAXERRORHANDLER. DIM ContentHandler As ContentHandlerset ContentHandler = New ContentHandler

Dim Reader as VbsaxxmlReader30

Set Reader = New VBSAXXMLREADER30

Reader.putfeature "http://xml.org/sax/features/external-Parameter-entities", TRUE

Reader.putfeature "http://xml.org/sax/features/external-general-entities", TRUE

Set Reader.ContentHandler = ContentHandler

Set reader.Errorhandler = ContentHandler

Set reader.dtdHandler = ContentHandler

Please note that all three handler interfaces actually be actually quite convenient on the three handler interfaces on the same class (ie the ContentHandler class). In order to start the actual analysis of the test file, I only need to call the Parseurl method. Reader.Parseurl (URI)

Then check the result, compare between actual output and expected output results, and so on. ContentHandler.cls ContentHandler class module implements SAX callback interface, the beginning of the code is as follows: Option Explicit

Implements IvbsaxContentHandler

Implements Ivbsaxdtdhandler

Implements Ivbsaxerrorhandler

The ContentHandler then implements all methods defined on these interfaces. A large number of code processing in this class How to generate an XML specification output, which can be used to compare with the desired output after output. The output includes the following:

Call all special characters as entities: Special characters include & <> and put the wrap calls as and. This is the storage method of the output file desired in the test suite, so I must also complete this task. To attribute Sort: Due to the order of attributes and the order, and some analyzers return to the default attributes are different from other analyzers, sorting the order in which the attributes can match the desired output file. To do this, I use it on MSDN Found Visual Basic Quicksort Algorithm. See Sorter.cls Module. Save Sign Declaration: Symbol Declaring from IVBSAXDTDHANDLER interface. You need to save them, then sort it to compare the desired output. Symbol storage and sort in the DOCTYPE.CLS module Completed. Capture and store the error message: This is done in the implementation of the IvbsaxErrorhandler Fatalerror method. XML statistics package uses another interesting thing to do with SAX-level XML to count the elements and properties. I have written another A simple IVBSAXCONTENTHANDLER implementation counts the number of elements, attributes, text nodes, text characters, and name characters - and displays a "tag rate", indicating how much tags relative to the actual text content in the file. As you As seen below, the tag of the Hamlet.xml file is quite. When you do this, you will feel how fast the SAX level processing can be used. Compare the DomDocument object model, and the tree structure is calculated Everything, it must be much faster. Filter.CLS This type of module adds 1 to a set of counters according to the call to implement IVBSAXContentLerRorHandler SAX callback interface. For example, the StarTelement method completes the following tasks: Private sub ivbsaxcontenthandler_startElement Byval Strnamespaceuri As String,

Byval strlocalname as string,

Byval strqname as string,

Byval attributes as ivbsaxattributes)

DIM I as integer

Elements = Elements 1

Namechars = namechars len (strqname)

AttributeNodes = AttributeNodes Attributes.Length

For i = 0 to attributes.length - 1

Namechars = namechars len (attributes.getqname (i))

Textchars = TextChars Len (Attributes.getValue (i))

NEXT

End Sub

The XML filtering is written in the implementation of the code for the filter IVBSAXContentHandler, I found that the addition of filtering does not increase how much workload. The filter tab contains the following options: Here I have selected the option to load Hamlet.xml and convert all element names and property names to the correct case; for example,

Will become

and many more.

Format SAX-level processing also allows you to format XML documents by adding new rows and adding indentation based on nested levels. You can control the amount of indentation and control the use of space characters when indentation is used. When it is formatted to "indent", the following input:

Chris

Lovett

It will become:

Chris

Lovett

The working principle of the indentation algorithm is to keep an integer stack that represents the "content" model of each level of document. Possible values ​​are: const content_empty = 0const content_Mixed = 1

Const Content_Element = 2

The content model of the new element is treated as Content_empty. When calling the IVBSAXContentHandler_Characters method, the content model of the current element is set to content_mixed. If the content is not mixed when a child element is started, the content will become Content_Element. I have not fully tested this code, so I suggest not use it for large load applications in the industry. However, in most of the time it looks quite good. You can also add a lot of things on it. For example, you will pay attention to the empty element

Be output

. This is just because I am too lazy, so that ">" characters are written until the endelement event appears. This makes the code a bit messy because you must keep in mind that ">" characters must be written before writing any text content or child elements. Attributes of the element Finally, this small check box causes all attributes to be written as a child element. For example, when this check box is selected, the following XML:

Address = '67 seventh av. 'city =' salt lake city 'state =' ut 'zip =' 84152 'contract =' true '' />

It will become (when indentation also works):

998-72-3567

Ringer

Albert

801 826-0752

67 sevesth av.

Salt Lake City

UT

84152

True

This is simply completed by the following code: if (attrstoelements) THEN

For i = 0 to attributes.length - 1

Name = attributes.getqname (i)

IF (Name <> "XMLns" and MID (Name, 1, 6) <> "XMLns:") THEN

Content (level) = content_element

Call WriteInd (Level, Content (Level))

OutputStream.write ("<" & filtername (name) & ")

OutputStream.write (esccontent (Attributes.getValue (i)))

OutputStream.write ("

")

END IF

NEXT

END IF

Conclusion The Visual Basic SAX interface (English) included in the Microsoft XML Parser beta (English) in July 2000 makes it quite easy to write high-performance, streamline XML processing applications. I spent about a day, I wrote these small samples and got a lot of fun. I hope that you will also enjoy happiness using SAX.

Chris Lovett is the project manager of Microsoft XML team. Early XML column of archiving

转载请注明原文地址:https://www.9cbs.com/read-1443.html

New Post(0)