INSIDE MSXML PERFORMANCE
MSXML performance analysis
Chris Lovett Microsoft Corporation
February 21, 2000
Download the Source Code for this article (1.17MB)
Download Source Codes in this article
Contents
MetricsMSXML FeaturesWorking SetMegabytes Per SecondAttributes vs. ElementsFirst DOM Walk Working Set DeltacreateNode OverheadWalk vs. selectSingleNodeSaveNamespacesFree-Threaded DocumentsDelayed Memory CleanupVirtual MemoryIDispatchScriptingThe Dreaded "//" OperatorPrune the Search TreeCross-Threading ModelsConclusion
table of Contents
Metric MSXML Features Workspace BMB bytes Per second Property with Elements First DOM Tree Traverse Causes Workspace Increment Addition Createnode Traversing with SelectsingLenode Save Name Space Free Thread Dedicated Memory Release Virtual Memory Idispatch Script is worrying "//" operator repair query tree cross thread mode summary
I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column IS Called "Extreme XML"! "That Said, This Article Assumes You Are Familiar with XML and The Microsoft XML Parser (MSXML) in particular. See The MSDN XML Developer's Center for more information.
I know from a lot of comments on the Internet, everyone needs more information on entry-level materials and some practical applications of XML. However, this article has been basically a draft and is a premium XML developer (after all, the name of this column is called "limit XML"!). That is to say, the readers in this article should be more familiar with XML and Microsoft XML parsers. To get more information, please refer to the MSDN XML Developer's Center.
So, you're designing your XML-based Web application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there .
Therefore, you may be designing XML-based web applications, and you need to know how the workability of the XML server is. Obviously, this is closely related to your processing. This is difficult to summarize, because there are too many factors that can affect its performance - such as the XML document size, how much the script code used by the processing document, how much output is generated. For Example, Major Variables That CAN Affect The Performance of MSXML INCLUDE:
For example, the factors that mainly affect MSXML performance are:
· The Kind of XML Data
· The Ratio of Tags to Text
· The Ratio of Attributes to Elements
· The Amount of Discarded White Space
· Type of XML data
· Label proportion of text
· Proportion of attributes to elements
· The number of negligible spaces
To Illustrate Some of these Variables, I'll Use Four Sample Data Files. Shown Below Is A Snippet from Each File to show you what Each Looks Like:
To illustrate each factor, 4 sample data files are used here. This is the case where the fragment extracted in these files:
ADO.XML
THIS SAMPLE FILE ISETETENTLY SAMPED ADO RecordSet Object-and is extremely attribute Heavy. Each Attribute Value Is Short, with little Wasted White Space, Making It A Data-Dense Document.
This sample file is permanently saved ADO Recordset object, which is full of properties. Each attribute is very short, there is no space, is a data-intensive document.
Phone = '408 286-2428' address = '22 Cleveland av. # 14 'city =' san jose 'state =' ca ' Zip = '95128' Contract = 'True' name = 'systempes' id = '4' uid = '1' type = 's' userstat = '0' Sysstat = '113' indexdel = '0' schema_ver = '1' refdate = '1900-01-01t00: 00: 00' CRDATE = '1996-04-03T03: 38: 57.387000000' Version = '0' deltrig = '0' INSTRIG = '0' updtrig = '0' Seltrig = '0' category = '0' cache = '0' /> Hamlet.xml This Sample File Consists of Shakespeare's Play "Hamlet." The File Is A Well -balanced Combination of Text and Element Markup, with no attributes. This file contains Shakespeare's script "Hamlet". It consists of text and element tags, without any properties. Speech> Ot.xml ................................. ...CRIPLILE, Hotel. This file contains the entire Old Testament. Each tag is only one to two characters, which reduces the proportion of tags on text. in the beginning, god created the heaven and the Earth. p> v> ... Northwind.xml This Sample File Contains a Portion of The Northwind Database That Ships with Microsoft Access. It Uses Elements INSTEAD OF Attributes, And Has A High Tag-to-Text Ratio, And Has A Lot of Extra White Space. This sample contains a part of the Northwind database included with Microsoft Access. It uses elements instead of attributes, with high tags to text proportions, there are many spaces. Item> ... Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. for some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. to be fair, the best format to use for measuring performance is UCS-2 on Disk So That The Numbers Are More Globally Meaningful. Another major factor is whether the file is encoded in UCS-2 format. Since most XML documents are English, the size of UTF-8 is half of UCS-2, because the Latin character is compressed in UTF-8 to a byte. But it is not the same for other languages. For example, for some Asian languages, UTF-8 is larger than UCS-2 because it extends each character to three bytes in worst cases. For justification, the best format of metrics should be UCS-2, which is more suitable for globalization. The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess Factor, "Which is The Ratio of Element and attribute name character. The following table shows the UCS-2 file size of four sample files, the only number, the number of elements, and attributes, the number of text nodes, and the number of text contents (UNICODE characters). It also shows the scale of the label, indicating the ratio of the elements and attribute names on other characters in the file. Sample sample FILE SIZE file size Unique names unique Elements and attributes elements and properties Text Nodes text node Text Content (Characters) text content (number of characters) Tagginess (percentage) label ratio (%) ADO.XML 2,171,812 53 63,722 61,462 3890 18.7 Hamlet.xml 559, 260 In one 6637 5472 170,545 5.9 Ot.xml 7,663,624 12 71,417 47,302 3,236,900 1.4 Northwind.xml 488,140 12 3680 2761 31,155 6.0 The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain. So Instead of using 407 kb of memory to store, the only name is very interesting because MSXML "Athalation" has the name of the element and attribute, which means that it is for each unique name only a string object, pointing to an element and attribute having the same name. This is important because elements and attribute names are typically repetitive. For example, ADO.XML In the sample file, there is actually 63,722 Elemental and attribute names, occupying the entire file 407,148 byte. The label here exceeds the proportion of files. 18% ! But only these names are 53 A unique name. So don't have to use 407KB The memory is stored, and only a small amount of memory is required.