Inside Msxml Performance (MSXML Performance Analysis) (3)

zhaozj2021-02-17  58

MSXML Features

MSXML Features

Next, Let's Examine Some Important Scenarios Associated with The Document Object Model (DOM) -InCluding Loading, Saving, Walking A Dom Tree, And Creating a New Dom Tree In Memory.

Next, let us discuss some scenes in the Document Object Model (DOM), including loading, saving, traversing the DOM tree and creates a new DOM tree in memory.

DOM

The MSXML Document Object Model ( "Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument) is the starting point for all XML processing within the MSXML parser. The fastest way to load an XML document is to use the default "rental" threading model (which means THE DOM DOCUMENT CANBE Used by Only One Thread At A Time; It Doesn't Matter Which Thread) with validateonparse, resolveexternals, and preserveWhitespace all disabled:

MSXML Document Object Model ("Microsoft.xmldom," CLSID_DOMDOCUMENT, IID_IXMLDOMDOCUMENT) is the starting point of all processing XML processes in the MSXML parser. The fastest way to load an XML document is to use the default "lease" thread mode (this means that the DOM document is only one thread "; but it doesn't mind which thread is used), you must use ValidateonParse, Resolveexternals. And the properties of PreserveWhitespace are set to False:

VAR DOC = New ActiveXObject ("Microsoft.xmldom");

Doc.validateonparse = false;

Doc.resolveexternals = false;

Doc.preservewhitespace = false;

Doc.Load ("Test.xml");

Working set

Work set

When using the DOM, the first metric to consider is the working set. Memory is used to load Msxml.dll and the other .dll files on which it depends. Some of these other .dll files are "delay loaded," which means the working set will not be affected until that .dll is used. MSXML is a COM DLL, so you typically use the standard COM APIs (CoInitialize and CoCreateInstance) to create a new XML document object. The minimum working set for a simple Visual C 6.0 command line application that uses COM is about one megabyte (This includes the following .dll files:. Ntdll.dll, Kernel32.dll, Ole32.dll, Rpcrt4.dll, Advapi32.dll, Gdi.dll, User32.dll, and oleaut32.dll.) The first call to CoCreateInstance of an IXMLDOMDocument object loads Msxml.dll and Shlwapi.dll, which adds another 745 KB on top of this. Once all the .dll files are loaded, a new IXMLDOMDocument object is only about 8 KB. When using the DOM, the measurement indicator to consider is a work set. MSXml.dll and other must-have DLL files are loaded. These DLL files are latency, that is, they do not affect the work set before they are not used. MSXML is a COM DLL, so you usually use standard COM API (Coinitialize and CocreateInstance) to create a new XML document object. For a simple Visual C 6.0 command line application using COM, the minimum work set is about 1 megabyte. (This includes the following DLL files: NTDLL.DLL, KERNEL32.DLL, OLE32.DLL, RPCRT4.DLL, Advapi32.dll, gdi.dll, user32.dll, and oleaut32.dll.) Load CocreateInstance first calls CocreateInstance Msxml.dll and shlwaip.dll increased by 745kb on the basis of the previous basis. Once the DLL file is loaded, the newly built IXMLDocument object only needs 8KB.

The memory used by the XML data loaded into an XML document is anywhere from one to four times the size of the XML file on disk, depending on the "tagginess" of the data being loaded and whether the file was already in a Unicode format on Disk. The Following IS A Very Rough Formula For Estimating The Memory Required for a Given XML Document: The size of XML data in memory may be one to four times the XML file on the disk, which depends on the "label specific gravity" loaded. Whether it is already a Unicode encoding format on the disk. The following is a rough formula that estimates the memory space size required for a given XML document:

WS = 32 (N T) 12T 50U 2W;

The Following Table Describes The Parts of The Formula:

The following table describes the various parts in the formula:

Part project

Description description

WS

The size of the work set (unit is byte)

n

The number of element and attribute nodes in the tree. Each element, attribute, attribute value, and text content has one node (for example, text = four nodes). In the element tree And the number of attribute nodes. Each element, attribute, attribute value, and text content have a node (for example, text total four nodes)

t

The number of text nodes. The number of text nodes

U

The Number of Unique Element and Attribute Names. The only name of the element and attribute.

w

The number of Unicode characters in text content (including attribute values). Note that loading single-byte ANSI text into memory results in twice the number, because all text is stored as Unicode characters, which are two bytes each. Text Unicode characters in The quantity (including attribute values). Note that you will take twice the space size of twice the single-byte ansi text into memory because they store two bytes in Unicode characters, each character.

This assumes you do not set the preserveWhitespace flag; When you do, more nodes are created to preserve the white space between elements, using more memory.

The above formula is based on the case where the PreserveWhitespace flag is not set; when you set the flag, you will create more nodes to retain space between elements, which takes up more memory space. For The Sample Data Above, We See The Following Working Set Numbers (Not Including The Initial Startup Working Set):

For the aforementioned sample file, the following table shows the desired workspace size (not included in the workspace when working space initialization):

Sample sample

Working set workspace

Ratio to File Size ratio ratio with disk file size

ADO.XML

4,689,920

2.6

Hamlet.xml

704, 512

1.25

Ot.xml

10,720,000

1.39

Northwind.xml

249, 856

0.51

An element-heavy XML document containing a lot of white space between elements and stored in Unicode can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content, such as Hamlet.xml and Ot.xml , END UP AT ABOUT 1.25 TO 1.5 The UCS-2 File Size WHEN IN MEMORY. Files That Are Very Data-Dense, Such as ado.xml, End Up More Than Twice The Disk-File Size When Loaded Into Memory.

One element is much larger, there are many spaces between each element and the XML document stored in Unicode format may be less space required to store spaces than on the disk. Documents, such as Hamlet.xml and Ot.xml, which may be spaced in memory with a space ratio of 1.25 to 1.5. And those data-intensive documents, like ADO.XML, the memory space occupied may be twice or more of the size on the disk.

Megabytes per second

100g biops

For the Megabytes-Per-Second Metric, I loaded EACH SAMPLE FILE 10 TIMES IN A LOOP ON A Pentium II 450-MHz Dual-Processor Computer Running Windows 2000, Measured The Load Times, And Averaged The Results.

For the metric metric per second, I measure the load time by the following trial: On the Pentium II 450-MHz dual processor, on the computer running Windows 2000, load each sample file cycle 10 times, get it Load time and average, the result is as follows:

Sample sample

Load Time (MilliseConds) load time (unit: millisecond)

MB / SecondMB / sec

Nodes / Second Node / Second

ADO.XML

677

3.2

184, 909

Hamlet.xml

104

5.3

116, 432

Ot.xml

1063

7.2

111,682

Northwind.xml

62

7.8

103,887

Also shown in this table is a measure of nodes per second. Notice how this correlates with megabytes per second. The more nodes processed per buffer of input data, the slower the absolute throughput. Conversely, the more compact the nodes are (as in Ado .xml), The higher the nodes per second. The test results of node / seconds are also displayed in the table above. Please note that it is related to the relationship between 100 megaphoes. The more nodes in the buffer of each input data, the less the absolute amount of the output. Instead, the more compact nodes (like ADO.xml), the more nodes processed per second.

Attributes vs. Elements attributes and elements

You could conclude from this that attribute-heavy formats (such as that of Ado.xml) deliver more data per second than element-heavy formats. But this should not be the reason for you to switch everything to attributes. There are many other factors To consider in the decision to use attributes versus elements.

You can get conclusions from above: the property is large, like Ado.xml), is greater than the amount of data per second than the elementality of the element. But this is not to express all things to express all things. There are many other factors to consider when considering the use of elements or attributes.

转载请注明原文地址:https://www.9cbs.com/read-29307.html

New Post(0)