Inside msxml Performance (MSXML performance analysis)

zhaozj2021-02-17 190

INSIDE MSXML PERFORMANCE

MSXML performance analysis

Chris Lovett Microsoft Corporation

February 21, 2000

Download the Source Code for this article (1.17MB)

Download Source Codes in this article

Contents

MetricsMSXML FeaturesWorking SetMegabytes Per SecondAttributes vs. ElementsFirst DOM Walk Working Set DeltacreateNode OverheadWalk vs. selectSingleNodeSaveNamespacesFree-Threaded DocumentsDelayed Memory CleanupVirtual MemoryIDispatchScriptingThe Dreaded "//" OperatorPrune the Search TreeCross-Threading ModelsConclusion

table of Contents

Metric MSXML Features Workspace BMB bytes Per second Property with Elements First DOM Tree Traverse Causes Workspace Increment Addition Createnode Traversing with SelectsingLenode Save Name Space Free Thread Dedicated Memory Release Virtual Memory Idispatch Script is worrying "//" operator repair query tree cross thread mode summary

I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column IS Called "Extreme XML"! "That Said, This Article Assumes You Are Familiar with XML and The Microsoft XML Parser (MSXML) in particular. See The MSDN XML Developer's Center for more information.

I know from a lot of comments on the Internet, everyone needs more information on entry-level materials and some practical applications of XML. However, this article has been basically a draft and is a premium XML developer (after all, the name of this column is called "limit XML"!). That is to say, the readers in this article should be more familiar with XML and Microsoft XML parsers. To get more information, please refer to the MSDN XML Developer's Center.

So, you're designing your XML-based Web application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there .

Therefore, you may be designing XML-based web applications, and you need to know how the workability of the XML server is. Obviously, this is closely related to your processing. This is difficult to summarize, because there are too many factors that can affect its performance - such as the XML document size, how much the script code used by the processing document, how much output is generated. For Example, Major Variables That CAN Affect The Performance of MSXML INCLUDE:

For example, the factors that mainly affect MSXML performance are:

· The Kind of XML Data

· The Ratio of Tags to Text

· The Ratio of Attributes to Elements

· The Amount of Discarded White Space

· Type of XML data

· Label proportion of text

· Proportion of attributes to elements

· The number of negligible spaces

To Illustrate Some of these Variables, I'll Use Four Sample Data Files. Shown Below Is A Snippet from Each File to show you what Each Looks Like:

To illustrate each factor, 4 sample data files are used here. This is the case where the fragment extracted in these files:

ADO.XML

THIS SAMPLE FILE ISETETENTLY SAMPED ADO RecordSet Object-and is extremely attribute Heavy. Each Attribute Value Is Short, with little Wasted White Space, Making It A Data-Dense Document.

This sample file is permanently saved ADO Recordset object, which is full of properties. Each attribute is very short, there is no space, is a data-intensive document.

Phone = '408 286-2428' address = '22 Cleveland av. # 14 'city =' san jose 'state =' ca '

Zip = '95128' Contract = 'True' name = 'systempes' id = '4' uid = '1' type = 's' userstat = '0'

Sysstat = '113' indexdel = '0' schema_ver = '1' refdate = '1900-01-01t00: 00: 00'

CRDATE = '1996-04-03T03: 38: 57.387000000' Version = '0' deltrig = '0' INSTRIG = '0' updtrig = '0' Seltrig = '0' category = '0' cache = '0' />

Hamlet.xml

This Sample File Consists of Shakespeare's Play "Hamlet." The File Is A Well -balanced Combination of Text and Element Markup, with no attributes.

This file contains Shakespeare's script "Hamlet". It consists of text and element tags, without any properties.

Scene I. Elsinore. A Platform Before The Castle. </ Title></p> <p><Stagedir> Francisco at his post. Enter to him bernardo </ stagedir></p> <p><Speech></p> <p><Speaker> Bernardo </ Speaker></p> <p><LINE> WHO's there? </ Line></p> <p></ Speech></p> <p>Ot.xml</p> <p>................................. ...CRIPLILE, Hotel.</p> <p>This file contains the entire Old Testament. Each tag is only one to two characters, which reduces the proportion of tags on text.</p> <p><book></p> <p><bktlong> The first book of moses, caled getsis. </ bktlong></p> <p><bucktshort> genesis </ bktshort></p> <p><chapter> <chtitle> Chapter 1 </ chtitle></p> <p><v> <vn> 1 </ vn> <p> in the beginning, god created the heaven and the Earth. </ p> </ v></p> <p>...</p> <p>Northwind.xml</p> <p>This Sample File Contains a Portion of The Northwind Database That Ships with Microsoft Access. It Uses Elements INSTEAD OF Attributes, And Has A High Tag-to-Text Ratio, And Has A Lot of Extra White Space.</p> <p>This sample contains a part of the Northwind database included with Microsoft Access. It uses elements instead of attributes, with high tags to text proportions, there are many spaces.</p> <p><ORDERIDS></p> <p><Item></p> <p><OrderID> 10326 </ OrderID></p> <p><ORDERDATE> 11/10/94 </ orderdate></p> <p><Shipaddress> C / Araquil, 67 </ shipaddress></p> <p></ Item></p> <p>...</p> <p>Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. for some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. to be fair, the best format to use for measuring performance is UCS-2 on Disk So That The Numbers Are More Globally Meaningful. Another major factor is whether the file is encoded in UCS-2 format. Since most XML documents are English, the size of UTF-8 is half of UCS-2, because the Latin character is compressed in UTF-8 to a byte. But it is not the same for other languages. For example, for some Asian languages, UTF-8 is larger than UCS-2 because it extends each character to three bytes in worst cases. For justification, the best format of metrics should be UCS-2, which is more suitable for globalization.</p> <p>The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess Factor, "Which is The Ratio of Element and attribute name character.</p> <p>The following table shows the UCS-2 file size of four sample files, the only number, the number of elements, and attributes, the number of text nodes, and the number of text contents (UNICODE characters). It also shows the scale of the label, indicating the ratio of the elements and attribute names on other characters in the file.</p> <p>Sample sample</p> <p>FILE SIZE file size</p> <p>Unique names unique</p> <p>Elements and attributes elements and properties</p> <p>Text Nodes text node</p> <p>Text Content (Characters) text content (number of characters)</p> <p>Tagginess (percentage) label ratio (%)</p> <p>ADO.XML</p> <p>2,171,812</p> <p>53</p> <p>63,722</p> <p>61,462</p> <p>3890</p> <p>18.7</p> <p>Hamlet.xml</p> <p>559, 260</p> <p>In one</p> <p>6637</p> <p>5472</p> <p>170,545</p> <p>5.9</p> <p>Ot.xml</p> <p>7,663,624</p> <p>12</p> <p>71,417</p> <p>47,302</p> <p>3,236,900</p> <p>1.4</p> <p>Northwind.xml</p> <p>488,140</p> <p>12</p> <p>3680</p> <p>2761</p> <p>31,155</p> <p>6.0</p> <p>The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain. So Instead of using 407 kb of memory to store, the only name is very interesting because</p> <p>MSXML</p> <p>"Athalation" has the name of the element and attribute, which means that it is for each unique name only a string object, pointing to an element and attribute having the same name. This is important because elements and attribute names are typically repetitive. For example,</p> <p>ADO.XML</p> <p>In the sample file, there is actually</p> <p>63,722</p> <p>Elemental and attribute names, occupying the entire file</p> <p>407,148</p> <p>byte. The label here exceeds the proportion of files.</p> <p>18%</p> <p>! But only these names are</p> <p>53</p> <p>A unique name. So don't have to use</p> <p>407KB</p> <p>The memory is stored, and only a small amount of memory is required.</p></div><div class="text-center mt-3 text-grey"> 转载请注明原文地址:https://www.9cbs.com/read-29309.html</div><div class="plugin d-flex justify-content-center mt-3"></div><hr><div class="row"><div class="col-lg-12 text-muted mt-2"><i class="icon-tags mr-2"></i><span class="badge border border-secondary mr-2"><h2 class="h6 mb-0 small"><a class="text-secondary" href="tag-2.html">9cbs</a></h2></span></div></div></div></div><div class="card card-postlist border-white shadow"><div class="card-body"><div class="card-title"><div class="d-flex justify-content-between"><div><b>New Post</b>(<span class="posts">0</span>) </div><div></div></div></div><ul class="postlist list-unstyled"> </ul></div></div><div class="d-none threadlist"><input type="checkbox" name="modtid" value="29309" checked /></div></div></div></div></div><footer class="text-muted small bg-dark py-4 mt-3" id="footer"><div class="container"><div class="row"><div class="col">CopyRight © 2020 All Rights Reserved </div><div class="col text-right">Processed: <b>0.038</b>, SQL: <b>9</b></div></div></div></footer><script src="./lang/en-us/lang.js?2.2.0"></script><script src="view/js/jquery.min.js?2.2.0"></script><script src="view/js/popper.min.js?2.2.0"></script><script src="view/js/bootstrap.min.js?2.2.0"></script><script src="view/js/xiuno.js?2.2.0"></script><script src="view/js/bootstrap-plugin.js?2.2.0"></script><script src="view/js/async.min.js?2.2.0"></script><script src="view/js/form.js?2.2.0"></script><script> var debug = DEBUG = 0; var url_rewrite_on = 1; var url_path = './'; var forumarr = {"1":"Tech"}; var fid = 1; var uid = 0; var gid = 0; xn.options.water_image_url = 'view/img/water-small.png'; </script><script src="view/js/wellcms.js?2.2.0"></script><a class="scroll-to-top rounded" href="javascript:void(0);"><i class="icon-angle-up"></i></a><a class="scroll-to-bottom rounded" href="javascript:void(0);" style="display: inline;"><i class="icon-angle-down"></i></a></body></html><script> var forum_url = 'list-1.html'; var safe_token = 'K8QJCmdjIoazlgYFxSrclFiukJM_2FAhIx6yYsOZAyQwvW8MqNyl_2BSVpjb2yAJqWHXNDVZFmit_2FB2XJvuyhrJODA_3D_3D'; var body = $('body'); body.on('submit', '#form', function() { var jthis = $(this); var jsubmit = jthis.find('#submit'); jthis.reset(); jsubmit.button('loading'); var postdata = jthis.serializeObject(); $.xpost(jthis.attr('action'), postdata, function(code, message) { if(code == 0) { location.reload(); } else { $.alert(message); jsubmit.button('reset'); } }); return false; }); function resize_image() { var jmessagelist = $('div.message'); var first_width = jmessagelist.width(); jmessagelist.each(function() { var jdiv = $(this); var maxwidth = jdiv.attr('isfirst') ? first_width : jdiv.width(); var jmessage_width = Math.min(jdiv.width(), maxwidth); jdiv.find('img, embed, iframe, video').each(function() { var jimg = $(this); var img_width = this.org_width; var img_height = this.org_height; if(!img_width) { var img_width = jimg.attr('width'); var img_height = jimg.attr('height'); this.org_width = img_width; this.org_height = img_height; } if(img_width > jmessage_width) { if(this.tagName == 'IMG') { jimg.width(jmessage_width); jimg.css('height', 'auto'); jimg.css('cursor', 'pointer'); jimg.on('click', function() { }); } else { jimg.width(jmessage_width); var height = (img_height / img_width) * jimg.width(); jimg.height(height); } } }); }); } function resize_table() { $('div.message').each(function() { var jdiv = $(this); jdiv.find('table').addClass('table').wrap('<div class="table-responsive"></div>'); }); } $(function() { resize_image(); resize_table(); $(window).on('resize', resize_image); }); var jmessage = $('#message'); jmessage.on('focus', function() {if(jmessage.t) { clearTimeout(jmessage.t); jmessage.t = null; } jmessage.css('height', '6rem'); }); jmessage.on('blur', function() {jmessage.t = setTimeout(function() { jmessage.css('height', '2.5rem');}, 1000); }); $('#nav li[data-active="fid-1"]').addClass('active'); </script>