Learn about XML

zhaozj2021-02-16  73

Learn about XML

Dare Obasanjo

Microsoft Corporation

July 2003

Summary: Understand how the Scalable Markup Language (XML) helps us implement universal data access. XML is a unicode-based plain text element language, which is a language for defining a tag language. It does not depend on any programming language, operating system or software vendor. XML can provide access to various data processing, build, conversion, and query technology. (This article contains some links to English sites.)

Introduction ubiquitous XML XML 1.0 Syntax Information Set (Infoset) and XML Series Technology Summary More

Introduction

Originally envisaged Scale Language (XML) is used to define new documents in the Web. XML is derived from a standard universal markup language (SGML), which can be considered a language, a language defining a tag language. Both SGML and XML are text-based formats that provide a tag (text from '<' and '>') to describe the document structure. Web developers may notice that XML is similar to HTML, because both of them are derived from SGML.

With the increasing application of XML, people have generally recognized that XML not only helps to describe the format of new documents for Web, but also to describe structured data. The so-called structured data includes those usually included in the spreadsheets, program profiles, and network protocols.

XML is better than early data format because XML can easily represent table format data (such as relational data in the database or electronic form) and semi-structured data (such as web pages or business documents). I have already existed and have a wide range of formats (such as comma-separated values ​​[CSV] files) can effectively process table format data, but can not process semi-structured data well, and RTF, etc. can only be specifically used for semi-structuring. Text documentation. Therefore, XML is widely accepted as a general language for information exchange.

Unwanted XML

In addition to the structured and semi-structured data, XML has many other characteristics to make it a widely used data representation format. XML is scalable, unrelated to the platform, and supports internationalization due to its entire use of Unicode. XML is a text-based format, so users can read and edit XML documents using standard text editing tools as needed.

XML scalability is expressed in multiple aspects. First, unlike HTML, XML has no fixed vocabulary. Instead, users can use XML to define specific applications or industry-specific vocabulary. Second, the application or use of an XML format is more "resistance" by applying or using an application in the XML format than an application using other formats, as long as these changes are attached. For example, if an application mainly handles the element with a Customer-ID property, the application is usually not destroyed if you add a Last-Purchase-Date property to the element. Such adaptability is rare in other data formats, which has become a significant advantage of using XML.

XML does not depend on any programming language, operating system or software vendor. In fact, using various programming languages ​​can be easily generated or using XML. Platform independence makes XML help interoperate between different programming platforms and operating systems.

Many people have realized that there is a lot advantage to publish data as XML, which has also pushed a large number of applications for XML data sources. People are or have been converted to the information source such as traffic document, database, and business to use XML as a representation format. Microsoft's products, such as Microsoft Office®, Microsoft SQL ServerTM, and Microsoft .NET Framework, end users and developers can generate documents, network information, and other data to XML or as XML. XML 1.0 syntax

As mentioned earlier, W3C XML 1.0 is recommended to describe a text-based format, using syntax similar to HTML to describe structured and semi-structured data.

Comparison of XML and HTML

HTML and XML documents consist of elements, each element contains "starting tag" (eg ), a "end tag" (eg ), and information between two tags (finger Element content). Elements can be annotated using attributes, with metadata about elements and their content.

However, there is a significant difference between HTML and XML, namely XML case sensitive, and HTML is not case sensitive. That is, in XML, starting mark

and
are different, and in HTML is the same. Another difference between HTML and XML is that XML introduces the concept of "good structure". XML's "Good Structure" rules will eliminate some of the inherent fuzzies that exist when dealing with HTML, such as it enforces all attributes. All elements must have a pair of parentheses. Start tags and end tags, or clearly indicate that it is empty element. For a short description of a good structure, see the D.2 section of "XML FAQ".

The most significant difference between HTML and XML is that the HTML has predefined the elements and attributes, elements, and attributes have been fully specified, and XML is not the case. Conversely, document author can create an XML vocabulary that is specific to its application or business needs. The existing XML vocabulary is suitable for many industries and applications, from financial information reporting (XBRL), Financial Services (FPML) to Web Documents (XHTML), Network Protocol (SOAP). Because you don't have to pay attention to how to specify how to preord or display an XML document, the document author can focus on semantic information related to its specific problem area when creating a document. The XML vocabulary brings separate content and form, making information and content to get a larger scale reuse.

XML document analysis

The following example is an XML document that represents an audio store customer order. Note that this document expresses rigorous structured data (used to describe CD information), which also represents semi-structural data (used to illustrate special instructions and annotations about specific customers), and its representation is very simple.

Dare

Obasanjo

One Microsoft Way redmond

WA

98052

16.95

NELLY

Nellyville </ Title></p> <p></ compact-disc></p> <p><compact-disc></p> <p><price> 17.55 </ price></p> <p><Artist> Baby D </ Artist></p> <p><title> Lil Chopper Toy </ Title></p> <p></ compact-disc></p> <p></ items></p> <p><! - Go to a few miles to find customers -></p> <p><special-instructions xmlns: html = "http://www.w3.org/1999/xhtml/"></p> <p><HTML: P> if Customer Is Not Available At The Address Then Attempt</p> <p>Leave Package At One of the Following Locations Listed in Order Of</p> <p>Which Should Be Attempted First First First</p> <p><html: ol></p> <p><html: li> Next door </ html: li></p> <p><html: li> Front Desk </ html: li></p> <p><html: li> on DOORSTEP </ html: li></p> <p></ html: ol></p> <p><HTML: B> Note </ html: b> Remember to Leave a Note Detailing Where</p> <p>To Pick Up The Package.</p> <p></ html: p></p> <p></ special-instructions></p> <p></ order></p> <p>The beginning of the document is an optional XML declaration to specify the XML version used, then the character encoding used by the document. Next is an XML style table processing instruction to bind a style sheet. The formatting instruction of the XML document included in the style sheet can be used in a more vivid way to present an XML document in a user application (e.g., a web browser). Processing instructions are often used to embed an application-specific information in an XML document. For example, most applications that process the above documents ignore the XML style table processing instruction, and an application for displaying an XML document (e.g., a web browser) uses information in the processing instruction to determine that will contain it for display. The style sheet of the special instruction of the document is positioned.</p> <p>Unicode angle bracket = interoperability</p> <p>XML 1.0 syntax is based on text and can be easily analyzed, which makes XML become the preferred data exchange format when interacting with a platform. The XML analyzer can be used in a variety of commonly used operating systems, so the completely different components on different platforms can be easily standardized when needed, and XML is used as the exchange format.</p> <p>The Unicode-based XML is also suitable for sharing information in a global network, such as on the web.</p> <p>Infoset and XML series technologies Although the use of XML as data representation can bring a great advantage: obtaining platform interoperability and scalability by using text-based XML syntax, but this is only XML to develop One of the benefits brought about by personnel. Another main benefit using XML is that users can access various data processing, build, conversion, and query technology.</p> <p>XML information set</p> <p>The W3C XML Information SET recommends an abstract expression of an XML document. The XML information set is mainly used as a definition set of various XML technology to formally describe the XML document section that requires technical processing. Several W3C XML technology is described in XML information set, including SOAP 1.2, XML architecture, and xQuery.</p> <p>The XML information set is a tree level representation of the XML document. An information set of an XML document contains many information items, which are abstract representations of XML document components, including representing documents, elements, attributes, processing instructions, comments, characters, representations, namespace, unsatisfactory entities. , Unlasting entity references and document type declaration information items. The XML information set is an officially recommended mechanism that defines important information that should be valued in the XML document. For example, the information set does not district two forms of emission elements. Therefore, according to the set of XML information, the following two representations</p> <p><Test> </ test></p> <p><Test /></p> <p>Are the same. Similarly, the type type type used by the attribute is not important, so according to XML information set, element</p> <p><Test Attr = 'Value' /></p> <p><Test Attr = "Value" /></p> <p>Are the same. The XML information set is not important XML 1.0 syntax content list in Appendix D recommended by the W3C XML Information Set.</p> <p>The XML Information SET recommendation introduces the concept of "Synthetic Infosets". The so-called integrated information set refers to the set of information created with other methods outside the XML documentation in the form of analysis text. The integrated information set has laid the foundation for the use of XML technology to deal with non-XML data. Of course, the premise is that such data can be mapped to the XML information set. One example of the integrated information set is ObjectxPathnavigator, which allows users to use XPath query objects in .NET Framework, or use XSLT conversion objects.</p> <p>Architecture language</p> <p>XML architectural languages ​​are used to describe the structure and content of the XML document. For example, you can use the schema to specify a document that contain one or more Compact-Disc elements, while each Compact-DISC element contains sub-element Price, Title, and Artist. In the process of switching the document, the XML architecture can describe the agreement between the XML generating program and the usage program, as it describes the composition of the effective XML message between the two. Although there is a large number of architectural languages ​​for XML, from DTD to XDR, now the most authoritative is the W3C XML architecture definition language, usually referred to as XSD.</p> <p>XSD is unique in XML architectural languages ​​because it tries to extend the role of the XML architecture so that it is no longer limited to use only to describe the two entities to exchange documents. XSD introduces the concept of "Post Schema Validation Infoset, PSVI". A complete XSD processor accepts the XML information set as an input and converts it to the Post Architecture Verification Information Set (PSVI) when verifying. PSVI is an initial input XML information set with new information that adds added and added to existing information items. W3C XML Schema recommends a component of the information set of the post-schema verification.</p> <p>Type comments are a class of important classes in the PSVI component. Elements and attributes need to be strict type definitions and have data type information associated therewith. After a strict type defined XML, there are many purposes, you can use .NET Framework XMLSerializer and other technologies to map objects, you can use SQLXML and .NET Framework's DataSet technology to map them to relational forms, or use strict type mechanisms. XML query language, such as XPath 2.0 and XQuery processed it. The following example is an architecture fragment that describes the Items element of the sample document in the XML document.</p> <p><xs: schema xmlns: xs = "http://www.w3.org/2001/xmlschema"></p> <p><xs: element name = "items"></p> <p><XS: ComplexType></p> <p><xs: sequence></p> <p><xs: element ref = "compact-disc" minoccurs = "0" maxoccurs = "unbounded" /></p> <p></ xs: sequence></p> <p></ xs: complexType></p> <p></ xs: element></p> <p><xs: element name = "Compact-disc"></p> <p><XS: ComplexType></p> <p><xs: sequence></p> <p><xs: element name = "price" type = "xs: decimal" /></p> <p><xs: element name = "artist" type = "xs: string" /></p> <p><xs: element name = "title" type = "xs: string" /></p> <p></ xs: sequence></p> <p></ xs: complexType></p> <p></ xs: element></p> <p></ xs: schema></p> <p>Tree model based API</p> <p>Tree model API presented an XML document as a tree consisting of nodes, which can usually be loaded into memory immediately. The most commonly used XML tree model API is a W3C Document Object Model (DOM). DOM supports the XML documentation in programming, processing, and modify the XML document.</p> <p>The following example uses the XMLDocument class in the .NET Framework to get the artist name and title of the first Compact-Disc in the Items element.</p> <p>Using system;</p> <p>USING SYSTEM.XML;</p> <p>Public class test {</p> <p>Public static void main (String [] args) {</p> <p>XmLDocument Doc = New XmLDocument ();</p> <p>Doc.Load ("Test.xml");</p> <p>Xmlelement firstcd = (xmlelement) doc.documentelement.firstchild;</p> <p>Xmlelement artist =</p> <p>(XMLELEMENT) FIRSTCD.GETELEMENTSBYTAGNAME ("Artist") [0];</p> <p>Xmlelement Title =</p> <p>(XMLELEMENT) FIRSTCD.GETELEMENTSBYTAGNAME ("Title") [0]</p> <p>Console.writeLine ("Artist = {0}, Title = {1}", Artist.innertext, Title.innertext);</p> <p>}</p> <p>Cursor-based API</p> <p>The XML Cursor API is like moving in the XML document, and is aligned to all aspects of the directionally oriented document. The XPathnavigator class in .NET Framework is an XML cursor API. The XML Cursor API has the advantage of loading the entire document into memory than the tree model API, so that it is easy to optimize the part of the XML generated program.</p> <p>The following example uses the XPathnavigator class in the .NET Framework to get the first Compact-Disc in the Items element, the first Compact-Disc in the iTems element.</p> <p>Using system;</p> <p>USING SYSTEM.XML;</p> <p>Using system.xml.xpath;</p> <p>Public class test {</p> <p>Public static void main (String [] args) {</p> <p>XmLDocument Doc = New XmLDocument ();</p> <p>Doc.Load ("Test.xml");</p> <p>Xpathnavigator NAV = doc.createnavigator ();</p> <p>Nav.movetofirstchild (); // Move from root nodes to document elements (items)</p> <p>nav.movetofirstchild (); // From the items element to the first Compact-Disc element</p> <p>/ / From Compact-Disc elements to Artist elements</p> <p>nav.movetofirstchild ();</p> <p>nav.movetonext ();</p> <p>String artist = nav.value;</p> <p>/ / From the Artist element to the Title element</p> <p>nav.movetonext ();</p> <p>String title = nav.value;</p> <p>Console.writeline ("Artist = {0}, Title = {1}", Artist, Title;</p> <p>}</p> <p>}</p> <p>Flow API</p> <p>When using a streaming API of the XML, the user only needs to store the context of the current node to process in the memory, the XML document can be processed. Such APIs can handle large XML files without occupying huge content space. The streaming API for XML processes mainly has two types: based on the advanced XML analyzer and the pull-out XML analyzer.</p> <p>The advancement of the promoted analyzer (such as SAX) is to move in the XML data stream, and the event "advance" to the registered event handler (callback method) when encountering an XML node. Based on the pull-out parser (such as the XMLReader class in .NET Framework) used as a forward cursor in the XML data stream.</p> <p>The following example uses the XMLReader class in the .NET Framework to get the artist name and title of the first Compact-Disc in the Items element.</p> <p>Using system;</p> <p>USING SYSTEM.XML;</p> <p>Public class test {</p> <p>Public static void main (String [] args) {</p> <p>String artist = null, title = null;</p> <p>XMLTextReader Reader = New XMLTextReader ("Test.xml");</p> <p>Reader.moveTocontent (); //move from root node to document element (item)</p> <p>/ * Keep read until you get the first <Artist> element * / while (Reader.Read ()) {</p> <p>IF ((Reader.NodeType == XMLNodeType.element) && reader.name.equals ("artist")) {</p> <p>Artist = Reader.Readelementstring ();</p> <p>Title = Reader.ReadeElementstring ();</p> <p>Break;</p> <p>}</p> <p>}</p> <p>Console.writeline ("Artist = {0}, Title = {1}", Artist, Title;</p> <p>}</p> <p>}</p> <p>XML query</p> <p>In some cases, using the API from the XML document will be too cumbersome, this or because the conditions of the lookup data are too simple, or because the API does not present specific content for the XML document for a particular query. The XML query language (such as XPath 1.0 and the upcoming XQuery) provide a rich mechanism for extracting information from XML information.</p> <p>The following example shows how to use XPath to get the artist name and title of the first Compact-Disc in the Items element.</p> <p>Using system;</p> <p>Using system.xml.xpath;</p> <p>Public class test {</p> <p>Public static void main (String [] args) {</p> <p>XpathDocument Doc = New XpathDocument ("Test.xml");</p> <p>Xpathnavigator NAV = doc.createnavigator ();</p> <p>XpathNodeIterator ity = nav.select ("/ items / compact-disc [1] / artist | / items / compact-disc [1] / title");</p> <p>Iterator.movenext ();</p> <p>Console.writeLine ("Artist = {0}", Iterator.current);</p> <p>Iterator.movenext ();</p> <p>Console.writeline ("Title = {0}", Iterator.current);</p> <p>}</p> <p>}</p> <p>XML conversion</p> <p>Users often need to convert an XML document from one vocabulary to another vocabulary. This is sometimes intended to be in order to printed a format or presentation document in a web browser, sometimes it may need to convert documents received from an external entity into a more familiar format.</p> <p>XSLT is an excellent XML conversion language. The conversion set forth in XSLT illustrates the rules that convert the source tree into the results tree. Conversion is done by association mode and template. A mode is an XPath expression that can be considered as a regular expression that matches the portion of the XML source tree, which is opposite to the matching portion of the string. The mode matches the elements in the source tree. After successful match, the template becomes an exemplary example of the creation of the results tree. When building a result tree, you can filter and reorder the elements in the source tree, and any configuration can be added.</p> <p>The following XSLT style sheet converts the items element to the XHTML web page containing the CD.</p> <p><XSL: Stylesheet XMLns: XSL = "http://www.w3.org/1999/xsl/transform" Version = "1.0" XMLns = "http://www.w3.org/1999/xhtml"></p> <p><xsl: Output method = "xml" indent = "yes"</p> <p>DOCTYPE-System = "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transition.dtd" DOCTYPE-PUBLIC = "- // W3C // DTD XHTML 1.0 Transitional // En" /></p> <p><XSL: Template Match = "/"></p> <p><html lang = "en" xml: lang = "en"></p> <p><HEAD></p> <p><title> Order Information - ORD123456 </ Title></p> <p></ hEAD></p> <p><body></p> <p><Table Border = "1"></p> <p><tr> <th> Artist </ tH> <TH> Title </ TH> <TH> Price </ TH> </ TR></p> <p><xsl: for-each select = "items / compact-disc"></p> <p><tr></p> <p><TD> <XSL: Value-of XMLns = "" SELECT = "Artist" /> </ td></p> <p><TD> <XSL: Value-of XMLns = "" SELECT = "Title" /> </ td></p> <p><TD> <XSL: Value-of XMLns = "" SELECT = "Price" /> </ td></p> <p></ TR></p> <p></ xsl: for-energy></p> <p></ TABLE></p> <p></ body></p> <p></ html></p> <p></ xsl: template></p> <p></ xsl: stylesheet></p> <p>The XHTML document is generated by the style sheet shown below:</p> <p><! Doctype html public "- // w3c // dtd xhtml 1.0 transitional // en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"></p> <p><html lang = "en" xml: lang = "en" xmlns = "http://www.w3.org/1999/xhtml"></p> <p><HEAD></p> <p><title> Order Information - ORD123456 </ Title></p> <p></ hEAD></p> <p><body></p> <p><Table Border = "1"></p> <p><tr></p> <p><th> Artist </ th></p> <p><TH> Title </ TH></p> <p><TH> Price </ TH></p> <p></ TR></p> <p><tr></p> <p><TD> NELLY </ TD></p> <p><TD> Nellyville </ TD></p> <p><TD> 16.95 </ TD></p> <p></ TR></p> <p><tr></p> <p><TD> Baby D </ TD></p> <p><TD> Lil Chopper Toy </ TD></p> <p><TD> 17.55 </ TD></p> <p></ tr> </ table></p> <p></ body></p> <p></ html></p> <p>It is shown below in the web browser.</p> <p>ArtistTITEPRICENELLYNELLYVILLE16.95BABY DLIL Chopper Toy17.55</p> <p>summary</p> <p>XML is not only a text format that describes the document, but also a mechanism to describe structured and semi-structured data, providing a series of technologies required to process such data. Powerful extraction features such as XML information set will help us to better handle non-text data, such as file systems, Windows® registry, relational databases, and even programming language objects using XML technology. XML allows us to achieve general data access.</p> <p>More information</p> <p>XML IN 10 Points</p> <p>Lessons from The Component Wars: an XML Manifesto</p> <p>XML Information Set Recommendation</p></div><div class="text-center mt-3 text-grey"> 转载请注明原文地址:https://www.9cbs.com/read-22640.html</div><div class="plugin d-flex justify-content-center mt-3"></div><hr><div class="row"><div class="col-lg-12 text-muted mt-2"><i class="icon-tags mr-2"></i><span class="badge border border-secondary mr-2"><h2 class="h6 mb-0 small"><a class="text-secondary" href="tag-2.html">9cbs</a></h2></span></div></div></div></div><div class="card card-postlist border-white shadow"><div class="card-body"><div class="card-title"><div class="d-flex justify-content-between"><div><b>New Post</b>(<span class="posts">0</span>) </div><div></div></div></div><ul class="postlist list-unstyled"> </ul></div></div><div class="d-none threadlist"><input type="checkbox" name="modtid" value="22640" checked /></div></div></div></div></div><footer class="text-muted small bg-dark py-4 mt-3" id="footer"><div class="container"><div class="row"><div class="col">CopyRight © 2020 All Rights Reserved </div><div class="col text-right">Processed: <b>0.036</b>, SQL: <b>9</b></div></div></div></footer><script src="./lang/en-us/lang.js?2.2.0"></script><script src="view/js/jquery.min.js?2.2.0"></script><script src="view/js/popper.min.js?2.2.0"></script><script src="view/js/bootstrap.min.js?2.2.0"></script><script src="view/js/xiuno.js?2.2.0"></script><script src="view/js/bootstrap-plugin.js?2.2.0"></script><script src="view/js/async.min.js?2.2.0"></script><script src="view/js/form.js?2.2.0"></script><script> var debug = DEBUG = 0; var url_rewrite_on = 1; var url_path = './'; var forumarr = {"1":"Tech"}; var fid = 1; var uid = 0; var gid = 0; xn.options.water_image_url = 'view/img/water-small.png'; </script><script src="view/js/wellcms.js?2.2.0"></script><a class="scroll-to-top rounded" href="javascript:void(0);"><i class="icon-angle-up"></i></a><a class="scroll-to-bottom rounded" href="javascript:void(0);" style="display: inline;"><i class="icon-angle-down"></i></a></body></html><script> var forum_url = 'list-1.html'; var safe_token = 'zECaeOvYIwZNjsyYkK9HCIjnfB1YVhGuOHIplYBwC2X_2FvKkIE_2FhBMFSWDIEY6HprYJGO_2FzTJFuSElv0wZMHWNw_3D_3D'; var body = $('body'); body.on('submit', '#form', function() { var jthis = $(this); var jsubmit = jthis.find('#submit'); jthis.reset(); jsubmit.button('loading'); var postdata = jthis.serializeObject(); $.xpost(jthis.attr('action'), postdata, function(code, message) { if(code == 0) { location.reload(); } else { $.alert(message); jsubmit.button('reset'); } }); return false; }); function resize_image() { var jmessagelist = $('div.message'); var first_width = jmessagelist.width(); jmessagelist.each(function() { var jdiv = $(this); var maxwidth = jdiv.attr('isfirst') ? first_width : jdiv.width(); var jmessage_width = Math.min(jdiv.width(), maxwidth); jdiv.find('img, embed, iframe, video').each(function() { var jimg = $(this); var img_width = this.org_width; var img_height = this.org_height; if(!img_width) { var img_width = jimg.attr('width'); var img_height = jimg.attr('height'); this.org_width = img_width; this.org_height = img_height; } if(img_width > jmessage_width) { if(this.tagName == 'IMG') { jimg.width(jmessage_width); jimg.css('height', 'auto'); jimg.css('cursor', 'pointer'); jimg.on('click', function() { }); } else { jimg.width(jmessage_width); var height = (img_height / img_width) * jimg.width(); jimg.height(height); } } }); }); } function resize_table() { $('div.message').each(function() { var jdiv = $(this); jdiv.find('table').addClass('table').wrap('<div class="table-responsive"></div>'); }); } $(function() { resize_image(); resize_table(); $(window).on('resize', resize_image); }); var jmessage = $('#message'); jmessage.on('focus', function() {if(jmessage.t) { clearTimeout(jmessage.t); jmessage.t = null; } jmessage.css('height', '6rem'); }); jmessage.on('blur', function() {jmessage.t = setTimeout(function() { jmessage.css('height', '2.5rem');}, 1000); }); $('#nav li[data-active="fid-1"]').addClass('active'); </script>