Perl XML QuickStart: The Perl XML Interfaces
Perl XML QuickStart: The Perl XML Interfaces
By Kip Hamptonapril 18, 2001
Introduction
Marginwidth = "0" marginheight = "0" src = "http://ad.doubleClick.net/adi/xml.ds/art;pos=_art;sz=336x280;ORD = 1280393006?" frameborder = "0" width = "336" scrolling = "no" height = "280">
A realthful of qustions to the
Perl-XML mailing list points to the need for a document that gives new users a quick, how-to overview of the various Perl XML modules. For the next few months I will be devoting this column solely to that purpose.
The XML modules available from CPAN can be divided into three main categories: modules that provide unique interfaces to XML data (usually concerned with translating data between an XML instance and Perl data structures), modules that implement one of the standard XML APIs, and special -purpose modules That Seek to Simplify The Execution of Some Specific XML-Related Task. This Month We Will Be Looking The First of these, The Perl-Specific XML Interfaces.
Use disclaimer QW (: standard);
This is not an exercise in comparative performance benchmarking, nor is it my intention to suggest that any one module is inherently more useful than another. Choosing the right XML module for your project depends largely upon the nature of the project and your past experience. Different interfaces lend themselves to different kinds of tasks and to different kinds of people. My only goal is to offer working examples of the various interfaces by defining two simple tasks, and then showing how to achieve the same net result using each of the selected modules.
THE TASKS
While the uses for XML are rich and varied, most XML-related tasks can be divided into two groups:. Those related to extracting data from existing XML documents, and those related to creating a new XML documents using data from other sources With this in mind, the examples that we will use for our module introductions will consist of extracting a specific set data from an XML file, and and marking up a Perl data structure in a specific XML format.Task One: Extracting Information
First, consider the folowing xml fragment:
XML Version = "1.0"?>
THE DROMEDARY CAMEL IS Characterized by a long-curved
NECK, Deep-Narrow Chest, And A Single Hump.
...
APPEARANCE>
physical-characteristics>
The driveDary Camel is an Herbivore.
...
food-habits>
THE DROMEDARY CAMEL HAS a LifeSpan of About 40-50 Years
...
reproduction>
With The Exception of Rutting Males, DROMEDARIES SHOW
Very Little Aggressive Behavior.
...
behavior>
The Camels Prefer Desert Conditions Characterized by a
Long Dry season and a short rainy season.
...
Habitat>
nature-history>
Since The Dromedary Camel Is Domesticated, The Camel HAS
No special status in conservation.
detail>
conservation>
species>
...
CamelIDS>
Now let's say that the complete document (available with this month's sample code) contains the same information for all the members of Camelidae family, not just our friend the single-humped Dromedary Camel. To illustrate how each module might be used to extract a subset of the data stored in this document, we will write a tiny script that parses the camelids.xml document and, for each species found, prints a line to STDOUT containing that species' common name, Latin name (in parentheses), and conservation status Having Processed The Entire Document, The Output of Each Script Should Yield The Following Result: Bactrian Camel (Camelus Bactrianus) endangered
DROMEDARY, ORABIAN CAMEL (Camelus DROMEDARIUS) No Special Status
Llama (lama glama) No Special Status
Guanaco (Lama Guanicoe) Special Concern
VICUNA (VICUGNA VICUGNA) EndanGred
Task Two: CREATING An XML Document
To demonstrate how each of the selected modules may be used to create XML documents from other data sources, we will write a small script that marks up a simple Perl hash containing URLs to a few cool camelid-related pages on the Web as a simple XHTML Document.
Here's the hash:
MY% Camelid_Links =
One => {url => '
http://www.online.discovery.com/news/picture/may99/photo20.html ',
Description => 'Bactrian Camel In Front of Great'.
'Pyramids in giza, egypt.'},
Two => {url => 'http://www.fotos-online.de/ENGLISH / M/09/9532.htm',
Description => 'DROMEDARY CAMEL ILLUSTRATES THE'.
'Importance of Accessorizing.'},
Three => {url => 'http://www.eskimo.com/~wallama/funny.htm',
Description => 'Charlie - Biography of a Narcissistic Llama.'}, four => {url => 'http://arrow.colorado.du/travels/other/turkey.html',
Description => 'a Visual Metaphor for the perl5-porters'.
'List?'},
FIVE => {URL => 'http://www.galaonline.org/pics.htm',
Description => 'Many Cool Alpacas.'},
Six => {url => 'http://www.thpf.de/suedamerikareise/galerie/vicunas.htm',
Description => 'Wild Vicunas in a Scenic landscape.'}
);
And Here Is An Example of The Document That We Hope to create from That Hash:
XML Version = "1.0">
Biography of a narcissistic Llama.
Camel In Front of Great Pyramids in Giza, Egypt.
Camel Illustrates the Importance of Accessorizing.
METAPHOR for the perl5-porters list?
Vicunas in a scenic landscape.
body>
html>
It's important to note that while the resulting XML is indented for readability (as shown above), this sort of fine-grained whitespace handling is not part of our sample requirement. All we care about is that the resulting document is well-formed XML, And That It Accurately Reflects The Data Stored IN Our Hash.with Our Tasks Defined, Let's Get Straight To The Code Samples.
Samples of the perl-specific XML Interfaces
XML :: Simple
Originally created to simplify the task of reading and writing config files in an XML format, XML :: Simple translates data between XML documents and native Perl data structures with no intervening abstract interface. Elements and attributes are accessed using nested references.
Reading
Use xml :: simple;
MY $ file = 'files / camelids.xml';
MY $ XS1 = XML :: Simple-> New ();
MY $ DOC = $ XS1-> XMLIN ($ file);
Foreach my $ key (keys (% {$ doc -> {species}}) {
Print $ doc -> {species} -> {$ key} -> {'common-name'}. '('. $ key. ')';
Print $ doc -> {species} -> {$ key} -> {conservation} -> final. "/ n";
}
Writing
Use xml :: simple;
Require "files / camelid_links.pl";
MY% Camelid_LINKS = GET_CAMELID_DATA ();
MY $ xsimple = Xml :: Simple-> new ();
Print $ xsimple-> xmlout (/% camelid_links,
Noattr => 1,
Xmldecl => ' XML Version = "1.0">');
Note that the requirements of the data-to-document task reveals one of XML :: Simple's few weaknesses:. It does not allow us to decide which keys in our hash should be returned as elements and which should be returned as attributes The output from the sample above would be close to the requirement, but it would not be close enough. For those cases where we prefer to manipulate the contents of an XML document using native Perl data structures, but need finer control over the output, a combination Of xml :: Simple and Xml :: Writer Works Nicely.The Following Illustrates How To Use Xml :: Writer to Meet The Output Requirement.
Use xml :: writer;
Require "files / camelid_links.pl";
MY% Camelid_LINKS = GET_CAMELID_DATA ();
MY $ Writer = XML :: Writer-> New ();
$ Writer-> Xmldecl ();
$ Writer-> StartTag ('HTML');
$ Writer-> StartTag ('Body');
Foreach my $ item (keys (% Camelid_Links) {
$ Writer-> StartTag ('a', 'href' => $ camelid_links {$ item} -> {url});
$ Writer-> Characters ($ Camelid_Links {$ Item} -> {Description});
$ Writer-> Endtag ('a');
}
$ Writer-> Endtag ('Body');
$ Writer-> endtag ('html');
$ Writer-> End ();
XML :: SimpleObject
XML :: SimpleObject Provides An Object-Oriented Interface To XML Data Using Accessor Methods That Are Reminiscent of The Document Object Model.
Reading
Use xml :: parser;
Use xml :: simpleObject;
MY $ file = 'files / camelids.xml';
MY $ PARSER = XML :: Parser-> New (ErrorContext => 2, Style => "Tree");
MY $ XSO = XML :: SimpleObject-> New ($ Parser-> Parsefile;
Foreach MY $ species ($ XSO-> CHILD ('Camelids') -> Children ('Species')) {
Print $ species-> child ('common-name') -> {value}; print '('. $ species-> attribute ('name'). ')';
Print $ species-> Child ('Conservation') -> Attribute ('status');
Print "/ n";
}
Writing
.
XML :: Treebuilder
The XML :: TreeBuilder distribution ships with two modules; XML :: Element, for creating or accessing the contents of XML element nodes, and XML :: TreeBuilder, a factory package that simplifies the building of document trees from existing XML files Those who. .
Reading
Use xml :: Treebuilder;
MY $ file = 'files / camelids.xml';
My $ tree = xml :: Treebuilder-> new ();
$ tree-> parse_file ($ file);
Foreach MY $ species ($ TREE-> FIND_BY_TAG_NAME ('Species')) {
Print $ species-> Find_BY_TAG_NAME ('Common-Name') -> as_Text;
Print '('. $ species-> attr_get_i ('name'). ')';
Print $ species-> Find_BY_TAG_NAME ('Conservation') -> attr_get_i ('status');
Print "/ n";
}
Writing
Use xml :: element;
Require "files / camelid_links.pl";
MY% Camelid_LINKS = GET_CAMELID_DATA ();
MY $ root = Xml :: Element-> New ('html');
MY $ body = Xml :: Element-> New ('body');
MY $ XML_PI = XML :: Element-> New ('~ pi', text => 'XML Version = "1.0"'); $ root-> push_content ($ body);
Foreach my $ item (keys (% Camelid_Links) {
MY $ link = xml :: Element-> New ('a', 'href' => $ camelid_links {$ item} -> {url});
$ link-> push_content ($ camelid_links {$ item} -> {description});
$ body-> push_content ($ link);
}
Print $ xml_pi-> as_XML;
Print $ root-> as_xml ();
XML :: Twig
XML :: Twig stands apart from the other Perl-only XML interfaces in that it combines an inventive Perlish interface with many of the features found in the standard XML APIs. For a more detailed introduction to XML :: Twig see this XML.com article .
Reading
Use xml :: twig;
MY $ file = 'files / camelids.xml';
MY $ twig = Xml :: twig-> new ();
$ twig-> parsefile ($ file);
MY $ root = $ twig-> root;
Foreach MY $ Species ($ root-> children ('species')) {
Print $ species-> first_child_text ('common-name');
Print '('. $ species-> att ('name'). ')';
Print $ species-> first_child ('consservation') -> ATT ('status');
Print "/ n";
}
Writing
Use xml :: twig;
Require "files / camelid_links.pl";
MY% Camelid_LINKS = GET_CAMELID_DATA ();
MY $ root = xml :: twig :: ELT-> New ('html');
MY $ body = xml :: twig :: ELT-> New ('body');
$ Body-> Paste ($ root);
Foreach my $ item (keys (% Camelid_Links) {
MY $ link = xml :: twig :: ELT-> New ('a');
$ link-> set_att ('href', $ Camelid_Links {$ Item} -> {url});
$ link-> set_text ($ camelid_links {$ item} -> {description});
$ link-> paste ('last_child', $ body);
}
Print QQ | XML Version = "1.0"?> |; $ root-> print;
These examples have illustrated the basic usage for the more generic Perl XML modules. My goal has been to give just enough example code to give you a feel for what it is like to work with each of these modules. Next month we will look at those Perl Modules That Implement One of the Standard XML Interfaces; Specification, XML :: DOM, XML :: XPath, And The Various Sax and Sax-like modules.
Resources
Download Sample Code. A Complete List of the XML MODULES AVAILABLE from CPAN Perl-Xml Mailing List Archives Using XML :: Twig