Fresh meat
Since it is technically, RSS is a well-structured XML document, so it can be handled with standard XML programming technology. There are two main technologies: SAX (The Simple API for XML) and DOM (The Document Object Model).
The SAX analyzer traverses the entire XML document while encountering a specific function when you encounter a tag that does not have the type. For example, call a specific function to process a start tag, call another function to process an end tag, then call a function to process data between the two. The duties of the analyzer are just the order traversing this document. The function it calls is responsible for processing the discovery mark. Once a tag is processed, the analyzer continues to analyze the next element in the document, which is constantly repeating.
On the other hand, the DOM analyzer works to read the entire XML document into memory and convert it into a hierarchical tree structure. Moreover, the API is provided for accessing different tree nodes (and the content attached to the node). The recursive processing method plus the API function allows developers to distinguish between different types of nodes (elements, attributes, character data, annotations, etc.), while performing different operations based on the node type and node depth of the document tree.
SAX and DOM analyzers support every language, including your favorite - php. I will use PHP's SAX analyzer to process RDF examples in this article. Of course, it is also easy to use the DOM analyzer.
Let us look at this simple example and remember it in your mind. Below is an RDF file I will use, this file is directly selected from http://www.freshmeat.net/:
Xml Version = "1.0" encoding = "ISO-8859-1"?>
XMLns = "http://purl.org/rss/1.0/" XMLns: DC = "http://purl.org/dc/elements/1.1/" >
http://freshmeat.net/ link>
And Cross-Platform Open Source Software. THOUSANDS OF Applications Are
Meticulously Catalog in the Freshmeat.Net Database, And Links to New
Code Are Added Daily. description>
rdf: SEQ>
items>
CHANNEL>
http://freshmeat.net/ link>
iMAGE>
http://freshmeat.net/releases/69583/ link>
item>
http://freshmeat.net/releases/69581/ link>
item>
rdf: rdf>
Below is a PHP script that analyzing this document and displays data in it:
PHP
// xml file
$ FILE = "FM-Releases.rdf";
// SET UP Some Variables for Use by the Parser
$ CurrentTag = ""
$ FLAG = "";
// Create Parser
$ XP = XML_PARSER_CREATE ();
// set Element Handler
XML_SET_ELEMENT_HANDLER ($ XP, "ElementBegin", "Elementend"); XML_SET_CHARACTER_DATA_HANDLER ($ XP, "CharacterData");
XML_PARSER_SET_OPTION ($ XP, XML_OPTION_CASE_FOLDING, TRUE);
// read XML File
IF ($ fp = fopen ($ file, "r"))))))))
{
DIE ("Could Not Read $ File");
}
// Parse Data
While ($ XML = FREAD ($ FP, 4096))
{
IF (! XML_PARSE ($ XP, $ XML, Feof ($ FP)))))
{
DIE ("XML Parser Error:".
XML_ERROR_STRING (XML_GET_ERROR_CODE ($ XP)));
}
}
// Destroy Parser
XML_PARSER_FREE ($ XP);
// Opening Tag Handler
Function ElementBegin ($ PARSER, $ Name, $ Attributes)
{
Global $ CURRENTTAG, $ FLAG;
// export the name of the current tag to the global scpe
$ currenttag = $ name;
// if Withn Item Block, Set A Flag
IF ($ Name == "Item")
{
$ FLAG = 1;
}
}
// Closing Tag Handler
Function Elementend ($ Parser, $ Name)
{
Global $ CURRENTTAG, $ FLAG;
$ CurrentTag = ""
// if Exitation An Item Block, Print A Line and Reset The Flag
IF ($ Name == "Item")
{
ECHO "
$ FLAG = 0;
}
}
// Character Data Handler
Function CharacterData ($ Parser, $ DATA)
{
Global $ CURRENTTAG, $ FLAG;
// if Withn iTem Block, Print Item Data
IF ($ currenttag == "title" || $ currenttag == "link" ||
$ currentTAG ==
"Description") && $ flag == 1)
{
Echo "$ CURRENTTAG: $ DATA
";
}
}
?>
Do not understand? Don't worry, will be explained later.
Capture flag
This script must first do to set some global variables:
// xml file
$ FILE = "FM-Releases.rdf";
// SET UP Some Variables for Use by the Parser
$ CurrentTag = ""
$ FLAG = "";
$ CURRENTTAG Variable Save is the name of the elements of the analyzer. You will soon see why you need it.
Because my ultimate goal is to display each individual entry (Item) in the channel and have a link. Also know when the analyzer exits the
The next step is to initialize the SAX analyzer and start analyzing the RSS document.
// Create Parser
$ XP = XML_PARSER_CREATE ();
// set Element Handler
XML_SET_ELEMENT_HANDLER ($ XP, "ElementBegin", "Elementend");
XML_SET_CHARACTER_DATA_HANDLER ($ XP, "CharacterData");
XML_PARSER_SET_OPTION ($ XP, XML_OPTION_CASE_FOLDING, TRUE);
// read XML File
IF ($ fp = fopen ($ file, "r"))))))))
{
DIE ("Could Not Read $ File");
}
// Parse Data
While ($ XML = FREAD ($ FP, 4096))
{
IF (! XML_PARSE ($ XP, $ XML, Feof ($ FP)))))
{
DIE ("XML Parser Error:".
XML_ERROR_STRING (XML_GET_ERROR_CODE ($ XP)));
}
}
// Destroy Parser
XML_PARSER_FREE ($ XP);
This code is simple, and the comments have been explained enough. The XML_PARSER_CREATE () function creates an analyzer instance and assigns it to the handle $ XP. Then create a backup function to process the on-tag and closed mark, and the character data between the two. Finally, the XML_PARSE () function combines the FREAD () call to read the RDF file and analyze it.
In the documentation, each time you encounter a bilus, you will be called by ELEMENTBEGIN ().
// Opening Tag Handler
Function ElementBegin ($ PARSER, $ Name, $ Attributes)
{
Global $ CURRENTTAG, $ FLAG;
// export the name of the current tag to the global scpe
$ currenttag = $ name;
// if Withn Item Block, Set A Flag
IF ($ Name == "Item")
{
$ FLAG = 1;
}
}
This function takes parameters as the name and attribute of the current tag. The tag name is assigned to the global variable $ CURRENTTAG. If this is called
Similarly, if you encounter a closed mark, the closed mark processor ELEMENTENTENTEND () will be called.
// Closing Tag Handler
Function Elementend ($ Parser, $ Name)
{
Global $ CURRENTTAG, $ FLAG;
$ currenttag = ""; // if exitation an item block, print a line and reset the flag
IF ($ Name == "Item")
{
ECHO "
$ FLAG = 0;
}
}
The closed tag handler is also used as its parameters with the marker name. If it is a closed mark for item>, the value of the variable $ FLAG is reset to 0 and the value of the variable $ CURRENTTAG is empty.
So how do you handle character data between tags? This is our interest. Let's greessing the character data processor CharacterData () first.
// Character Data Handler
Function CharacterData ($ Parser, $ DATA)
{
Global $ CURRENTTAG, $ FLAG;
// if Withn iTem Block, Print Item Data
IF ($ currenttag == "title" || $ currenttag == "link" ||
$ currentTAG ==
"Description") && $ flag == 1)
{
Echo "$ CURRENTTAG: $ DATA
";
}
}
Now you can see the parameters passing to this function, you will find that it only receives the number between the tag and the closed mark, and it does not know that the analyzer is currently being processed "tag. And this is the reason why we introduce global variable $ CURRENTTAG at first.
If the value of the $ FLAG variable is 1, that is, if the analyzer is currently between the
The entire RDF document is handled in this order, and a certain output is displayed for each