XML observation: Planet Blog

xiaoxiao2021-03-05  24

XML observation: Planet Blog

English original

content:

PLANET RDFRSS Note source code list RSS parser aggregated running software Conclusion Reference Information About the author to this paper

related information:

The source of RDF data is traced using XML and RDF to find friends using FOAF to support online community RSS 2.0 content

In the XML area:

Teaching tools and products all articles

Let the development group go together

Level: Getting Started

Edited by Edd Dumbill (edd@xml.com), Xmlhack.com 2004 February

EDD Dumbill explains how RSS in WebLog aggregates, enhances communication between software developer groups, and how to use XML / RDF to describe multiple communities.

Blogging is interesting. WebLog is the Mixture of Home and Personal Magazines, which has become the most popular way to express their own in the Web. It is produced with the development of RSS (full name for RSS abbreviations, can be RDF Site Summary or Really Simple Syndication). Whether it is directly to the desktop RSS reader, or BLOGDEX, Meerkat, or Bloglines, one of the most widely used applications is a chain of WebLog entries. About RSS, there have been many places on the Web. If you are not familiar with it, see the links provided in the reference. In fact, I wrote in this column about using the Redland RDF Toolkit, the sample code is a simple aggregator of RSS 1.0. As I mentioned, using a personal aggregator to handle RSS to be a very common way. Straw (see Referring) is a very typical type, Figure 1 is a screenshot. The benefit of choosing the RSS you need is that the resulting information is very good. But it's bad is that the final information is very in line with your preference: You won't see anything that exceeds you choose! Figure 1. The screenshot of the personal RSS aggregator at the Gnome desktop platform recently, more and more developers working on open source projects have their own WebLog. In general, these developers have much more interest in writing code than the update page. They have been attracted by Blogging because of the widespread use of fast blogging tools like Pybloxsom and Movable Type. There are also many free software developers temporarily using Advogato (see Resources). Advogato has a log system. The development trend of this system is to make users more control over their own content and representation. Reading developers of related projects, which is beneficial to your progress, design decisions, and opportunities for you. Build a session, activate and motivate a developer community, or they will only organize together very loosely. It is not difficult to find that this method is used in all types of large and medium-sized organizations. However, you want to track these magazines through your personal aggregator: Because new users will mix the information from you to track and other message sources, unless you only have a fixed user in your RSS aggregator. However, if you don't have a new face again if you have any new faces. If you can access these magazines in groups, it will be much better than searching for each individual. For this reason, add more experiences that want to give more experience around the community around the open source project, starting to have the development of WebLog's Web site. The earliest example is Planet Gnome, and there are many red hat and novell / ximian companies, there are also many free developers. Next, Monologue, Novell / Ximian has a project mono responsible for .NET runtime and C # compilers, Monologue is composed of WebLog, which is developers of this project (see Resources). Planet RDF is being encouraged by these rumors that emerge in various community magazines. I pulled a few friends interested in RDF and semantic Web technology. I plan to build a similar website and the content is semantic WEB technology. This website is called Planet RDF, you may like to read the magazine by viewing it. Figure 2 is a screenshot of this web site.

Figure 2. The screenshot of the Planet RDF is not surprising that Planet RDF is completely built from XML and RDF technology. The remainder of the article is to discuss the architecture of this aggregator, the format of the configuration file, and how to set up a aggregator for yourself. This aggregator requires a relatively small condition: a list of RSS extracted by RSS, an entry database, an entry database, a method of extracting and formatting aggregation results, because many people are generous to put their wonderful webpages Sharing, most components here are already ready. For different purposes, Matt Biddulph wrote RSS aggregat in 2003. Because there are too many ready-made things, the establishment and investment of this website requires only 3 hours of the development group (Matt Bidden, Dave Beckett and Phil McCarthy), all of the work just uses XSLT and CSS! Below I will describe each component. In order to establish a website like Planet RDF, the RSS is required to provide the following information:

Name Weblog's URL Weblog's title We should get Dave Beckett's Semantic Web WebLogs (see Resources) list, then input it into the aggregator, convert it with XSLT. The list format of RSS in WebLog (some places is called blogroll) has appeared, called Outline Processor Markup Language (Summary Processing Markup Language, OPML) (see Resources). But this format can only support one title and a URL, so it cannot really meet existing tasks. In fact, please note that the many of the multi-datations required, and the elements in the Friend of A Friend (FOAF) vocabulary are the same, so we have established the format shown in Listing 1. Sign up, only two members are listed here. Listing 1. Source code list of RSS

XMLns: rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

XMLns: rdfs = "http://www.w3.org/2000/01/rdf-schema#"

XMLns: foaf = "http://xmlns.com/foAf/0.1/"

XMLns: rss = "http://purl.org/rss/1.0/"

XMLns: DC = "http://purl.org/dc/elements/1.1/">>

Planet RDF

Joe Bloggs

Joe Blogs' Blog

Frederique Smith

Freddie's Blog

In Listing 1, although some RDF-specific redundant information is added to use XML, it is also convenient to process the general-purpose XML tool, such as XPath / XSLT. Note that the semantics of the group member identity are clearly defined in the document: can't repeat any existing URL! This uses one of the benefits of RDF, which can extend other functions in the case of ensuring backward compatible with other RDF processing. (If you want to know why I think this is a wonderful thing, you can refer to the reference of my article "Sticking with it - rdf".) Listing 2 shows some modifications we may want to do in the future. However, it will not affect any RDF perceived software for low version of this format. Listing 2. How to expand the source code list

Joe Bloggs

Joe Blogs' Blog

Joe Bloggs' Work Journal

The extension is expressed in Listing 2. In fact, the software used to deal with the format in Listing 1 should process the second WebLog entry. RDF-Aware software in the format in Listing 1 is just ignored special homepage and image tools. In addition, if there is no setting for the base and order of the element, the software does not have any feasible RDF methods. So, usually use XPath expressions to find the data you need. BLOG list can be easily processed with Redland Rdf Toolkit (see Resources). Listing 3 lists both methods of the FOAF class in BlogInfo.py (which can be linked to its code). Listing 3. Communication with information from the Foaf Blog list in BlogInfo.py

From __future__ import generators

From rdf import *

Class Foaf:

DEF __INIT __ (Self, URL):

Self.Model = Model ()

Parser (). PARSE_INTO_MODEL (Self.Model, URL)

Self.name = node (uri_string = "http://xmlns.com/foAf/0.1/name")

Self.nick = node (uri_string = "http://xmlns.com/foaf/0.1/nick")

Self.title = node (Uri_String = "http://purl.org/dc/elements/1.1/title")

Def Blogs (Self):

Statement = statement (SUBJECT = NONE,

Predicate = node (Uri_String = "http://xmlns.com/foaf/0.1/weblog"),

Object = none)

For i in self.model.Find_Statements (Statement):

Yield I.Object

Constructor __init__ passed the URL of the blog list through the parameter, and parsing it into RDF mode. Calling the blogs () method can easily find all of the WebLog listed in the list, the predicate of this method is the object of all FOAF: WebLog declarations. There are many types of RSS parsers used in the RSS parser Web, from the very accurate parser that is very free from the RDF Parser mentioned in the source of "Dracketed RDF data", and what is the result of the result, RSS parsing the fanatics believes that "freedom" is "practical"). In these free parsers, we selected Mark Pilgrim's feed Parser, and the target programming language select Python. (If you want to get more useful information, see References in References) It can be seen from Listing 4 that is quite simple to use. Listing 4. Handling RSS Equal IMPORT RSSPARSER with MARK PILGRIM RSS parser

# etag and modified are set to the value we find when

# we last polled this rss feed

Data = rssparser.parse (RSS, ETAG = ETAG, Modified = Modified,

Agent = 'planet rdf aggregator 0.1; http://planet.rdfhack.com/')

For item in data ['Items']:

Print item ['Title']

The result of the Parse () call in Listing 4 is a Python Dictionary. All RSS entries are stored in an array, and the keyword is a string of the "items" field. The last two lines of Listing 4 iterate all entries and prints their headings. The code for aggregating the PLANET RDF aggregate part is relatively simple. Sequentially register each RSS feed in the list, get the key value of each RSS entry, and save all entries. Getting the key value of each entry is a bit troubles, but this is to use the URL and entries of the RSS file at the same time. With this method, the provisioners of the extract will be given back to any changes to the title or description of the entry. But unfortunately if there is an error in the URL, it will be considered a new entry by the aggregator, and this error entry will not be discovered. One of the strategies to avoid this is to let the producer allocate a fixed identifier for each RSS entry, but it seems that this method is clearly not widely accepted. You can find the execution process of the aggregation by reading the code (see Resources). But compared to the XML head, what is more meaningful is how the final web home page is created. The output of the aggregator is actually a RSS file created by rss.py (written by Mark Nottingham). This RSS file contains all entries in Participant Weblog, arranged in order from new to old time. In order to provide HMTL output, it uses the XSLT style. The benefit of this is that you can get a ready-made RSS file, which aggregated WebLog can be added to the user's personal RSS reader. Running software If you want to get more, you can download the software through the links in the reference information and try it yourself. You need to install Python 2.2 or above and Redland RDF framework (see Resources). Download the source file and then create a bloggers.rdf file similar to Listing 1. You can test your files: Modify the __main__ section of bloginfo.py, make it point to your RDF file, then perform Python BlogInfo.py. You must modify the main aggregator chumpologica.py, set the output directory and data directory you are using at the beginning of the script. Then just perform python chumpologica.py bloggers.rdf. The aggregator stores and runs an RDF file in the directory you specified: This is an RSS 1.0 file. You can improve it with XSLT. Conclusion The Planet gnome aggregator has proven well, and when all of the portions of all WebLog are included, its output will be very attractive. According to conventions, this is done by transfers the HTML entity in the RSS to the main body. Norm Walsh explains why this is a bad thing (see Resources). RSS 1.0 has a slightly better mechanism to deal with it, called ccontent: encoded (see Resources). The Planet RDF code accepts content: Encoded in some places that can be used, and cleaned the abuse of the RSS: Description property by deleting an escaled HTML; this HTML actually moves into the Content: Encode attribute in the RSS 1.0 file. In most cases, this HTML is repaired by using excellent HTML Tidy tools (see Resources) to generate the ultimate output of the XHTML 1.0 specification. There are still some problems that there is no solution when processing WebLog is directly facing personal work. such as:

Organization or temporary group members created the proposal to create, such as the BUG tracking system due to the growing number of aggregates in the Planet style (when I wrote this article, Planets Apache and SUSE are in nervous development), create aggregation The various software of the website also increased. At least 3 codebooks are now available to create such a website, from Monologue, Planet Gnome, and Planet RDF. If these three codebooks can be merged, it will be a good thing, even if there is a configuration file, just like the RDF Blog list in Listing 1. In addition, we want a more advanced approach to describe each Planet, perhaps be a BER aggregator - Planerium! (In fact, the Jeff Waugh, who creates Planet Gnome has registered the domain name "PlanetPlanet.org", you can go there to see). I left the code in Listing 5, mention how to describe multiple Planet: The handler can retrieve the membership list of each Planet according to the SEEALSO link. If you choose to use RDF / XML, create a BER configuration file and aggregate all RDF blogs as easy. Reference

Please read the WebLog homepage of Harvard University Law, including Donna Wentworth's Definition Of Weblogs - Website "continually linked, comments, and anything you like anything you like. New entry is constantly added to top and old entries Soon will sink to the page. "Learn more about the information about the personal aggregator, it has proven to be the most popular way of reading WebLog. Specific examples of NetNewswire, Windows .NET Sharpreader and UNIX GNOME Desktops of Mac OS X. Site sites from the open source development activities include Monologue, Planet Gnome, and Planet Debian. Discuss Outline Processor Markup Language (OPML), a hierarchical document profile format, increasing support for the RSS summor list. Description of the team to build a Planet RDF site is: matt biddulph is responsible for aggregator coding; Dave Beckett Maintains the RDF Blogger list and creates the Redland RDF framework, while Phil McCarthy is designed. Read Dave Beckett's "Semantic WebLogs" list, which can be used to enter the RDF format from the XSLT Stylesheet method to use the source code for the aggregator. A lot of software developers are using Advogato Blogging tools, which is a log system. To learn more about Friend-Of-A-Friend (FOAF), this is a vocabulary attempt to create a web home page for machinery, these homepage descriptors, links between people, and what they created and what they are doing. Please refer to EDD Dumbill's early columns in developerWorks "Find friends using XML and RDF" (June 2002) and "Using FOAF Support Online Community" (Aug 2002). Mark Pilgrim's feed Parser is used to parse the relatively free RSS syntax, and Mark NOTTINGHAM's RSS.PY is a good serial course for RSS. Please read "Sticking With IT - RDF", this article explains the advantages of using RDF to represent the XML vocabulary. Download Redland Rdf Toolkit, which includes a very good Python module for processing RDF (used in the structure of the Planet RDF). The article "The source of the retrospective RDF data" (July 2003) has mentioned Redland Toolkit. The source code of Planet RDF is known for "Chumpologica" in Matt Biddulph's Web site. RSS 1.0 Content Module Specifies a Content: Encoded tag to avoid harmful overloading of the Description tag in RSS. In the article "Escaped Markup Considered Harmful" published on XML.com, you can learn why XML supports HTML by escaping some special characters. This is a bad idea. We spend a lot of time to check the HTML we receive. Why don't you check our output (x) HTML? HTML Tidy is an excellent tool for completing this task.

Want to take more time you accepted on the HTML tag, isn't you entangled on your (x) HTML mark? Read James Lewin's article "RSS 2.0 content", you can better understand this important format. (DeveloperWorks, December 2003). More XML reference materials can be found in the XML zone of DeveloperWorks, read the previous XML observation column series articles. Please browse a lot of discount XML books in Developer Bookstore. IBM's DB2 database not only supports relational database storage, but also provides tools related to XML, such as DB2 XML Extender, which can be used as a bridge between XML and relationship systems. To learn more about DB2, please visit the DB2 Developer Garden. Understand how to become an IBM certified XML and related technologists. About the author Edd Dumbill is the editor of XML.com and XML Developer News Site XMLHack editors. He is the O'Reilly's Programming Web Services With XML-RPC, as well as the common creativity and consultant of the PharmAlicensing Life Science Intellectual Property Transaction Office. EDD is also the chairman of the XML EUROPE meeting. You can contact EDD via EDD@xml.com.

Page

转载请注明原文地址:https://www.9cbs.com/read-37399.html

New Post(0)