Original link: http://blog.9cbs.net/zhaozexin/archive/2005/02/06/282333.aspx? Pending = true
Sina on the first two days, I saw the Sina RSS channel opened, not easy, waiting for so long. My goal is to be a simple portlet, each time I read the Sina's RSS site displays the latest news entry. It took an afternoon, go to Sourceforge and Google to search for the Java RSS LIB in OpenSource, there is still a lot (by way of inciting, the Sourceforget search is really bad). After simple filtration, I feel that the following three class libraries have a head. Rome, RSSUTILS and RSSLIB4J. The specific evaluation is as follows: 1. Rome Rome is an open source project on java.net, and the current version is 0.5. Why is the Rome, according to its introduction, there is a "strip road to Rome", some RSS means. Rome may be SUN to extracted from its own sub-project, and the name of the package and class feel like J2SDK. Function supports all versions of RSS and Atom 0.3 (Atoms are a content polymerization similar to RSS). Rome itself provides an API and functional implementation, which is independent of a Rome-Fetcher project, specifically used to read RSS content. Just and I. Referring to the example Fetcher, RSS parsing quite simple, the following code fragment: FeedFetcher fetcher = new HttpURLFeedFetcher (); SyndFeed feed = fetcher.retrieveFeed (feedUrl); System.out.println (feedUrl "has a title:" new String (feed.gettitle (). getBytes ("ISO8859-1"), "UTF-8") "and contacts" feed.getentries (). size () "entries); for (Iterator Iter = Feed .getentries (). Iterator (); ore.hasnext ();) {syndentry entry = (SYNDENTRY) ore.next (); system.out.println (" " entry.gettitle () " [" entry.getpublishedDDate () "] ");} Why use new string (feed.gettitle (). getBytes (" ISO8859-1 ")," UTF -8 ") The transcodation is because Rome is parsed to the Sina News RSS, trying to get encoded information from the URLConnection header, otherwise always use ISO8859-1. And Sina's RSS Response Header does not include encoding information, so do a transcoding.
There is also an additional.getpubdate () will also return null, because Rome uses a variety of pattern to try parsing time information, Sina's time format is still in line with RFC822, but Rome uses SimpleDateFormat to parse the time, it forgotted, SimpleDateFormat The resolution is associated with local, so since I local local Locale is China, SimpleDateFormat's Parse method parsing the time character in English. The above code is before adding locale.setdefault (locale.englisth) to get it, but always feel uncomfortable. If you do not want to transcode it, Rome XmlReader also provides a class inferred by analyzing the header encoding and xml content, modify HttpURLFeedFetcher source: // the InputStreamReader change, change with XmlReader // InputStreamReader reader = new InputStreamReader (is, ResponseHandler. getCharacterEncoding (connection)); XmlReader reader = new XmlReader (connection); SyndFeedInput input = new SyndFeedInput (); SyndFeed feed = input.build (reader); then the Chinese without transcoding can also display correctly, but after I modify the source code again When the Baidu News RSS is analyzed, the underlying JDom will be wrong to say that the XML format is incorrect. Maybe Rome's developers have also encountered such problems, so there is no XMLReader. II. RsSUTILS is a toolkit, there is article RSS Utilities: A Tutorial specifically introduced the RSS content with TAGLIB, but I can download this kit from the Internet, naturally Unable to see its source code. But from the anti-compiletable code, it is also the internal master of Sun, which is exquisite, and the code is concise. Implement a Handler, parse XML content in SAX, construct the RSS element object and assign values in the internal reflection and JavaBean mechanism. The code segment is as follows: rssparser Parser = new rssparserimpl (); rss rss = parser.parse (NEW URL (URL)); system.out.println (Rss.getChannel (). GetTitle ()); for (Iterator Iter = RSS. getChannel (). getItems (). iterator (); it.hasnext (); {item item = (item) iter.next (); system.out.println ("" item.gettitle () " item.getpubdate ());} As shown above, the code is quite simple, there is no transcoding, the time is also displayed correctly (because there is no analysis at all , When the string returns directly).
However, the toolkit is not really published, and there are some unstrial places in the code, such as the output of System.out, very unhappy. And if the XML content of the RSS is missing, it will also putstackTrace a big string, wear it. There is another big problem, that is, when I use it to resolve Baidu news, I will report it directly: org.xml.sax.saxparseException: Character conversion error: "UNCONVERTIBLE UTF-8 Character Beginning with 0xB0". Check the online check, which may be the UTF-8 and standard UTF-8 minor compatibility of Java modified. For details, please refer to the supplemental character in the Java platform. III. RSSLIB4J RSSLIB4J is the project on the Sourceforget. The home page is http://sourceforge.net/projects/rsslib4j, and the latest version is 0.2, only 0.2, but the development status is already stable. Also support all RSS versions. RSSLIB4J parses the way RSS and RSSUTILS, I saw the source code, the design is simple, the code readability is general, and the IF is used. The code segment is as follows: rsshandler hand = new rsshandler (); rssparser.Parsexmlfile (new url (url), hand, false; rsschannel ch = hand.getrsSchannel (); system.out.println; linkedlist ()); LinkedList LST = Hand.getrsSchannel (). GetItems (); for (int i = 0; i