[Old] 2004-1-27 13:53:44 Software Technology Frontier: About XML and RSS (1) - Preliminary Analysis Newz Crawler

zhaozj2021-02-16  52

Software Technology Frontier: About XML and RSS (1) - Preliminary Analysis Newz Crawler Newz Crawler is not easy, support Chinese support is too bad, and Headline is often emptying, and the channel title has also changed.

It seems to write one yourself.

I studied its data format last night, relatively simple. Each file corresponds to a primary folder, the file content is the channel in the main folder and its corresponding file name. News.ncw

Each channel corresponds to a file in the feED directory {17C1E396-1C2F-4E34-B106-7940E3933454} .ncn

The content of xxx.ncn is the title and offline data.

If you write, there are several questions that need to be carefully considered

1. Format of data storage. If the amount of data is quite large, for example, offline data is stored in a single file, it is appropriate.

2. How to deal with new posts. It may not be important for Blog, but it is more important to the forum, directly related to the efficiency of popularity and questioning forum.

3. How to support existing forums, try to change the existing code. The existing forum is roughly divided into two styles. One is a planar shape, all posts are followed by the primary post, only time has been related. The other is a tree structure that can be replied to the postback, and all posts constitute a tree. Personally think that the latter is relatively high, compared to natural thinking habits, but it is very troublesome.

4. About the RSS protocol analysis and implementation is just the problem of workload, there is no technical difficulty.

转载请注明原文地址:https://www.9cbs.com/read-18339.html

New Post(0)