RSS.py
Mark Nottingham's RSS.py is a Python library for RSS processing. It is
very complete and well-written. It requires Python 2.2 and PyXML 0.7.1.
Installation is easy; just download the Python file from Mark's home
page and copy it to somewhere in your PYTHONPATH
.
Most users of RSS.py need only concern themselves with two classes it provides: CollectionChannel
and TrackingChannel
. The latter seems the more useful of the two. TrackingChannel
is a data structure that contains all the RSS data indexed by the key of each item. CollectionChannel
is a similar data structure, but organized more as RSS documents
themselves are, with the top-level channel information pointing to the
item details using hash values for the URLs. You will probably use the
utility namespace declarations in the RSS.ns
structure. Listing 1
is a simple script that downloads and parses an RSS feed for Python
news, and prints out all the information from the various items in a
simple listing.
Listing 1
|
We start by creating a TrackingChannel
instance, and then populate it with data parsed from the RSS feed at http://www.python.org/channews.rdf
.
RSS.py uses tuples as the property names for RSS data. This may seem an
unusual approach to those not used to XML processing techniques, but it
is actually a very useful way of being very precise about what was in
the original RSS file. In effect, an RSS 0.91 title
element is not considered to be equivalent to an RSS 1.0 one. There is
enough data for the application to ignore this distinction, if it
likes, by ignoring the namespace portion of each tuple; but the basic
API is wedded to the syntax of the original RSS file, so that this
information is not lost. In the code, we use this property data to
gather all the items from the news feed for display. Notice that we are
careful not to assume which properties any particular item might have.
We retrieve properties using the safe form as seen in the code below.
|
Which provides a default value if the property is not found, rather than this example.
|
This precaution is necessary because you never know what elements are used in an RSS feed. Listing 2shows the output from Listing 1.
Listing 2
|
Of course, you would expect somewhat different output because the
news items will have changed by the time you try it. The RSS.py channel
objects also provide methods for adding and modifying RSS information.
You can write the result back to RSS 1.0 format using the output()
method. Try this out by writing back out the information parsed in Listing 1. Kick off the script in interactive mode by running: python -i listing1.py
. At the resuting Python prompt, run the following example.
|
The result is an RSS 1.0 document printed out. You must have RSS.py,
version 0.42 or more recent for this to work. There is a bug in the output()
method in earlier versions.
View The Python Web services developer: RSS for Python Discussion
Page: 1 2 3 4 Next Page: rssparser.py