rssparser.py
Mark Pilgrim offers another module for RSS file parsing. It doesn't provide all the features and options that RSS.py does, but it does offer a very liberal parser, which deals well with all the confusing diversity in the world of RSS. To quote from the rssparser.py page:
You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).
Then there are feeds, like Aaron's feed, which are too
bleeding edge. He puts an excerpt in the description element but puts
the full text in the content:encoded element (as CDATA). This is valid
RSS 1.0, but nobody actually uses it (except Aaron), few news
aggregators support it, and many parsers choke on it. Other parsers are
confused by the new elements (guid) in RSS 0.94 (see Dave Winer's feed
for an example). And then there's Jon Udell's feed, with the fullitem
element that he just sort of made up.
It's funny to consider this in the light of the fact that XML and Web services are supposed to increase interoperability. Anyway, rssparser.py is designed to deal with all the madness.
Installing rssparser.py is also very easy. You download the Python
file (see Resources), rename it from "rssparser.py.txt" to
"rssparser.py", and copy it to your PYTHONPATH
. I also
suggest getting the optional timeoutsocket module which improves the
timeout behavior of socket operations in Python, and thus can help
getting RSS feeds less likely to stall the application thread in case
of error.
Listing 3 is a script that is the equivalent of Listing 1, but using rssparser.py, rather than RSS.py.
Listing 3
|
As you can see, the code is much simpler. The trade-off between RSS.py and rssparser.py is largely that the former has more features, and maintains more syntactic information from the RSS feed. The latter is simpler, and a more forgiving parser (the RSS.py parser only accepts well-formed XML).
The output should be the same as in Listing 2.
View The Python Web services developer: RSS for Python Discussion
Page: 1 2 3 4 Next Page: Conclusion & Resources