XPaths And tails
ElementTree implements a subset of XPath
in its .find*()
methods. Using this style can be much more concise than
nesting code to look within levels of subnodes, especially for
XPaths that contain wildcards. For example, if I were interested
in all the timestamps of hits to my Web server, I could examine
weblog.xml using:
|
Of course, for a standard, shallow document like weblog.xml, it is easy to do the same thing with list comprehensions:
Listing 10. Using list comprehensions to find and filter nested subelements
|
Prose-oriented XML documents, however, tend to have much more
variable document structure, and typically nest tags at least
five or six levels deep. For example, an XML schema like DocBook or TEI
might have citations in sections, subsections,
bibliographies, or sometimes within italics tags, or in
blockquotes, and so on. Finding every <citation>
element
would require a cumbersome (probably recursive) search across
levels. Or using XPath, you could just write:
|
However, XPath support in ElementTree is limited: You cannot use the various functions contained in full XPath, nor can you search on attributes. In what it does, though, the XPath subset in ElementTree greatly aids readability and expressiveness.
I want to mention one more quirk of ElementTree
before I wrap up. XML documents can be mixed content. Prose-oriented XML, in
particular, tends to intersperse PCDATA and tags rather freely.
But where exactly should you store the text that comes
between child nodes? Since an ElementTree
Element
instance
has a single .text
attribute -- which contains a string -- that
does not really leave space for a broken sequence of strings.
The solution ElementTree adopts is to give each node a
.tail
attribute, which contains all the text after a closing
tag but before the next element begins or the parent element
is closed. For example:
|
View Process XML in Python with ElementTree Discussion
Page: 1 2 3 4 5 Next Page: Conclusion