Querying, Transformation & Real-World XML

XPath: Querying XML Documents

XPath is the query language for XML — the way you select specific elements, attributes, and text from an XML document tree. It is used in XSLT, XSD, web scraping, test automation, and anywhere you need to point at a specific piece of XML data.

The XML Tree Model

An XML document is a tree. The root element is the trunk. Child elements are branches. Attributes and text content are leaves. XPath navigates this tree using path expressions, similar to how filesystem paths navigate directories.

text
catalog
├── book[@id="bk101"]
│   ├── title — "The Great Gatsby"
│   ├── author — "F. Scott Fitzgerald"
│   └── price[@currency="USD"] — "12.99"
└── book[@id="bk102"]
    ├── title — "Thinking, Fast and Slow"
    ├── author — "Daniel Kahneman"
    └── price[@currency="USD"] — "16.99"

XPath expressions return node-sets (collections of nodes), single values (strings, numbers, booleans), or nothing.

XPath Fundamentals

Navigation Axes

`/` — root or direct child. /catalog/book selects book elements that are direct children of the root catalog element.

`//` — descendant at any depth. //title selects every <title> element anywhere in the document.

`.` — current context node. Useful in XSLT templates.

`..` — parent of the current node. //title/.. selects every parent of a title element (in this case, all book elements).

`@` — attribute. //book/@id selects the id attribute of every book element.

Predicates (Filters)

Predicates appear in square brackets and filter a node-set:

text
//book[1]                     first book element
//book[last()]                last book element
//book[@genre='fiction']      books with genre="fiction"
//book[price > 15]            books where price content is > 15
//book[contains(title,'Great')] books with "Great" in the title
//book[@id and @genre]        books with BOTH id and genre attributes
//book[not(@genre)]           books WITHOUT a genre attribute
//book[position() <= 3]       first three books

Essential XPath Functions

text
text()                        selects text nodes
contains(str, substring)      true if string contains substring
starts-with(str, prefix)      true if string starts with prefix
normalize-space(str)          trim + collapse internal whitespace
count(node-set)               number of nodes
sum(node-set)                 sum of numeric values
concat(str1, str2, ...)       string concatenation
not(expression)               boolean negation
position()                    position in current node-set
string-length(string)         number of characters
substring(str, start, length) extract substring
translate(str, from, to)      character replacement (like tr)

Practical Function Examples

text
//book[contains(title, 'Great')]
//book[starts-with(@id, 'bk1')]
count(//book)
sum(//book/price)
//book[string-length(title) > 20]
normalize-space(//book[1]/title/text())

XPath Axes (Advanced Navigation)

The full axis syntax is axisname::nodetest[predicate]. Most expressions use the abbreviated forms:

Full AxisAbbreviatedSelects
child::bookbookChildren named book
descendant::title.//titleAll descendants named title
parent::*..Parent element
ancestor::catalog(none)All catalog ancestors
attribute::id@idAttribute named id
following-sibling::book(none)Book siblings after current
preceding-sibling::book(none)Book siblings before current
self::book(none)Current node if it is a book

Complete Practical Examples

text
All books by a specific author:
//book[author='Daniel Kahneman']

Title text of the first book:
/catalog/book[1]/title/text()

All id attribute values:
//@id

Books published after 2010:
//book[publish_date > '2010-01-01']

Books that have a description element:
//book[description]

Parent of the element with id="bk101":
//*[@id='bk101']/..

Count of fiction books:
count(//book[@genre='fiction'])

Books with price between 10 and 20:
//book[price >= 10 and price <= 20]

The most expensive book:
//book[price = max(//book/price)]

XPath and Namespaces

When querying namespaced XML, you must declare namespace prefixes for the XPath processor:

python
from lxml import etree

tree = etree.parse("catalog.xml")
ns = {"bk": "http://example.com/books"}
titles = tree.xpath("//bk:title/text()", namespaces=ns)

Without the namespace mapping, //title will return nothing in a document where <title> is in a namespace.

Where XPath Is Used

XSLT — every <xsl:template match=""> and <xsl:value-of select=""> uses XPath. Understanding XPath is required for writing XSLT.

XSD<xs:key> and <xs:unique> use XPath for identity constraints.

Web scraping — Selenium, Scrapy, and lxml all accept XPath selectors for locating elements in HTML documents.

Selenium/Playwright test automation — select DOM elements by structure, text content, or attribute conditions that CSS selectors cannot express.

XML databases — XQuery (a superset of XPath) powers XML-native databases like BaseX and MarkLogic.

Configuration processing — XSLT-based build pipelines and Spring XML configs use XPath.

Testing XPath in the Browser

Any modern browser lets you test XPath directly in the Developer Tools console:

javascript
$x("//h1")                          // returns NodeList of all h1 elements
$x("//a[contains(@href,'github')]") // all links to GitHub
$x("//button[text()='Submit']")     // buttons with visible text "Submit"

Also: Elements tab → Ctrl+F → paste an XPath expression to highlight matching elements visually.

See [XSLT: Transforming XML](/tutorials/xml-fundamentals/xslt-transformation) where XPath is the select and match language for all transformations, and [XML in the Wild](/tutorials/xml-fundamentals/xml-in-the-wild) where XPath powers real-world XML parsing across RSS, SVG, and SOAP formats.

Example

xml
// XPath in Python (lxml)
from lxml import etree

tree = etree.parse("catalog.xml")

# Select all book titles
titles = tree.xpath("//book/title/text()")

# Filter: fiction books only
fiction = tree.xpath("//book[@genre='fiction']")

# Count and sum
book_count = tree.xpath("count(//book)")
total_price = tree.xpath("sum(//book/price)")

// XPath in Selenium (Java)
driver.findElement(By.xpath("//button[contains(text(),'Submit')]"));
driver.findElement(By.xpath("//input[@type='email']"));
driver.findElements(By.xpath("//table//tr[position()>1]"));

// Browser DevTools
$x("//h1")
$x("//a[contains(@href,'example.com')]")
Try it yourself — XML