XML Core

DTD & XML Schema (XSD) Validation

A well-formed XML document can say anything. A valid XML document says what it is supposed to. This lesson teaches the two major schema languages — DTD and XSD — so you can define and enforce data contracts between systems.

Why Validation Matters

When two systems exchange XML, both sides need to agree on what the document should contain. A schema defines that agreement — which elements exist, what order they appear in, what data types they contain, which are required, and which are optional.

Without validation, your parser might accept a document where <price> contains "free" instead of a number, or where a required <customer-id> is missing. Bugs surface downstream, far from the source of the problem. Validation catches structural problems at parse time, before your application logic ever sees the data.

Think of schema validation as the XML equivalent of input validation in web applications — never trust incoming data without checking it against a known contract.

DTD — The Original Schema Language

DTDs (Document Type Definitions) predate XML Schema and use a non-XML syntax. They are still used in legacy systems, HTML5 doctype declarations, and some publishing workflows (DocBook, DITA).

xml
<?xml version="1.0"?>
<!DOCTYPE catalog [
  <!ELEMENT catalog (book+)>
  <!ELEMENT book (title, author, price, publish_date?)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT price (#PCDATA)>
  <!ELEMENT publish_date (#PCDATA)>
  <!ATTLIST book id ID #REQUIRED>
  <!ATTLIST book genre CDATA #IMPLIED>
]>
<catalog>
  <book id="bk101" genre="fiction">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <price>12.99</price>
  </book>
</catalog>

DTD Syntax

Element declarations: <!ELEMENT name content-model>

  • (#PCDATA) — parsed character data (text content)
  • (child1, child2) — sequence: child1 followed by child2
  • (child1 | child2) — choice: one or the other
  • + — one or more
  • * — zero or more
  • ? — zero or one (optional)

Attribute declarations: <!ATTLIST element-name attr-name type default>

  • Types: CDATA (text), ID (unique per document), IDREF (reference to an ID), IDREFS, NMTOKEN, enumeration (list of allowed values)
  • Defaults: #REQUIRED (must be present), #IMPLIED (optional), "value" (literal default), #FIXED "value" (always this value)

DTD Limitations

DTDs have significant limitations that led to the creation of XML Schema:

  • No data types — you cannot say "this must be an integer between 1 and 100"
  • Non-XML syntax — you cannot use standard XML tools to process DTDs
  • No namespace support — cannot validate documents that mix XML vocabularies
  • Limited cardinality — only +, *, ? (no "exactly 3" or "between 2 and 5")
  • Cannot define complex type constraints (patterns, ranges, enumerations with types)

XML Schema (XSD) — The Modern Standard

XSD is itself written in XML — it is a schema for schemas. It supports full data types, namespaces, complex type definitions, inheritance, cardinality control, and value constraints.

xml
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title"        type="xs:string"/>
              <xs:element name="author"       type="xs:string"/>
              <xs:element name="price"        type="xs:decimal"/>
              <xs:element name="publish_date" type="xs:date" minOccurs="0"/>
            </xs:sequence>
            <xs:attribute name="id"    type="xs:string" use="required"/>
            <xs:attribute name="genre" type="xs:string"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

XSD Core Concepts

Simple types — built-in types that validate data content:

TypeDescription
xs:stringCharacter sequence
xs:integerWhole number
xs:decimalDecimal number
xs:booleantrue/false/1/0
xs:dateCalendar date (2026-02-20)
xs:dateTimeDate and time
xs:anyURIURL or URI

Complex types — elements that contain other elements or attributes, defined with <xs:complexType>.

Compositors — define how child elements are organized:

  • <xs:sequence> — children must appear in the declared order
  • <xs:choice> — exactly one of the children must appear
  • <xs:all> — children may appear in any order, each at most once

CardinalityminOccurs and maxOccurs on any element:

  • minOccurs="0" — optional element
  • maxOccurs="unbounded" — any number of occurrences
  • minOccurs="2" maxOccurs="5" — between 2 and 5 occurrences

Restrictions and Facets

Facets narrow a base type to a specific range or pattern:

xml
<!-- Integer between 1 and 100 -->
<xs:element name="score">
  <xs:simpleType>
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="1"/>
      <xs:maxInclusive value="100"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

<!-- Enumeration: allowed string values -->
<xs:element name="genre">
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:enumeration value="fiction"/>
      <xs:enumeration value="non-fiction"/>
      <xs:enumeration value="biography"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

<!-- Pattern: regex validation -->
<xs:element name="product-code">
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:pattern value="[A-Z]{2}-[0-9]{4}"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

Linking a Schema to a Document

xml
<!-- XSD — reference using xsi namespace -->
<catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="catalog.xsd">

<!-- DTD — reference using DOCTYPE -->
<?xml version="1.0"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>

When to Use Which

SituationRecommendation
New projectsXSD
Data type validation requiredXSD
Namespace support neededXSD
Legacy system compatibilityDTD
Very simple structureEither
HTML doctype declarationDTD (it is the DOCTYPE)

Validation Tools

  • Online: xmlvalidation.com, freeformatter.com/xml-validator-xsd
  • Command line: xmllint --schema catalog.xsd catalog.xml
  • Python: from lxml import etree; schema = etree.XMLSchema(schema_doc)
  • Java: SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
  • IDE: VS Code XML extension, IntelliJ IDEA, Eclipse all provide real-time XSD validation with red squiggles

See also: [Namespaces & Modular XML](/tutorials/xml-fundamentals/namespaces) for how namespaces interact with XSD validation, and [XML Syntax & Well-Formed Documents](/tutorials/xml-fundamentals/syntax-well-formed) for the foundational syntax rules that validation builds on.

Example

xml
<!-- XSD validation pattern -->
<xs:element name="price">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="xs:decimal">
        <xs:attribute name="currency" type="xs:string" use="required"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

<!-- Pattern facet (regex) -->
<xs:restriction base="xs:string">
  <xs:pattern value="[A-Z]{2}-\d{4}"/>
</xs:restriction>

<!-- Command-line validation -->
xmllint --schema catalog.xsd --noout catalog.xml
Try it yourself — XML