XML documents are valid only if they conform to the structural and syntactical rules of a DTD.
Therefore it is important to know the basics of creating DTDs.
In this module, you will create DTDs for existing XML documents. You will also use a DTD as a guideline for creating new XML documents that conform to the rules laid out in the DTD.
After completing this module, you will have the skills and knowledge necessary to:
- Describe a valid XML document
- Describe the process for creating a DTD
- Declare basic elements in a DTD and specify their content
- Write element declarations for mixed content and declare empty elements
- Reference DTD declarations in XML
- Create a DTD file which works with a separate XML file
- Create a DTD from an existing set of tags
n the next lesson, the concept of a valid XML document will be discussed.
To use a Document Type Definition (DTD) as a guideline for creating a new XML document that conforms to the rules laid out in the DTD, follow these steps:
-
Understand the DTD Structure
A DTD defines the structure, elements, attributes, and data types allowed in an XML document. It specifies:
- Elements: Their hierarchy, occurrence, and relationships.
- Attributes: Their allowed values and defaults.
- Entity declarations.
Example of a simple DTD:
<!ELEMENT catalog (book+)>
<!ELEMENT book (title, author, year-published, isbn)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year-published (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
-
Link the DTD to Your XML Document
Include a reference to the DTD at the beginning of your XML document. The DTD can be:
- Internal: Defined directly in the XML file.
- External: Stored as a separate file.
Internal DTD Example:
<!DOCTYPE catalog [
<!ELEMENT catalog (book+)>
<!ELEMENT book (title, author, year-published, isbn)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year-published (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
]>
External DTD Example:
<!DOCTYPE catalog SYSTEM "catalog.dtd">
The external DTD file catalog.dtd
contains:
<!ELEMENT catalog (book+)>
<!ELEMENT book (title, author, year-published, isbn)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year-published (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
-
Create the XML Document
Use the DTD's rules to structure your XML content. Make sure the elements, attributes, and data types match the DTD specifications.
Example XML Document:
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
<book>
<title>A Certain Justice</title>
<author>P.D. James</author>
<year-published>1998</year-published>
<isbn>0375401091</isbn>
</book>
<book>
<title>Ashworth Hall</title>
<author>Anne Perry</author>
<year-published>1997</year-published>
<isbn>0449908445</isbn>
</book>
</catalog>
-
Validate the XML Document
Use an XML editor or parser to validate the XML against the DTD. Many tools (e.g., XML Notepad, Eclipse, or online XML validators) can check for conformance.
If using a parser in a programming language, e.g., Python with lxml
:
from lxml import etree
# Load XML and DTD
dtd = etree.DTD("catalog.dtd")
tree = etree.parse("catalog.xml")
# Validate XML against DTD
if dtd.validate(tree):
print("XML is valid!")
else:
print("XML is invalid:", dtd.error_log.filter_from_errors())
-
Iterate and Correct Errors
Adjust the XML file to address any validation errors identified during the validation step. Ensure all required elements are included, attributes are defined correctly, and the structure matches the DTD.
By following these steps, you ensure your XML document adheres to the rules defined in the DTD, maintaining both structure and consistency.
XML predefines no elements at all. Instead, XML allows you to define your own elements, as needed. However, these elements and the documents built from them are not completely arbitrary. They have to follow a specific set of rules elaborated in this module. A well-formed document is one that follows these rules. Well-formedness is the minimum criterion necessary for XML processors and browsers to read files. This module explores the different parts of an XML document such as tags, text, attributes, elements. In addition, the primary rules for each part are also discussed. Particular attention is paid to how XML differs from HTML. Along the way I introduce several new XML constructs including comments, processing instructions, entity references, and CDATA sections. This module is not an exhaustive discussion of well-formedness rules. Some of the rules I present must be adjusted slightly for documents that have a document type definition (DTD), and there are additional well-formedness rules that define the relationship between the document and its DTD, but these will be explored in later modules.
An XML document can result from the aggregation of various chunks of data
- entities,
- schemas, and
- DTDs
coming from different network locations. In these cases, the BaseURI property tells you where these nodes come from. If the XML document is being processed through a stream (for example, an in-memory string), no URI is available and the BaseURI property returns the empty string.