Lesson 2 | What is a parser? |
Objective | Explain how an XML parser works. |
What is XML parser?
XML parsers operate on XML documents by creating a hierarchical representation of these documents.
Once the document hierarchy is created, any element in the XML document can be accessed, changed, deleted, or added. For example, assume that the following elements are included in an XML document:
<ITEM>
<NAME>Computer Cable</NAME>
<LIST-PRICE>$50</LIST-PRICE>
</ITEM>
Document Hierarchy
In this example, the document hierarchy that is created by the XML parser gives an application using the XML document the ability to obtain the value included in the <LIST-PRICE>
element for this inventory item, namely, $50. A processing application has the ability to add another item to this document, or change the existing list price of an item. The processing application utilizes a set of application programming interfaces (APIs) defined by the parser software vendor to programmatically obtain and manipulate the elements in the hierarchical representation of the document. Two types of APIs exist for this purpose: the Document Object Model (DOM) and the Simple API for XML (SAX), both of which are discussed later in the course. All XML parsers are required to check the well-formedness of an XML document according to the XML specification and report any errors. In addition to checking the well-formedness of an XML document, some parsers also check to see if a given XML document conforms to a specified DTD. In this case, the XML parser reports any inconsistencies (errors) that may exist in the XML document.
Relationship between XML Parser and the Document Hierarchy
An XML parser is a software tool or library that reads XML documents and provides access to their content and structure. The document hierarchy refers to the nested, tree-like organization of elements within an XML document, where elements can contain other elements, attributes, and text.
The relationship between an XML parser and the document hierarchy is foundational and symbiotic:**
- Parsing and Understanding Structure:
- Recognition of Hierarchy: An XML parser reads the XML document sequentially, recognizing the start and end tags of elements to understand the nesting and hierarchy of the document. It identifies parent and child relationships among elements.
- Building a Tree Model: Many XML parsers construct an in-memory representation of the XML document's hierarchy, such as the Document Object Model (DOM) tree. Each node in this tree represents an element, attribute, or text node, reflecting the structure of the XML document.
- Validation Against Definitions:
- Conformance to DTD or Schema: If a Document Type Definition (DTD) or XML Schema is provided (as in your XML code), the parser uses it to validate the document's structure. It ensures that the hierarchy of elements in the document matches the rules defined in the DTD or Schema.
- For example, the parser checks that the `INVENTORY` element contains one or more `PRODUCT` elements and that each `PRODUCT` contains `NAME`, `QTYAVAIL`, and `UNITPRICE` elements in the correct order.
- Access and Manipulation:
- Navigation: By understanding the document hierarchy, the parser allows applications to navigate through the XML structure. You can traverse from parent to child elements, access sibling elements, and retrieve the content of specific nodes.
- Data Extraction: Applications can extract data from specific parts of the document by querying the hierarchical structure. For instance, you can retrieve all `NAME` elements within `PRODUCT` elements to get a list of product names.
- Error Detection:
- Structural Errors: The parser can detect errors in the document hierarchy, such as missing end tags, improperly nested elements, or elements that are out of order according to the DTD or Schema.
- Content Errors: It can also validate the content within elements if data types and formats are specified in the Schema.
- Transformation and Serialization:
- Modifying the Hierarchy: Some parsers allow applications to modify the document hierarchy by adding, removing, or rearranging elements and then outputting the modified XML.
- Serialization: After manipulation, the parser can serialize the in-memory hierarchical structure back into XML format for storage or transmission.
Summary**
- XML Parser: Reads and interprets the XML document, understanding its hierarchical structure.
- Document Hierarchy: The organized, nested structure of elements within the XML document.
- Relationship: The parser relies on the document hierarchy to accurately read, validate, and provide access to the XML data. It builds an in-memory representation of this hierarchy, enabling applications to navigate and manipulate the XML content effectively.
By parsing the document hierarchy, the XML parser enables applications to work with complex XML documents in a structured and meaningful way, ensuring that the data is both syntactically correct and semantically valid according to the defined DTD or Schema.
In the next lesson, you will learn how to differentiate between the types of parsers and how they are used.