Lesson 6 | The SAX model |
Objective | Explain the Simple API for XML (SAX) model for parsing XML documents. |
The SAX Model | XML Parser
A SAX-based parser does not generate the entire document tree of the parsed XML document. Instead, a SAX parser generates events at certain
points such as:
- The start of a document
- The end of the document
- The start of an element
- The end of an element
An application using a SAX parser provides the code needed to handle these events and process the information generated from each.
A SAX parser discards any event that is not handled by the application. To illustrate the operation of a SAX parser, consider the following XML document named
order.xml
:
<ORDER>
<CUSTOMER-NAME>Smith Jones<CUSTOMER-NAME>
<DOLLAR-AMOUNT>$300.00</DOLLAR-AMOUNT>
<TERMS>net 30</TERMS>
<ORDER>
Event-based API
Since SAX is an event-based API, an application using a SAX-based parser obtains a SAX parser object and passes it both to the document
order.xml
and to a document handler. The document handler implements the org.xml.sax.DocumentHandler
interface. When the SAX parser encounters the <DOLLAR-AMOUNT>
element, for example, it invokes the user-defined DocumentHandler.startElement()
method and passes the <DOLLAR-AMOUNT>
string to it as an argument. The processing application will provide implementation detail to the startElement()
method to perform whatever processing is required when this element is encountered.
Which XML parsers are event based?
Event-based XML parsers are a type of XML parser that parse an XML document by processing events, such as the start of an element, the end of an element, and character data, as they occur in the document. The parser generates events as it processes the document, and the application receives these events and uses them to process the data contained within the document.
Some popular event-based XML parsers include:
- Simple API for XML (SAX): A Java-based event-driven XML parser that generates events as it processes an XML document.
- Expat: A fast, non-validating XML parser written in C that is often used in a variety of programming languages.
- libxml2: A C-based XML parser that provides both DOM and SAX interfaces.
- RapidXML: A fast, non-validating XML parser written in C++ that provides a SAX-like interface.
Event-based XML parsers are typically faster and use less memory than DOM-based parsers, since they do not create a complete in-memory representation of the document. However, they can be more difficult to use than DOM-based parsers, since the application must process events as they occur and maintain state information as it processes the document.
The next lesson concludes this module.
SAX Model - Exercise
Click on the Exercise link below for an optional exercise practice parsing an XML file using XML4J.
SAX Model - Exercise