Creating Documents  «Prev  Next»

Lesson 3Rules for a well-formed XML document
ObjectiveList Rules for constructing well-formed XML documents.

Rules for constructing well-formed XML Documents

Wel-formedness is essential in XML. The W3C instructs us that violations of well-formedness constraints are fatal errors. Documents that are not well-formed will not load in a browser or will not be processed by an XML parser, according to the XML Recommendation.

Five rules for well-formed documents

Five basic rules will help you construct well-formed XML documents. You should commit these rules to memory:
  1. XML uses elements to markup content.
XML elements consist of a start tag and an end tag. Start tags begin with < and end with >. End tags begin with </ and end with >. Element names in XML are case sensitive. They may start with a letter, an underscore character, or a colon character. The next characters in an element name may be letters, digits, underscores, hyphens, periods, and colons but not white space. Spaces, carriage returns, line feeds, and tabs are all treated as white space in XML.
  1. Tags cannot be inferred and must be Tags cannot be inferred and must be.
All start tags must have corresponding ending tags. All ending tags must have corresponding start tags.
For the document to be well-formed, it must be written in the following way:

<NAME><FIRST>John</FIRST></NAME>

  1. An empty element must be closed with />.
Empty elements may be used for elements that have no content.
You may be familiar with the <IMG> and <BR> empty tags from HTML. In HTML, empty tags are not required to have closing tag in the form />. In XML, empty elements must be closed with />. For example

<PURCHASE-ORDER NUMBER="1234"/>
  1. XML elements that have name-value pair attributes must enclose attribute values in single or double quotation marks.

Attribute values for XML documents

In HTML, either of the following would be considered correct:
1) <TD WIDTH=25%>
2) <TD WIDTH="25%">

XML, however, requires that all attribute values be enclosed with quotes. In other words, of the two examples above, only the second would be valid for XML, and then only presuming the ending </TD> tag was also present.
Some users say XML documents should have no attribute/value pairs, or very few, and others wish to convey a great deal of data in attribute/value pairs. The concept of attributes and attribute values will be explored later in this course.
For example:
<BOOK ISBN="345671">
      <AUTHOR>James Gosling</AUTHOR>
</BOOK>

In this example, the <BOOK> element has an attribute named ISBN with the value 345671.
Note that the attribute value is enclosed in double quotation marks for the document to be well-formed.
  1. XML elements must nest and un-nest in reverse order.

For example, the following XML document is not well-formed because it violates the rule for correctly nesting elements:
<NAME><FIRST>John</NAME></FIRST>

The correct nesting of these elements would be:
<NAME><FIRST>John</FIRST></NAME>

Well Formed XML Document
The next lesson shows you how to determine the inherent structure of information within XML documents.

SEMrush Software