To understand what XML is and how it works, you must know what a markup language is. Markup languages are designed to tell machines, particularly computers, how to process data. The term "markup" derives from early print publishers, who would "mark up" text by hand to indicate to the printer which font size to use where, in which weight, using what form of alignment, and so forth.
In other words, the earliest markup languages were dedicated to passing formatting instructions.
Tags
Markup instructions are generally referred to as tags, and the process of marking up a document is sometimes called tagging. Early word-processing programs required the user to perform manual tagging. Today, most tagging in programs happens behind the scenes, and usually takes place using a proprietary system.
The divergence of methods for tagging text in proprietary systems made it hard for people to exchange data with each other.
With the advent of the Internet, the ability for authors to interchange documents in a format that was easy to use, yet powerful and aesthetically acceptable, became more valuable and more imperative.
Procedural and Logical Markup
Markup specifically designed to affect the appearance of a document is commonly called procedural markup because it instructs the computer (the browser in the case of HTML) how to render, or display, the text. But organizations that process huge numbers of documents, such as government and bureaucratic entities, quickly found that it was more important to know what the data represented rather than how it looked.
As a result, markup was created that described the content of the page. This type of markup is called descriptive, or logical, markup.
XML documents use markup to describe the semantics of the data included in the documents. Since XML has no predefined tags,
like those used in HTML, a set of tags is created for each type of document. In that sense, XML is a metalanguage.
The metalanguage aspect of XML is discussed in the next lesson. The next lesson defines metalanguages.
Binary Files
A binary file, at its simplest, is just a stream of bits (1s and 0s). It is up to the application that created the binary file to understand what all of the bits mean. That is why binary files can only be read and produced by certain computer programs, which have been specifically written to understand them. For example, when saving a document in Microsoft Word, using a version before 2003, the file created (which has a doc extension) is in a binary format. If you open the file in a text editor such as Notepad, you will not be able to see a picture of the original Word document; the best you will be able to see is the occasional line of text surrounded by binary code rather than the prose, which could be in a number of formats such as bold or italic. The characters in the document other than the actual text are metadata, literally information about information. Mixing data and metadata is both common and straightforward in a binary file.
Metadata can specify things such as which words should be shown in bold, what text is to be displayed in a table, and so on.
To interpret this file you the need the help of the application that created it. Without the help of a converter that has in-depth knowledge of the underlying binary format, you will not be able to open a document created in Word with another similar application such as WordPerfect.
The main advantage of binary formats is that they are concise and can be expressed in a relatively small space.
This means that more files can be stored but, more importantly nowadays, less bandwidth is used when transporting these files across networks.