To cite from ``DocBook -- The Definitive Guide'' (see Further Reading at the end of this section), DocBook provides a system for writing structured documents using SGML or XML. In the following, I shall focus on the XML-variant of DocBook, because the SGML-variant is being phased out.
DocBook has been developed with a slightly different mindset than the systems I discussed in the two previous articles (POD article, LaTeX/latex2html article).
By changing the DTD, almost arbitrary constraints can be imposed on a DocBook document. For example, an organizing committee of a conference might adapt the DocBook DTD in such way that all the article of the conference's proceedings will have a uniform look and all the necessary author information.
The particular features of DocBook mentioned, imply uses of DocBook documents that are not possible, at least not easily, with POD or LaTeX documents.
For example, we load the XML::DOM
module into Perl to access
XML compliant documents, and Python ships with the xml.dom
module, which has been designed for the same purpose.
The World Wide Web Consortium (W3C, http://www.w3c.org) has even defined a language for XML translations, called XSLT (see for example http://www.w3.org/TR/xslt and http://www.oasis-open.org/cover/xsl.html). XSLT itself is a language defined within the SGML framework, which makes XML and XSL look quite similar: loads of angle brackets.
Popular transformation tools are:
The installation of both tools including the necessary DSSSL stylesheets or XSL stylesheets is quite tricky, thus I would like to recommend to beginners the installation from .deb or .rpm packages.
Being general purpose translators, both tools are not restricted to transforming DocBook documents. If you feed them the right style sheets, they will do other translations, too.
The DocBook/XML syntax resembles HTML. The fundamental difference between the two being the strictness with which the syntax is enforced. Many HTML browsers are extremely forgiving about unterminated elements, and they often silently ignore unknown elements or attributes. DocBook/XML translators reject non-DTD complying input with detailed error messages, and refuse to produce any output in such cases.
DocBook/XML is spoken in several variants, where the variants differ in
interpreting the closing tag of an element. The most verbose dialect always
closes <tag>
with </tag>
. Another
variant allows for abbreviating the closing tag to </>
, yet
another allows dropping the closing tag for empty elements all together. I
prefer writing out every end tag, a style that has proven advantageous in
deeply nested structures such as nested lists. So, in this article only the
form <tag> ... </tag>
will appear.
Special characters are written with the ampersand-semicolon convention as they are in HTML. The most frequently used special characters are
&
''<
'' and>
''.Comments are bracketed between ``<!--
'' and
``--
>''.
As already mentioned, DocBook documents must adhere to the structure that is defined in a DTD. Every document starts with selecting a particular DTD:
<!DOCTYPE (1) book (2) PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN" (3) "/usr/share/sgml/db41xml/docbookx.dtd" (4) [ ] (5) >
where I have broken the expression (from ``<'' to ``>'') into several lines for easier analysis, and added numbers in parentheses for reference.
Part (1) tells the system that we are about to choose our DTD.
Part (2) defines element book
to be
the root element of our document. part (3), the public identifier selects
the DTD to use. The public identifier is the string in quotes. The system
identifier, part (4) tells the translation tools where to find the DTD on
the local computer system. Within the square brackets, part (5), we could
place so called entity definitions, but I do not want go into detail on
entities in this introduction, so we leave this space empty.
Now, we start the text with the root element, in our case book
. What elements go into book
is defined in the DocBook DTD. These are,
for example, bookinfo
or chapter
. For a
comprehensive list of allowed elements, consult ``The Definitive Guide''. The
elements allowed within bookinfo
or chapter
are also
defined in the DocBook DTD as are all elements. The only way constructing a
valid document is by obeying all the rules prescribed by the DTD.
What might look like a drag on first sight -- Rules? Rules suck! -- is the key to open up the document to programmatic access. As the document complies to the DTD, all post-processing can rely on that very fact. Good for the programmers of the post-processors! I have to admit that the number of elements and the elements' mutual relationships is tough to pick up. However, the relations are logical: a chapter contains one ore more (introductory) paragraphs and one or more Level 1 sections. No section, on the other hand, contains a chapter, that would be nonsense. Having a copy of ``The Definitive Guide'' right next to the keyboard also helps to learn DocBook. Further down, there is a short compilation of commonly used tags.
Here comes a very short, but complete DocBook document.
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN" "/usr/share/sgml/db41xml/docbookx.dtd" []>
<book> <bookinfo> <title>XYZ (version 0.8.15) User's Manual</title> </bookinfo>
<chapter id = "chapter-introduction"> <title>Introduction</title>
<para> This chapter provides a quick introduction to XYZ. </para>
<sect1 id = "section-syntax"> <title>Syntax</title>
<para> In this section we present an outline of the syntax of the XYZ language. </para> </sect1>
<sect1 id = "section-core-library"> <title>Core Library</title>
<para> Even if no additional libraries are loaded to a XYZ program, it has access to some core library functions. </para> </sect1> </chapter>
<chapter id = "chapter-commands"> <title>Commands</title>
<sect1 id = "section-interactive-commands"> <title>Interactive Commands</title>
<para> ... </para>
<sect2 id = "section-interactive-commands-argumentless"> <title>Argumentless Commands</title>
<para> ... </para> </sect2> </sect1>
<sect1 id = "section-non-interactive-commands"> <title>Non-Interactive Commands</title>
<para> ... </para>
<sect2 id = "section-non-interactive-commands-argumentless"> <title>Argumentless Commands</title>
<para> ... </para> </sect2> </sect1> </chapter> </book>
To help the aspiring DocBook writer making sense of the loads of elements, the DocBook standard defines, I have compiled a bunch of useful tags, which are used often.
Root section tags define the outermost element of any document.
book
I<paragraphs or chapters>
</book>
article
I<paragraphs or level 1 sections>
</article>
Sectioning elements divide the document into logical parts like chapters, sections, paragraphs, and so on.
chapter
, sect1
, ..., sect6
title
followed by
paragraphs or level N+1 sections
</chapter>
Define a section. Commonly, chapter and section elements carry the
id
attribute, which allows for referencing the elements with, for
example, <xref linkend = "label"></xref>.
para
paragraph text
</para>
Group several lines of text together to form a paragraph. This is the workhorse element in many documents.
programlisting
program text
</programlisting>
Render a longish piece of program text -- preserving the line breaks.
The program is assumed to be written in the language specified in the
role
attribute. Note
that within programlisting
all
special characters retain their meaning!
This means in particular that you cannot use the control characters
``<
'', ``>
'', and ``&
'' inside
of it. The several workarounds for this problem. Either you replace all
control characters with their mnemonic equivalents (``<
'',
``>
'', and ``&
'' in our example), or
you wrap the program code in a CDATA
, like, for example,
<programlisting> <![CDATA[ cout << "value = <" << &p << ">\n"; ]]> </programlisting>
or, if the program is stored in file my-program.pl, pull in the whole file with
<programlisting> <inlinemediaobject> <imageobject> <imagedata format = "linespecific" fileref = "my-program.pl"></imagedata> </imageobject> </inlinemediaobject> </programlisting>
Generate the three typical types of lists.
The items or definitions are typically formed by one or more paragraphs, but they are allowed to contain program listings, too. The terms usually are one or more words, not paragraphs.
<itemizedlist>
<listitem>
first item
</listitem>
<listitem>
second item
</listitem>
...
</itemizedlist>
<enumeratedlist>
<listitem>
first item
</listitem>
<listitem>
second item
</listitem>
...
</enumeratedlist>
<variablelist>
<varlistentry>
<term>first term</term>
<listitem>
first definition
</listitem>
</varlistentry>
<varlistentry>
<term>second term</term>
<listitem>
second definition
</listitem>
</varlistentry>
...
</variablelist>
emphasis
Highlight a short part of the document; usually a single word.
filename
Mark word as filename.
literal
<literal role = "classification">literal something</literal>
Mark a word as being a literal expression. Use this tag only as last
possibility, if no other more specific tag matches. To calm one's bad
conscience, literal
often gets
decorated with a role
attribute, which describes more precisely
the kind of literal.
replaceable
Mark a meta-variable.
title
Give a name to a section or a formal element, like a table.
Cross references refer to other parts of the same DocBook document or to
other documents on the World Wide Web. Targets of the former are all elements
that carry an id
attribute, targets of the latter are selected
with universal resource locators (URLs).
link
Install a (hyper-)link to the spot identified via target within the current document.
ulink
Install a hyper-link to a WWW-accessible document identified by a
complete URL. A complete URL includes the protocol, for example,
http://
.
xref
Install a (hyper-)link to the spot identified via target within
the current document. A translator will add text around an xref
element. For example, a xref
to a section might be decorated with the
text ``see section
''.
Ugh, I left out tons of stuff, but only to give you a smooth, non-frightening introduction. Some great things DocBook handles that I have not discussed are
Also left out is everything related to changing the DTD or changing the style sheets.
Next month: Texinfo