Linux, Java and XMLEoin Lanemailto:eoinlane@esatclear.ieThis article is a basic introduction to the new web markup
language XML and the transformation language XSL. Here I show
how the Apache web server can be configured using the servlet engine
JServ, to do client side XML/XSL transformation using Apache's Cocoon servlet.Future updates for this article will be located at http://www.inconn.ie/article/cocoon.htmhttp://www.inconn.ie/article/cocoon.htm.IntroductionThe eXtensible Markup Language (XML) is a powerful new web markup language (ISO approval in February 1999). It is a powerful way of separating web content and
style. A lot has been written about XML, but to be used effectively in web design the technologies behind it must be understood. To this end I have added my own two pence worth to the already
vast amount of literature out there on the subject. This article is not
a place to learn XML, nor is it a place where the capabilities of XML are
explored to their fullest, but is is a place where the technologies behind XML can be
put in practice immediately.Before I go any further, I should recommend the two sites where
definitive information on XML can be obtained. The first is the World
Wide Web Consortium (W3C) site http://www.w3.org/http://www.w3.org/. The W3C are responsible for the XML specification. The second site is the
XML frequency asked
questions site (http://www.ucc.ie/xml/http://www.ucc.ie/xml/)
which will answer any other questions. I also recommend the XML
pages hosted by IBM,
http://www.ibm.com/xml/http://www.ibm.com/xml/,
where you will find a wide range of excellent tutorials and articles on XML.The original web language, SGML (around since 1986) is the mother of all mark-up
languages. SGML can be used to document any
conceivable system; from complex aeronautical design to ancient Chinese
dialects. However, it
suffers from being over complex and unwieldy for routine web
applications. HTML is basically a very cut down version
of SGML, originally designed with the scientific publishing community
in mind. It is a
simple mark-up language (it has been said "anyone with a pulse
can learn it") and with the explosion of the web it is clear that the people with pulses have spoken. Since its foundation the web has
grown in complexity and it has long outgrown its lowly beginning in the
scientific community.Today web pages need to be dynamic, interactive,
back-ended with databases, secure and eye catching to compete in an ever
more crowded cyberspace. Enter XML, a new mark-up language to deal
with the complexities of modern web design. XML is only 20 percent as
complex as SGML and can handle 80 percent of SGML situations (believe me
when you are talking about coding ancient Chinese dialects, 80 percent
is plenty). In the following section I will will briefly compare two markup examples, one in HTML and the second is XML, demonstrating the benefits of an XML approach. In the final section I will show you
how to set up an Apache web server to serve an XML document so
that you may begin immediately to start using XML in your web design.HTML The following example is a very simple HTML document that everyone will be familiar with:
Two important points can be made about this document.
The content and style are tied together in the document. It would be very difficulty for a search program to search
this document and extract the mail address of Eoin lane.
XML addresses these two issues.
XMLThe XML equivalent is as follows
The first thing to note is that this document, along with all
other valid XML
documents, is well formed. To be a well formed
document every tag must have an open and close brace. A program
searching for the mail address then has only to locate the text in between
the opening and closing tags of mail.
The second and crucial point is that this XML document contains just data. There is nothing in this document
that dictates how to display the author's name or his mail address. In practice it is easier to
think about web design in terms of data and presentation separately. In
the design of medium to large web sites, where all the pages have the
same look and only the data is changing form page to page, this is
clearly a better solution. Also it allows a division of labour where, style and content can be
handled by two different departments, working independently. It also allows the possibility of having one set of data with a number of ways of presenting
it.
An XML document can be presented using two different methods. One is using a Cascading Style Sheet (CSS) (see http://www.w3.org/style/css/http://www.w3.org/style/css/) to markup up the text in
HTML. The second is using a transformation language called XSL, which
converts the XML document into HTML, XML, pdf, ps, or
Latex. As to which one to use, the W3C (the people responsible for these specification) has this to say:Use CSS when
you can, use XSL when you must. They go on to say:
The reason is that CSS is much easier to use, easier to
learn, thus easier to maintain and cheaper. There are WYSIWYG editors
for CSS and in general there are more tools for CSS than for XSL. But
CSS's simplicity means it has its limitations. Some things you cannot
do with CSS, or with CSS alone. Then you need XSL, or at least the
transformation part of XSL.So what are the things you cannot do with
CSS? In general everything that needs transformations. For example, if
you have a list and want it displayed in lexicographical order, or if
words have to be replaced by other words, or if empty elements have to
be replaced by text. CSS can do some text generation, but only for generating small things, such as numbers of section headers.XSL XSL (eXtensible Stylesheet
Language)http://www.w3.org/style/xsl/ is
the language used to transform and display XML documents. It is not yet finished so
beware! It is a complex document formating
language that is itself an XML document. It can be further subdivided
in two parts: transformation (XSLT) and formatting objects (sometimes
referred to as FO, XSL:FO or simply XSL). For the sake of simplicity I
will only deal with XSLT here.XSL Transformations (XSLT)As of the 16th of November 1999 the World Wide Web Consortium
has announced the publication of XSLT as a W3C Recommendation. This
basically means that XSLT is stable and will not change in the
future. The above XML document can be transformed into a HTML document and
subsequently displayed on any browser using the following XSLT file.
To learn more about XSLT, I recommend the XSLINFO site
(http://www.xslinfo.com/http://www.xslinfo.com/
as a good starting point. Also I found the revised Chapter 14 from the
XML
Biblehttp://metalab.unc.edu/xml/books/bible/updates/14.html
to be very good. This revision is based on the specifications that
eventually became the recommendation.With the arrival of the next generation of browsers,
i.e. Netscape 5 (currently under construction http://www.mozilla.org/http://www.mozilla.org)
this transformation with be done client side. When an XML
file is requested the
corresponding XSL file will be sent along with it, and the transformation will be done by
the browser. Currently there are a lot of browsers only capable of
displaying HTML, and until then the transformation must be done server
side. This can be accomplished by using Java
servlets (Java server side programs).The Cocoon servlet is such a
servlet, written by some very clever people at Apache (http://www.apache.org/http://www.apache.org/). It basically takes
an XML document and transforms it using a XSL document. An example of
such a transformation would be to convert the XML document into HTML
so that the browser can display it. So if your web
server is configured to run servlets, and you include the cocoon
servlet, then you can start designing your web pages using XML. The rest of this article will show exactly how to do this.How do I do it?I have tested the following instructions on a fresh installation of Red Hat 6.0, so I know it works.Apache Web ServerFirst set up the Apache web server. On Red Hat this comes
pre installed but I want you to blow it away using: rpm
-e --nodeps apache and do not worry about the error
messages. Next get a hold of the most recent Apache (http://www.apache.org/http://www.apache.org/) (currently verison 1.3.9) and copy it somewhere handy. I put mine in
/usr/local/src. Tar and unzip the file using:
tar zxvf apache_1.3.9.tar.gz This will
expand the installation into the directory
/usr/local/src/apache_1.3.9. Change into this directory
and configure, build and install the application using the
following:./configure --prefix=/usr/local/apache
--mandir=/usr/local/man --enable-shared=maxmakemake install
This will install apache into the directory
/usr/local/apache and the important file to note here is
http.conf which can be found in the directory
/usr/local/apache/conf. This file contains most of the
important information necessary to run apache correctly. It contains
information on: where to serve the web documentsfrom, virtual web
servers and folder aliases. We will be returning to this file shortly so become familiar with it's general
layout. At this stage I had to reboot Linux and then start Apache using the following
instruction /usr/local/apache/bin/apachectl start
To test it, point your web browser to http://localhost/http://localhost/ and
you're in business, hopefully!
For good web design and planning I would refer you to an article that
I found invaluable in setting up my own web site: Better Web Site
Design under
Linuxhttp://www.linuxgazette.com/issue43/gibbs/Web_Design.htmlJava and JSDKAs of October, IBM have released the Java Development Kit 1.1.8 for
Linux. It claims to be faster than the corresponding Blackdown's
(http://www.blackdown.org/http://www.blackdown.org/)
and Sun's
JDKs. Download IBM JDK (see
http://www.ibm.com/java/http://www.ibm.com/java/).
Again tar and unzip this into the
/usr/local/src/jdk118 directory. Next, download the
JavaSoft's JSDK2.0http://java.sun.com/products/servlet/, the solaris version (not JSDK2.1 or any other flavours you might be
tempted to get) and tar and unzip it - again I put it in
/usr/local/src/JSDK2.0. Add the following or equivalent
to /etc/profile to make them available to your system.
JAVA_HOME="/usr/local/src/jdk118"JSDK_HOME="/usr/local/src/JSDK2.0"CLASSPATH="$JAVA_HOME/lib/classes.zip:$JSDK_HOME/lib/jsdk.jar"PATH="$JAVA_HOME/bin:$JSDK_HOME/bin:$PATH"export PATH CLASSPATH JAVA_HOME JSDK_HOME
To test them
run: java -version at the command prompt, and you should get back the
following message java version "1.1.8" and to test the servlet development kit run:
servletrunner and if all goes well you
should get back the following: servletrunner starting with settings:port = 8080backlog = 50max handlers = 100timeout = 5000servlet dir = ./examplesdocument dir = ./examplesservlet propfile = ./examples/servlet.properties
We are now ready to install Apache's servlet engine, ApacheJServ.ApacheJServAgain, download the latest ApacheJServ (version 1.0 at this time,
although version 1.1 is in it's final beta stage) from Apache's Java Site
(http://java.apache.org/http://java.apache.org/)
and expand it into /usr/local/src/ApacheJServ-1.0/. Configure, make and
install it using the following instructions: ./configure
--with-apache-install=/usr/local/apache --with-jsdk=/usr/local/src/JSDK2.0makemake install
When this has successfully completed add the following line to the end
of the http.conf file that I refereed to earlier during the Apache web
server installation: Include /usr/local/src/ApacheJServ-1.0/example/jserv.confand restart the web server using:
/usr/local/apache/bin/apachectl restart Now
comes the moment of truth, point your web browser to
http://localhost/example/Hellohttp://localhost/example/Hello
and if you get back the following two lines:Example Apache JServ ServletCongratulations, Apache JServ is working!
then you are almost home.CocoonFinally, download the latest version of Cocoon (version 1.5 at this time) from Apache's Java Site
(http://java.apache.org/http://java.apache.org/).
Cocoon is distributed as a Java jar file and can be extracted using the command
jar. First, create the directory
/usr/local/src/cocoon and then expand the cocoon jar file
into it:
mkdir /usr/local/src/cocoonjar -xvf Cocoon_1.5.jar
Now comes the tricky part of
configuring the JServ engine to recognise a file with a
.xml extension and to use the cocoon servlet process and
serve them.
Locate the file jserv.properties which you will find in the
directory /usr/local/src/ApacheJServ-1.0/example/ and at
the end of the section that begins:# CLASSPATH environment
value passed to the JVM add the following:wrapper.classpath=/usr/local/src/cocoon/bin/xxx.jar
In the case of Cocoon 1.5 this means adding the following three lines:wrapper.classpath=/usr/local/src/cocoon/bin/fop.0110.jarwrapper.classpath=/usr/local/src/cocoon/bin/openxml.106-fix.jarwrapper.classpath=/usr/local/src/cocoon/bin/xslp.19991017-fix.jar
Although these files will change with different versions. The next file to locate is the example.properties file,
again found in the /usr/local/src/ApacheJServ-1.0/example/
directory and add the following line:
repositories=/usr/local/src/cocoon/bin/Cocoon.jar
In my example.properties file it meant changing the line:
repositories=/usr/local/src/ApacheJServ-1.0/example
to the following:
repositories=/usr/local/src/ApacheJServ-1.0/example,/usr/local/src/cocoon/bin/Cocoon.jar
Also add the following line to the end of the
example.properties file:
servlet.org.apache.cocoon.Cocoon.initArgs=properties=/usr/local/src/cocoon/bin/cocoon.propertiesThe JServ engine is now properly configured and all that is left
for us to do it to tell Apache to direct any call to an XML file (or
any other file you want Cocoon to process) to the Cocoon servlet. For
this we need the JServ configuration file,
jserv.conf mentioned earlier (again in the same directory). Include the following line: ApJServAction .xml
/example/org.apache.cocoon.CocoonIn order to
access the cocoon documentation and examples add the following lines to
the alias section of
your http.conf file:
Alias /xml/ "/usr/local/src/cocoon/"Alias /xml/example/ "/usr/local/src/cocoon/example/"
Restart the web browser for this to take effect:
/usr/local/apache/bin/apachectl restart
Now point your browser to http://localhost/xml/http://localhost/xml/
to browse the documentation and
http://localhost/xml/example/http://localhost/xml/example/
to try out the examples. If Cocoon complains about a exceeding a memory limit then open the file cocoon.properties found in the /usr/local/src/cocoon/ directory. Find the line store.memory = 150000and change it to something lower like 15000. To try out the PDF examples, which I think
are very cool, you have to have Acrobat Reader installed as a
netscape plug-in, but it is worth the extra effort to get this
working.
Cocoon 2The Cocoon 1.x series has basically been a work in progress.
What started out as a simple servlet for static XSL transformation has grown into
something much more. With this ongoing development, design
considerations taken at the beginning of the project are now hampering future
developments as the scale and the scope of the project becomes
apparent. To add to this, XSL is also a work in progress,
although the current version of XSLT has become a W3C Recommendation (as of November, 16 1999).
Cocoon 2 intends to address these issues and provide us with a
servlet for XML transformations that is scalable to handle large quantities
of web traffic. Web design of medium to large sites in the
future will be based entirely around XML, as its benefit become apparent, and the Cocoon 2 servlet will hopefully provide us with a way to use it effectively.ConclusionsEven as I have
been writing this article, Apache have opened a new site dedicated exclusively to
XML
(see http://xml.apache.org/http://xml.apache.org/).
The cocoon project has obviously grown beyond all expectations, and with
the coming of Cocoon 2 will be a commercially viable servlet to
enable design of web sites in XML to become a reality. The people at
Apache deserve a lot of credit for this so write to them and thank them,
join the mailing list and generally lend your support. After
all this is open source code and this is what Linux is all about.