Put XML to work in your applications
Beyond the hype, the Web-friendly language plays well on e3000s
By Mark Wonsil
In recent months, the hype behind the eXtensible Markup Language (XML) has finally started to wane. Left behind are two diametrically-opposed groups. There are those who believe XML is the solution for everything, and those who believe XML has no use whatsoever.
Regardless of these claims, it is worthwhile to look at one of the goals behind XML: to separate content from presentation. What this means is that the serving program makes no assumptions about the client. That means the client can be a Web browser, a cell phone, a personal digital assistant, or a program running on an HP e3000.
Conversely, if a program on an e3000 creates a well-formed XML data stream, it could work with many clients. The client doesnt care if the data comes from SQL Server on an NT box, MS Access on a Windows machine, Oracle running on HP-UX, or IMAGE/SQL running on MPE/iX. XML data looks the same regardless of its source.
In the past year, some relational database vendors have added XML capabilities to their products. Both Oracle and Microsoft have added this capability to their database products. These products will accept SQL statements and return the result sets as simple marked-up text streams. One of the reasons database vendors are turning to XML is the fact that XML is just plain text. Text can pass through firewalls using HTTP and can be read by a myriad of operating systems.
While this sounds like good news for Java and C++ programmers, it is problematic to manipulate XML files in record-oriented programming languages like COBOL, Fortran and Basic. Fortunately, there is a way to represent XML files in a record-oriented manner. The key is to know that XML is a scaled-down version of SGML.
SGML to ESIS
SGML has been around for quite some time; certainly before the proliferation of object-oriented languages. The developers of SGML used Unix, and Unix provides a large suite of record-oriented tools like sed, grep, and awk among others.
Charles Goldfarb, the father of SGML, wrote an SGML utility called sgmls that exported SGML documents in a record oriented format for use by the standard Unix utilities. He called the format the Element Structure Information Set (ESIS). In this syntax, the first character of each record determines the type of data on the rest of the record. There is a record code for the start of the element, the end of the element, attributes and data. A Python programmer developed one such ESIS implementation called PYX (see Figure 1).Traditional record-oriented languages such as COBOL are well-equipped to handle PYX records. Figure 2 shows a sample XML file, and Figure 3 shows the same file in the PYX format.
In order to create the ESIS stream, you first must have the XML. Since developers can define their own
markup languages in XML, how will we know the format of the data? This is one of the popular criticisms of XML.
Every day there seems to be another three- or four-letter acronym announcing yet another markup language. Terry Floyd of the Support Group calls them XTLAs: eXtended Three Letter Acronyms. Fortunately, in the case of databases, there is an unofficial standard called the rowset.
The rowset comes in two flavors. The most popular is one that has one element as the record indicator
We know there are no generic XML parsers for COBOL or Fortran. But one way to do something you cannot do yourself is to ask someone else to do it for you, i.e. do it by proxy. A parsing proxy can create a process which requests an XML resource on the network, parses the stream, converts it to an ESIS representation, and passes it back to our calling program.
Employing Java to parse
Given its strength in handling XML and networking, a Java program even one running on an e3000 would be a good candidate as a parsing service. There are several XML parsers available for Java, including Xerces (see sidebar).
Most popular parsers can process XML files in one of two ways. The first is by building a Document
Object Model (DOM) tree. In this method, the parser loads the entire document into memory with each element, attribute and piece of data represented by a node on a virtual tree. This method is useful if you want to change the structure of the document by adding or deleting elements. However, if we had a very large record set, this may exceed the memory resources of the program.
The second method of parsing XML is called the Simple API for XML (SAX). In this method, the program registers call-back functions for predefined events. Some of these events include the start of the document, the end of the document, a starting tag, an ending tag or data. See Figure 6 for an example in Java, ParsingProxy.java, a parsing program that will convert an XML stream to an ESIS stream.
Streaming into variables
So now we can pass a request to our parsing service and return an ESIS stream. How does the data stream get into the programs variables? One method is to evaluate the first character of each record and act on it appropriately. (See Figure 7 for example in COBOL.)
This sample program reads the ESIS stream and processes each record type. When the program reads a start tag, (, it pushes the name of the element onto an element stack. When the program reads a data record, -, it knows the full path of the element by the values in the element stack. The top element in the stack is the name of the current element. At an end tag, ), the program pops the top element off of the element stack. The programmer assigns the values of the element to variables after any necessary type conversion, as all XML data is in string form. While this will do the job, there are other methods we can try to improve data binding.
In a COBOL program, one method would be to have the parsing service take all the data and return it with the fields separated by a delimiting character. In this way, a COBOL program can take the string and use the UNSTRING verb to put the data into the appropriate working storage variables. Once again, this will work well for strings but the programmer will still have to perform some data conversions for computational items. It also requires that the fields be returned by the parsing service in the correct sequence.
If you are working in Fortran, you could try a different method. Fortran has the concept of an internal file. An internal file can be just a string and the program can perform a READ on the string as if the data were coming from a file. The advantage of this is that the READ will perform the necessary type conversion from text to the appropriate type. As the programmer you must ensure that all of the expected data is in the correct sequence.
Finally, you could use the same method of data binding that the relational databases use, a preprocessor. Those who use Allbase/SQL or Oracle are used to coding with EXEC SQL statements. A preprocessor converts these statements into the actual code required to access the database and to bind the data to variables.
In order to do something similar, you would have to define the data stream using a schema language like the World Wide Web Consortiums XSchema or James Clarks Trex. This option is left as an exercise for the reader, or an undergraduate student in need of a project.
Other improvements to our parsing proxy could include making the proxy socket-based and multi-threaded. This would save on the time it would take to start a Java Virtual Machine. You could also pre-connect to resources, like databases. A pool of resources would improve the response time to the client, as database opens are fairly expensive.
Why get started?
As reported in the August 2000 3000 NewsWire, Oracle has ended support for its database on MPE/iX. Many HP e3000 users are now looking for ways to get at their Oracle data since there will be no more SQL*Net software upgrades for MPE/iX. XML can provide a way to get at that data again.
Mark Wonsil is president of 4M
Enterprises (www.4m-ent.com) and has worked with HP 3000s for over 19
years, assisting companies where the 3000 has been the solution for
manufacturing and healthcare. He became interested in XML after doing
EDI integration projects for an HP 3000-based EDI translator
Copyright The 3000 NewsWire. All rights reserved.