Put XML to work in your applications

Design 3000 Sponsor Message

May 2001

Put XML to work in your applications

Beyond the hype, the Web-friendly language plays well on e3000s

By Mark Wonsil

In recent months, the hype behind the eXtensible Markup Language (XML) has finally started to wane. Left behind are two diametrically-opposed groups. There are those who believe XML is the solution for everything, and those who believe XML has no use whatsoever.

Regardless of these claims, it is worthwhile to look at one of the goals behind XML: to separate content from presentation. What this means is that the serving program makes no assumptions about the client. That means the client can be a Web browser, a cell phone, a personal digital assistant, or a program running on an HP e3000.
Figure 1
PYX
Meaning

( Start Tag

) End Tag

A Attribute

- Data

? Processing Instruction

Conversely, if a program on an e3000 creates a well-formed XML data stream, it could work with many clients. The client doesn’t care if the data comes from SQL Server on an NT box, MS Access on a Windows machine, Oracle running on HP-UX, or IMAGE/SQL running on MPE/iX. XML data looks the same regardless of its source.

In the past year, some relational database vendors have added XML capabilities to their products. Both Oracle and Microsoft have added this capability to their database products. These products will accept SQL statements and return the result sets as simple marked-up text streams. One of the reasons database vendors are turning to XML is the fact that XML is just plain text. Text can pass through firewalls using HTTP and can be read by a myriad of operating systems.

While this sounds like good news for Java and C++ programmers, it is problematic to manipulate XML files in record-oriented programming languages like COBOL, Fortran and Basic. Fortunately, there is a way to represent XML files in a record-oriented manner. The key is to know that XML is a scaled-down version of SGML.

SGML to ESIS

SGML has been around for quite some time; certainly before the proliferation of object-oriented languages. The developers of SGML used Unix, and Unix provides a large suite of record-oriented tools like sed, grep, and awk among others.
Figure 2

Charles Goldfarb, the father of SGML, wrote an SGML utility called sgmls that exported SGML documents in a record oriented format for use by the standard Unix utilities. He called the format the Element Structure Information Set (ESIS). In this syntax, the first character of each record determines the type of data on the rest of the record. There is a record code for the start of the element, the end of the element, attributes and data. A Python programmer developed one such ESIS implementation called PYX (see Figure 1).

Traditional record-oriented languages such as COBOL are well-equipped to handle PYX records. Figure 2 shows a sample XML file, and Figure 3 shows the same file in the PYX format.

**Figure 3**

In order to create the ESIS stream, you first must have the XML. Since developers can define their own

markup languages in XML, how will we know the format of the data? This is one of the popular criticisms of XML.

Every day there seems to be another three- or four-letter acronym announcing yet another markup language. Terry Floyd of the Support Group calls them XTLA’s: eXtended Three Letter Acronyms. Fortunately, in the case of databases, there is an unofficial standard called the rowset.

The rowset comes in two flavors. The most popular is one that has one element as the record indicator
Figure 4, Advertisers By Element

and a contained element for each field in the record with the data between the start and end tags. This is called an element-centric representation. (See Figure 4 for an example of XML representation of Advertisers By Element.) The other is similar, but an attribute holds the data and the element is empty. This type of rowset can be slightly more compact because there is no end-tag. The method is attribute-centric. (See Figure 5 for an example of XML representation of Advertisers By Attribute.) Both Oracle and SQL Server can generate an element-centric rowset.

We know there are no generic XML parsers for COBOL or Fortran. But one way to do something you cannot do yourself is to ask someone else to do it for you, i.e. do it by proxy. A parsing proxy can create a process which requests an XML resource on the network, parses the stream, converts it to an ESIS representation, and passes it back to our calling program.

Employing Java to parse

Given its strength in handling XML and networking, a Java program — even one running on an e3000 — would be a good candidate as a parsing service. There are several XML parsers available for Java, including Xerces (see sidebar).

Most popular parsers can process XML files in one of two ways. The first is by building a Document
Figure 5, Advertisers By Attribute

Object Model (DOM) tree. In this method, the parser loads the entire document into memory with each element, attribute and piece of data represented by a node on a virtual tree. This method is useful if you want to change the structure of the document by adding or deleting elements. However, if we had a very large record set, this may exceed the memory resources of the program.

The second method of parsing XML is called the Simple API for XML (SAX). In this method, the program registers call-back functions for predefined events. Some of these events include the start of the document, the end of the document, a starting tag, an ending tag or data. See Figure 6 for an example in Java, ParsingProxy.java, a parsing program that will convert an XML stream to an ESIS stream.

Streaming into variables

So now we can pass a request to our parsing service and return an ESIS stream. How does the data stream get into the program’s variables? One method is to evaluate the first character of each record and act on it appropriately. (See Figure 7 for example in COBOL.)

This sample program reads the ESIS stream and processes each record type. When the program reads a start tag, “(”, it pushes the name of the element onto an element stack. When the program reads a data record, “-”, it knows the full path of the element by the values in the element stack. The top element in the stack is the name of the current element. At an end tag, “)”, the program pops the top element off of the element stack. The programmer assigns the values of the element to variables after any necessary type conversion, as all XML data is in string form. While this will do the job, there are other methods we can try to improve data binding.

In a COBOL program, one method would be to have the parsing service take all the data and return it with the fields separated by a delimiting character. In this way, a COBOL program can take the string and use the UNSTRING verb to put the data into the appropriate working storage variables. Once again, this will work well for strings but the programmer will still have to perform some data conversions for computational items. It also requires that the fields be returned by the parsing service in the correct sequence.

FORTRAN streams

If you are working in Fortran, you could try a different method. Fortran has the concept of an internal file. An internal file can be just a string and the program can perform a READ on the string as if the data were coming from a file. The advantage of this is that the READ will perform the necessary type conversion from text to the appropriate type. As the programmer you must ensure that all of the expected data is in the correct sequence.

Preprocessors

Finally, you could use the same method of data binding that the relational databases use, a preprocessor. Those who use Allbase/SQL or Oracle are used to coding with EXEC SQL statements. A preprocessor converts these statements into the actual code required to access the database and to bind the data to variables.

In order to do something similar, you would have to define the data stream using a schema language like the World Wide Web Consortium’s XSchema or James Clark’s Trex. This option is left as an exercise for the reader, or an undergraduate student in need of a project.

Other improvements to our parsing proxy could include making the proxy socket-based and multi-threaded. This would save on the time it would take to start a Java Virtual Machine. You could also pre-connect to resources, like databases. A pool of resources would improve the response time to the client, as database opens are fairly expensive.

Why get started?

As reported in the August 2000 3000 NewsWire, Oracle has ended support for its database on MPE/iX. Many HP e3000 users are now looking for ways to get at their Oracle data since there will be no more SQL*Net software upgrades for MPE/iX. XML can provide a way to get at that data again.

Beyond that, there is no reason that the HP e3000 cannot serve up XML data, too. This would provide a way for information on the HP to end up in many of the presentation clients that consume XML. Today XML data can be seen on cell phones, in Web pages and as Web graphics. It is also available to programming languages like perl, Python, Visual Basic or JavaScript, or can be viewed as a PDF document. The HP e3000 has always worked well with other systems. XML adds yet another tool in the interoperability toolbox of MPE/iX.

Mark Wonsil is president of 4M Enterprises (www.4m-ent.com) and has worked with HP 3000s for over 19 years, assisting companies where the 3000 has been the solution for manufacturing and healthcare. He became interested in XML after doing EDI integration projects for an HP 3000-based EDI translator company.