Overview of the tools in the xmlf90 suite

SAX

Flib SAX is a SAX level 1.0 implementation in Fortran 90.

Stream Xpath

Stream Xpath is a library that emulates some of the features of the Xpath standard, but working within the stream model of SAX.

Its small memory footprint makes it quite useful to process large datafiles, for which the standard Xpath (built on top of the memory-intensive DOM) would not be appropriate. However, the stream paradigm forces the user to be careful about controlling the state of the parser.

WXML

WXML is a library that facilitates the writing of well-formed XML, including such features as automatic start-tag completion, attribute pretty-printing, and element indentation. There are also helper routines to handle the output of numerical arrays.

See also the examples in the Examples/wxml subdirectory of the main distribution.

Jon Wakelin has written Jumbo90, a CML-formatting library on top of a slightly modified WXML. For examples of CML-formatting in strict WXML, see the Examples/cml subdirectory of the main xmlf90 distribution. The two strands of WXML will be merged very soon.

FDOM

FDOM is a a DOM level 1.0 implementation in Fortran 95. We have implemented almost all the instance methods, although it is unlikely that any of the class methods will ever be implemented. The FDOM is still evolving but is already in a usable state. More importantly, as all of the interfaces are standard, changes to the code will only take place behind the scenes.

See also the examples in the Examples/dom subdirectory of the main distribution.

Jon Wakelin, Alberto Garcia, April 2004

Guidelines for developers

The parser is built on several levels:

  1. Upper-level modules

  • m_xml_parser: The main module

  • m_xml_error : Basic error handling

  1. Intermediate layer

  • m_sax_fsm (A finite-state machine to parse the input)

  1. Basic data structures and file interfaces

  • m_sax_reader: File interface and character handling as per XML specs.

  • m_sax_buffer: Basic homemade “variable length string”, with some limitations (size, of course), but avoiding the use of dynamic structures for now.

  • m_sax_dictionary: Simple, not dynamic.

  • m_sax_charset: A simple hashing method for sets of characters.

  • m_sax_elstack: Simple stack to check well-formedness.

  • m_sax_entities: Entity replacement utilities.

  1. Something which does not really belong in the parser but which is useful to massage the data extracted from the file:

  • m_sax_converters: Routines to turn pcdata chunks into numerical arrays

There are obviously a number of hardwired limitations, which should be removed in a later version:

  • Buffer size in buffer_t definition. This is not as serious as it looks. Only long unbroken pcdata sections and overly long attribute names or values will be affected. Long SGML declarations and comments might be truncated, but they are not relevant anyway.

  • Maximum number of attributes in an element tag (set in m_sax_dictionary)

While the parser does not use any variable-length strings (to keep it compatible with existing Fortran90 compilers) or dynamical data structures for attribute dictionaries, etc, such improvements could be incorporated almost as drop-in replacements for existing sub-modules.