Overview of the tools in the xmlf90 suite¶
SAX¶
Flib SAX is a SAX level 1.0 implementation in Fortran 90.
Stream Xpath¶
Stream Xpath is a library that emulates some of the features of the Xpath standard, but working within the stream model of SAX.
Its small memory footprint makes it quite useful to process large datafiles, for which the standard Xpath (built on top of the memory-intensive DOM) would not be appropriate. However, the stream paradigm forces the user to be careful about controlling the state of the parser.
WXML¶
WXML is a library that facilitates the writing of well-formed XML, including such features as automatic start-tag completion, attribute pretty-printing, and element indentation. There are also helper routines to handle the output of numerical arrays.
See also the examples in the Examples/wxml
subdirectory of the main
distribution.
Jon Wakelin has written Jumbo90, a CML-formatting library on top of a
slightly modified WXML. For examples of CML-formatting in strict WXML,
see the Examples/cml
subdirectory of the main xmlf90
distribution. The two strands of WXML will be merged very soon.
FDOM¶
FDOM is a a DOM level 1.0 implementation in Fortran 95. We have implemented almost all the instance methods, although it is unlikely that any of the class methods will ever be implemented. The FDOM is still evolving but is already in a usable state. More importantly, as all of the interfaces are standard, changes to the code will only take place behind the scenes.
See also the examples in the Examples/dom
subdirectory of the main
distribution.
Jon Wakelin, Alberto Garcia, April 2004
Guidelines for developers¶
The parser is built on several levels:
Upper-level modules
m_xml_parser: The main module
m_xml_error : Basic error handling
Intermediate layer
m_sax_fsm (A finite-state machine to parse the input)
Basic data structures and file interfaces
m_sax_reader: File interface and character handling as per XML specs.
m_sax_buffer: Basic homemade “variable length string”, with some limitations (size, of course), but avoiding the use of dynamic structures for now.
m_sax_dictionary: Simple, not dynamic.
m_sax_charset: A simple hashing method for sets of characters.
m_sax_elstack: Simple stack to check well-formedness.
m_sax_entities: Entity replacement utilities.
Something which does not really belong in the parser but which is useful to massage the data extracted from the file:
m_sax_converters: Routines to turn pcdata chunks into numerical arrays
There are obviously a number of hardwired limitations, which should be removed in a later version:
Buffer size in buffer_t definition. This is not as serious as it looks. Only long unbroken pcdata sections and overly long attribute names or values will be affected. Long SGML declarations and comments might be truncated, but they are not relevant anyway.
Maximum number of attributes in an element tag (set in m_sax_dictionary)
While the parser does not use any variable-length strings (to keep it compatible with existing Fortran90 compilers) or dynamical data structures for attribute dictionaries, etc, such improvements could be incorporated almost as drop-in replacements for existing sub-modules.