XML-hul Module

1 Introduction

The XML-hul module recognizes and validates the XML (Extensible Markup Language) format [XML].

The module can be invoked with the following command-line options:

jhove ... -m XML-hul [-x sax-class] ...

XML Parser Options

The XML-hul module can use any XML parser that conforms to the SAX2 interfaces. Note that if the optional SAX2 LexicalHandler interface isn't supported by the parser, JHOVE will only be able to report a restricted set of representation information.

The actual parser used is either:

  1. The parser specified by the -x sax-class command-line option (whose class file must be found on the CLASSPATH at the time of execution);
  2. The value of the edu.harvard.hul.ois.jhove.saxClass property in the ${user.home}/jhove/jhove.properties Java properties file, where ${user.home} is the standard Java user.home property; or
  3. The default parser of your Java Runtime Environment (JRE).

For example, if you would like to use the latest Apache Xerces, you would need to specify sax-class as org.apache.xerces.parsers.SAXParser.

Module Configuration Options

This module can be configured with the following parameters:

schema=schema-URL;local-schema-path

Specifies the local schema file to use for validation in place of any occurrences of the external schema in an XML document. Using a local file is typically faster and more reliable than retrieving files over a network such as the internet, and is recommended when processing large volumes of XML. This parameter can be declared as many times as necessary.

Example: schema=http://example.com/schema.xsd;C:\schemas\example.com\schema.xsd

withtextmd=true

Indicates that textMD metadata should be included as part of a document's representation information.

2 Coverage

The XML-hul module recognizes and validates the following public profiles:

3 Well-Formedness

JHOVE uses the criteria for XML well-formedness defined by [XML].

4 Validity

JHOVE uses the criteria for XML validity defined by [XML].

Note that the concept of validity applies only to XML files that explicitly reference a DTD or XML Schema. JHOVE can determine if either of these conditions are met and if so, it will automatically invoke the SAX2 parser in a validating mode. Otherwise, the parser is invoked in a manner that only checks for well-formedness.

5 Representation Information

The MIME type is reported as: text/xml

In addition to the standard JHOVE representation information, the following XML-specific properties are reported:

Note that the notations and entities reported above are only those that appear in the XML file, not all of those that are defined in the DTD or XML Schema associated with the file.

6 Additional Module Properties