Parser (JHOVE - JSTOR/Harvard Object Validation Environment 1.16.0 API)

java.lang.Object
- edu.harvard.hul.ois.jhove.module.pdf.Parser

```
public class Parser
extends Object
```
The Parser class implements some limited syntactic analysis for PDF. It isn't by any means intended to be a full parser. Its main job is to track nesting of syntactic elements such as dictionary and array beginnings and ends.

Constructor Summary

Constructors
Constructor and Description

Parser(Tokenizer tokenizer)
Constructor.

Constructors
Constructor and Description
`Parser(Tokenizer tokenizer)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`getArrayDepth()` Returns the number of array starts not yet matched by array ends.
`int`	`getDictDepth()` Returns the number of dictionary starts not yet matched by dictionary ends.
`Set<String>`	`getLanguageCodes()` Returns the language code set from the Tokenizer.
`Token`	`getNext()` Gets a token.
`Token`	`getNext(Class<?> clas, String errMsg)` A class-sensitive version of getNext.
`Token`	`getNext(long max)` Gets a token.
`long`	`getOffset()` Returns the current offset into the file.
`boolean`	`getPDFACompliant()` Returns false if either the parser or the tokenizer has detected non-compliance with PDF/A restrictions.
`String`	`getWSString()` Returns the Tokenizer's current whitespace string.
`PdfArray`	`readArray()` Reads an array.
`PdfDictionary`	`readDictionary()` Reads a dictionary.
`PdfObject`	`readObject(boolean allowPseudo)` Reads an object.
`PdfObject`	`readObjectDef()` Reads an object definition, from wherever we are in the stream to the completion of one full object after the obj keyword.
`PdfObject`	`readObjectDef(Numeric objNumTok)` Reads an object definition, given the first numeric object, which has already been read and is passed as an argument.
`void`	`reset()` Clear the state of the parser so that it can start reading at a different place in the file.
`void`	`resetLoose()` Clear the state of the parser so that it can start reading at a different place in the file and ignore any nesting errors.
`void`	`scanMode(boolean flag)` If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings.
`void`	`seek(long offset)` Positions the file to the specified offset, and resets the state for a new token stream.
`void`	`setEncrypted(boolean encrypted)` Tells this Parser, and its Tokenizer, whether the file is encrypted.
`void`	`setObjectMap(Map<Long,PdfObject> objectMap)` Set the object map on which the parser will work.
`void`	`setPDFACompliant(boolean pdfACompliant)` Set the value of the pdfACompliant flag.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Parser
```
public Parser(Tokenizer tokenizer)
```
    Constructor. A Parser works with a Tokenizer that feeds it tokens.
    
    Parameters:
    
    tokenizer - The Tokenizer which the parser will use
- Method Detail
  - setObjectMap
```
public void setObjectMap(Map<Long,PdfObject> objectMap)
```
    Set the object map on which the parser will work.
  - reset
```
public void reset()
```
    Clear the state of the parser so that it can start reading at a different place in the file. Clears the stack and the dictionary and array depth counters.
  - resetLoose
```
public void resetLoose()
```
    Clear the state of the parser so that it can start reading at a different place in the file and ignore any nesting errors. Sets the stack and the dictionary and array depth counters to a large number so that nesting exceptions won't be thrown.
  - getNext
```
public Token getNext()
              throws IOException,
                     PdfException
```
    Gets a token. Uses Tokenizer.getNext, and keeps track of the depth of dictionary and array nesting.
    
    Throws:
    
    IOException
    
    PdfException
  - getNext
```
public Token getNext(long max)
              throws IOException,
                     PdfException
```
    Gets a token. Uses Tokenizer.getNext, and keeps track of the depth of dictionary and array nesting.
    
    Parameters:
    
    max - Maximum allowable size of the token
    
    Throws:
    
    IOException
    
    PdfException
  - getNext
```
public Token getNext(Class<?> clas,
                     String errMsg)
              throws IOException,
                     PdfException
```
    A class-sensitive version of getNext. The token which is obtained must be of the specified class (or a subclass thereof), or a PdfInvalidException with message errMsg will be thrown.
    
    Throws:
    
    IOException
    
    PdfException
  - getDictDepth
```
public int getDictDepth()
```
    Returns the number of dictionary starts not yet matched by dictionary ends.
  - setEncrypted
```
public void setEncrypted(boolean encrypted)
```
    Tells this Parser, and its Tokenizer, whether the file is encrypted.
  - getArrayDepth
```
public int getArrayDepth()
```
    Returns the number of array starts not yet matched by array ends.
  - getWSString
```
public String getWSString()
```
    Returns the Tokenizer's current whitespace string.
  - getLanguageCodes
```
public Set<String> getLanguageCodes()
```
    Returns the language code set from the Tokenizer.
  - getPDFACompliant
```
public boolean getPDFACompliant()
```
    Returns false if either the parser or the tokenizer has detected non-compliance with PDF/A restrictions. A value of true is no guarantee that the file is compliant.
  - setPDFACompliant
```
public void setPDFACompliant(boolean pdfACompliant)
```
    Set the value of the pdfACompliant flag. This may be used to clear previous detection of noncompliance. If the parameter has a value of true, the tokenizer's pdfACompliant flag is also set to true.
  - readObjectDef
```
public PdfObject readObjectDef()
                        throws IOException,
                               PdfException
```
    Reads an object definition, from wherever we are in the stream to the completion of one full object after the obj keyword.
    
    Throws:
    
    IOException
    
    PdfException
  - readObjectDef
```
public PdfObject readObjectDef(Numeric objNumTok)
                        throws IOException,
                               PdfException
```
    Reads an object definition, given the first numeric object, which has already been read and is passed as an argument. This is called by the no-argument readObjectDef; the only other case in which it will be called is for a cross-reference stream, which can be distinguished from a cross-reference table only once the first token is read.
    
    Throws:
    
    IOException
    
    PdfException
  - readObject
```
public PdfObject readObject(boolean allowPseudo)
                     throws IOException,
                            PdfException
```
    Reads an object. By design, this reader has a number of limitations.
    - It doesn't retain the contents of streams
    - It doesn't recognize a stream when it's pointing at the stream's dictionary; it will just read the dictionary
    Functions which it uses may call it recursively to build up structures. If it encounters a token inappropriate for an object start, it throws a PdfException on which getToken() may be called to retrieve that token.
    Throws:
    
    IOException
    
    PdfException
  - readArray
```
public PdfArray readArray()
                   throws IOException,
                          PdfException
```
    Reads an array. When this is called, we have already read the ArrayStart token, and arrayDepth has been incremented to reflect this.
    
    Throws:
    
    IOException
    
    PdfException
  - readDictionary
```
public PdfDictionary readDictionary()
                             throws IOException,
                                    PdfException
```
    Reads a dictionary. When this is called, we have already read the DictionaryStart token, and dictDepth has been incremented to reflect this. Only for use in this special case, where we're picking up a dictionary in midstream.
    
    Throws:
    
    IOException
    
    PdfException
  - getOffset
```
public long getOffset()
```
    Returns the current offset into the file.
  - seek
```
public void seek(long offset)
          throws IOException,
                 PdfException
```
    Positions the file to the specified offset, and resets the state for a new token stream.
    
    Throws:
    
    IOException
    
    PdfException
  - scanMode
```
public void scanMode(boolean flag)
```
    If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings.
    
    Parameters:
    
    flag - Scan mode flag

Class Parser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Parser

Method Detail

setObjectMap

reset

resetLoose

getNext

getNext

getNext

getDictDepth

setEncrypted

getArrayDepth

getWSString

getLanguageCodes

getPDFACompliant

setPDFACompliant

readObjectDef

readObjectDef

readObject

readArray

readDictionary

getOffset

seek

scanMode