public abstract class Tokenizer extends Object
Modifier and Type | Field and Description |
---|---|
protected int |
_ch
Character code of current character.
|
protected RandomAccessFile |
_file
Source from which to read bytes.
|
static char[] |
PDFDOCENCODING
Mapping between PDFDocEncoding and Unicode code points.
|
Constructor and Description |
---|
Tokenizer()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addLanguageCode(String langCode)
Add a string to the language codes
|
abstract void |
backupChar()
Back up a byte so it will be read again.
|
Set<String> |
getLanguageCodes()
Return the set of language codes.
|
Token |
getNext()
Parses out and returns a token from the input file.
|
Token |
getNext(long max)
Parses out and returns a token from the input file.
|
long |
getOffset()
Return the current offset into the file.
|
boolean |
getPDFACompliant()
Returns the value of the pdfACompliant flag, which indicates that
the tokenizer hasn't detected non-compliance.
|
String |
getWSString()
Returns the value of the last white space string read by the
tokenizer.
|
protected abstract void |
initStream(Stream token)
Initialization code for Stream object.
|
abstract int |
readChar()
Get a character from the file or stream, using a buffer
|
int |
readChar1(boolean utf16)
Read a character in one-byte or 2-byte format, as
requested
|
void |
scanMode(boolean flag)
If true, do not attempt to parse non-whitespace delimited tokens, e.g.,
literal and hexadecimal strings.
|
abstract void |
seek(long offset)
Set the Tokenizer to a new position in the file.
|
protected void |
seekReset(long offset)
Reset after a seek.
|
void |
setEncrypted(boolean encrypted)
Tell this object that the file is or isn't encrypted.
|
void |
setPDFACompliant(boolean pdfACompliant)
Set the value of the pdfACompliant flag.
|
protected abstract void |
setStreamOffset(Stream token)
Sets the offset of a Stream to the current file position.
|
public static char[] PDFDOCENCODING
protected RandomAccessFile _file
protected int _ch
public Token getNext() throws IOException, PdfException
IOException
PdfException
public Token getNext(long max) throws IOException, PdfException
max
- Maximum allowable size of the tokenIOException
PdfException
public long getOffset()
public Set<String> getLanguageCodes()
public void setEncrypted(boolean encrypted)
public boolean getPDFACompliant()
true
is no guarantee that the file is compliant.public void setPDFACompliant(boolean pdfACompliant)
public String getWSString()
public abstract void seek(long offset) throws IOException, PdfException
offset
- The offset in bytes from the start of the file.IOException
PdfException
protected void seekReset(long offset)
public abstract int readChar() throws IOException
IOException
public int readChar1(boolean utf16) throws IOException
IOException
public abstract void backupChar()
public void addLanguageCode(String langCode)
public void scanMode(boolean flag)
flag
- Scan mode flagprotected abstract void initStream(Stream token) throws IOException, PdfException
IOException
PdfException
protected abstract void setStreamOffset(Stream token) throws IOException, PdfException
IOException
PdfException
Copyright © 2008–2017 The Open Preservation Foundation. All rights reserved.