|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Object
|
+--superwaba.ext.xplat.xml.XmlTokenizer
|
+--superwaba.ext.xplat.xml.XmlReader
Class to read HTML or XML documents, reporting events to handlers (for example, ContentHandler).
Note: While in the SAX 2.0 spirit, this implementation is not fully compliant. Speed and footprint took precedence over what the author judged being details.
Versus SAX, reporting tag names, like in ContentHandler.startElement(int, superwaba.ext.xplat.xml.AttributeList), passes an integral
tag code rather than the name itself. This is, again, for performance reasons.
Comparing integers vs. string is notably more efficient and tag name comparison is heavily used for XML
Applications.
The tag code must uniquely identify the name of the tag. The default implementation —
see getTagCode(byte[], int, int) in this code — simply consists to hash the tag name. It can be overriden to
suit specific needs.
Tag names should be translated to tag codes as soon are they are known, when reading the DTD for instance, or computed in advance and saved into a static correspondence table.
| Field Summary | |
protected CharacterConverter |
converter
charsetName - protected to allow non-default locale encoding |
protected int |
tagNameHashId
hash ID of current tag name, set by foundStartTagName or foundEndTagName |
| Constructor Summary | |
XmlReader()
Constructor |
|
| Method Summary | |
void |
foundAttributeName(byte[] buffer,
int offset,
int count)
Method called when an attribute name has been found. |
void |
foundAttributeValue(byte[] buffer,
int offset,
int count,
byte dlm)
Method called when an attribute value has been found. |
void |
foundCharacter(char charFound)
Method called when a character has been found in contents, this character resulting from a character reference resolution. |
void |
foundCharacterData(byte[] buffer,
int offset,
int count)
Method called when character data content has been found. |
void |
foundComment(byte[] buffer,
int offset,
int count)
Method called when a comment has been found. |
void |
foundEndEmptyTag()
Method called when an empty-tag has been found. |
void |
foundEndOfInput(int count)
Method called when the end of the input was found, and tokenization is about to end. |
void |
foundEndTagName(byte[] buffer,
int offset,
int count)
Method called when an end-tag has been found. |
void |
foundStartTagName(byte[] buffer,
int offset,
int count)
Method called when a start-tag has been found. |
ContentHandler |
getContentHandler()
Return the current content cntHandler. |
protected int |
getTagCode(byte[] b,
int offset,
int count)
Method to compute the tag code identifying a tag name. |
void |
parse(byte[] input,
int offset,
int count)
Parse XML data from an array of bytes, offset and count. |
void |
parse(Stream input)
Parse an XML document from a Stream. |
void |
parse(Stream input,
byte[] buffer,
int start,
int end,
int pos)
Parse an XML document from an already buffered Stream. |
void |
parse(XmlReadable input)
Parse an XmlReadable |
AttributeList.Filter |
setAttributeListFilter(AttributeList.Filter filter)
Set an AttributeList.Filter to filter the attribute entered in the AttributeList |
void |
setContentHandler(ContentHandler cntHandler)
Allow an application to register a content event cntHandler. |
void |
setNewlineSignificant(boolean val)
Enable or disable coalescing white spaces, according to HTML rules. |
| Methods inherited from class superwaba.ext.xplat.xml.XmlTokenizer |
disableReferenceResolution,
foundDeclaration,
foundInvalidData,
foundProcessingInstruction,
foundReference,
foundStartOfInput,
getAbsoluteOffset,
isDataCDATA,
resolveCharacterReference,
setCdataContents,
setStrictlyXml,
tokenize,
tokenize,
tokenize,
tokenize |
| Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
| Field Detail |
protected CharacterConverter converter
protected int tagNameHashId
foundStartTagName or foundEndTagName| Constructor Detail |
public XmlReader()
| Method Detail |
public void setContentHandler(ContentHandler cntHandler)
If the application does not register a content cntHandler, all content events reported by the SAX parser will be silently ignored.
Applications may register a new or different cntHandler in the middle of a parse, and the SAX parser must begin using the new cntHandler immediately.
cntHandler - The content cntHandler.getContentHandler()public AttributeList.Filter setAttributeListFilter(AttributeList.Filter filter)
filter - AttributeList.Filter to set, or null if the current AttributeList filter must be removedpublic ContentHandler getContentHandler()
setContentHandler(superwaba.ext.xplat.xml.ContentHandler)
public final void parse(Stream input)
throws SyntaxException
The application can use this method to instruct the XML reader to begin parsing an XML document from reading a Stream.
Here is the general contract for all parse methods.
Applications may not invoke this method while a parse is in progress (they should create a new XMLReader instead for each nested XML document). Once a parse is complete, an application may reuse the same XMLReader object, possibly with a different input source.
During the parse, the XMLReader will provide information about the XML document through the registered event handlers.
This method is synchronous: it will not return until parsing has ended. If a client application wants to terminate parsing early, it should throw an exception.
input - The input source for the top-level XML document.setContentHandler(superwaba.ext.xplat.xml.ContentHandler)
public final void parse(Stream input,
byte[] buffer,
int start,
int end,
int pos)
throws SyntaxException
Versus the general method above, this method requires more arguments. It should be used when the HTML document is embedded within an HTTP stream.
See the general contract of parse(Stream).
input - stream to parsebuffer - buffer, already filled with bytes read from the input streamstart - starting position in the bufferend - ending position in the bufferpos - read position of the byte at offset 0 in the buffer
public final void parse(XmlReadable input)
throws SyntaxException
input - The input source for the top-level XML document.
public final void parse(byte[] input,
int offset,
int count)
throws SyntaxException
See the general contract of parse(Stream).
input - byte array to parseoffset - position of the first byte in the arraycount - number of bytes to parsepublic void setNewlineSignificant(boolean val)
White-spaces are any character less or equal to the ascii space (0x20)
This method allows to process the contents of pre-formatted lines, such as the contents of the <PRE> tag. When the parse starts, newlines are not significant. Hence, setNewLineSignificant must be called after the parse started. For example, to make all newlines significant:
class MyXmlReader extends XmlReader { public void foundStartOfInput(byte input[], int offset, int count) {
setNewLineSignificant(true); } }
Note: this is a "stacked" call.
setNewlineSignificant(true); // newlines are significant - stack is 1 setNewlineSignificant(true); // newlines are significant - stack is 2 setNewlineSignificant(false); // newlines are still significant - stack is 1 setNewlineSignificant(false); // newlines are no more significant again - stack is 0
val - true if newline characters must be significant, false if they must be collapsed according to HTML
rules.
protected int getTagCode(byte[] b,
int offset,
int count)
This is the value which is passed to ContentHandler's for reporting a tag name. Derived class may override it.
b - byte array containing the bytes to be hashedoffset - position of the first byte in the arraycount - number of bytes to be hashed
public void foundStartTagName(byte[] buffer,
int offset,
int count)
Derived class may override this method.
input - byte array containing the name of the tag that startedoffset - position of the first character of the tag name in the arraycount - number of bytes the tag name is made of
public void foundEndTagName(byte[] buffer,
int offset,
int count)
Derived class may override this method.
input - byte array containing the name of the tag that endedoffset - position of the first character of the tag name in the arraycount - number of bytes the tag name is made ofpublic final void foundEndEmptyTag()
This method is called just after all events related to the starting tag have been reported. The implied tagName is the one of the starting tag (e.g.: the most recently reported start-tag.)
Derived class may override this method.
Example:generates: - foundStartTagName("FOO"); - foundAttributeName("A"); - foundAttributeValue("B"); - foundEndEmptyTag();
public final void foundCharacterData(byte[] buffer,
int offset,
int count)
Derived class may override this method.
input - byte array containing the character data that was foundoffset - position of the first character data in the arraycount - number of bytes the character data content is made ofpublic final void foundCharacter(char charFound)
Derived class may override this method.
charFound - resolved character - if the character is invalid,
this value is set to '\uffff', which is not a Unicode character.XmlTokenizer.foundReference(byte[],int,int)
public final void foundAttributeName(byte[] buffer,
int offset,
int count)
Derived class may override this method.
input - byte array containing the attribute nameoffset - position of the first character of the attribute name
in the arraycount - number of bytes the attribute name is made of
public final void foundAttributeValue(byte[] buffer,
int offset,
int count,
byte dlm)
Derived class may override this method.
input - byte array containing the attribute valueoffset - position of the first character of the attribute value
in the arraycount - number of bytes the attribute value is made ofdlm - delimiter that started the attribute value (' or ").
'\0' if none
public final void foundComment(byte[] buffer,
int offset,
int count)
Derived class may override this method.
input - byte array containing the comment (without the
<!-- and
--> delimiters)offset - position of the first character of the comment
in the arraycount - number of bytes the comment is made ofpublic final void foundEndOfInput(int count)
Derived class may override this method.
count - count of bytes parsed
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||