superwaba.ext.xplat.html
Class HtmlReader

java.lang.Object
  |
  +--superwaba.ext.xplat.xml.XmlTokenizer
        |
        +--superwaba.ext.xplat.xml.XmlReader
              |
              +--superwaba.ext.xplat.html.HtmlReader

public class HtmlReader
extends XmlReader

HtmlReader extends XmlReader in order to:


Fields inherited from class superwaba.ext.xplat.xml.XmlReader
converter, tagNameHashId
 
Constructor Summary
HtmlReader()
          Constructor
 
Method Summary
 void foundEndTagName(byte[] buffer, int offset, int count)
          Method called when an end-tag has been found.
protected  void foundReference(byte[] input, int offset, int count)
          Method called when a reference been found in content.
 void foundStartTagName(byte[] buffer, int offset, int count)
          Method called when a start-tag has been found.
protected  int getTagCode(byte[] b, int offset, int count)
          Method to compute the tag code identifying a tag name.
 
Methods inherited from class superwaba.ext.xplat.xml.XmlReader
foundAttributeName, foundAttributeValue, foundCharacter, foundCharacterData, foundComment, foundEndEmptyTag, foundEndOfInput, getContentHandler, parse, parse, parse, parse, setAttributeListFilter, setContentHandler, setNewlineSignificant
 
Methods inherited from class superwaba.ext.xplat.xml.XmlTokenizer
disableReferenceResolution, foundDeclaration, foundInvalidData, foundProcessingInstruction, foundStartOfInput, getAbsoluteOffset, isDataCDATA, resolveCharacterReference, setCdataContents, setStrictlyXml, tokenize, tokenize, tokenize, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlReader

public HtmlReader()
Constructor
Method Detail

foundReference

protected void foundReference(byte[] input,
                              int offset,
                              int count)
Description copied from class: XmlTokenizer
Method called when a reference been found in content.

It can be either a named or numeric character reference, or an entity reference.  Given the several syntaxes of reference, no verification is made a priori on the validity of the "name" of the reference.

For conveniency, a static method: XmlTokenizer.resolveCharacterReference(byte[],int,int) allows to convert the character reference into its UCS-2 encoded value.

Note:  foundReference is called only if XmlTokenizer.disableReferenceResolution(boolean disable) has been called first, with disable set to true.  If not, then foundReference is never called, and XmlTokenizer.foundCharacter(char) is called instead.  This design permits to easily handle simple XML documents — only predefined named character entities, and numeric character entities — and documents which have user-defined internal/external entities.  This is explained below.

When working with a set of externally defined entities, issue disableReferenceResolution(true) to turn off automatic reference resolution. Then, your code in foundReference could make a quick check to see if the found reference is numeric.  If it is numeric — it starts with a # character — call resolveCharacterReference; if it is not a numeric reference, checks if the reference belongs to the known list of defined entities for the parsed document.  If it does, do the substitution; if not, call resolveCharacterReference, because it could be one of the XML Predefined Entities

By default, each character reference is naturally reported via XmlTokenizer.foundCharacter(char), which, again, supersedes the foundReference notification.

Derived class may override this method.

Overrides:
foundReference in class XmlTokenizer
Tags copied from class: XmlTokenizer
Parameters:
input - byte array containing the reference name
offset - position of the first character of the reference name in the array
count - number of bytes the reference name is made of
See Also:
XmlTokenizer.setStrictlyXml(boolean toSet)

getTagCode

protected int getTagCode(byte[] b,
                         int offset,
                         int count)
Description copied from class: XmlReader
Method to compute the tag code identifying a tag name.

This is the value which is passed to ContentHandler's for reporting a tag name.  Derived class may override it.

Overrides:
getTagCode in class XmlReader
Tags copied from class: XmlReader
Parameters:
b - byte array containing the bytes to be hashed
offset - position of the first byte in the array
count - number of bytes to be hashed
Returns:
the corresponding hash code

foundStartTagName

public final void foundStartTagName(byte[] buffer,
                                    int offset,
                                    int count)
Description copied from class: XmlTokenizer
Method called when a start-tag has been found.

Derived class may override this method.

Overrides:
foundStartTagName in class XmlReader
Tags copied from class: XmlTokenizer
Parameters:
input - byte array containing the name of the tag that started
offset - position of the first character of the tag name in the array
count - number of bytes the tag name is made of

foundEndTagName

public final void foundEndTagName(byte[] buffer,
                                  int offset,
                                  int count)
Description copied from class: XmlTokenizer
Method called when an end-tag has been found.

Derived class may override this method.

Overrides:
foundEndTagName in class XmlReader
Tags copied from class: XmlTokenizer
Parameters:
input - byte array containing the name of the tag that ended
offset - position of the first character of the tag name in the array
count - number of bytes the tag name is made of