Module parser
source code
This module provides a very lenient HTML/XML lexer. The SoupLexer class is
initialized with a listener object, which receives all low level events
(like starttag, endtag, text etc). Listeners must implement the
ListenerInterface.
On top of the lexer there's SoupParser class, which actually implements the
ListenerInterface itself (the parser listens to the lexer). The parser adds
HTML semantics to the lexed data and passes the events to a building listener
(BuildingListenerInterface). In addition to the events sent by the lexer the
SoupParser class generates endtag events (with empty data arguments) for
implicitly closed elements. Furthermore it knows about CDATA elements like
<script> or <style> and modifies the lexer state accordingly.
The actual semantics are provided by a DTD query class (implementing
DTDInterface.)
|
SoupLexer
(X)HTML Tagsoup Lexer
|
|
DEFAULT_LEXER
(X)HTML Tagsoup Lexer
|
|
SoupParser
The parser is actually a tagsoup parser by design in order to process
most of the "HTML" that can be found out there.
|
|
DEFAULT_PARSER
The parser is actually a tagsoup parser by design in order to process
most of the "HTML" that can be found out there.
|