Package tdi :: Package markup :: Package soup :: Module parser
[frames] | no frames]

Module parser

source code

This module provides a very lenient HTML/XML lexer. The SoupLexer class is initialized with a listener object, which receives all low level events (like starttag, endtag, text etc). Listeners must implement the ListenerInterface.

On top of the lexer there's SoupParser class, which actually implements the ListenerInterface itself (the parser listens to the lexer). The parser adds HTML semantics to the lexed data and passes the events to a building listener (BuildingListenerInterface). In addition to the events sent by the lexer the SoupParser class generates endtag events (with empty data arguments) for implicitly closed elements. Furthermore it knows about CDATA elements like <script> or <style> and modifies the lexer state accordingly.

The actual semantics are provided by a DTD query class (implementing DTDInterface.)


Author: André Malo

Classes
  SoupLexer
(X)HTML Tagsoup Lexer
  DEFAULT_LEXER
(X)HTML Tagsoup Lexer
  SoupParser
The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there.
  DEFAULT_PARSER
The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there.
Variables
  __package__ = 'tdi.markup.soup'
Variables Details

__package__

Value:
'tdi.markup.soup'