| |
- HTMLParser.HTMLParser(markupbase.ParserBase)
-
- AdvancedHTMLFormatter
-
- AdvancedHTMLMiniFormatter
-
- AdvancedHTMLSlimTagMiniFormatter
- AdvancedHTMLSlimTagFormatter
class AdvancedHTMLFormatter(HTMLParser.HTMLParser) |
|
A formatter for HTML. Note this does not understand CSS, so if you are enabling preformatted text based on css rules, it will not work.
It does, however, understand "pre", "code" and "script" tags and will not try to format their contents. |
|
- Method resolution order:
- AdvancedHTMLFormatter
- HTMLParser.HTMLParser
- markupbase.ParserBase
Methods defined here:
- __init__(self, indent=' ', encoding='utf-8')
- Create a pretty formatter.
@param indent <str/int>, Default ' ' [4 spaces] - Either a space/tab/newline that represents one level of indent, or an integer to use that number of spaces
@param encoding <str/None>, Default 'utf-8', - Use this encoding for the document. None to not mess with encoding
- feed(self, contents)
- feed - Load contents
@param contents - HTML contents
- getHTML(self)
- getHTML - Get the full HTML as contained within this tree, converted to valid XHTML
@returns - String
- getRoot(self)
- getRoot - returns the root Tag
@return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG
- getRootNodes(self)
- getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)
Use this method to get objects, for example, in an AJAX request where <html> may not be your root.
Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically
handles that, and returns all root nodes.
@return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.
- handle_charref(self, charRef)
- Internal for parsing
- handle_comment(self, comment)
- Internal for parsing
- handle_data(self, data)
- handle_data - Internal for parsing
- handle_decl(self, decl)
- Internal for parsing
- handle_endtag(self, tagName)
- handle_endtag - Internal for parsing
- handle_entityref(self, entity)
- Internal for parsing
- handle_startendtag(self, tagName, attributeList)
- handle_startendtag - Internal for parsing
- handle_starttag(self, tagName, attributeList, isSelfClosing=False)
- handle_starttag - Internal for parsing
- parseFile(self, filename)
- parseFile - Parses a file and creates the DOM tree and indexes
@param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
- parseStr(self, html)
- parseStr - Parses a string and creates the DOM tree and indexes.
@param html <str> - valid HTML
- setRoot(self, root)
- setRoot - Sets the root node, and reprocesses the indexes
@param root - AdvancedTag to be new root
- unknown_decl(self, decl)
- Internal for parsing
Methods inherited from HTMLParser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- error(self, message)
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
Data and other attributes inherited from HTMLParser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
- entitydefs = None
Methods inherited from markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
|
class AdvancedHTMLMiniFormatter(AdvancedHTMLFormatter) |
|
AdvancedHTMLMiniFormatter - A formatter that will reformat a document, keeping only functional
whitespace and removing any and all indentation and nesting spaces. |
|
- Method resolution order:
- AdvancedHTMLMiniFormatter
- AdvancedHTMLFormatter
- HTMLParser.HTMLParser
- markupbase.ParserBase
Methods defined here:
- __init__(self, encoding='utf-8')
- Create a mini formatter.
@param encoding <str/None>, Default 'utf-8', - Use this encoding for the document. None to not mess with encoding
Methods inherited from AdvancedHTMLFormatter:
- feed(self, contents)
- feed - Load contents
@param contents - HTML contents
- getHTML(self)
- getHTML - Get the full HTML as contained within this tree, converted to valid XHTML
@returns - String
- getRoot(self)
- getRoot - returns the root Tag
@return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG
- getRootNodes(self)
- getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)
Use this method to get objects, for example, in an AJAX request where <html> may not be your root.
Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically
handles that, and returns all root nodes.
@return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.
- handle_charref(self, charRef)
- Internal for parsing
- handle_comment(self, comment)
- Internal for parsing
- handle_data(self, data)
- handle_data - Internal for parsing
- handle_decl(self, decl)
- Internal for parsing
- handle_endtag(self, tagName)
- handle_endtag - Internal for parsing
- handle_entityref(self, entity)
- Internal for parsing
- handle_startendtag(self, tagName, attributeList)
- handle_startendtag - Internal for parsing
- handle_starttag(self, tagName, attributeList, isSelfClosing=False)
- handle_starttag - Internal for parsing
- parseFile(self, filename)
- parseFile - Parses a file and creates the DOM tree and indexes
@param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
- parseStr(self, html)
- parseStr - Parses a string and creates the DOM tree and indexes.
@param html <str> - valid HTML
- setRoot(self, root)
- setRoot - Sets the root node, and reprocesses the indexes
@param root - AdvancedTag to be new root
- unknown_decl(self, decl)
- Internal for parsing
Methods inherited from HTMLParser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- error(self, message)
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
Data and other attributes inherited from HTMLParser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
- entitydefs = None
Methods inherited from markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
|
class AdvancedHTMLSlimTagFormatter(AdvancedHTMLFormatter) |
|
AdvancedHTMLSlimTagFormatter - Formats HTML with slim start tags,
which may break some xhtml-compatible parsers.
For example <span id="abc" > will become <span id="abc">.
Remainder will be pretty-printed. For mini-printing, @see AdvancedHTMLSlimTagMiniFormatter
If slimSelfClosing=True on __init__, <br /> will become <br/> as well |
|
- Method resolution order:
- AdvancedHTMLSlimTagFormatter
- AdvancedHTMLFormatter
- HTMLParser.HTMLParser
- markupbase.ParserBase
Methods defined here:
- __init__(self, indent=' ', encoding='utf-8', slimSelfClosing=False)
- __init__ - Construct an AdvancedHTMLSlimTagFormatter
@see AdvancedHTMLFormatter
@param slimSelfClosing <bool> Default False - If True, will use slim self-closing tags,
e.x. <br /> becomes <br/>
- handle_starttag = handle_starttag_slim(self, tagName, attributeList, isSelfClosing=False)
- handle_starttag_slim - Handles parsing a start tag, but with "slim" start tags
@see AdvancedHTMLFormatter.handle_starttag
Methods inherited from AdvancedHTMLFormatter:
- feed(self, contents)
- feed - Load contents
@param contents - HTML contents
- getHTML(self)
- getHTML - Get the full HTML as contained within this tree, converted to valid XHTML
@returns - String
- getRoot(self)
- getRoot - returns the root Tag
@return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG
- getRootNodes(self)
- getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)
Use this method to get objects, for example, in an AJAX request where <html> may not be your root.
Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically
handles that, and returns all root nodes.
@return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.
- handle_charref(self, charRef)
- Internal for parsing
- handle_comment(self, comment)
- Internal for parsing
- handle_data(self, data)
- handle_data - Internal for parsing
- handle_decl(self, decl)
- Internal for parsing
- handle_endtag(self, tagName)
- handle_endtag - Internal for parsing
- handle_entityref(self, entity)
- Internal for parsing
- handle_startendtag(self, tagName, attributeList)
- handle_startendtag - Internal for parsing
- parseFile(self, filename)
- parseFile - Parses a file and creates the DOM tree and indexes
@param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
- parseStr(self, html)
- parseStr - Parses a string and creates the DOM tree and indexes.
@param html <str> - valid HTML
- setRoot(self, root)
- setRoot - Sets the root node, and reprocesses the indexes
@param root - AdvancedTag to be new root
- unknown_decl(self, decl)
- Internal for parsing
Methods inherited from HTMLParser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- error(self, message)
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
Data and other attributes inherited from HTMLParser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
- entitydefs = None
Methods inherited from markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
|
class AdvancedHTMLSlimTagMiniFormatter(AdvancedHTMLMiniFormatter) |
|
AdvancedHTMLSlimTagMiniFormatter - A "mini" formatter that
removes all non-functional whitespace (including all indentations)
Also uses "slim" start tags, @see AdvancedHTMLSlimTagFormatter for more info |
|
- Method resolution order:
- AdvancedHTMLSlimTagMiniFormatter
- AdvancedHTMLMiniFormatter
- AdvancedHTMLFormatter
- HTMLParser.HTMLParser
- markupbase.ParserBase
Methods defined here:
- __init__(self, encoding='utf-8', slimSelfClosing=False)
- __init__ - Create an AdvancedHTMLSlimTagMiniFormatter
@see AdvancedHTMLMiniFormatter
@param slimSelfClosing <bool> Default False - If True, will use slim self-closing tags,
e.x. <br /> becomes <br/>
- handle_starttag = handle_starttag_slim(self, tagName, attributeList, isSelfClosing=False)
- handle_starttag_slim - Handles parsing a start tag, but with "slim" start tags
@see AdvancedHTMLFormatter.handle_starttag
Methods inherited from AdvancedHTMLFormatter:
- feed(self, contents)
- feed - Load contents
@param contents - HTML contents
- getHTML(self)
- getHTML - Get the full HTML as contained within this tree, converted to valid XHTML
@returns - String
- getRoot(self)
- getRoot - returns the root Tag
@return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG
- getRootNodes(self)
- getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)
Use this method to get objects, for example, in an AJAX request where <html> may not be your root.
Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically
handles that, and returns all root nodes.
@return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.
- handle_charref(self, charRef)
- Internal for parsing
- handle_comment(self, comment)
- Internal for parsing
- handle_data(self, data)
- handle_data - Internal for parsing
- handle_decl(self, decl)
- Internal for parsing
- handle_endtag(self, tagName)
- handle_endtag - Internal for parsing
- handle_entityref(self, entity)
- Internal for parsing
- handle_startendtag(self, tagName, attributeList)
- handle_startendtag - Internal for parsing
- parseFile(self, filename)
- parseFile - Parses a file and creates the DOM tree and indexes
@param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
- parseStr(self, html)
- parseStr - Parses a string and creates the DOM tree and indexes.
@param html <str> - valid HTML
- setRoot(self, root)
- setRoot - Sets the root node, and reprocesses the indexes
@param root - AdvancedTag to be new root
- unknown_decl(self, decl)
- Internal for parsing
Methods inherited from HTMLParser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- error(self, message)
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
Data and other attributes inherited from HTMLParser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
- entitydefs = None
Methods inherited from markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
| |