| |||||
It follows the original WHATWG official HTML5 specification.
The parser is designed to handle all flavours of HTML and parses invalid documents using well-defined error handling rules compatible with the behaviour of major desktop web browsers.
The output is palced inside a tree structure.
It supports output to ElementTree, DOM and lxml tree formats as well as a simple custom format.
HTML5Lib is packaged with distutils.
HTML5Lib is also available in:
Ruby - download HTML5Lib for Ruby here.
Python - download HTML5Lib for Python here.
PHP - download HTML5Lib for PHP here.
What's New in This Release: [ read full changelog ]
· Parses valid and invalid HTML documents to a tree
· Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup (deprecated) and custom simpletree output formats
· DOM to SAX converter
· Reports parse errors
· Character encoding detection
· Filtering and serializing of trees
· HTML+CSS sanitizer
· Many unit tests

Via: HTML5Lib (Python) 0.95 / 1.0b1






0 Comment:
Post a Comment