|
It builds on Lucene Java, adding new web-specifics, such as parsers for HTML, a crawler, a link-graph database and other document formats.
Nutch can run on a single machine, but works better in Hadoop clusters.
Plugins are available for expanding its usage spectrum.
What's New in This Release: [ read full changelog ]
· Ensure duplicate tags do not exist in microformat-reltag tag set.
· A better fall back value for date field.
· Get rid of the dreaded.
· Upgrade to Hadoop 1.2.0.
· Upgrade to Tika 1.3.
Via: Apache Nutch 2.2.1
0 Comment:
Post a Comment