05 July, 2013

Apache Nutch 2.2.1


Developer:

Website:

License / Price:

Platforms:

Databases:

Language:

Last Updated:

Category:
Apache Software Foundation | More scripts
nutch.apache.org
Apache License 

Windows / Linux / Mac OS / BSD / Solaris
N/A
Java
July 5th, 2013, 17:59 GMT [view history]
C: \ Search Engines

It builds on Lucene Java, adding new web-specifics, such as parsers for HTML, a crawler, a link-graph database and other document formats.

Nutch can run on a single machine, but works better in Hadoop clusters.

Plugins are available for expanding its usage spectrum.

What's New in This Release: [ read full changelog ]

· Ensure duplicate tags do not exist in microformat-reltag tag set.
· A better fall back value for date field.
· Get rid of the dreaded.
· Upgrade to Hadoop 1.2.0.
· Upgrade to Tika 1.3.


download button
Via: Apache Nutch 2.2.1

0 Comment: