|
This is a platform used in analyzing large data sets consisting of high-level languages for expressing data analysis programs.
It is coupled with infrastructure for evaluating programs.
The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Here are some key features of "Apache Hadoop Pig":
· Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
· Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.
· Extensibility. Users can create their own functions to do special-purpose processing.
Requirements:
· Java 1.6.x or higher
· Ant
· Cygwin
· Apache Hadoop 0.20.x or higher
What's New in This Release: [ read full changelog ]
· This release includes DateType datatype, RANK, CUBE and ROLLUP operators, Groovy udfs, custom reducer estimation, schema-based tuples and HCatalog DDL integration.
Via: Apache Hadoop Pig 0.11.0
0 Comment:
Post a Comment