tPigAggregate
Adds one or more additional columns to the output of the grouped data to create
data to be used by Pig.
The tPigAggregate groups the original data by column and
adds one or more additional columns to the output of preceding grouped data.
tPigAggregate Standard properties
These properties are used to configure tPigAggregate running in the Standard Job framework.
The Standard
tPigAggregate component belongs to the Big Data and the Processing families.
The component in this framework is available when you are using one of the Talend solutions with Big Data.
Basic settings
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to Click Edit schema to make changes to the schema.
|
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
Group by |
Click the plus button to add one or more columns to set tuples in |
|
Operations |
Click the plus button to add one or more columns to generate one
Additional Output Column: Select a
Function: Functions for operation
Input Column: Select a column in |
Advanced settings
|
Increase parallelism |
Select this check box to set the number of reduce tasks for the |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
Global Variables
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
|
Usage rule |
This component is commonly used as intermediate step together with |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|
Limitation |
Knowledge of Pig scripts is required. |
Related scenario
For a tPigAggregate related scenario, Aggregating values and sorting data of tAggregateRow.