August 17, 2023

tPigAggregate – Docs for ESB 5.x




This component will be available in the Palette of the studio on the condition that you have subscribed to
one of the Talend solutions with Big

tPigAggregate Properties

Component family

Big Data / Hadoop



This component allows you to group the original data by column and
add one or more additional columns to the output of preceding
grouped data.


The tPigAggregate component adds
one or more additional columns to the output of the grouped data to
create data to be used by Pig.

Basic settings

Schema and Edit

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository



Built-in: The schema will be
created and stored locally for this component only. Related topic:
Talend Studio User Guide.



Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and Job designs. Related topic: see
Talend Studio User Guide.


Group by

Click the plus button to add one or more columns to set tuples in
the source data as group condition.



Click the plus button to add one or more columns to generate one
or more additional output columns based on conditions:

Additional Output Column: Select a
column in the original data as output column.

Function: Functions for operation
on input data.

Input Column: Select a column in
the original data as input column.

Advanced settings

Increase parallelism

Select this check box to set the number of reduce tasks for the
MapReduce Jobs.


tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.


This component is commonly used as intermediate step together with
input component and output component.


The Hadoop distribution must be properly installed, so as to guarantee the interaction
with Talend Studio. The following list presents MapR related information for

  • Ensure that you have installed the MapR client in the machine where the Studio is,
    and added the MapR client library to the PATH variable of that machine. According
    to MapR’s documentation, the library or libraries of a MapR client corresponding to
    each OS version can be found under MAPR_INSTALL
    . For example, the library for
    Windows is lib
    in the MapR
    client jar file. For further information, see the following link from MapR:

    Without adding the specified library or libraries, you may encounter the following
    error: no MapRClient in java.library.path.

  • Set the -Djava.library.path argument, for example, in the Job Run VM arguments area
    of the Run/Debug view in the [Preferences] dialog box. This argument provides to the Studio the
    path to the native library of that MapR client. This allows the subscription-based
    users to make full use of the Data viewer to view
    locally in the Studio the data stored in MapR. For further information about how to
    set this argument, see the section describing how to view data of Talend Big Data Getting Started Guide.

For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.


The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User

For more information on the log4j logging levels, see the Apache documentation at


Knowledge of Pig scripts is required.


Related scenario

For a tPigAggregate related scenario, Scenario 1: Aggregating values
and sorting data
of tAggregateRow.

Document get from Talend
Thank you for watching.
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x