August 17, 2023

tAggregateRow – Docs for ESB 5.x

tAggregateRow

tAggregateRow.png

tAggregateRow properties

Component family

Processing

 

Function

tAggregateRow receives a flow and
aggregates it based on one or more columns. For each output line,
are provided the aggregation key and the relevant result of set
operations (min, max, sum…).

Purpose

Helps to provide a set of metrics based on values or calculations.

Basic settings

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to
be processed and passed on to the next component. The schema is
either Built-in or stored remotely
in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

This component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see Talend Studio
User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns
of a table and is recommended to be used for this purpose only; it is not recommended
for the use of creating tables.

 

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see Talend Studio User Guide.

 

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and Job flowcharts. Related topic: see
Talend Studio User
Guide
.

 

Group by

Define the aggregation sets, the values of which will be used for
calculations.

 

 

Output Column: Select the column
label in the list offered based on the schema structure you defined.
You can add as many output columns as you wish to make more precise
aggregations.

Ex: Select Country to calculate an average of values for each
country of a list or select Country and Region if you want to
compare one country’s regions with another country’ regions.

 

 

Input Column: Match the input
column label with your output columns, in case the output label of
the aggregation set needs to be different.

 

Operations

Select the type of operation along with the value to use for the
calculation and the output field.

 

 

Output Column: Select the
destination field in the list.

 

 

Function: Select the operator
among: count, min, max, avg, sum, first, last, list, list(objects),
count(distinct), standard deviation.

 

 

Input column: Select the input
column from which the values are taken to be aggregated.

 

 

Ignore null values: Select the
check boxes corresponding to the names of the columns for which you
want the NULL value to be ignored.

Advanced settings

Delimiter(only for list operation)

Enter the delimiter you want to use to separate the different
operations.

 

Use financial precision, this is the max precision for
“sum” and “avg” operations, checked option heaps more memory and
slower than unchecked.

Select this check box to use a financial precision. This is a max
precision but consumes more memory and slows the processing.

Warning

We advise you to use the BigDecimal type for the
output in order to obtain precise results.

 

Check type overflow (slower)

Checks the type of data to ensure that the Job doesn’t
crash.

 

Check ULP (Unit in the Last Place), ensure that a value
will be incremented or decremented correctly, only float and
double types. (slower)

Select this check box to ensure the most precise results possible
for the Float and Double types.

 

tStatCatcher Statistics

Check this box to collect the log data at component level. Note that this check box is not available in
the Map/Reduce version of the component.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component handles flow of data therefore it requires input
and output, hence is defined as an intermediary step. Usually the
use of tAggregateRow is combined
with the tSortRow component.

Usage in Map/Reduce Jobs

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Map/Reduce component. In a Talend Map/Reduce Job, this
component is used as an intermediate step and other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

For further information about a Talend Map/Reduce Job, see the sections
describing how to create, convert and configure a Talend Map/Reduce Job of the
Talend Big Data Getting Started Guide.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Usage in Storm Jobs

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Storm component. In a Talend Storm Job, this component is used as
an intermediate step and other components used along with it must be Storm components, too.
They generate native Storm code that can be executed directly in a Storm system.

The Storm version does not support the use of the global variables.

You need to use the Storm Configuration tab in the
Run view to define the connection to a given Storm
system for the whole Job.

This connection is effective on a per-Job basis.

For further information about a Talend Storm Job, see the sections
describing how to create and configure a Talend Storm Job of the Talend Big Data Getting Started Guide.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario 1: Aggregating values
and sorting data

The following scenario describes a four-component Job. As input component, a CSV file
contains countries and notation values to be sorted by best average value. This
component is connected to a tAggregateRow operator, in
charge of the average calculation then to a tSortRow
component for the ascending sort. The output flow goes to the new csv file.

Use_Case_tAggregateRow1.png
  • From the File folder in the Palette, drop a tFileInputDelimited component to the design workspace.

  • Click the label and rename it as Countries. Or rename it
    from the View tab panel

  • In the Basic settings tab panel of this
    component, define the filepath and the delimitation criteria. Or select the
    metadata file in the repository if it exists.

  • Click Edit schema… and set the columns:
    Countries and Points to match the
    file structure. If your file description is stored in the Metadata area
    of the Repository, the schema is automatically uploaded when you click
    Repository in Schema type field.

  • Then from the Processing folder in the
    Palette, drop a tAggregateRow component to the design workspace. Rename it as
    Calculation.

  • Connect Countries to Calculation via
    a right-click and select Row > Main.

  • Double-click Calculation (tAggregateRow component) to set the properties. Click Edit schema and define the output schema. You can add
    as many columns as you need to hold the set operations results in the output
    flow.

Use_Case_tAggregateRow3.png
  • In this example, we’ll calculate the average notation value per country and we
    will display the max and the min notation for each country, given that each
    country holds several notations. Click OK when
    the schema is complete.

  • To carry out the various set operations, back in the Basic settings panel, define the sets holding the operations in
    the Group By area. In this example, select
    Country as group by column. Note that the
    output column needs to be defined a key field in the schema. The first column
    mentioned as output column in the Group By
    table is the main set of calculation. All other output sets will be secondary by
    order of display.

  • Select the input column which the values will be taken from.

  • Then fill in the various operations to be carried out. The functions are
    average, min,
    max for this use case. Select the input columns, where
    the values are taken from and select the check boxes in the Ignore null values list as needed.

Use_Case_tAggregateRow4.png
  • Drop a tSortRow component from the Palette onto the design workspace. For more
    information regarding this component, see tSortRow properties.

  • Connect the tAggregateRow to this new
    component using a row main link.

  • On the Component view of the tSortRow component, define the column the sorting is
    based on, the sorting type and order.

Use_Case_tAggregateRow5.png
  • In this case, the column to be sorted by is Country, the
    sort type is alphabetical and the order is ascending.

  • Drop a tFileOutputDelimited from the
    Palette to the design workspace and define
    it to set the output flow.

  • Connect the tSortRow component to this output
    component.

  • In the Component view, enter the output
    filepath. Edit the schema if need be. In this case the delimited file is of csv
    type. And select the Include Header check box
    to reuse the schema column labels in your output flow.

  • Press F6 to execute the Job. The csv file
    thus created contains the aggregating result.

Use_Case_tAggregateRow6.png

Scenario 2: Aggregating values based on dynamic schema

In this scenario, a four-component Java Job uses a tAggregateRow component to read data from a CSV file, group the data,
and then send the grouping result to the Run console
and an output file. A dynamic schema is used in this Job. For more information about
dynamic schema, see Talend Studio User Guide.

Use_Case_tAggregateRow7.png
  • Drop the components required for this use case: tFileInputDelimited, tAggregateRow, tLogRow and
    tFileOutputDelimited from the Palette to the design workspace.

  • Connect these components together using Row
    > Main links.

  • Double-click the tFileInputDelimited
    component to display its Basic settings view.

Use_Case_tAggregateRow8.png

Warning

The dynamic schema feature is only supported in Built-In mode and requires the input file to have a
header row.

  • Select Built-In from the Property Type list.

  • Click the […] button next to the File Name field to browse to your input file. In this
    use case, we use a simple CSV file that has only three columns, as shown below:

Use_Case_tAggregateRow9.png
  • Specify the header row in Header field. In
    this use case the first row is the header row.

  • Select Built-In from the Schema list, and click Edit
    schema
    to set the input schema.

Use_Case_tAggregateRow10.png

Warning

The dynamic column must be defined in the last row of the
schema.

  • In the schema editor, add two columns and name them Task
    and Other respectively. Set the data type of the
    Other column to Dynamic to retrieve all the columns undefined in the schema.

  • Click OK to close the schema editor.

  • Double-click the tAggregateRow component to
    display the Basic settings view.

Use_Case_tAggregateRow11.png
  • Click Sync columns to reuse the input schema
    for the output row. If needed, click Edit
    schema
    and rename the columns in the output schema. In this use
    case, we simply keep the schema as it is.

  • Add a row in the Group by table by clicking
    the plus button, and select Other in both Output column and Input column
    position
    fields to group the data entries by the dynamic
    column.

Warning

Dynamic column aggregation can be carried out only for the grouping
operation.

  • Add a row in the Operations table by clicking
    the plus button, select Task in both Output column and Input column
    position
    fields, and select list in the
    Function field so that all the entries of
    the Task column are listed in the grouping result.

  • To view the output in the form of a table on the Run console, double-click the tLogRow component and select the Table option in the Basic
    settings
    view.

  • Double-click the tFileOutputDelimited
    component to display its Basic settings
    view.

Use_Case_tAggregateRow14.png
  • Click the […] button next to the File Name field to browse to the directory where you
    want to save the output file, and then enter a name for the file.

  • Select the Include Header check box to
    retrieve the column names as well as the grouped data.

  • Save your Job and press F6 to run it.

    As shown in the Job execution result, the data entries are grouped as per
    Team and Status.

Use_Case_tAggregateRow16.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x