
Warning
This component will be available in the Palette of the studio on the condition that you have
subscribed to the relevant edition of one of the Talend solutions
with Big Data.
Component family |
FileScale |
Note that this component is deprecated. |
Function |
tFSAggregate performs an This component has high speed capabilities for aggregating large |
|
Purpose |
tFSAggregate helps setting |
|
Basic settings |
Schema type and Edit Schema |
A schema is a row description, it defines the number of fields to Click Edit schema to make changes to the schema. If the
|
|
|
Repository: You have already |
|
|
Built-in: You create and store |
|
Property type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
Input File Name |
Name of the file holding the data you want to collect. |
|
Output File Name |
Name of the file where you want to write the collected |
|
Record separator (char) |
Character, string or regular expression to separate records |
|
Field separator (char) |
Character, string or regular expression to separate fields in a |
|
Header |
Number of records to be skipped in the beginning of the |
|
Footer |
Number of records to be skipped at the end of the file. |
|
Group by |
Column: List of the columns of
Key Attribute: Select the check box |
|
Operations |
Additional Output Column: Enter |
|
|
Function: Select the type of the
count: calculates the number of avg: calculates the average, sum: calculates the sum. |
|
|
Input Column: Select the input |
Advanced settings |
Generate FSLang File |
Select this check box to generate the FSLang file corresponding to |
|
Assign FileScale Path |
Select this check box and then click the three-dot button next to |
|
Specify Number of Process Child |
Select this check box and enter the number of child processes to |
|
Sort results |
Select this check box to sort the results. |
|
Custom FileScale Parameter (separated |
Enter the parameters for any specific operation you want to add to |
|
tStatCatcher Statistics |
Select this check box to gather the job processing metadata at a |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component handles files therefore it does not require input |
|
Limitation |
Limitation is imposed by the limits of physical memory and Central |
Warning
Make sure that you have unzipped and saved locally the FileScale
executable file delivered by Talend.
You must define the path of this executable file in the Advanced settings view of tFSAggregate.
This scenario describes a Job that uses the tFSAggregate
component to aggregate, in high speed, very big customers’ data according to
the States in which they are based, calculating the number of customers that have been
aggregated for each State and the average of their incomes.
In this scenario, we have already stored the input schemas of the large input file in
the Repository. For more information about storing schema metadata in the Repository,
see Talend Studio
User Guide.
The input file contains nine columns: id,
CustomerName, CustomerAddress,
idState, id2,
RegTime, RegisterTime,
Sum1 and Sum2.

-
In the Repository tree view, expand Metadata and the file node where you have stored the
input schemas and drop the relevant metadata onto the design workspace.The [Components] dialog box displays.

-
Select tFSAggregate from the list and click
OK to close the dialog box.The tFSAggregate component displays in the
design workspace. -
Double-click tFSAggregate to display its
Basic settings view.

All tFSAggregate property fields are automatically
filled in. If you did not define your input schemas locally in the Repository, fill in
the details manually after selecting Built-in in the
Schema Type and Property
Type fields.
-
In the Output File Name field, browse to the
output file you want to write the aggregated data in. -
In the Group by table, select the check
box(es) next to the column name(s) you want to use to regroup the data. You can
select multiple columns as aggregation set if you want to regroup data based on
multiple criteria. For this scenario, we want to use the
idState column to regroup the data.

-
In the Operations table, click the plus
button to add two columns that will hold the results of the aggregation
operation.

-
In the first line of the Additional Output
column list, enter a name for the first additional output column,
count in this scenario. -
Click in the first line of the Function list
and select the aggregation operation you want to perform, count in this scenario. -
Click in the first line of the Input Column
list and select the column from which the input values are to be taken,
id in this scenario.
Thus, the column count will be added to the output file and will
contain the number of the id of the customers regrouped by
State.
-
In the second line of the Additional Output
column list, enter a name for the second additional output
column, avg in this scenario. -
Click in the second line of the Function list
and select the aggregation operation you want to perform, avg in this scenario. -
Click in the second line of the Input Column
list and select the column from which the input values are to be taken,
Sum1 in this scenario.
Thus, the column avg will be added to the output file and will
contain the average of the Sum1 of the customers.
-
Click the Advanced settings tab to display
the advanced settings view and then select the Assign
FileScale Path check box to display the FileScale Path field and then browse to the executable file
delivered by Talend. -
Save your Job and press F6 to execute it.

A progress bar displays below the tFSAggregate
component in the design workspace to show the completed percentage of the aggregation
operation. This progress bar will make it evident how the huge input data is aggregated
at a very high speed.
When the percentage progress bar reaches 100%, the specified data is regrouped and
written in the two new defined output columns.