tSurviveFields

Centralizes data from various and heterogeneous sources to create a master copy of
data for MDM.

tSurviveFields receives a flow
and merges it based on one or more columns. The aggregation key and the relevant result
of operations (such as min, max, sum etc) are provided for each output line.

tSurviveFields Standard properties

These properties are used to configure tSurviveFields running in the Standard Job framework.

The Standard
tSurviveFields component belongs to the Data Quality, the Talend MDM and the Processing families.

This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema and Edit schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.
	Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.
	Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: see Talend Studio User Guide.
Key	Define the merge sets, the values of which will be used for calculations. Output column: Select the column name from the list that reflects the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations. Input column: Match each input column name with your output columns, in case the output column of the aggregation set needs to be different. Warning: The columns in the Key table must NOT appear in the Operations table. If you want all the columns of the output schema to be filled in, they must appear either in the Key table or in the Operations table.
Operations	Output column: From the list, select the output column which will result from the selected merge operation. Function: Select the type of merge operation to be performed from the list. The list includes count, min, max, avg, sum, first, last, list, list(object), count(distinct), standard deviation, max length and best rank. Input column: From the list, select the input column from which the values are to be selected for the merge operation. Rank column: Only available with the best rank function. From the list, select the column you want to use as a rank value for the merge operation. Then the input column will be replaced with the value which has the greater rank. Ignore null values: Select the check boxes which correspond to the names of the columns for which you want the NULL value to be ignored.

Advanced settings

Delimiter (only for list operation)	Between double quotation marks, enter the delimiter you want to use for the list operation.
Use financial precision, this is the max precision for “sum” and “avg” operations, checked option heaps more memory and slower than unchecked.	This check box, which enables financial precision, is selected by default. Clear the check box if you want to use less memory and thus optimize performance.
Check type overflow (slower)	Checks the data type to ensure that the job does not crash. If you select this check box, the system will be slower.
Check ULP (Unit in the Last Place), ensure that a value will be incremented or decremented correctly, only for float and double types. (slower)	Select this check box to launch ULP verification. If you select this check box, the system will be slower.
tStatCatcher Statistics	Select this check box to collect log data at the Job and the component levels.

Global Variables

Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	This component requires an input component and an output component.

Merging the content of several rows using different columns as rank
values

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario describes a three-component Job that uses the tSurviveFields component to merge, based on different rank values, the content
of data rows in different columns and then writes the result in an output file.

In this scenario, we have already stored the input schemas of the input file in the
Repository. For further information about storing schema metadata in the Repository, see

Talend Studio User
Guide.

The input file contains four columns: grp,
gender, firstname and count.
The data in the input file has problems such as duplication, first names spelled differently
or wrongly and different information for the same customer.

Setting up the Job

In the Repository tree view, expand Metadata and the FileExcel
node where you have stored the input schemas and then drop it onto the design
workspace.

The tFileInputExcel component which contains your
schema displays on the workspace.
Drop a tSurviveFields and a tFileOutputExcel component from the Palette onto the design workspace.
Link the components together using Row > Main connections.

Configuring the components

Double-click tFileInputExcel to display its
Basic settings view.

All tFileInputExcel property fields are
automatically filled in. If you did not define your input schemas locally in the
Repository, fill in the details manually after selecting Built-in in the Schema and Property Type fields.
Double-click tSurviveFields to display its
Basic settings view and define the component
properties.
Click Sync columns to retrieve the schema from the
preceding component. You can click the […] next to
Edit schema to view the schema.
In the Key area, click the [+] button to add a new line, and click the field and select the name of
the column you want to use to merge the data from the list.

You can select multiple columns as an aggregation set if you want to merge data
based on multiple criteria. For this scenario, we want to use the
grp column to merge the data.
In the Operations area, click the [+] button to add new rows. Here you can define the output
columns that will hold the results of the merge operation. In this scenario, we want to
merge the data in the firstname, gender and
count columns.
Click in the first field of the Output column and
select the first output column that will hold the merge results.
- Click in the first field of the Function column
  and select the merge operation you want to perform.
- Click in the first field of the Input Column
  list and select the column from which the input values are to be taken.
- Click in the first field of the Rank column and
  select the column that will be used as a basis for the merge operation.
- Repeat the same process to define the parameters for the merge operation for all
  the columns you want to write in the output file.
  
  Here we want to read data from the firstname and
  gender input columns and write only the values with the
  maximum rank (row count) in firstname and
  gender output columns. We also want to read data from the
  count input column and write its maximum value in a
  count output column.
Double-click the tFileOutputExcel component to open
its Basic settings view.
Specify the path to the target file, select the Include
header check box, and leave the other settings as they are.

Executing the Job

Save your Job and press F6 to execute it.

A progress bar displays to show the percentage of the merge operation completed.
When the percentage progress bar reaches 100%, the specified data is regrouped and
written in the defined output columns.

The figure below illustrates a sample of the output data after the merge
operation.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 7.x

0 Comments

Inline Feedbacks

View all comments

tSurviveFields – Docs for ESB 7.x

tSurviveFields

tSurviveFields Standard properties

Basic settings

Advanced settings

Global Variables

Usage

Merging the content of several rows using different columns as rank
values

Setting up the Job

Configuring the components

Executing the Job

My Website Links

Tags

tSurviveFields

tSurviveFields Standard properties

Basic settings

Advanced settings

Global Variables

Usage

Merging the content of several rows using different columns as rank values

Setting up the Job

Configuring the components

Executing the Job

Merging the content of several rows using different columns as rank
values