August 17, 2023

tFSUnique – Docs for ESB 5.x

tFSUnique

tFSUniq_icon32.png

Warning

This component will be available in the Palette of the
studio on the condition that you have subscribed to the relevant edition of one of the
Talend solutions with Big Data.

tFSUnique Properties

Component family

FileScale

Note that this component is deprecated.

Function

tFSUnique can make the file
records unique through recuperating from the input file one
occurrence of each record. This component can process large scale
files at high speed for having only unique records in the output
file. To optimize performance, the component usually sorts data
before processing it.

Purpose

tFSUnique helps having the output
file without any duplicate records.

Basic settings

Schema type and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

 

Repository: You have already
created the schema and stored it in the Repository. You can reuse it
in various projects and Job flowcharts. Related topic: see Talend Studio User
Guide
.

 

 

Built-in: You create and store
the schema locally for this component only. Related topic: see
Talend Studio User
Guide
.

 

Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-in: No property data stored
centrally.

 

 

Repository: Select the repository
file where Properties are stored. The fields that follow are
pre-filled in using the fetched data.

 

Input File Name

Name of the file you want to rewrite with unique records.

 

Output File Name

Name of the file where you want to write the modified data.

 

Record separator (char)

Character, string or regular expression to separate records
(lines).

 

Field separator (char)

Character, string or regular expression to separate fields in a
record.

 

Header

Number of records to be skipped in the beginning of the
file.

 

Footer

Number of records to be skipped at the end of the file.

 

Unique key

Column: List of the column
schema.

Key attribute: Select the check box
next to the column name you want to use as a key attribute, you want
to recuperate only one occurrence of each record in the
column.

Case sensitive: select this check
box to take into account the upper and lower cases.

Advanced settings

Generate FSLang File

Select this check box to generate the FSLang file corresponding to
your Job and click the three-dot button next to the FSLang File Name field to specify its
path and its name.

 

Assign FileScale Path

Select this check box and then click the three-dot button next to
the FileScale Path field to select
the FileScale program executable file required to run the
component.

 

Specify Number of Process Child

Select this check box and enter the number of child processes to
use for carrying out the aggregation.

 

Custom FileScale Parameter (separated by,)

Enter the parameters for any specific operation you want to add to
the FileScale executable call.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component handles files therefore it does not require input
and output data flows. It is used to process large scale
files.

Limitation

Limitation is imposed by limits of physical memory and CPU
architectures. For example, number of records which are duplicates
cannot exceed 4194303 (18014398509481983 for 64bits version).

Related Scenarios

For related scenarios, see: Scenario: Combining filtering and sorting processes in a large scale file.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x