August 17, 2023

tFSSort – Docs for ESB 5.x

tFSSort

tFSSort_icon32.png

Warning

This component will be available in the Palette of
the studio on the condition that you have subscribed to the relevant edition of one of
the Talend solutions with Big Data.

tFSSort Properties

Component family

FileScale

Note that this component is deprecated.

Function

tFSSort sorts input data by sort
type and order based on one or several columns. This component can
sort large scale files at very high speed.

Purpose

tFSSort helps creating metrics
and classification tables.

Basic settings

Schema type and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

 

Repository: You have already
created the schema and stored it in the Repository. You can reuse it
in various projects and Job flowcharts. Related topic: see Talend Studio User
Guide
.

 

 

Built-in: You create and store
the schema locally for this component only. Related topic: see
Talend Studio User
Guide
.

 

Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-in: No property data stored
centrally.

 

 

Repository: Select the repository
file where Properties are stored. The fields that follow are
pre-filled in using the fetched data.

 

Input File Name

Name of the file holding the data you want to sort.

 

Output File Name

Name of the file where you want to write the sorted data.

 

Record separator (char)

Character, string or regular expression to separate records
(lines).

 

Field separator (char)

Character, string or regular expression to separate fields in a
record.

 

Header

Number of records to be skipped in the beginning of the
file.

 

Footer

Number of records to be skipped at the end of the file.

 

Criteria

Click the plus button to add as many lines as required for the
sort to be complete.

Schema column: Click in the cell
and select the column label from your schema, which the sort will be
based on.

Note

The order is essential as it determines the sorting
priority.

Sort type: Click in the cell and
select the sort type: numerical or alphabetical.

Order type: Click in the cell and
select the order type: ascending or descending.

Advanced settings

Generate FSLang File

Select this check box to generate the FSLang file corresponding to
your Job and click the three-dot button next to the FSLang File Name field to specify its
path and its name.

 

Assign FileScale Path

Select this check box and then click the three-dot button next to
the FileScale Path field to select
the FileScale program executable file required to execute the
component.

 

Specify Number of Process Child

Select this check box and enter the number of child processes to
use for carrying out the aggregation.

 

Custom FileScale Parameter (separated by,)

Enter the parameters for any specific operation you want to add to
the FileScale executable call.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component handles files therefore it does not require input
and output data flows. It is used to sort data in large scale
files.

Limitation

Limitation is imposed by limits of physical memory and CPU
architectures. For example, total length of processed files cannot
exceed file system limit for LargeFile support (maximum value of 64
signed bits).

Scenario: Sorting entries in a large scale file

Warning

Make sure that you have unzipped and saved locally the FileScale
executable file delivered by


Talend

. You must define the path of this executable file in the
Advanced settings
view of
tFSSort
.

This scenario describes a Job that sort in very short time big amount of data in a
large scale file following two defined value entries.

In this scenario, we have already stored the input schemas of the large input file in
the repository. For more information about storing schema metadata in the Repository tree view, see Talend Studio User
Guide
.

The input file contains 10 columns: id, surname, firstname, zipcode, city,
dateofbirth, streetname, streetnr, statecode,
and
state.

Use_Case_tFSSort1.png
  • In the Repository tree view, expand Metadata and the file node where you have stored the
    input schemas and drop the relevant metadata onto the design workspace.

    The [Component] dialog box displays.

Use_Case_tFSSort2.png
  • Select tFSSort from the list and click
    OK to close the dialog box.

    The tFSSort component displays in the
    workspase.

  • Double-click tFSSort to display its Basic settings view.

Use_Case_tFSSort.png

All tFSSort property fields are automatically filled
in. If you did not define your input schemas locally in the repository, fill in the
details manually after selecting Built-in in the
Schema Type and Property
Type
fields.

  • In the Output File Name, browse to the output
    file you want to write the sorted data in.

  • In the Criteria table, click the plus button
    to add columns to the list and then select the schema columns you want to use as
    base for the sorting operation.

    In this scenario, we want to sort the data according to the
    city and surname columns. We want
    to group the data alphabetically in an ascending order.

  • In the Sort type and Order type columns, click in the cell and select alpha and asc for
    the sort type and order type respectively.

  • Click the Advanced settings tab to display
    the advanced settings view and then select the Assign
    FileScale Path
    check box to display the FileScale Path field and browse to the filescale executable
    file.

  • Save your Job and press F6 to execute
    it.

Use_Case_tFSSort3.png

A progress bar displays below the tFSSort component
in the design workspace to show the completed percentage of the sorting operation. This
progress bar will make it evident how the huge input data is sorted at high
speed.

When the percentage progress bar reaches 100%, the sorted data according to first the
city name and second the surname is written in the defined output file.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x