August 17, 2023

tFSPartitionFile – Docs for ESB 5.x




This component will be available in the Palette of the studio on the condition that you have
subscribed to the relevant edition of one of the Talend solutions
with Big Data.

tFSPartitionFile Properties

Component family


Note that this component is deprecated.


tFSPartitionFile enables you to
partition mass data from an input file based on the hash or
round-robin partitioning method before writing it to an output file.
This will facilitates the management of very large tables.

This component has real-time capabilities for partitioning large
scale files. To optimize performance, the component usually sorts
data before processing it.


Helps partitioning mass data before writing it in an output

Basic settings

Schema type and Edit

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository



Repository: You have already
created the schema and stored it in the Repository. You can reuse it
in various projects and Job flowcharts. Related topic: see Talend Studio User



Built-in: You create and store
the schema locally for this component only. Related topic: see
Talend Studio User


Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.



Built-in: No property data stored



Repository: Select the repository
file where Properties are stored. The fields that follow are
pre-filled using the fetched data.


Input File Name

Name of the file holding the data you want to partition.


Output File Name

Name of the file where you want to write the partitioned


The generated set of the output files will be postfixed with
an auto-increment number depending on the number of partitions
you define.


Record separator (char)

Character, string or regular expression to separate records


Field separator (char)

Character, string or regular expression to separate fields in a



Number of records to be skipped in the beginning of the



Number of records to be skipped at the end of the file.


Number of partitions

Number of partitions in the file.



Select from the list the partition method you want to use:

Round-robin: The records are
partitioned on a round-robin basis so that each partition contains a
more or less equal number of rows and load balancing is achieved.
Because there is no partition key, rows are distributed randomly
across all partitions.

Hash: The records are hashed into
partitions based on the value of a key column or columns selected
from the file schema.

Select the Partition Key check
box that corresponds to the column(s) you want to use as a base to
partition data.

Advanced settings

Generate FSLang File

Select this check box to generate the FSLang file corresponding to
your Job and click the three-dot button next to the FSLang File Name field to specify its
path and its name.


Assign FileScale Path

Select this check box and then click the three-dot button next to
the FileScale Path field to select
the FileScale program executable file required to execute the


Specify Number of Process Child

Select this check box and enter the number of child processes to
use for carrying out the operation.


Sort results

Select this check box to sort the results.


Custom FileScale Parameter (separated by,)

Enter the parameters for any specific operation you want to add to
the FileScale executable call.


tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.


This component handles files therefore it does not require input
and output data flows. It is used to partition data in large scale


Limitation is imposed by limits of physical memory and CPU
architectures. For example, total length of processed files cannot
exceed file system limit for LargeFile support (maximum value of 64
signed bits).

Scenario: Partitioning mass data based on the hash method before writing it to an
output file


Make sure that you have unzipped and saved locally the FileScale
executable file delivered by


. You must define the path of this executable file in the
Advanced settings
view of

This scenario describes a Job that uses the tFSPartitionFile
component to partition, in high speed, very big data according to the hash
method and using two of the input columns as partition keys.

  • Drop the following components from the Palette to the design workspace: tRowGenerator, tFileOutputDelimited and tFSPartitionFile.

  • Connect tRowGenerator first to tFileOutputDelimited using a Row > Main link and then to tFSPartitionFile using an OnSubjobOk link.

In this scenario, the tRowGenerator component will
generate data according to the schema you define in the component Basic settings view and send it to the input file. If data generation is
errorless, the tFSPartitionFile component will
partition data into six subsets according to the defined partition method.

  • Click tRowGenerator to display its Basic settings view and define the component

  • Click the three-dot button next to RowGenerator
    to open the component editor where you can define your

  • In the upper half of the editor, click the plus button to add the columns you
    want to write in the input file.

  • Define the schema and set the parameters of the columns.

    In this scenario, the input file contains five columns:
    id, firstname,
    lastname, city and


Make sure to define the length of your columns. Otherwise, an error
message will display when executing your Job.

  • If required, click the Preview tab in the
    lower half of the editor to display the corresponding view and then click the
    View button to display a sample of the
    generated data.

  • Click OK to validate your schema and close
    the tRowGenerator editor.

  • Click tFileOutputDelimited to display its
    Basic settings view and define the
    component properties.

  • Set the tFileOutputdelimited

For this scenario, we want to define a context variables for the input file path. You
can create context variables in different ways. For more information about how to create
and use context variables, see Talend Studio User Guide. In
this example, we want to define context variables directly from the component

  • Place your pointer in the field that you want to parameterize, File Name in this example, and then press F5.

    A dialog box displays.

  • Give a name to this new variable and select its type from the Type list.

  • In the Default value field, type in the
    context value you want to use for the input file path.

  • Click Finish to validate your changes and
    close the dialog box.

    The newly created variable is displayed in the File
    field and in the Contexts

  • In the Basic settings view of tFileOutputDelimited, click the Edit schema button to display the schema you defined in the
    editor and modify it if required.

  • Click OK to close the dialog box.

  • Click tFSPartitionFile to open its Basic settings view and define the component

  • Set schema and property type to Built-In.

  • Click the Edit schema button to display a
    dialog box. Here you can define your column schema. This schema must corresponds
    to the input file schema.

  • Click OK to close the schema dialog

    The defined column schema displays in the Partition

  • Set the input and output file names using the variable context you define
    earlier by pressing Ctrl + Space and selecting
    the variable from the list.

  • Define the record and field separators and then the header and footer of the
    file, if any.

  • In the Number of partitions field, enter a
    number of the data subsets you want to create, six in this scenario.

  • From the Partition list, select the partition
    method you want to use, Hash in this

  • In the Partition configuration table, select
    the check boxes that correspond to the input columns you want to use as
    partition keys, id and firstname in
    this scenario.

  • Click the Advanced settings tab to display
    the advanced settings view.

  • Select the Assign FileScale Path check box to
    display the FileScale Path field and then
    browse to the executable file delivered by Talend.

  • Save your Job and press F6 to execute it.

A progress bar displays below the tFSPartition
component in the design workspace to show the completed percentage of the operation.
This progress bar will make it evident how the huge input data is partitioned at a very
high speed.

When the percentage progress bar reaches 100%, the data is partitioned into six
subsets as defined in the component settings and written in the defined output files as
shown in the below capture.


The generated set of the output files is postfixed with an auto-increment number
depending on the number of partitions you define, six in this scenario.

Document get from Talend
Thank you for watching.
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x