
Warning
This component will be available in the Palette of the studio on the condition that you have
subscribed to the relevant edition of one of the Talend solutions
with Big Data.
Component family |
FileScale |
Note that this component is deprecated. |
Function |
tFSPartitionFile enables you to This component has real-time capabilities for partitioning large |
|
Purpose |
Helps partitioning mass data before writing it in an output |
|
Basic settings |
Schema type and Edit |
A schema is a row description, it defines the number of fields to be processed and Click Edit schema to make changes to the schema. If the
|
|
|
Repository: You have already |
|
|
Built-in: You create and store |
|
Property type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
Input File Name |
Name of the file holding the data you want to partition. |
|
Output File Name |
Name of the file where you want to write the partitioned NoteThe generated set of the output files will be postfixed with |
|
Record separator (char) |
Character, string or regular expression to separate records |
|
Field separator (char) |
Character, string or regular expression to separate fields in a |
|
Header |
Number of records to be skipped in the beginning of the |
|
Footer |
Number of records to be skipped at the end of the file. |
|
Number of partitions |
Number of partitions in the file. |
|
Partition |
Select from the list the partition method you want to use:
Round-robin: The records are
Hash: The records are hashed into Select the Partition Key check |
Advanced settings |
Generate FSLang File |
Select this check box to generate the FSLang file corresponding to |
|
Assign FileScale Path |
Select this check box and then click the three-dot button next to |
|
Specify Number of Process Child |
Select this check box and enter the number of child processes to |
|
Sort results |
Select this check box to sort the results. |
|
Custom FileScale Parameter (separated by,) |
Enter the parameters for any specific operation you want to add to |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component handles files therefore it does not require input |
|
Limitation |
Limitation is imposed by limits of physical memory and CPU |
Warning
Make sure that you have unzipped and saved locally the FileScale
executable file delivered by
Talend
. You must define the path of this executable file in the
Advanced settings
view of
tFSPartitionFile
.
This scenario describes a Job that uses the tFSPartitionFile
component to partition, in high speed, very big data according to the hash
method and using two of the input columns as partition keys.
-
Drop the following components from the Palette to the design workspace: tRowGenerator, tFileOutputDelimited and tFSPartitionFile.

-
Connect tRowGenerator first to tFileOutputDelimited using a Row > Main link and then to tFSPartitionFile using an OnSubjobOk link.
In this scenario, the tRowGenerator component will
generate data according to the schema you define in the component Basic settings view and send it to the input file. If data generation is
errorless, the tFSPartitionFile component will
partition data into six subsets according to the defined partition method.
-
Click tRowGenerator to display its Basic settings view and define the component
properties. -
Click the three-dot button next to RowGenerator
Editor to open the component editor where you can define your
schema.

-
In the upper half of the editor, click the plus button to add the columns you
want to write in the input file. -
Define the schema and set the parameters of the columns.
In this scenario, the input file contains five columns:
id, firstname,
lastname, city and
age.
Warning
Make sure to define the length of your columns. Otherwise, an error
message will display when executing your Job.
-
If required, click the Preview tab in the
lower half of the editor to display the corresponding view and then click the
View button to display a sample of the
generated data. -
Click OK to validate your schema and close
the tRowGenerator editor. -
Click tFileOutputDelimited to display its
Basic settings view and define the
component properties.

-
Set the tFileOutputdelimited
properties.
For this scenario, we want to define a context variables for the input file path. You
can create context variables in different ways. For more information about how to create
and use context variables, see Talend Studio User Guide. In
this example, we want to define context variables directly from the component
view.
-
Place your pointer in the field that you want to parameterize, File Name in this example, and then press F5.
A dialog box displays.

-
Give a name to this new variable and select its type from the Type list.
-
In the Default value field, type in the
context value you want to use for the input file path. -
Click Finish to validate your changes and
close the dialog box.The newly created variable is displayed in the File
Name field and in the Contexts
view. -
In the Basic settings view of tFileOutputDelimited, click the Edit schema button to display the schema you defined in the
editor and modify it if required. -
Click OK to close the dialog box.
-
Click tFSPartitionFile to open its Basic settings view and define the component
properties.

-
Set schema and property type to Built-In.
-
Click the Edit schema button to display a
dialog box. Here you can define your column schema. This schema must corresponds
to the input file schema. -
Click OK to close the schema dialog
box.The defined column schema displays in the Partition
configuration table. -
Set the input and output file names using the variable context you define
earlier by pressing Ctrl + Space and selecting
the variable from the list. -
Define the record and field separators and then the header and footer of the
file, if any. -
In the Number of partitions field, enter a
number of the data subsets you want to create, six in this scenario. -
From the Partition list, select the partition
method you want to use, Hash in this
scenario. -
In the Partition configuration table, select
the check boxes that correspond to the input columns you want to use as
partition keys, id and firstname in
this scenario. -
Click the Advanced settings tab to display
the advanced settings view.

-
Select the Assign FileScale Path check box to
display the FileScale Path field and then
browse to the executable file delivered by Talend. -
Save your Job and press F6 to execute it.
A progress bar displays below the tFSPartition
component in the design workspace to show the completed percentage of the operation.
This progress bar will make it evident how the huge input data is partitioned at a very
high speed.
When the percentage progress bar reaches 100%, the data is partitioned into six
subsets as defined in the component settings and written in the defined output files as
shown in the below capture.

The generated set of the output files is postfixed with an auto-increment number
depending on the number of partitions you define, six in this scenario.