
Warning
This component will be available in the Palette of the studio on the condition that you have
subscribed to the relevant edition of one of the Talend solutions
with Big Data.
Component family |
FileScale |
Note that this component is deprecated. |
Function |
tFSFilterColumns makes specified |
|
Purpose |
Helps homogenizing schemas in a file either by removing unwanted |
|
Basic settings |
Schema type and Edit |
A schema is a row description. It defines the number of fields to Click Edit schema to make changes to the schema. If the
|
|
|
Repository: You have already |
|
|
Built-in: You create and store |
|
Property type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
Input File Name |
Name of the file holding the data you want to filter. |
|
Output File Name |
Name of the file where you want to write the filtered data. |
|
Record separator (char) |
Character, string or regular expression to separate records |
|
Field separator (char) |
Character, string or regular expression to separate fields in a |
|
Header |
Number of records to be skipped in the beginning of the |
|
Footer |
Number of records to be skipped at the end of the file. |
|
Column configuration |
Column: List of the columns to be
Keep it: Select the check box next |
Advanced settings |
Generate FSLang File |
Select this check box to generate the FSLang file corresponding to |
|
Assign FileScale Path |
Select this check box and then click the three-dot button next to |
|
Specify Number of Process Child |
Select this check box and enter the number of child processes to |
|
Sort results |
Select this check box to sort the results. |
|
Custom FileScale Parameter (separated by,) |
Enter the parameters for any specific operation you want to add to |
|
tStatCatcher Statistics |
Select this check box to gather the job processing metadata at a |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component handles files therefore it does not require input |
|
Limitation |
Limitation is imposed by limits of physical memory and CPU |
Warning
Make sure that you have unzipped and saved locally the FileScale
executable file delivered by
Talend
. You must define the path of this executable file in the
Advanced settings
view of each of the FileScale components.
This scenario describes a Job that uses several FileScale components which are
connected with the FS Combine link. Using this link,
the different FileScale components will process data simultaneously and thus optimize
performance. This scenario uses the following FileScale components:
-
tFSFilterColumns: to filter columns in a table based on column name mapping,
-
tFSFilterRows: to define a row filter on the table to keep only the rows that have
a value greater than or equal to 5000.0, -
tFSUnique: to recuperate only one occurrence
of the name record, in case duplicates are present, -
tFSSort: to sort data in alphabetical order
based on the name column.
To design such a scenario, complete the following:
-
Drop the following components from the Palette to the design workspace: tRowGenerator, tFileOutputDelimited, tFSFilterColumns, tFSFilterRows, tFSUnique and
tFSSort.

-
Connect tRowGenerator first to tFileOutputDelimited using a Row > Main link and then to tFSFilterColumns using an OnSubjobOk link.
-
Connect all the FileScale components together using the Row > FS Combine link.
Note
As the FS Combine link joins all data processes
designed in the Job and executes them simultaneously, the sequence of the FileScale
components is of no importance. You can order them the way you like.
Warning
In order for such scenario to execute correctly, you must define the same
input and output files in all the FileScale components.
In this scenario, the tRowGenerator component will
generate data according to the schema you define in the component Basic settings view and send it to the input file. If data generation is
errorless, data in the input file will be processed simultaneously by the four FileScale
components and thus optimize performance.
-
Click tRowGenerator to display its Basic settings view and define the component
properties. -
Click the three-dot button next to RowGenerator
Editor to open the component editor where you can define your
schema.

-
In the upper half of the editor, click the plus button to add the columns you
want to write in the input file. -
Define the schema and set the parameters of the columns.
In this scenario, the input file contains nine columns:
id, name,
dateOfBirth, amount and five other
columns that hold extra information. -
If required, click the Preview tab in the
lower half of the editor to display the corresponding view and then click the
View button to display a sample of the
generated data. -
Click OK to validate your schema and close
the tRowGenerator editor. -
Click tFileOutputDelimited to display its
Basic settings view and define the
component properties.

-
Set the tFileOutputDelimited
properties.
For this scenario, we have defined two context variables:
filename for the input file name and
folder for the input file path. For more information about
context variables, check Talend Studio User
Guide.
-
Put your pointer in the File Name field and
then press Ctrl + Space and select the context
variable for the file path (context.folder). -
Enter
+"/"+
to separate the file path from the file name and then
press Ctrl+Space to display the context
variable list. -
Select the context variable for the file name
(context.filename). -
If required, click the Edit schema button to
view the schema you defined in the editor or modify it. -
Click tFSFilterColumns to open its Basic settings view and define the component
properties.

-
Set schema and property type to Built-In.
-
Click the Edit schema button to display a
dialog box. Here you can define your column schema. This schema must corresponds
to the input file schema. -
Click OK to close the schema dialog
box.The defined column schema displays in the Column
configuration table. -
Set the input file path and name using the variable contexts you define
earlier by pressing Ctrl + Space and selecting
the variables from the list. Separate the two variables by
+"/"+
. -
Click the three-dot button next to the Output File
Name and browse to the output file where you want to write the
processed data.In this example, we have defined a context variable for the output file the
same way we did for the input file. -
Define the record and field separators and then the header and footer of the
file, if any. -
In the Column configuration table, select the
check boxes of the columns you want to write in the output file, id,
name, dateOfBirth and amount in this
scenario. -
Click the Advanced settings tab to display
the advanced settings view.

-
Select the Generate FSLang File check box to
display the FSLang File Name field and then
browse to where you want to generate the FSLang file corresponding to your Job.In this example, we have defined a context variable for the FSLang output
file as we did for the input and output files. -
Select the Assign FileScale Path check box to
display the FileScale Path field and then
browse to the FileScale executable file delivered by Talend. -
Click tFSFilterRows to display the Basic settings view and define the component
properties.

-
Click the Edit schema button to display the
schema dialog box. Here you can define your column schema. This schema must
corresponds to the input file schema.

-
From the panel to the right, delete the output column that do not map to the
column filter you defined in tFSFilterColumns,
all extraInfo columns in this scenario. -
Click OK to close the schema dialog
box. -
Define the components properties as you did with
tFSFilterColumns. Use the same input and output files. -
Select a logical operator from the corresponding list, And in this scenario.
-
In the Conditions table, click the plus
button to add a line to the table. -
Click in the line and then select the column on which you want to base your
row filter.In this scenario, we want write out only the rows where the
amount value is greater than or equal to 5000.0. -
Click the Advanced settings tab to display
the advanced settings view. -
Select the Assign FileScale Path check box to
display the FileScale Path field and then
browse to the FileScale executable file delivered by Talend. -
Click tFSUnique to display the Basic settings view and define the component
properties.

-
Define the components properties as you did with
tFSFilterRows. Use the same input and output files. -
In the Unique key table, select the check box
of the column you want to use as a key attribute.In this scenario, we want to recuperate only one occurrence of each name in
the name column. -
If you want to take into account upper and lower cases, select the
corresponding Case Sensitive check box. -
Click the Advanced settings tab to display
the advanced settings view. -
Select the Assign FileScale Path check box to
display the FileScale Path field and then
browse to the FileScale executable file delivered by Talend. -
Click tFSSort to display its Basic settings view and define the component
properties.

-
Define the components properties as you did with
tFSUnique. Use the same input and output files. -
In the Criteria table, click the plus button
to add a line to the table. -
Click in the Schema column and then select
the column by which you want to sort data. -
Click in the Sort type column and select the
sort type. -
Click in the Order type column and select the
order type.
In this scenario, you want to output data alphabetically in ascending order based on
the name column.
-
Click the Advanced settings tab to display
the advanced settings view. -
Select the Assign FileScale Path check box to
display the FileScale Path field and then
browse to the FileScale executable file delivered by Talend. -
Save your job and press F6 to execute it.
A progress bar displays in the design workspace to show the completed percentage of
the simultaneous operations. This progress bar will make it evident how the huge input
data is processed at a very high speed.
When the percentage progress bar reaches 100%, the data is written in the output file.
A simple comparison between the input data and the processed data will show us
that:
The output file has only the id, name, dateOfBirth and
amount columns. Only the rows where the
amount value is greater than or equal to 5000.0 are listed.
There is only one occurrence of each of the names and data records are written in an
alphabetical ascending order.
The below two captures show a sample of the large scale data before and after
processing.
Input data:

Output data:
