Warning
This component will be available in the Palette of the
studio on the condition that you have subscribed to the relevant edition of one of the
Talend solutions with Big Data..
Component family |
FileScale |
Note that this component is deprecated. |
Function |
tFSJoin combines fields from two |
|
Purpose |
Helps combining a leftset file with a rightset file using a join |
|
Basic settings |
Property Type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data is |
|
|
Repository: Select the repository |
|
Input File |
|
|
Schema and Edit |
A schema is a row description, it defines the number of fields to be processed and Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: You create and store |
|
|
Repository: You have already |
|
File Name |
Path to the leftset file (input file) that holds the data you want |
|
Header |
Number of records to be skipped in the beginning of the |
|
Footer |
Number of records to be skipped at the end of the file. |
|
Rightset File |
|
|
Schema and Edit |
A schema is a row description, it defines the number of fields to be processed and Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: You create and store |
|
|
Repository: You have already |
|
File Name |
Path to the rightset file (lookup file) that holds the data you |
|
Header |
Number of records to be skipped in the beginning of the |
|
Footer |
Number of records to be skipped at the end of the file. |
|
Record Separator (char) |
Character, string or regular expression to separate records |
|
Field Separator (char) |
Character, string or regular expression to separate fields in a |
|
Join Key |
Input column: Select the
Rightset column: Select the lookup |
|
Join Mode |
Select from the list the mode that defines the join between the
Inner join: this most common join
Left-outer-join: with this join
Right-outer-join: This join type
Full-outer-join: applies both left |
|
Output File Name |
Path of the output file where you want to write the combined |
|
Output Reject File |
Path of the output file where you want to write the rejected |
Advanced settings |
Generate FSLang File |
Select this check box to generate the FSLang file corresponding to
Left FSLang File Name: to specify
Right FSLang File Name: to specify
Join FSLang File Name: to specify |
|
Assign FileScale Path |
Select this check box and then click the three-dot button next to |
|
Specify Number of Process Child |
Select this check box and enter the number of child processes to |
|
Custom FileScale Parameter (separated by,) |
Enter the parameters for any specific operation you want to add to |
|
Set temporary path |
Select this check box to set the directory for temporary files of NoteAvoid using the system partition to store temporary files |
|
Custom Hash Ratio |
Enter a ratio between the maximum memory used to execute the For example, For a default value of this parameter equals to 0.5, |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component handles files therefore it does not require input |
|
Limitation |
Limitations depend on the limits imposed by the physical memory |
Warning
Make sure that you have unzipped and saved locally the FileScale executable file
delivered by Talend. You must define the path of this executable
file in the Advanced settings view of tFSJoin.
This scenario describes a Job that uses the tFSJoin
component to combine data from an input (leftset) file with data from a
lookup (rightset) file by using one column common to both as the join key. This Job also
outputs the rejected data, data that does not have a match in the lookup file.
-
Drop the following components from the Palette to the design workspace: two tRowGenerator components, two tFileOutputDelimited components and one tFSJoin component.
-
Connect the tRowGenerator components to
the tFileOutputDelimited components using
Row > Main
links. -
Use Trigger > OnSubjobOk links to connect the two tRowGenerator components together and then the second
tRowGenerator to tFSJoin.
In this scenario, the first tRowGenerator
component will generate the lookup data according to the schema you define in the
component editor and will send it to the lookup file. The second tRowGenerator component will generate the main data
according to the schema you define in the component editor and will send it to the
input file.
Note
You must have at least one column common to both the main and lookup files in
order to be able to combine data according to this column.
If data generation is errorless, the tFSJoin
component will compare data in both files according to the join key, combine common
data and writes it in an output file and finally writes rejected data in another
output file.
Configuring the tRowGenerator and tFileOutputDelimited components
-
Click tRowGenerator to display its
Basic settings view and define the
component properties. -
Click the […] button next to RowGenerator Editor to open the component editor
where you can define your schema. -
In the upper half of the editor, click the plus button to add the columns
you want to write in the input file. -
Define the schema and set the parameters of the columns.
In this scenario, the lookup file contains three columns:
firstname_client, lastname_client
and id_client.If required, click the Preview tab in the
lower half of the editor to display the corresponding view and then click
the View button to display a sample of the
generated data. -
Click tFileOutputDelimited to display its
Basic settings view and define the
component properties. -
Click the Edit schema button to display
the schema you defined in the editor and modify it if required. -
In the design workspace, double-click the second tRowGenerator to open its editor and define the main input
schema as you did with the lookup schema.In this scenario, the main input file contains three columns:
id_command, price_command and
id_client. -
Click the second tFileOutputDelimited to
display its Basic settings view and define
the component properties. -
Click the Edit schema button to display
the input schema you defined in the editor and modify it if required.
Configuring the tFSJoin component
-
Click tFSJoin to open its Basic settings view and define the component
properties. -
Click the Edit schema button to display a
dialog box. Here you can define your column schema. This schema must
correspond to the input file schema. -
In the Input File area, set the
properties of the input file and click Edit
schema to view/modify the input schema, if required.Do the same in the Rightset File
area. -
Set the record and field separators in the corresponding fields.
-
In the Join Key table, click the plus
button to add a line in the table and then click in the line and select a
column from the input file and one of the lookup file. You want to use these
columns common to both files as the value to link data. Repeat the operation
to add as many join keys as needed.Warning
You can use only the final columns in the input and lookup
files as join keys. -
From the Join Mode list, select the mode
you want to use as a base to join data. -
In the Output File Name field, set the
path to the output file that will hold the combined data. -
Select the Output Reject File check box
if you want to output the rejected data after the join. Set in the field
that displays the path to the reject file. -
Click the Advanced settings tab to
display the advanced settings view. -
Select the Assign FileScale Path check
box to display the FileScale Path field and
then browse to the executable file delivered by Talend.
-
Press Ctrl+S to save your Job.
-
Press F6 or click Run on the Run tab to
execute the Job.A progress bar displays below the tFSJoin
component in the design workspace to show the completed percentage of the
operation. This progress bar will make it evident how the huge input data is
partitioned at a very high speed.When the percentage progress bar reaches 100%, combined data for matched
pairs is written in the output file and data without a match is written in
the reject output file.Below is a sample of the output joined client data. In the output file,
you can see client first and last names combined with the command id and
command price based on the client id.