tFSJoin

Warning

This component will be available in the Palette of the
studio on the condition that you have subscribed to the relevant edition of one of the
Talend solutions with Big Data..

tFSJoin properties

Component family	FileScale	Note that this component is deprecated.
Function	tFSJoin combines fields from two files together by using values common to both. It compares columns from the leftset file (input file) with reference columns from the rightset file (lookup file) and outputs the combined data and/or the rejected data.
Purpose	Helps combining a leftset file with a rightset file using a join predicate in order to identify the data on which to perform the join.
Basic settings	Property Type	Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
		Built-in: No property data is stored centrally.
		Repository: Select the repository file where Properties are stored. The fields that follow are pre-filled using the fetched data.
	Input File
	Schema and Edit Schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
		Built-in: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created and stored the schema in the Repository. You can reuse it in other projects and Job designs. Related topic: see Talend Studio User Guide.
	File Name	Path to the leftset file (input file) that holds the data you want to join.
	Header	Number of records to be skipped in the beginning of the file.
	Footer	Number of records to be skipped at the end of the file.
	Rightset File
	Schema and Edit Schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
		Built-in: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created and stored the schema in the Repository. You can reuse it in other projects and Job designs. Related topic: see Talend Studio User Guide.
	File Name	Path to the rightset file (lookup file) that holds the data you want to join.
	Header	Number of records to be skipped in the beginning of the file.
	Footer	Number of records to be skipped at the end of the file.
	Record Separator (char)	Character, string or regular expression to separate records (lines).
	Field Separator (char)	Character, string or regular expression to separate fields in a record.
	Join Key	Input column: Select the column(s) from the input file that needs to be checked against the rightset (lookup) column. Rightset column: Select the lookup key columns that you will use as a reference against which you want to compare the columns from the input file.
	Join Mode	Select from the list the mode that defines the join between the input file and the rightset file: Inner join: this most common join type creates an output table by combining column values of two tables based upon the join predicate. Left-outer-join: with this join type, the output table always contains all records of the leftset table even if the join condition does not find any matching record in the rightset table. Right-outer-join: This join type joins tables with a reversed treatment of Left_outer-join. Full-outer-join: applies both left and right outer joins. NULL values will be given for every column of the table that lacks a matching row.
	Output File Name	Path of the output file where you want to write the combined data.
	Output Reject File	Path of the output file where you want to write the rejected data.
Advanced settings	Generate FSLang File	Select this check box to generate the FSLang file corresponding to your Job and click the three-dot button next to each of the following fields: Left FSLang File Name: to specify the path and name of the left FSLang file, Right FSLang File Name: to specify the path and name of the right FSLang file, Join FSLang File Name: to specify the path and name of the FSLang file that will hold the combined data,
	Assign FileScale Path	Select this check box and then click the three-dot button next to the FileScale Path field to select the FileScale program executable file required to execute the component.
	Specify Number of Process Child	Select this check box and enter the number of child processes to use to carry out aggregation.
	Custom FileScale Parameter (separated by,)	Enter the parameters for any specific operation you want to add to the FileScale executable call.
	Set temporary path	Select this check box to set the directory for temporary files of tFSJoin when you are working on large files. Make sure the directory for temporary files already exists and it is in a partition that has adequate disk space. Note Avoid using the system partition to store temporary files because it may not be accessible to applications.
	Custom Hash Ratio	Enter a ratio between the maximum memory used to execute the FileScale process and input file size. If the size of any of the input files is under this ratio, an in-memory join-hash algorithm is tried first, to optimize performance. For example, For a default value of this parameter equals to 0.5, and if the memory of your computer is1GB, the FileScale process will use an in-memory join-hash algorithm when the input file size is under 500MB.
	tStatCatcher Statistics	Select this check box to gather the Job processing metadata at a Job level as well as at each component level.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	This component handles files therefore it does not require input or output data flows. It is used to join data in large scale files.
Limitation	Limitations depend on the limits imposed by the physical memory and CPU architecture. For example, the total length of processed files cannot exceed the file system limit for LargeFile support (maximum value of 64 signed bits).

Scenario: Combining data from two columns and outputting the joined and rejected
data

Warning

Make sure that you have unzipped and saved locally the FileScale executable file
delivered by Talend. You must define the path of this executable
file in the Advanced settings view of tFSJoin.

This scenario describes a Job that uses the tFSJoin
component to combine data from an input (leftset) file with data from a
lookup (rightset) file by using one column common to both as the join key. This Job also
outputs the rejected data, data that does not have a match in the lookup file.

Dropping and linking the components

Drop the following components from the Palette to the design workspace: two tRowGenerator components, two tFileOutputDelimited components and one tFSJoin component.
Connect the tRowGenerator components to
the tFileOutputDelimited components using
Row > Main
links.
Use Trigger > OnSubjobOk links to connect the two tRowGenerator components together and then the second
tRowGenerator to tFSJoin.

Configuring the components

In this scenario, the first tRowGenerator
component will generate the lookup data according to the schema you define in the
component editor and will send it to the lookup file. The second tRowGenerator component will generate the main data
according to the schema you define in the component editor and will send it to the
input file.

Note

You must have at least one column common to both the main and lookup files in
order to be able to combine data according to this column.

If data generation is errorless, the tFSJoin
component will compare data in both files according to the join key, combine common
data and writes it in an output file and finally writes rejected data in another
output file.

Configuring the tRowGenerator and tFileOutputDelimited components

Click tRowGenerator to display its
Basic settings view and define the
component properties.
Click the […] button next to RowGenerator Editor to open the component editor
where you can define your schema.
In the upper half of the editor, click the plus button to add the columns
you want to write in the input file.
Define the schema and set the parameters of the columns.

In this scenario, the lookup file contains three columns:
firstname_client, lastname_client
and id_client.

If required, click the Preview tab in the
lower half of the editor to display the corresponding view and then click
the View button to display a sample of the
generated data.
Click tFileOutputDelimited to display its
Basic settings view and define the
component properties.
Click the Edit schema button to display
the schema you defined in the editor and modify it if required.
In the design workspace, double-click the second tRowGenerator to open its editor and define the main input
schema as you did with the lookup schema.

In this scenario, the main input file contains three columns:
id_command, price_command and
id_client.
Click the second tFileOutputDelimited to
display its Basic settings view and define
the component properties.
Click the Edit schema button to display
the input schema you defined in the editor and modify it if required.

Configuring the tFSJoin component

Click tFSJoin to open its Basic settings view and define the component
properties.
Click the Edit schema button to display a
dialog box. Here you can define your column schema. This schema must
correspond to the input file schema.
In the Input File area, set the
properties of the input file and click Edit
schema to view/modify the input schema, if required.

Do the same in the Rightset File
area.
Set the record and field separators in the corresponding fields.
In the Join Key table, click the plus
button to add a line in the table and then click in the line and select a
column from the input file and one of the lookup file. You want to use these
columns common to both files as the value to link data. Repeat the operation
to add as many join keys as needed.

Warning

You can use only the final columns in the input and lookup
files as join keys.
From the Join Mode list, select the mode
you want to use as a base to join data.
In the Output File Name field, set the
path to the output file that will hold the combined data.
Select the Output Reject File check box
if you want to output the rejected data after the join. Set in the field
that displays the path to the reject file.
Click the Advanced settings tab to
display the advanced settings view.
Select the Assign FileScale Path check
box to display the FileScale Path field and
then browse to the executable file delivered by Talend.

Saving and executing the Job

Press Ctrl+S to save your Job.
Press F6 or click Run on the Run tab to
execute the Job.

A progress bar displays below the tFSJoin
component in the design workspace to show the completed percentage of the
operation. This progress bar will make it evident how the huge input data is
partitioned at a very high speed.

When the percentage progress bar reaches 100%, combined data for matched
pairs is written in the output file and data without a match is written in
the reject output file.

Below is a sample of the output joined client data. In the output file,
you can see client first and last names combined with the command id and
command price based on the client id.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tFSJoin – Docs for ESB 5.x

tFSJoin

Warning

tFSJoin properties

Note

Scenario: Combining data from two columns and outputting the joined and rejected
data

Warning

Dropping and linking the components

Configuring the components

Note

Warning

Saving and executing the Job

My Website Links

Tags

tFSJoin

Warning

tFSJoin properties

Note

Scenario: Combining data from two columns and outputting the joined and rejected data

Warning

Dropping and linking the components

Configuring the components

Note

Warning

Saving and executing the Job

Scenario: Combining data from two columns and outputting the joined and rejected
data