August 17, 2023

tFSJoin – Docs for ESB 5.x

tFSJoin

tFSJoin_icon32.png

Warning

This component will be available in the Palette of the
studio on the condition that you have subscribed to the relevant edition of one of the
Talend solutions with Big Data..

tFSJoin properties

Component family

FileScale

Note that this component is deprecated.

Function

tFSJoin combines fields from two
files together by using values common to both. It compares columns
from the leftset file (input file) with reference columns from the
rightset file (lookup file) and outputs the combined data and/or the
rejected data.

Purpose

Helps combining a leftset file with a rightset file using a join
predicate in order to identify the data on which to perform the
join.

Basic settings

Property Type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-in: No property data is
stored centrally.

 

 

Repository: Select the repository
file where Properties are stored. The fields that follow are
pre-filled using the fetched data.

 

Input File

 

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

 

Built-in: You create and store
the schema locally for this component only. Related topic: see
Talend Studio User
Guide
.

 

 

Repository: You have already
created and stored the schema in the Repository. You can reuse it in
other projects and Job designs. Related topic: see Talend Studio User
Guide
.

 

File Name

Path to the leftset file (input file) that holds the data you want
to join.

 

Header

Number of records to be skipped in the beginning of the
file.

 

Footer

Number of records to be skipped at the end of the file.

 

Rightset File

 

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

 

Built-in: You create and store
the schema locally for this component only. Related topic: see
Talend Studio User
Guide
.

 

 

Repository: You have already
created and stored the schema in the Repository. You can reuse it in
other projects and Job designs. Related topic: see Talend Studio User
Guide
.

 

File Name

Path to the rightset file (lookup file) that holds the data you
want to join.

 

Header

Number of records to be skipped in the beginning of the
file.

 

Footer

Number of records to be skipped at the end of the file.

 

Record Separator (char)

Character, string or regular expression to separate records
(lines).

 

Field Separator (char)

Character, string or regular expression to separate fields in a
record.

 

Join Key

Input column: Select the
column(s) from the input file that needs to be checked against the
rightset (lookup) column.

Rightset column: Select the lookup
key columns that you will use as a reference against which you want
to compare the columns from the input file.

 

Join Mode

Select from the list the mode that defines the join between the
input file and the rightset file:

Inner join: this most common join
type creates an output table by combining column values of two
tables based upon the join predicate.

Left-outer-join: with this join
type, the output table always contains all records of the leftset
table even if the join condition does not find any matching record
in the rightset table.

Right-outer-join: This join type
joins tables with a reversed treatment of Left_outer-join.

Full-outer-join: applies both left
and right outer joins. NULL values will be given for every column of
the table that lacks a matching row.

 

Output File Name

Path of the output file where you want to write the combined
data.

 

Output Reject File

Path of the output file where you want to write the rejected
data.

Advanced settings

Generate FSLang File

Select this check box to generate the FSLang file corresponding to
your Job and click the three-dot button next to each of the
following fields:

Left FSLang File Name: to specify
the path and name of the left FSLang file,

Right FSLang File Name: to specify
the path and name of the right FSLang file,

Join FSLang File Name: to specify
the path and name of the FSLang file that will hold the combined
data,

 

Assign FileScale Path

Select this check box and then click the three-dot button next to
the FileScale Path field to select
the FileScale program executable file required to execute the
component.

 

Specify Number of Process Child

Select this check box and enter the number of child processes to
use to carry out aggregation.

 

Custom FileScale Parameter (separated by,)

Enter the parameters for any specific operation you want to add to
the FileScale executable call.

 

Set temporary path

Select this check box to set the directory for temporary files of
tFSJoin when you are working on
large files. Make sure the directory for temporary files already
exists and it is in a partition that has adequate disk space.

Note

Avoid using the system partition to store temporary files
because it may not be accessible to applications.

 

Custom Hash Ratio

Enter a ratio between the maximum memory used to execute the
FileScale process and input file size. If the size of any of the
input files is under this ratio, an in-memory join-hash algorithm is
tried first, to optimize performance.

For example, For a default value of this parameter equals to 0.5,
and if the memory of your computer is1GB, the FileScale process will
use an in-memory join-hash algorithm when the input file size is
under 500MB.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component handles files therefore it does not require input
or output data flows. It is used to join data in large scale
files.

Limitation

Limitations depend on the limits imposed by the physical memory
and CPU architecture. For example, the total length of processed
files cannot exceed the file system limit for LargeFile support
(maximum value of 64 signed bits).

Scenario: Combining data from two columns and outputting the joined and rejected
data

Warning

Make sure that you have unzipped and saved locally the FileScale executable file
delivered by Talend. You must define the path of this executable
file in the Advanced settings view of tFSJoin.

This scenario describes a Job that uses the tFSJoin
component to combine data from an input (leftset) file with data from a
lookup (rightset) file by using one column common to both as the join key. This Job also
outputs the rejected data, data that does not have a match in the lookup file.

Dropping and linking the components

  1. Drop the following components from the Palette to the design workspace: two tRowGenerator components, two tFileOutputDelimited components and one tFSJoin component.

    Use_Case_FSJoin.png
  2. Connect the tRowGenerator components to
    the tFileOutputDelimited components using
    Row > Main
    links.

  3. Use Trigger > OnSubjobOk links to connect the two tRowGenerator components together and then the second
    tRowGenerator to tFSJoin.

Configuring the components

In this scenario, the first tRowGenerator
component will generate the lookup data according to the schema you define in the
component editor and will send it to the lookup file. The second tRowGenerator component will generate the main data
according to the schema you define in the component editor and will send it to the
input file.

Note

You must have at least one column common to both the main and lookup files in
order to be able to combine data according to this column.

If data generation is errorless, the tFSJoin
component will compare data in both files according to the join key, combine common
data and writes it in an output file and finally writes rejected data in another
output file.

Configuring the tRowGenerator and tFileOutputDelimited components

  1. Click tRowGenerator to display its
    Basic settings view and define the
    component properties.

  2. Click the […] button next to RowGenerator Editor to open the component editor
    where you can define your schema.

    Use_Case_FSJoin1.png
  3. In the upper half of the editor, click the plus button to add the columns
    you want to write in the input file.

  4. Define the schema and set the parameters of the columns.

    In this scenario, the lookup file contains three columns:
    firstname_client, lastname_client
    and id_client.

    If required, click the Preview tab in the
    lower half of the editor to display the corresponding view and then click
    the View button to display a sample of the
    generated data.

  5. Click tFileOutputDelimited to display its
    Basic settings view and define the
    component properties.

    Use_Case_FSJoin2.png
  6. Click the Edit schema button to display
    the schema you defined in the editor and modify it if required.

  7. In the design workspace, double-click the second tRowGenerator to open its editor and define the main input
    schema as you did with the lookup schema.

    In this scenario, the main input file contains three columns:
    id_command, price_command and
    id_client.

    Use_Case_FSJoin3.png
  8. Click the second tFileOutputDelimited to
    display its Basic settings view and define
    the component properties.

    Use_Case_FSJoin4.png
  9. Click the Edit schema button to display
    the input schema you defined in the editor and modify it if required.

Configuring the tFSJoin component

  1. Click tFSJoin to open its Basic settings view and define the component
    properties.

    Use_Case_FSJoin5.png
  2. Click the Edit schema button to display a
    dialog box. Here you can define your column schema. This schema must
    correspond to the input file schema.

  3. In the Input File area, set the
    properties of the input file and click Edit
    schema
    to view/modify the input schema, if required.

    Do the same in the Rightset File
    area.

  4. Set the record and field separators in the corresponding fields.

  5. In the Join Key table, click the plus
    button to add a line in the table and then click in the line and select a
    column from the input file and one of the lookup file. You want to use these
    columns common to both files as the value to link data. Repeat the operation
    to add as many join keys as needed.

    Warning

    You can use only the final columns in the input and lookup
    files as join keys.

  6. From the Join Mode list, select the mode
    you want to use as a base to join data.

  7. In the Output File Name field, set the
    path to the output file that will hold the combined data.

  8. Select the Output Reject File check box
    if you want to output the rejected data after the join. Set in the field
    that displays the path to the reject file.

  9. Click the Advanced settings tab to
    display the advanced settings view.

    Use_Case_FSJoin6.png
  10. Select the Assign FileScale Path check
    box to display the FileScale Path field and
    then browse to the executable file delivered by Talend.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to
    execute the Job.

    A progress bar displays below the tFSJoin
    component in the design workspace to show the completed percentage of the
    operation. This progress bar will make it evident how the huge input data is
    partitioned at a very high speed.

    When the percentage progress bar reaches 100%, combined data for matched
    pairs is written in the output file and data without a match is written in
    the reject output file.

    Below is a sample of the output joined client data. In the output file,
    you can see client first and last names combined with the command id and
    command price based on the client id.

    Use_Case_FSJoin7.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x