August 16, 2023

Configuring key generation – Docs for ESB 6.x

Configuring key generation

  1. Double-click tGenKey
    to display the Basic
    settings
    view and define the component
    properties.

    Use_Case_tGenKey11.png

    You can click

    match_rule_import_icon.png

    and import blocking keys from
    the match rules created with the VSR algorithm and
    tested in the
    Profiling
    perspective of

    Talend Studio
    and use them in your
    Job. Otherwise, define the blocking key parameters
    as described in the below steps.

  2. Under the Algorithm
    table, click the [+] button to add a row in this
    table.
  3. On the column column,
    click the newly added row and select from the list
    the column you want to process using an algorithm.
    In this example, select
    DoB.
  4. On the algorithm
    column, click the newly added row and select from
    the list the algorithm you want to apply to the
    corresponding column. In this example, select
    substring(a,b).
  5. Click in the value column and enter the value
    for the selected algorithm, when needed. In this scenario, type in
    6;10.

    The substring(a,b) algorithm allows you to extract the
    characters from a string, between two specified indices, and to return the new
    substring. First character is at index 0. In this scenario,
    for a given DoB
    21-01-1995“, 6;10 will return only
    the year of birth, that is to say “1995” which is the substring from the 7th to
    the 10th character.
    In this example, we want to generate a functional key that holds the last four
    characters of the date of birth, which correspond to the year of birth, for each
    of the data rows and we do not want to define any extra options on these
    columns.
    You can select the Show help check box to
    display instructions on how to set algorithms/options parameters.
    Once you have defined the tGenKey properties,
    you can display a statistical view of these parameters. To do so:
  6. Right-click on the tGenKey component and
    select View Key Profile in the contextual
    menu.

    Use_Case_tGenKey12.png

    The View Key Profile editor displays,
    allowing you to visualize statistics regarding the number of blocks and to adapt
    the parameters according to the results you want to get.
    Note:

    When you are processing a large amount of data and when this component is
    used to partition data in order to use them in a matching component (such as
    tRecordMatching or tMatchGroup), it is preferable to have a limited number of rows
    in one block. An amount of about 50 rows per block is considered optimal,
    but it depends on the number of fields to compare, the total number of rows
    and the time considered acceptable for data processing.

    From the key editor, you can:

    • edit the Limit
      of rows used to calculate the statistics.

    • click match_rule_import_icon.png and import blocking keys
      from the Studio repository and use them in your Job.

    • edit the input column you want to process using an
      algorithm.

    • edit the parameters of the algorithm you want to
      apply to input columns.

    Every time you make a modification, you can see its implications by clicking
    the Refresh button which is located at the top
    right part of the editor.
  7. Click OK to close the
    View Key Profile
    editor.

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x