Defining the match rule

In the tMatchGroup basic settings, click
Preview to open the configuration wizard
and define the matching key and the survivorship function.

You can use the configuration wizard to import match rules created and tested
in the studio and stored in the repository, and use them in your match Jobs. For
further information, see Importing match rules from the studio repository.

It is important to have the same type of the matching algorithm selected in
the basic settings of the component and defined in the configuration wizard.
Otherwise the Job runs with default values for the parameters which are not
compatible between the two algorithms.
Define the match rule as the following:
- In the Key definition table, click
  the [+] button to add a line in the
  table. Click in the Input Key Attribute
  column and select the column on which you want to do the matching
  operation, first_name in this scenario.
- Click in the Matching Function column
  and select Soundex from the list. This
  method matches processed entries according to a standard English
  phonetic algorithm which indexes strings by sound, as pronounced in
  English.
- From the Tokenized measure list,
  select not to use a tokenized distance for the selected
  algorithm.
- Set the Threshold to 0.8 and the Confidence Weight to 1.
- Select Null Match None in the
  Handle Null column in order to have
  matching results where null values have minimal effect.
- Select Most common in the Matching Function column. This method
  validates the most frequent name value in each group of
  duplicates.
Define the survivorship rule as the following:
- In the Default Survivorship Rules
  table, click the [+] button to add a
  line in the table. Click in the Data
  Type column and select Number.
- Click in the Survivorship Function
  column and select Largest (for numbers)
  from the list. This method validates the largest numerical value in each
  group.
Set the Hide groups of less than parameter in
order to decide what groups to show in the result chart and matching table. This
parameter enables you to hide groups of small group size.
Click the Chart button in the wizard to
execute the Job in the defined configuration and have the results directly in
the wizard.

The matching chart gives a global picture about the duplicates in the analyzed
data. The matching table indicates the details of the items in each group,
colors the groups in accordance with their color in the matching chart and
indicates with true the records which are
master records. The master record in each group is the result of merging two
similar records according to the phonetic algorithm and survivorship rule. The
master record is a new record that does not exist in the input data.
Click OK to close the wizard.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 6.x

0 Comments

Inline Feedbacks

View all comments

Defining the match rule – Docs for ESB 6.x

Defining the match rule

My Website Links

Tags