August 15, 2023

Matching measures – Docs for ESB 6.x

Matching measures

To compare one attribute of two records, you can use any of the implemented matching
functions, such as Exact, Levenshtein and
Jaro-Winkler, or a custom matching algorithm you
created.

You can also compare two records on many attributes. For two records to match, the two
following conditions must hold:

  • When using the T-Swoosh algorithm, the score for each matching function in the
    match rule must exceed the threshold, if any specified. By default, the
    threshold is set to 1. This means exact match for most
    matching functions, excepted for Exact – ignore case and
    potentially any custom matching function.

  • The global score, computed as a weighted score of the different matching
    functions, must exceed the match threshold. The score is equal to
    Σ(wi ×
    si(r1,r2)) / Σwi where
    wi is the confidence
    weight of the matching function i and
    si(r1,r2)
    is the score of the matching function i over
    records r1 and
    r2.

matching_measures_example_tmatchgroup.png

In this example, the score for the Levenshtein metric on the
attribute country must exceed 0.7 and
the global score, with a confidence weight of 1 on each of
the two measures, must exceed 0.85.

matching_measures_example2_tmatchgroup.png

This example shows the weighted average computation that yields the global score of
two similar records.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x