August 15, 2023

Scenario 1: Grouping output data in separate flows according to the minimal distance computed in each record – Docs for ESB 6.x

Scenario 1: Grouping output data in separate flows according to the minimal distance
computed in each record

This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.

This scenario describes a basic Job that compares columns in the input file using the
Jaro-Winkler matching method on the
lname and fname column and the q-grams matching method on the address1
column. It then groups the output records in three output flows:

  • Uniques: lists the records which group
    score (minimal distance computed in the record) is equal to
    1.

  • Matches: lists the records which group
    score (minimal distance computed in the record) is higher than the threshold
    you define in the Confidence threshold
    field.

  • Suspects: lists the records which group
    score (minimal distance computed in the record) is below the threshold you
    define in the Confidence threshold
    field.

use_case_tmatchgroup.png

For another scenario that groups the output records in one single output flow, see
Scenario 2: Comparing columns and grouping in the output flow duplicate records that have the same functional key.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x