Scenario 1: Grouping output data in separate flows according to the minimal distance
computed in each record
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
Jaro-Winkler matching method on the
lname and fname column and the q-grams matching method on the address1
column. It then groups the output records in three output flows:
-
Uniques: lists the records which group
score (minimal distance computed in the record) is equal to
1. -
Matches: lists the records which group
score (minimal distance computed in the record) is higher than the threshold
you define in the Confidence threshold
field. -
Suspects: lists the records which group
score (minimal distance computed in the record) is below the threshold you
define in the Confidence threshold
field.
For another scenario that groups the output records in one single output flow, see
Scenario 2: Comparing columns and grouping in the output flow duplicate records that have the same functional key.