Scenario: Labeling suspect pairs with assigned labels
This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.
For further information about the two workflows used when
matching with Spark, see the documentation on Talend Help Center (https://help.talend.com).
The use case described here uses:
-
a tFileInputDelimited
component to read the input suspect pairs generated by tMatchPairing; -
a tMatchPredict component to
label suspect records automatically and groups together suspect records which
match the label set in the component properties; and -
a tFileOutputDelimited component output the
labeled duplicate records and the groups created on the suspect records which
match the label set in tMatchPredict properties.