August 15, 2023

Scenario 2: Computing suspect pairs and suspect sample from source data – Docs for ESB 6.x

Scenario 2: Computing suspect pairs and suspect sample from source
data

This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.

In this example, tMatchPairing uses a blocking key to compute the
pairs of suspect duplicates in a list of early childhood education centers in
Chicago.

The use case described here uses:

  • a tFileInputDelimited component to read the source file,
    which contains a list of early childhood education centers in Chicago coming
    from ten different sources;

  • a tMatchPairing component to pre-analyze the data, compute
    pairs of suspect duplicates and generate a pairing model which is used by the
    tMatchPredict component;

  • three tFileOutputDelimited
    components to output the suspect duplicates, a sample of suspect pairs and the
    unique records; and

  • a tLogRow component to
    output the exact duplicates.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x