Scenario: Comparing four columns using different matching methods and collecting
encountered duplicates
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
This scenario describes a four-component job aiming at collecting in two separate files
all unique entries and all duplicate entries from few defined processed columns based on the
Levenshtein and Double Metaphone matching types.
The input file in this example looks like the following:
1 2 3 4 |
ID;Status;FirstName;Email;City;Initial;ZipCode 1;married;Paul;pnewman@comp.com;New York;P.N.;55677 2;single;Raul;rnewman@comp.com;New Ork;R.N.;55677 3;single;Mary;mnewman@comp.com;Chicago;M.N;66898 |
Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Login
0 Comments