Scenario 2: Creating a synonym index for people names using tMap
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
In this scenario, a four-component Job creates an index storing people names and their
relative nicknames.
The source data to be used in this scenario is stored in a .csv
file, an extract of which is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4 France;Anne;Ninon;Annie;Ninette;Ann France;Bernadette;Nad;Netty;Dadette France;Albert;Al France;Alexandre;Alex France;Alfred-Hubert;Alu France;Andrew;Andy France;Anthony;Anton;Tony;Tonio France;Artus;Artie France;Benoit;Ben France;Catherine;Cate;Katherine;Kathryn France;Charles;Charlie;Charlot;Chuck France;Christophe;Christian;Chris;Kris;Kristof France;Christian;Chris |
This data describes people’s home country (not to be inserted into the index), first
names (reference entries) and frequently used nicknames (synonyms).
The four components used in this Job are:
-
tFileInputDelimited: this component reads the
source data and inputs them to tSynonymOutput. -
tMap: this component is used to transform the
source data into two separated columns representing the first names and the
nicknames, in the meantime, ignoring the people’s home country
information. -
tSynonymOutput: this component creates the
index of interest in this scenario and feeds it with the synonyms given in the
source file. -
tLogRow: this component lists the data that
have been inserted into the newly created index.