tFirstnameMatch
Matches first names against a reference index in order to standardize
data.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
tFirstnameMatch compares the first name column from the input flow with
first names in an embedded reference index and outputs the matching first names.
It does not support Chinese characters.
This index has first names for about 162 countries, and it has more than
1000 reference first names for some countries.
tFirstnameMatch checks first names against an index file embedded
in the component itself. This component searches first names in the index file according
to the input gender and input country you specify in the component settings. When you do
not use the gender and country as a search basis, first names are searched throughout
all the index, whatever the country is.
The index file has reference first names for about 162 countries. Some of the countries
listed in the index have more than 1000 reference first names. Such countries include
USA, GBR, AUS, IRL, CAN, FRA, NZL, CHE and NLD. For example, the index file has more
than 8000 American first names, more than 4000 British first names, more than 2000
Australian first names and so on.
Some other countries have less than 1000 reference first names stored in the index file.
For such countries, it is advisable not to select a country column so that the input
first name is checked against all reference first names of all countries in the index
file.
tFirstnameMatch Standard properties
These properties are used to configure tFirstnameMatch running in the Standard Job framework.
The Standard
tFirstnameMatch component belongs to the Data Quality family.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description, it defines the number of fields to be processed and One read-only column, FIRSTNAMEMATCH is |
 |
Built-in: The schema will be |
 |
Repository: The schema already |
First Names |
Select the column that contains first names. |
Use Gender |
Optional parameter: select this check box and then from the list, Expected genders are M (masculine) and F (Feminine). |
Use Country |
Optional parameter: select this check box and then from the list, |
Fuzzy Search |
Select this check box if you want to get the best match possible, |
Advanced settings
tStatCatcher Statistics |
Select this check box to gather the processing metadata at the Job |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is not startable and it requires input and output |
Limitation/prerequisite |
The index used to standardize the first names is embedded in this |
Matching first names with a reference index
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
This scenario describes a four-component Job aiming at matching the
name column of an input flow with the reference index.
The output of this first name match is displayed in the FIRSTNAMEMATCH
output column along with all other columns defined in the input schema of the
tFirstnameMatch component.
Dropping the components and linking them together
- Drop the following components from the Palette to the design workspace: tFixedFlowInput, tFilterColumns, tFirstnameMatch and tLogRow.
- Connect the first three components using Row > Main links.
-
Connect tFirstnameMatch to
tLogRow using a Row > Output link.
Configuring the input data
-
Double-click tFixedFlowInput to display the Basic
settings view and define the component properties. -
Click the […] button next to
Edit schema to open a dialog box and add as many
columns as needed to the input schema.In this example, the input data flow is made of several columns including one
for first names (name), two for country codes
(iso2 and iso3) and one
for gender (gender). -
In the Mode area, select
the Use Inline Content (delimited file)
option to display the corresponding view. -
Set the row and field separators in the corresponding fields,
if any. -
In the Content area,
type in the data for the input flow according to the schema you defined
earlier.
Configuring the process of matching data
tFirstnameMatch.
-
Click the tFilterColumns component to
display its Basic settings view and define
the component properties. -
Click the […] button next to
Edit schema to open a dialog box. -
Select the name and
gender columns from the input schema and move them to
the output schema. -
Click OK to validate your changes and
close the dialog box. -
Click tFirstnameMatch to display the
Basic settings view and define the
component properties. -
Click the […] button next to
Edit schema to view the input and output schemas, and
then click OK to close the dialog
box.The output schema of this component is the same as the input schema plus one
fixed column: FIRSTNAMEMATCH. -
From the First Names
list, select the column that holds the first names, name
in this example. -
If required, select the Use
Gender or the Use Country
check box and, from the list, select the column that contains the gender or
country respectively.This will optimize system performance and will give more precise
results. -
If required, select the Fuzzy Search
check box if you want to get the first-name best match possible, in case
several matches are available.
Executing the Job
-
Click the tLogRow
component to display the Basic settings
view and define its properties according to the display mode you prefer. -
In the Mode area, select
Table (print values in cells of a
table). - Save the Job and press F6 to execute it.
All the output columns including FIRSTNAMEMATCH are listed in the
Run console. The FIRSTNAMEMATCH column
outputs the best match possible of the first names.