Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.
Component family |
Data Quality |
|
Function |
tFirstnameMatch compares the This index has first names for about 162 countries, and it has |
|
Purpose |
Helps ensuring the data quality of first names against a reference |
|
Basic settings |
Schema and Edit |
A schema is a row description, it defines the number of fields to be processed and Since version 5.6, both the Built-In mode and the Repository mode are One read-only column, FIRSTNAMEMATCH is |
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
First Names |
Select the column that contains first names. |
|
Use Gender |
Optional parameter: select this check box and then from the list, Expected genders are M (masculine) and F (Feminine). |
|
Use Country |
Optional parameter: select this check box and then from the list, |
|
Fuzzy Search |
Select this check box if you want to get the best match possible, |
Advanced settings |
tStatCatcher Statistics |
Select this check box to gather the processing metadata at the Job |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is not startable and it requires input and output |
|
Limitation/prerequisite |
The index used to standardize the first names is embedded in this |
tFirstnameMatch checks first names against an index
file embedded in the component itself. This component searches first names in the index
file according to the input gender and input country you specify in the component
settings. When you do not use the gender and country as a search basis, first names are
searched throughout all the index, whatever the country is.
The index file has reference first names for about 162 countries. Some of the
countries listed in the index have more than 1000 reference first names. Such countries
include USA, GBR, AUS, IRL, CAN, FRA, NZL, CHE and NLD. For example, the index file has
more than 8000 American first names, more than 4000 British first names, more than 2000
Australian first names and so on.
Some other countries have less than 1000 reference first names stored in the index
file. For such countries, it is advisable not to select a country column so that the
input first name is checked against all reference first names of all countries in the
index file.
This scenario describes a four-component Job aiming at matching the
name column of an input flow with the reference index.
The output of this first name match is displayed in the
FIRSTNAMEMATCH output column along with all other columns
defined in the input schema of the tFirstnameMatch
component.
To drop and link the components of interest, proceed as follows:
-
Drop the following components from the Palette to the design workspace: tFixedFlowInput, tFilterColumns, tFirstnameMatch and tLogRow.
-
Connect the first three components using Row >
Main links. -
Connect tFirstnameMatch to tLogRow using a Row >
Output link.
To configure the input data, perform the following operations:
-
Double-click tFixedFlowInput to display
the Basic settings view and define the
component properties. -
From the Schema list, set the schema type
to Built-In and click the three-dot button
next to Edit Schema. A dialog box
displays. -
Click the plus button to add as many lines as needed for the input schema
you want to create from internal variables.In this example, the input data flow is made of several columns including
one for first names (name), two for country codes
(iso2 and iso3) and one for
gender (gender). -
Click OK to close the dialog box.
The defined columns display in the Mode
area of the component basic settings view. -
In the Mode area, select the Use Inline Content (delimited file) option to
display the corresponding view. -
Set the row and field separators in the corresponding fields. You want to
use these defined separators in your input flow. -
In the Content area, type in the data for
the input flow according to the schema you defined earlier.
To do this, you need to select the data columns of interest and then match them
using tFirstnameMatch.
-
Click the tFilterColumns component to
display its Basic settings view and define
the component properties.The tFilterColumns component enables you
to build the output schema based on the column names of the input
schema. -
Click the three-dot button next to Edit
schema to display a dialog box where you can define the
output schema. -
Select the name and gender
columns from the input schema and move them to the output schema. -
Click OK to validate your changes and
close the dialog box. -
Click tFirstnameMatch to display the
Basic settings view and define the
component properties. -
If required, click the three-dot button next to Edit
schema to view the input and output schemas, and then click
OK to close the dialog box.Note
The output schema of this component is the same as the input schema
plus one fixed column: FIRSTNAMEMATCH. -
From the First Names list, select the
column that holds the first names, name in this
example. -
If required, select Use Gender or
Use Country check box and then select
from the list the column that contains the gender or country respectively.
This will optimize system performance and will give more precise
results. -
If required, select the Fuzzy Search
check box if you want to get the first-name best match possible, in case
several matches are available.
To do this, simply click tLogRow to display the
Basic settings view and define the component
properties according to the display mode you prefer.
In the Mode area, select Table (print values in cells of a table).
Then save the Job and press F6 to execute
it.
All the output columns including FIRSTNAMEMATCH are listed in
the Run console. The
FIRSTNAMEMATCH column outputs the best match possible of
the first names.