July 30, 2023

tFirstnameMatch – Docs for ESB 7.x

tFirstnameMatch

Matches first names against a reference index in order to standardize
data.

tFirstnameMatch compares the first name column from the input flow with
first names in an embedded reference index and outputs the matching first names.
It does not support Chinese characters.

This index has first names for about 162 countries, and it has more than
1000 reference first names for some countries.

tFirstnameMatch checks first names against an index file embedded
in the component itself. This component searches first names in the index file according
to the input gender and input country you specify in the component settings. When you do
not use the gender and country as a search basis, first names are searched throughout
all the index, whatever the country is.

The index file has reference first names for about 162 countries. Some of the countries
listed in the index have more than 1000 reference first names. Such countries include
USA, GBR, AUS, IRL, CAN, FRA, NZL, CHE and NLD. For example, the index file has more
than 8000 American first names, more than 4000 British first names, more than 2000
Australian first names and so on.

Some other countries have less than 1000 reference first names stored in the index file.
For such countries, it is advisable not to select a country column so that the input
first name is checked against all reference first names of all countries in the index
file.

tFirstnameMatch Standard properties

These properties are used to configure tFirstnameMatch running in the Standard Job framework.

The Standard
tFirstnameMatch component belongs to the Data Quality family.

This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

One read-only column, FIRSTNAMEMATCH is
added to the output schema automatically.

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see
Talend Studio User
Guide
.

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and job designs. Related topic: see
Talend Studio User
Guide
.

First Names

Select the column that contains first names.

Use Gender

Optional parameter: select this check box and then from the list,
select the column that contains the gender. This will optimize
system performance and give more precise results.

Expected genders are M (masculine) and F (Feminine).

Use Country

Optional parameter: select this check box and then from the list,
select the column that contains the country ISO 3166-1 alpha-3
codes. This will optimize system performance and give more precise
results.

Fuzzy Search

Select this check box if you want to get the best match possible,
including approximate matches.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job
level as well as at each component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is not startable and it requires input and output
components.

Limitation/prerequisite

The index used to standardize the first names is embedded in this
component. For the time being, it is able to handle Latin
names.

Matching first names with a reference index

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario describes a four-component Job aiming at matching the
name column of an input flow with the reference index.

The output of this first name match is displayed in the FIRSTNAMEMATCH
output column along with all other columns defined in the input schema of the
tFirstnameMatch component.

Dropping the components and linking them together

  1. Drop the following components from the Palette to the design workspace: tFixedFlowInput, tFilterColumns, tFirstnameMatch and tLogRow.
  2. Connect the first three components using Row > Main links.
  3. Connect tFirstnameMatch to
    tLogRow using a Row > Output link.
tFirstnameMatch_1.png

Configuring the input data

  1. Double-click tFixedFlowInput to display the Basic
    settings
    view and define the component properties.
  2. Click the […] button next to
    Edit schema to open a dialog box and add as many
    columns as needed to the input schema.

    tFirstnameMatch_2.png

    In this example, the input data flow is made of several columns including one
    for first names (name), two for country codes
    (iso2 and iso3) and one
    for gender (gender).

  3. In the Mode area, select
    the Use Inline Content (delimited file)
    option to display the corresponding view.

    tFirstnameMatch_3.png

  4. Set the row and field separators in the corresponding fields,
    if any.
  5. In the Content area,
    type in the data for the input flow according to the schema you defined
    earlier.

Configuring the process of matching data

You need to select the data columns of interest before matching them using
tFirstnameMatch.

  1. Click the tFilterColumns component to
    display its Basic settings view and define
    the component properties.
  2. Click the […] button next to
    Edit schema to open a dialog box.
  3. Select the name and
    gender columns from the input schema and move them to
    the output schema.

    tFirstnameMatch_4.png

  4. Click OK to validate your changes and
    close the dialog box.
  5. Click tFirstnameMatch to display the
    Basic settings view and define the
    component properties.

  6. Click the […] button next to
    Edit schema to view the input and output schemas, and
    then click OK to close the dialog
    box.

    The output schema of this component is the same as the input schema plus one
    fixed column: FIRSTNAMEMATCH.

    tFirstnameMatch_5.png

  7. From the First Names
    list, select the column that holds the first names, name
    in this example.
  8. If required, select the Use
    Gender
    or the Use Country
    check box and, from the list, select the column that contains the gender or
    country respectively.

    This will optimize system performance and will give more precise
    results.
  9. If required, select the Fuzzy Search
    check box if you want to get the first-name best match possible, in case
    several matches are available.

Executing the Job

  1. Click the tLogRow
    component to display the Basic settings
    view and define its properties according to the display mode you prefer.
  2. In the Mode area, select
    Table (print values in cells of a
    table)
    .
  3. Save the Job and press F6 to execute it.
tFirstnameMatch_6.png

All the output columns including FIRSTNAMEMATCH are listed in the
Run console. The FIRSTNAMEMATCH column
outputs the best match possible of the first names.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x