August 17, 2023

tFirstnameMatch – Docs for ESB 5.x

tFirstnameMatch

tFirstnameMatch_icon32_white.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tFirstnameMatch properties

Component family

Data Quality

 

Function

tFirstnameMatch compares the
first name column from the input flow with first names in an
embedded reference index and outputs the matching first
names.

This index has first names for about 162 countries, and it has
more than 1000 reference first names for some countries. For further
information, see About the reference index embedded in tFirstnameMatch.

Purpose

Helps ensuring the data quality of first names against a reference
index in order to standardize data.

Basic settings

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

One read-only column, FIRSTNAMEMATCH is
added to the output schema automatically.

 

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see Talend Studio User
Guide
.

 

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and job designs. Related topic: see Talend Studio User
Guide
.

 

First Names

Select the column that contains first names.

 

Use Gender

Optional parameter: select this check box and then from the list,
select the column that contains the gender. This will optimize
system performance and give more precise results.

Expected genders are M (masculine) and F (Feminine).

 

Use Country

Optional parameter: select this check box and then from the list,
select the column that contains the country ISO 3166-1 alpha-3
codes. This will optimize system performance and give more precise
results.

 

Fuzzy Search

Select this check box if you want to get the best match possible,
including approximate matches.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job
level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is not startable and it requires input and output
components.

Limitation/prerequisite

The index used to standardize the first names is embedded in this
component. For the time being, it is able to handle Latin
names.

About the reference index embedded in tFirstnameMatch

tFirstnameMatch checks first names against an index
file embedded in the component itself. This component searches first names in the index
file according to the input gender and input country you specify in the component
settings. When you do not use the gender and country as a search basis, first names are
searched throughout all the index, whatever the country is.

The index file has reference first names for about 162 countries. Some of the
countries listed in the index have more than 1000 reference first names. Such countries
include USA, GBR, AUS, IRL, CAN, FRA, NZL, CHE and NLD. For example, the index file has
more than 8000 American first names, more than 4000 British first names, more than 2000
Australian first names and so on.

Some other countries have less than 1000 reference first names stored in the index
file. For such countries, it is advisable not to select a country column so that the
input first name is checked against all reference first names of all countries in the
index file.

Scenario: Matching first names with a reference index

This scenario describes a four-component Job aiming at matching the
name column of an input flow with the reference index.

The output of this first name match is displayed in the
FIRSTNAMEMATCH output column along with all other columns
defined in the input schema of the tFirstnameMatch
component.

Use_Case_FirstnameMatch.png

Dropping the components and linking them together

To drop and link the components of interest, proceed as follows:

  1. Drop the following components from the Palette to the design workspace: tFixedFlowInput, tFilterColumns, tFirstnameMatch and tLogRow.

  2. Connect the first three components using Row >
    Main
    links.

  3. Connect tFirstnameMatch to tLogRow using a Row >
    Output
    link.

Configuring the input data

To configure the input data, perform the following operations:

  1. Double-click tFixedFlowInput to display
    the Basic settings view and define the
    component properties.

    Use_Case_FirstnameMatch1.png
  2. From the Schema list, set the schema type
    to Built-In and click the three-dot button
    next to Edit Schema. A dialog box
    displays.

    Use_Case_FirstnameMatch2.png
  3. Click the plus button to add as many lines as needed for the input schema
    you want to create from internal variables.

    In this example, the input data flow is made of several columns including
    one for first names (name), two for country codes
    (iso2 and iso3) and one for
    gender (gender).

  4. Click OK to close the dialog box.

    The defined columns display in the Mode
    area of the component basic settings view.

  5. In the Mode area, select the Use Inline Content (delimited file) option to
    display the corresponding view.

    Use_Case_FirstnameMatch3.png
  6. Set the row and field separators in the corresponding fields. You want to
    use these defined separators in your input flow.

  7. In the Content area, type in the data for
    the input flow according to the schema you defined earlier.

Configuring the process of matching data

To do this, you need to select the data columns of interest and then match them
using tFirstnameMatch.

  1. Click the tFilterColumns component to
    display its Basic settings view and define
    the component properties.

    Use_Case_FirstnameMatch4.png

    The tFilterColumns component enables you
    to build the output schema based on the column names of the input
    schema.

  2. Click the three-dot button next to Edit
    schema
    to display a dialog box where you can define the
    output schema.

  3. Select the name and gender
    columns from the input schema and move them to the output schema.

  4. Click OK to validate your changes and
    close the dialog box.

  5. Click tFirstnameMatch to display the
    Basic settings view and define the
    component properties.

    Use_Case_FirstnameMatch5.png
  6. If required, click the three-dot button next to Edit
    schema
    to view the input and output schemas, and then click
    OK to close the dialog box.

    Note

    The output schema of this component is the same as the input schema
    plus one fixed column: FIRSTNAMEMATCH.

    Use_Case_FirstnameMatch6.png
  7. From the First Names list, select the
    column that holds the first names, name in this
    example.

  8. If required, select Use Gender or
    Use Country check box and then select
    from the list the column that contains the gender or country respectively.
    This will optimize system performance and will give more precise
    results.

  9. If required, select the Fuzzy Search
    check box if you want to get the first-name best match possible, in case
    several matches are available.

Executing the Job

To do this, simply click tLogRow to display the
Basic settings view and define the component
properties according to the display mode you prefer.

In the Mode area, select Table (print values in cells of a table).

Then save the Job and press F6 to execute
it.

Use_Case_FirstnameMatch7.png

All the output columns including FIRSTNAMEMATCH are listed in
the Run console. The
FIRSTNAMEMATCH column outputs the best match possible of
the first names.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x