July 30, 2023

tPersonator – Docs for ESB 7.x

tPersonator

Ensures the quality of a US and Canadian contact database by checking,
verifying, moving and appending contact data.

This component is the result of Talend collaboration with Melissa Data, one of the world leaders for global address
validation.

For more information about Melissa Data
and its software tools, visit the website.

Using the Personatorâ„¢
Consumer Web Service, the tPersonator component verifies, corrects and adds data to
enrich your database. You can use several tPersonator components to perform one or more
actions.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.

Setting up the Genderization Policy parameter of
tPersonator

The behavior of the Genderization Policy
parameter, also known as gender aggression, depends on the Gender Population parameter. The following table shows the results
of the tPersonator component according to the
options that you set:

Gender Population First name
gender
Male Neutral Female
Always Often Commonly Always Commonly Often Always
Mixed
Policy Neutral M M N N N F F
Conservative M N N N N N F
Aggressive M M M N F F F
Male
Policy Neutral M M M N N F F
Conservative M M N N N N F
Aggressive M M M M N F F
Female
Policy Neutral M M N N F F F
Conservative M N N N N F F
Aggressive M M N F F F F

tPersonator output schema

The tPersonator component generates
read-only columns that are all of type String.

Here is the list of the output columns in the original order:

  • md_RecordID
  • md_AddressDeliveryInstallation
  • md_AddressExtras
  • md_AddressKey
  • md_AddressLine1
  • md_AddressLine2
  • md_AddressLockBox
  • md_AddressPostDirection
  • md_AddressPreDirection
  • md_AddressPrivateMailboxName
  • md_AddressPrivateMailboxRange
  • md_AddressRouteService
  • md_AddressStreetName
  • md_AddressStreetSuffix
  • md_AddressSuiteName
  • md_AddressSuiteNumber
  • md_AddressTypeCode
  • md_AreaCode
  • md_CBSACode
  • md_CBSADivisionCode
  • md_CBSADivisionLevel
  • md_CBSADivisionTitle
  • md_CBSALevel
  • md_CBSATitle
  • md_CarrierRoute
  • md_CensusBlock
  • md_CensusKey
  • md_CensusTract
  • md_City
  • md_CityAbbreviation
  • md_CompanyName
  • md_CongressionalDistrict
  • md_CountryCode
  • md_CountryName
  • md_CountyFIPS
  • md_CountyName
  • md_CountySubdivisionCode
  • md_CountySubdivisionName
  • md_DateOfBirth
  • md_DateOfDeath
  • md_DeliveryIndicator
  • md_DeliveryPointCheckDigit
  • md_DeliveryPointCode
  • md_DemographicsGender
  • md_DemographicsResults
  • md_DomainName
  • md_ElementarySchoolDistrictCode
  • md_ElementarySchoolDistrictName
  • md_EmailAddress
  • md_Gender
  • md_Gender2
  • md_HouseholdIncome
  • md_Latitude
  • md_LengthOfResidence
  • md_Longitude
  • md_MailboxName
  • md_MaritalStatus
  • md_MelissaAddressKey
  • md_NameFirst
  • md_NameFirst2
  • md_NameFull
  • md_NameLast
  • md_NameLast2
  • md_NameMiddle
  • md_NameMiddle2
  • md_NamePrefix
  • md_NamePrefix2
  • md_NameSuffix
  • md_NameSuffix2
  • md_NewAreaCode
  • md_Occupation
  • md_OwnRent
  • md_PhoneCountryCode
  • md_PhoneCountryName
  • md_PhoneExtension
  • md_PhoneNumber
  • md_PhonePrefix
  • md_PhoneSuffix
  • md_PlaceCode
  • md_Plus4
  • md_PostalCode
  • md_PresenceOfChildren
  • md_PrivateMailBox
  • md_RecordExtras
  • md_Reserved
  • md_Salutation
  • md_SecondarySchoolDistrictCode
  • md_SecondarySchoolDistrictName
  • md_StateDistrictLower
  • md_StateDistrictUpper
  • md_StateName
  • md_Suite
  • md_TopLevelDomain
  • md_UTC
  • md_UnifiedSchoolDistrictCode
  • md_UnifiedSchoolDistrictName
  • md_UrbanizationName
  • md_Results

tPersonator Standard properties

These properties are used to configure tPersonator
running in the Standard Job framework.

The standard tPersonator component belongs
to the Data Quality family.

Basic Settings

Schema and Edit
schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component.

Click Sync
columns
to retrieve the schema from the previous component connected in the
Job.

Select the Schema type:

  • Built-In: You create and store the schema locally for this component
    only.

  • Repository: You have already created the schema and stored it in the
    Repository. You can reuse it in various projects and Job designs.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

View schema: choose this
option to view the schema only.

Change to built-in property:
choose this option to change the schema to Built-in for local changes.

Update repository connection:
choose this option to change the schema stored in the repository and decide whether
to propagate the changes to all the Jobs upon completion. If you just want to
propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
again in the Repository Content
window.

The output schema contains read-only columns. For more information, see the list of the output
columns
.

Input mapping Associate the Personator
field
with the Input
column
.
Actions Select the actions to perform:

  • Check
    Action
    : Standardizes the data and ensures that
    they are valid. This action analyses each data separately. For
    example, if the zip code does not match the city, the Check Action corrects it,
    without impacting the other data.
  • Verify
    Action
    : Ensures that the different data are
    associated with each other. This action analyses the data as a
    whole. For example, if you perform this action on the addresses,
    the Verify Action
    verifies whether the address is associated with the same name,
    phone and email in other databases.
  • Move
    Action
    : Retrieves the last address. The database
    must at least contain the last name or company name and an
    address.
  • Append
    Action
    : Adds missing data.

Depending on the action, some inputs are
mandatory
.

Centric Hint Available if you select Append Action or
Verify Action.

Select one reference data:

  • Auto:
    Select to use the address as the reference data. If not
    available, the phone number is used. If not available, the
    email is used. If not available, the SSN (Social Security
    Number) is used.
  • Address
  • Phone
  • Email
  • SSN

    This reference data is available if you select
    Verify Action.

Append Options Available if you select Append Action.

Select one action:

  • Blank: Select to append data
    when your database does not contain either the address,
    phone, email, name or company.
  • Check error: Select to append data
    when errors occur to either the address, phone, email, name
    or company:

    • An address error occurs when the address is not found in the
      database, is not partially verified, or cannot be
      corrected. The component does not return the
      result codes: AS01, AS02, or AS03. For more
      information on the result codes, see this description
      table
      .

    • A phone error occurs when the phone number does not contain 7 or 10
      digits. The component does not return the result
      codes: PS01 or PS02. For more information on the
      result codes, see this description
      table
      .

    • An email error occurs when the email is not found in the database,
      or if the email is unconfirmed. The component does
      not return the result codes: ES01 or ES03. For
      more information on the result codes, see this
      description
      table
      .

    • A name error occurs when the name did not parse successfully. The
      component does not return the result code: NS01. For
      more information on the result codes, see this description
      table
      .
    • A company error occurs when the company name is not
      provided.
  • Always: Select to append data,
    regardless of whether the address, phone, email, name or
    company in your database is blank or incorrect.
Address Options Diacritics: auto, on or off. Set
to on to return the French
characters. If set to auto,
those characters are returned if present in your database.

Advanced Address
Correction
: Select to perform an advanced correction
of the address. It uses the full name or company name and can
correct or append house number, street name, city, state and zip
code.

Use Preferred
City
: Select to use the city name preferred by the
postal services.

Name Options Name Hint

  • Definitely
    Full
    : Select to treat the name in this order:
    first name, middle name, last name, regardless of formatting or
    punctuation.
  • Very Likely
    Full
    : Select to treat the name in this order:
    first name, middle name, last name, unless the order is
    indicated by formatting or punctuation.
  • Probably
    Full
    : Select to let the statistical logic
    determine the name order, with a bias toward this order: first
    name, middle name, last name.
  • Varying: Select to let the statistical logic
    determine the name order, with no bias toward either name
    order.
  • Probably
    Inverse
    : Select to let the statistical logic
    determine the name order, with a bias toward this order: last
    name, middle name, first name.
  • Very Likely
    Inverse
    : Select to treat the name in this order:
    last name, middle name, first name, unless the order is
    indicated by formatting or punctuation.
  • Definitely
    Inverse
    : Select to treat the name in this order:
    last name, middle name, first name, regardless of formatting or
    punctuation.
  • Mixed First
    Name
    : Select if the last name misses. Name field
    must only contain prefixes, first name and middle name.
  • Mixed Last
    Name
    : Select if the first and middle names miss.
    Name field must only contain last names and suffixes.

Middle Name Logic

  • Parse
    Logic
    : Select to consider the middle name as
    part of the last name. This consideration is possible if the
    middle name is a common last name. In this case, the last name
    is hyphenated.
  • Hyphenated
    Last
    : Select to consider the second word as part
    of the last name.
  • Middle
    Name
    : Select to consider the second word as a
    middle name.

Salutation Format: Select the
salutation format. For example, John Smith:

  • Formal:
    Mr. Smith
  • Informal: John
  • First/Last: Smith

Gender Population: Mixed, Male, Female
Select the predominant gender in your database.

Genderization Policy:
Neutral, Conservative, Aggressive. For more information on
this option, see the table of results.

Correct First
Name
: Select to correct the spelling of the first
name.

Standardize Company: Select to
apply title cases and abbreviate the company name. For example, melissa data corporation is replaced by Melissa Data Corp..

Email Options
  • Database Lookup: Select to
    verify the domain names using a database of valid domains.
  • Standardize Casing: Select to
    lowercase the email characters before any action.
  • Correct Syntax: Select to
    correct the syntax of the email. This option supports simple
    email syntax: local part + @ + domain
    + ‘.’ + top-level domain
    . For example,
    jsmith@domain,coj is replaced by
    jsmith@domain.com.
  • Update Domain: Select to update
    the domain name if out-of-dated.
Address Output
Groups
Basic (Default): Select to
return the basic address.

Address Details: Select to return the detailed
address.

Plus4: Select to return the +4 code.

PrivateMailBox: Select to return the private mail
box number. These mail boxes are the private mail boxes in
commercial mail receiving agencies.

Suite: Select to return the
apartment number.

Parsed Address: Select to return the address
details.

Geographic Output
Groups
Census: Select to return
census information.

Census2: Select to return more census
information.

GeoCode: Select to return the
geocode.

Other Output Groups Demographic Basic: Select to
return a string containing all the results of the demographics. Commas
delimit the results.

Name
Details
: Select to return the name details such as
the gender, salutation…

Parsed Email: Select to return the
email details such as the domain name, mailbox name…

Parsed Phone:
Select to return the phone number details such as the extension,
prefix…

Depending on the action, some inputs are mandatory.

Action Mandatory inputs
Check The database must at least contain one of the following:

  • Address and zip code
  • Address and city or state
  • Phone number
  • Email
  • Full name
  • First name and last name
Verify The database must at least contain two of the following:

  • Address and zip code or address, city
    and state
  • Phone number
  • Email
  • Full name or last name or first and
    last names
  • Company name
Note: If the database contains only names and
company names, you cannot perform the verify action. The
results cannot be accurate enough.
Move The database must at least contain one of the following:

  • Address and full name
  • Address and first and last names
  • Address and company name
Append The mandatory inputs depend on the data to append.

To append a name or company name, the
database must at least contain one of the following:

  • Address, city and state or address
    and zip code
  • Phone number
  • Email
To append an address, the
database must at least contain one of the following:

  • Phone number
  • Email
To append a phone number, the
database must at least contain one of the following:

  • Address, city and state or address
    and zip code
  • Email
To append an email, the
database must at least contain one of the following:

  • Address, city and state or address
    and zip
  • Phone number

Advanced Settings

tStatCatcher
Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

License
Key
To enter a license key, click the […] button
next to the field.
Number of
Retries
Define the number of retries before the Job
fails.
Timeout in
Seconds
Define the timeout time period.
Cache
Directory
Browse the cache directory.
Batch Request Size
(1-100)
Define the number of messages to be delivered in
each batch.
Multi-Threading Select to use more than one thread in the same job to handle the
response from the Melissa data service.
Thead Count (1-10) Define the maximum number of threads.
Show Debug Console
Output
Select to show the debug console output.

Scenario: Verifying and enriching a database

The Job in this scenario uses the tPersonator component to ensure the
quality of a customer database and enrich this
database.

In this Job, the following data are used:

  • Company name
  • Address
  • State
  • Zip code
  • Full name
  • Phone number
  • Date of company foundation
  • Date of birth

Setting up the Job

  1. Drop the following components from the Palette onto the
    design workspace: tFileInputDelimited,
    tPersonator and tLogRow.
  2. Connect the components together using the Main
    links.
tPersonator_1.png

Configuring the tFileInputDelimited component

  1. Double-click tFileInputDelimited to open its Basic
    settings
    view.

    tPersonator_2.png

  2. Select Built-in as
    Property Type and Schema.
  3. Click […] next to
    Edit schema.
  4. Click the [+] button to
    add the columns.

    tPersonator_3.png

  5. In File name/Stream,
    browse the input file.

Configuring the tPersonator component

  1. Double-click the tPersonator component to open its Basic settings view.
  2. Select Built-in as
    Schema.
  3. To check the schema, click the […] button next to
    Edit schema.

    tPersonator_4.png

  4. In Input Mapping, click
    the [+] button to associate the
    tPersonator fields with the input data:

    1. CompanyName with
      Company
    2. AdressLine1 with
      Address
    3. State with
      State
    4. PostalCode/ZipCode with Zip
    5. FullName with
      FullName
    6. PhoneNumber with
      PhoneNumber
  5. Select Check Action,
    Append Action, Verify Action and Move Action.
  6. In Centric Hint,
    select Address.
  7. In Append Options, select
    Always.
  8. In Address
    Options
    :

    1. Diacritics:
      Select auto.
    2. Select the Advanced Address
      Correction
      and Use Preferred
      City
      check boxes.
  9. In Name Options, select
    the following options:

    1. Name
      Hint
      :Varying
    2. Middle Name
      Logic
      : Parse
      Logic
    3. Salutation
      Format
      : Formal
    4. Gender
      Population
      : Mixed
    5. Genderization
      Policy
      : Aggressive
    6. Correct First
      Name
      and Standardize
      Company
      check boxes
  10. Select all the check boxes in Email
    Options
    , Address Output
    Groups
    , Geographics Output
    Groups
    and Other Output
    Groups
    .

    You have the following configuration:

    tPersonator_5.png

  11. Click the Advanced
    settings
    tab:

    1. Clear the tStatCatcher
      Statistics
      check box.
    2. Enter a License
      Key
      .
    3. Enter the Number of
      Retries
      : 5.
    4. Enter the Timeout in
      Seconds
      : 100 .
    5. Browse the Cache
      Directory
      .
    6. Enter the Batch Request
      Size (1-100)
      : 2.
    7. Select the Multi-Threading
      check box.
    8. Enter the Thread
      Count
      : 3.
    9. Select the Show Debug
      Console Output
      check box.

Configuring the tLogRow component

  1. Double-click the tLogRow component to open
    its Basic settings view.
  2. In the Mode area, select Table (print values in cells of a table).

    tPersonator_6.png

Saving and executing the Job

Save your Job and press F6
to execute the Job.

The database is enriched and the results are displayed on the
console. Here is an example of the output columns that have been enriched.

tPersonator_7.png

As you can, result codes are indicated in the last column. For more information on these result codes, see
this description table.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x