August 17, 2023

tAddressRowCloud – Docs for ESB 5.x

tAddressRowCloud

tAddressRowCloud_icon32_white.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tAddressRowCloud properties

Component family

Data Quality

 

Function

tAddressRowCloud verifies and
formats international addresses in the Cloud by using online
services.

Purpose

tAddressRowCloud enables you to
parse address data and get formatted addresses quickly, accurately
and without installing any software.

Address data is corrected against the latest online reference data
from several providers including Loqate, MelissaData, Google, or
QAS. tAddressRowCloud proposes
alternatives for missing address data such as country or postal
code, and addresses are enriched with other elements such as
latitude longitude.

Warning

Each data row needs one or several calls to the webservice of
the provider. Depending on the provider, the number of requests
per row may vary. The quota will depend on the license provided
by the webservice provider. Make sure not to run the component
on a data set that exceeds your quota, otherwise you will get
error messages and addresses will not be corrected.

Basic settings

Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

 

 

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Edit Schema

Click the […] button and define
the input and output schema of the address data.

The output schema of tAddressRowCloud proposes several read-only address
columns including a VerificationLevel column which provides you with a
verification status of the processed address. The verification
levels in this column are defined by Talend. For
further information, see Address verification levels in tAddressRowCloud.

Also some of the output columns could be empty depending on what
address provider you select in the component basic settings when
executing the Job.

 

Address Provider

Select from the list the provider of the reference data against
which you want to validate and format input addresses.

The list of address providers includes Google, Loqate, QAS and
MelissaData.

 

License/API key

Enter the license or the API key provided by the address provider
you select from the list. You must visit the provider website,
register and get the license/API key.

When you select Google as a provider, the component uses the
Google Places API. You must generate the key from the Google
Developer Console at https://developers.google.com/console/help/new/ and set the
key in this field.

Only Loqate 

Processing Mode

This option is applied only to the Loqate provider.

Select from the list the mode of address validation you want to
have:

Verify and Geocode (selected by
default): with this mode, the component standardizes and corrects
addresses and enriches them with latitude and longitude information.

Note

Combining address verification and geocoding will cost extra
credits. For further information, see Cloud
Price Card
.

Verify only: with this mode, the
component standardizes and corrects addresses without enriching them
with latitude and longitude information.

Only QAS

Country

This option is applied only to the QAS provider.

Select from the list the country corresponding to your input
addresses.

When you select QAS as a provider, the component uses the QAS Pro
OnDemand service. For further information about Experian address
verification, see the product sheet at https://www.edq.com/globalassets/product-sheets/address-verification.pdf.

Only QAS 

QAS OnDemand username

This option is applied only to the QAS provider.

Enter the username you can find in the license provided by
QAS.

You can check your username from the QAS OnDemand portal at https://ondemand.qas.com/index.htm.

Only QAS 

Password

This option is applied only to the QAS provider.

Enter the password you can find in the license provided by
QAS.

You can check your password from the QAS OnDemand portal at https://ondemand.qas.com/index.htm.

 

Use security mode to connect

Select this check box to connect to the Cloud in a secure mode.
This may have a slight impact on performance.

This check box is not available with all address providers.

 

Mapping

Address field: add lines to the
table and select from the predefined address list the fields that
will hold input addresses.

tAddressRowCloud provides a long
list of address fields because some countries have more complex
address structures than others. For further information about the
input fields, see Address fields in tLoqateAddressRow.

Input Column: add lines to the
table and select from the list the columns that hold the input
addresses. The input schema can have one or multiple columns and can
have columns that do not represent address data.

 

Use Additional Output

This option is not available for the QAS provider.

Select this check box and use the Output
Mapping
table to add more address columns to the output
schema:

Address field: add lines to the
table and select from a predefined address list the fields of the
extra information you want to output.

These predefined address fields vary according to the provider you
select from the Address Provider
list. For further information about the additional address fields,
check the provider website.

Output Column: select from the
list the columns that will hold the additional addresses
information. You must first add these additional columns to the
tAddressRowCloud output schema
through the Edit Schema
button.

tAddressRowCloud maps the values
of the address fields to the output columns in the Output Column.

If you select to have an output column in the Output Address table that has the exact name
of an input column, the input column value will be overwritten by
the value given by the component.

Advanced settings

Fields in this view will vary according to the address provider
you select in the basic settings view.

Address Line Separator: define
the string which will separate the output address components within
the output address fields.

If you keep the default option, Default in this field, the component uses the line
separator according to the address provider you select: for example,
it uses the line break string (<BR>) with Loqate and
; with MelissaData.

Default Country: select the
country name for which the ISO 3166-1 alpha-3 code should be used
when parsing data and if no identifiable country is found in an
input record.

Forced Country: select the
country name for which the ISO 3166-1 alpha-3 code should be used
for all input records when parsing data.

Output Script: select the
transliterate language of the output address.

The script list differs according to the address provider you
select.

When the address provider is Loqate or MelissaData:

If you keep the default option, Not
set
in this field, the component checks the input data
and decide to use Native or Latin according to whether the bigger
portion of input is Native or Latin.

Select Latin to encode the
parsing results in Latin, or western characters.

Select Native/Match input to
encode the parsing results using the country script wherever
possible.

The Native/Match input script
includes the following supported character sets (scripts) and
languages tAddressRowCloud can
transliterate:

Cyrl – Cyrillic (Russia),

Grek – Greek (Greece)

Hebr – Hebrew (Israel),

Hani – Kanji (Japan),

Hans – simplified Chinese (China),

Arab – Arabic (United Arab Emirates),

Thai – Thai (Thailand),

Hang – Hangul (South Korea),

Minimum match score: set the
minimum match score a record must reach in order not to be reverted.
The default value is zero, and valid values are between zero and
100.

This option is very helpful when you want to get, in the output
fields, the input data if a specific level of verification (minimum
match score) was not reached.

Minimum interval between two queries
(milliseconds)
: set in millisecond the minimum wait
period between two queries.

Limit of retrying the same query in case it
fails (times)
: set the number of times a query should be
retried in case of failure.

Interval between two retries of the same
query (milliseconds)
: set in millisecond the minimum wait
period between two tries of the same query.

Delay before forcing the termination of the
query executor (seconds)
: set in seconds the wait period
before forcing the query executor to shut down.

tStatCatcher
Statistics

Select this check box to collect log data at the component
level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is an intermediary step. It requires an input and
output flows.

Limitation

n/a

Address verification levels in tAddressRowCloud

The tAddressRowCloud component outputs a
VerificationLevel column. This column lists the address verification levels
defined by Talend.

The providers which are supported in the component (Loqate, Melissadata and so on)
have different verification levels as these providers use different databases and
different algorithms to verify addresses. The results of address verification of the
providers are mapped to Talend verification levels.

The below table describes the verification levels that are ouput by the
component.

Verification levels

Description

Verified

A complete match is made between the input data and a single
record from the available reference data.

Partially Verified

A partial match is made between the input data and a single record
from the available reference data.

Unverified

Unable to verify the address. Output fields will contain input
data.

Ambiguous

More than one close reference data match is found.

Conflict

More than one close reference data match is found with conflicting
values.

Reverted

The record can not be verified with a minimum acceptable level.
Output fields will contain input data.

Scenario: Parsing addresses against reference data in the Cloud

This scenario describes a three-component Job that:

  • uses the tFixedFlowInput component to
    generate the address data to be analyzed,

  • uses the tAddressRowCloud component to parse,
    standardize and format the addresses in the Cloud through the Address Validation
    API,

  • uses a tFileOutputExcel component to output
    the correct formatted addresses in an .xls file.

use_case-tcloudaddressrow.png

You must have internet connection to be able to use tAddressRowCloud.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tAddressRowCloud and tFileOutputExcel.

  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its
    Basic settings view in the Component tab.

    use_case-tcloudaddressrow2.png
  2. Create the schema through the Edit Schema
    button.

    In the open dialog box, click the [+]
    button and add the columns that will hold the information in the input
    address, in this example: Address and
    Country.

  3. Click OK.

    An address and a country columns
    are created in the Inline Table.

  4. In the Number of rows field, set the
    number of rows as 1.

  5. In the Mode area, select the Use Inline Table option.

  6. In the Content table, enter the address
    data you want to analyze, for example:

    Set the country for the three address lines to
    FRA.

Parsing addresses against Loqate

Setting the schema and defining address mapping

  1. Double-click tAddressRowCloud to display
    the Basic settings view and define the
    component properties.

    use_case-tcloudaddressrow3.png
  2. If required, click Sync columns to
    retrieve the schema defined in the input component.

  3. Click the Edit schema button to open the
    schema dialog box.

    tAddressRowCloud proposes several
    predefined read-only address columns as shown in the below capture.

    use_case-tcloudaddressrow4.png

    The STATUS column returns the status of
    processing input addresses. For further information about process status,
    see Process status in tLoqateAddressRow.

    The AddressVerificationCode column returns the verification
    code for the processed address. For further information about what values
    this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.

  4. Move any of the input columns to the output schema according to your
    needs, click OK and accept to propagate the
    changes.

    You can also add columns directly in the output schema to retrieve
    additional address information from the Loqate repository.

  5. Select from the Address Provider list the
    provider of the reference data against which you want to validate and format
    input addresses, Loqate in this
    example.

  6. Select the Use security mode to connect
    check box to connect to the provider repository in a secure mode.

    This may have a slight impact on performance.

  7. In the License/API key field, enter the
    license key provided by Loqate.

  8. From the Processing Mode list,
    select:

    Option

    To…

    Verify and Geocode
    (selected by default)

    standardize and correct addresses and enrich them with
    latitude and longitude.

    Note

    Combining address verification and geocoding will
    cost extra credits. For further information, see
    Cloud Price Card.

    Verify only

    standardize and correct addresses without enriching
    them with latitude and longitude.

  9. In the Mapping table:

    • Use the [+] button to add lines
      in the table.

    • Click in the Address Field column
      and select from the list predefined in the component the fields that
      hold the input address, Address and
      Country in this example.

      The component will map the values of these fields to the input
      columns you set in this table.

      tAddressRowCloud provides a list
      of individual fields because some countries have more complex
      addressing structures than others.

    • Click in the Input Column column
      and select from the list of the input schema the columns that hold
      the input address, address and
      country in this example.

Defining additional address fields

  1. If required, select the Use Additional
    Output
    check box to retrieve additional address information from
    the provider repository.

  2. Click the Edit schema button to open the
    schema dialog box and add in the output schema the columns which will hold
    the extra address information.

  3. In the Output Mapping table:

    • Use the [+] button to add lines
      in the table.

    • Click in the Address Field column
      and select from the predefined list the additional address fields
      you want to add to the output schema.

    • Click in the Output Column column
      and select from tAddressRowCloud
      output schema the columns that will hold the additional address
      information.

    The component maps the values of the address fields in the Loqate
    repository to the output columns you set in the table.

  4. Set the parameters in the Advanced
    settings
    view according to your needs.

    The default parameters are not changed for this example.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel
    component to display the Basic settings
    view and define the component properties.

    use_case-tcloudaddressrow5.png
  2. Set the destination file name as well as the sheet name and then select
    the Define all columns auto size check
    box.

  3. Save your Job and press F6 to execute
    it.

    The tAddressRowCloud component uploads
    data to the cloud, retrieves the corrected data and writes the result in the
    output file.

  4. Right-click the output component and select Data
    Viewer
    to display the formatted address data.

    use_case-tcloudaddressrow6.png

    tAddressRowCloud matches input address
    data against the Loqate repository.

    The all_info and Geo_info
    columns retrieve additional address information from the
    Raw_Response and GeoAccuracy
    columns respectively in the Loqate repository. The
    Raw_Response column provides you with all address
    information from the provider repository without any formatting. if you want
    this information to be more readable, you must parse it using json or
    xml.

    The STATUS output column returns the OK status
    for all address rows. This means that the verification process of all
    address rows could be completed successfully by the component. For further
    information about process status, see Process status in tLoqateAddressRow.

    The VerificationLevel output column
    provides you with a verification status of the processed addresses. For
    further information, see Verification status.

    The AddressVerificationCode output column returns a
    verification code for each of the processed address rows. For example, the
    first verification code V44-I45-P3-100 means:

    • Verification status = V (verified): a complete match was made
      between the input address and a single record from the available
      reference data.

    • Post-processed verification match level = 4 (premises): the level
      to which the input data matches the available reference data once
      all changes and additions performed during the verification process
      have been taken into account.

    • Pre-processed verification match level = 4 (premises): the level
      to which the input data matches the available reference data prior
      to any changes or additions performed during the verification
      process.

    • Parsing status = I (identified and parsed): all components of the
      input data have been able to be identified and placed into output
      fields.

    • Lexicon identification match level = 4 (premises): using pattern
      matching, a numeric value or word has been identified as a premises
      number or name.

    • Context identification match level = 5 (delivery point, PostBox or
      SubBuilding): a numeric value or word has been identified as a post
      box number or sub building name.

    • Postcode Status = P3 (added): the primary postal code for the
      country has been added.

    • Match score = 100 (complete similarity): the input data and
      closest reference data match completely.

    For further information about what values this code is made up of and the
    implications of each segment, see Address verification codes in tLoqateAddressRow.

Parsing addresses against MelissaData

You can run the Scenario: Parsing addresses against reference data in the Cloud Job against
Melissadata repository by doing the followings:

  1. In the tFixedFlowInput Basic settings, create the schema through the
    Edit Schema button.

    use_case-tcloudaddressrow7.png

    In the open dialog box, click the [+]
    button and add one column that will hold the information in the input
    address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the
    address data you want to analyze, for example:

  4. In the basic settings of tAddressRowCloud, select MelissaData from the Address
    Provider
    list.

    use_case-tcloudaddressrow8.png
  5. In the License key/KPI key field, enter
    the license key provided by MelissaData.

  6. In the Mapping table, click the [+] button to add a line and then select Address.

    The component will map the values of this field to the input column you
    set in this table.

  7. If required, select the Use Additional
    Output
    check box and use the Output
    Mapping
    table to retrieve additional address information from the
    provider repository.

    For further information, see Defining additional address fields.

  8. Leave the parameters in the Advanced
    settings
    view unchanged.

  9. Save your Job and press F6 to execute
    it.

    The tAddressRowCloud component uploads
    data to the cloud, retrieves the corrected data and writes the result in the
    output file.

  10. Right-click the output component and select Data
    Viewer
    to display the formatted address data.

    use_case-tcloudaddressrow9.png

    tAddressRowCloud matches input address
    data against the MelissaData data repository and writes formatted addresses
    in the output file.

    The AddressVerificationCode output column returns a
    verification code for each of the processed address rows. These codes are
    written in comma-delimited lists. Each code consists of two letters followed
    by two numbers. These codes indicate different statuses and errors. For
    example, the AC02 code means that the state name is
    corrected based on the combination of city name and zip code.

    For a complete list of the meaning of the result codes and for further
    information about all the output columns, see the Address Object Reference
    Guide you can download from the Support Center of MelissaData athttp://www.melissadata.com/.

    The VerificationLevel output column
    provides you with a verification status of the processed addresses. For
    further information, see Address verification levels in tAddressRowCloud.

Parsing addresses against Google

You can run the Scenario: Parsing addresses against reference data in the Cloud Job against
Google Places API by doing the followings:

  1. In the tFixedFlowInput Basic settings, create the schema through the
    Edit Schema button.

    use_case-tcloudaddressrow10.png

    In the open dialog box, click the [+]
    button and add one column that will hold the information in the input
    address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the
    address data you want to analyze, for example:

  4. In the basic settings of tAddressRowCloud, select Google
    from the Address Provider list.

    use_case-tcloudaddressrow11.png
  5. In the License/API key field, enter the
    API key you generate from the Google Developer Console at https://developers.google.com/console/help/new/.

  6. In the Mapping table, click the [+] button to add a line and then select Address.

    Address is the only available field when Google is
    the address provider. The component will map the values of this field to the
    input column you set in this table.

  7. If required, select the Use Additional
    Output
    check box and use the Output
    Mapping
    table to retrieve additional address information from the
    provider repository.

    For further information, see Defining additional address fields.

  8. In the Advanced settings view, set
    Output Script to FRENCH and leave the other parameters unchanged.

  9. Save your Job and press F6 to execute
    it.

    The tAddressRowCloud component uploads
    data to the cloud, retrieves the corrected data and writes the result in the
    output file.

  10. Right-click the output component and select Data
    Viewer
    to display the formatted address data.

    use_case-tcloudaddressrow12.png

    tAddressRowCloud matches input address
    data against Google Places API and writes formatted addresses in the output
    file.

    The VerificationLevel output column
    provides you with a verification status of the processed addresses. For
    further information, see Address verification levels in tAddressRowCloud.

Parsing addresses against QAS

You can run the Scenario: Parsing addresses against reference data in the Cloud Job using the
QAS Pro OnDemand service and verify the accuracy and completeness of
addresses.

  1. In the tFixedFlowInput Basic settings, create the schema through the
    Edit Schema button.

    use_case-tcloudaddressrow_qas.png

    In the open dialog box, click the [+]
    button and add one column that will hold the information in the input
    address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the
    address data you want to analyze, for example:

  4. In the basic settings of tAddressRowCloud, select QAS
    from the Address Provider list.

    use_case-tcloudaddressrow_qas2.png
  5. From the Country list, select the country
    corresponding to your input addresses, United States in
    this example.

  6. In the QAS OnDemand username and
    password fields, enter respectively
    your username and password you can find in the license provided by
    QAS.

  7. In the Mapping table, click the [+] button to add a line and then select Address.

    The component will map the values of this field to the input column you
    set in this table.

  8. Leave the parameters in the Advanced
    settings
    view unchanged.

  9. Save your Job and press F6 to execute
    it.

    The tAddressRowCloud component uploads
    data to the cloud, validates and retrieves the corrected data and writes the
    result in the output file.

  10. Right-click the output component and select Data
    Viewer
    to display the formatted address data.

    use_case-tcloudaddressrow_qas3.png

    tAddressRowCloud validates input address
    data against QAS Pro OnDemand and writes formatted addresses in the output
    file.

    The VerificationLevel output column
    provides you with a verification status of the processed addresses. For
    further information, see Address verification levels in tAddressRowCloud.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x