August 15, 2023

tBatchAddressRowCloud – Docs for ESB 6.x

tBatchAddressRowCloud

Uses batch processing to parse address data and get formatted addresses quickly,
accurately and without installing any software.

Address data is corrected against the latest online reference data from providers that
support a batch services including Loqate and MelissaData. tBatchAddressRowCloud proposes alternatives for missing address data such as
country or postal code, and addresses are enriched with other elements such as latitude
longitude.The advantages of this component over tAddressRowCloud is that you gain in performance when dealing with large amounts
of data.

tBatchAddressRowCloud splits addresses
from input streams into several files (batches) and verifies and formats addresses in each
file by using online batch services.

tBatchAddressRowCloud Standard properties

These properties are used to configure tBatchAddressRowCloud running in the Standard Job framework.

The Standard
tBatchAddressRowCloud component belongs to the Data Quality family.

This component is available in the Palette of the Studio only if you have subscribed to one of the Talend Platform products.

Basic settings

Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Edit Schema

Click the […] button and define
the input and output schema of the address data.

The output schema of tBatchAddressRowCloud proposes several read-only
address columns including a VerificationLevel column which provides you with a
verification status of the processed address. The verification
levels in this column are defined by
Talend
. For further information, see Address verification levels in tAddressRowCloud.

Also some of the output columns could be empty depending on what
address provider you select in the component basic settings when
executing the Job.

Address Provider

Select from the list the provider of the reference data against
which you want to validate and format addresses.

The list of address providers includes Loqate and
MelissaData.

Default Country

Select the country name for which the ISO 3166-1 alpha-3 code
should be used when parsing data and if no identifiable country is
found in an input record.

License/API key

Enter the license or the API key provided by the address provider
you select from the list.

You must visit the provider website, register and get the
license/API key.

Batch job name

Enter, between quotation marks, a name of your choice to give to
the batch files that will be generated and saved on the Loqate
server. These files hold the results of batch processing.

Number of rows in each batch file

Enter the number of address records you want to group in each
batch file.

Loqate website login

Enter your login provided by Loqate.

Loqate website password

Enter the password provided by Loqate.

Processing Mode

This option is applied only to the Loqate provider.

Select from the list the mode of address validation you want to
have:

Verify and Geocode (selected by
default): with this mode, the component standardizes and corrects
addresses and enriches them with latitude and longitude information.

Note:

Combining address verification and geocoding will use
extra credits. For further information, see Cloud Price Card.

Verify only: with this mode, the
component standardizes and corrects addresses without enriching them
with latitude and longitude information.

Input Mapping

Address field: add lines to the
table and select from the predefined address list the fields that
will hold input addresses.

tBatchAddressRowCloud provides a
long list of address fields because some countries have more complex
address structures than others. For further information about the
input fields, see Address fields in tLoqateAddressRow.

Input Column: add lines to the
table and select from the list the columns that hold the input
addresses. The input schema can have one or multiple columns and can
have columns that do not represent address data.

Use Additional Output

Select this check box and use the Output
Mapping
table to add more address columns to the
output schema:

Address field: add lines to the
table and select from a predefined address list the fields of the
extra information you want to output.

These predefined address fields vary according to the provider you
select from the Address Provider
list. For further information about the additional address fields,
check the provider website.

Output Column: select from the
list the columns that will hold the additional addresses
information. You must first add these additional columns to the
tBatchAddressRowCloud output
schema through the Edit Schema
button.

tBatchAddressRowCloud maps the
values of the address fields to the output columns in the Output Column.

If you select to have an output column in the Output Address table that has the exact
name of an input column, the input column value will be overwritten
by the value given by the component.

Advanced settings

Fields in this view will vary according to the address provider
you select in the basic settings view.

Address Line Separator: define
the string which will separate the output address components within
the output address fields.

If you keep the default option, Default in this field, the component uses the line
separator according to the address provider you select: for example,
it uses the line break string (<BR>) with Loqate
and ; with MelissaData.

Forced Country: select the
country name for which the ISO 3166-1 alpha-3 code should be used
for all input records when parsing data.

Output Script: select the
transliterate language of the output address.

The script list differs according to the address provider you
select.

When the address provider is Loqate or MelissaData:

If you keep the default option, Not
set
in this field, the component checks the input
data and decide to use Native or Latin according to whether the
bigger portion of input is Native or Latin.

Select Latin to encode the
parsing results in Latin, or western characters.

Select Native/Match input to
encode the parsing results using the country script wherever
possible.

The Native/Match input script
includes the following supported character sets (scripts) and
languages tBatchAddressRowCloud can
transliterate:

Cyrl – Cyrillic (Russia),

Grek – Greek (Greece)

Hebr – Hebrew (Israel),

Hani – Kanji (Japan),

Hans – simplified Chinese (China),

Arab – Arabic (United Arab Emirates),

Thai – Thai (Thailand),

Hang – Hangul (South Korea),

Minimum match score: set the
minimum match score a record must reach in order not to be reverted.
The default value is zero, and valid values are between zero and
100.

This option is very helpful when you want to get, in the output
fields, the input data if a specific level of verification (minimum
match score) was not reached.

Minimum interval between two queries
(milliseconds)
: set in millisecond the minimum wait
period between two queries.

Limit of retrying the same query in case it
fails (times)
: set the number of times a query should
be retried in case of failure.

Interval between two retries of the same
query (milliseconds)
: set in millisecond the minimum
wait period between two tries of the same query.

Delay before forcing the termination of the
query executor (seconds)
: set in seconds the wait
period before forcing the query executor to shut down.

Use mockup mode (no credit consumption)

Before being able to use this option, you must run your Job at
least once to create batch files on the Loqate server.

This option is only for testing or for development needs. You will
not be charged for batch processing if you select to run the Job
with this option.

Select this check box to simulate execution and responses from the
Loqate server by using as output a batch file that has been
previously processed and saved on the server.

Batch ID: set the identifier of
the batch file you want to use as input in your Job.

You can get the file identifier if you log in at Everything
Location
and access the Loqate server at Online
Batch Cleansing
.

tStat
Catcher
Statistics

Select this check box to collect log data at the component
level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is an intermediary step. It requires an input and
output flows.

Scenario: Parsing addresses against reference data in the Cloud using batch
processing

This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.

This scenario describes a Job which uses:

  • the tFixedFlowInput component to generate the
    address data to be analyzed,

  • the tBatchAddressRowCloud component to parse,
    standardize and format the addresses in the Cloud through the Address Validation
    API,

  • the tFileOutputExcel component to output the
    correct formatted addresses in an .xls file.

use_case-tbatchaddressrowcloud.png

You must have internet connection to be able to use tBatchAddressRowCloud.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tBatchAddressRowCloud and tFileOutputExcel.
  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its
    Basic settings view in the Component tab.

    use_case-tbatchaddressrowcloud2.png

  2. Create the schema through the Edit Schema
    button.

    In the open dialog box, click the [+]
    button and add the columns that will hold the information in the input
    address. For this example, add ID,
    Organization, Address1 x 8,
    Locality, AdministrativeArea,
    PostalCode and Country.
  3. Click OK.
  4. In the Number of rows field, enter
    1.
  5. In the Mode area, select the Use Inline Content option.
  6. In the Content table, enter the address
    data you want to analyze, for example:

Parsing addresses against Loqate

Setting the schema and selecting an address provider

  1. Double-click tBatchAddressRowCloud to
    display the Basic settings view and define
    the component properties.

    use_case-tbatchaddressrowcloud3.png

  2. If required, click Sync columns to
    retrieve the schema defined in the input component.
  3. Click the Edit schema button to open the
    schema dialog box.

    tBatchAddressRowCloud proposes several
    predefined read-only address columns as shown in the below capture.
    use_case-tbatchaddressrowcloud4.png

    The
    STATUS
    column returns the status of
    processing input addresses. For further information about process status,
    see Process status in tLoqateAddressRow.
    The AddressVerificationCode column returns the verification
    code for the processed address. For further information about what values
    this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.
    The VerificationLevel output column
    provides you with a verification status of the processed addresses. For
    further information, see Address verification levels in tAddressRowCloud.
  4. Move any of the input columns to the output schema if you want to show
    them in the verification results, click OK
    and accept to propagate the changes.

    You can also add columns directly in the output schema to retrieve
    additional address information from the provider repository.
  5. Select from the Address Provider list the
    provider of the reference data against which you want to validate and format
    input addresses, Loqate in this
    example.

    You can also validate addresses against MelissaData online service.
  6. In the License/API key field, enter the
    license key provided by Loqate.
  7. In the Batch job name field, enter
    between quotation marks a name of your choice to give to the batch files
    that will be generated and saved on the Loqate server.

    Set the number of address records you want to group in each batch file in
    the Number of rows in each batch file
    field.
  8. Enter the login and password provided by Loqate in the Loqate website login and Loqate website password respectively.
  9. From the Processing Mode list,
    select:

    Option

    To…

    Verify and Geocode
    (selected by default)

    standardize and correct addresses and enrich them with
    latitude and longitude.

    Combining address verification and geocoding will cost
    extra credits. For further information, see Cloud Price Card.

    Verify only

    standardize and correct addresses without enriching
    them with latitude and longitude.

Defining address mapping and setting advanced parameters

  1. In the Input Mapping table:

    • Use the [+] button to add lines
      in the table.

    • Click in the Address Field column
      and select from the predefined list the fields that hold the input
      address, Address in this
      example.

      The component will map the values of these fields to the input
      columns you set in this table.

      tBatchAddressRowCloud provides a
      list of individual fields because some countries have more complex
      addressing structures than others.

    • Click in the Input Column column
      and select from the list of the input schema the columns that hold
      the input address you want to parse, Address1 in this example.

  2. If required, select the Use Additional
    Output
    check box and define in the table what extra address
    fields you want to retrieve from the provider repository and add to the
    parsing results. For an example on how to use this table, check Defining additional address fields.

    The Address field column holds predefined
    address fields which vary according to the provider you select. The
    Output Column column holds the fields
    you want to use to output the extra information. You must first add these
    additional columns to the component schema through the Edit Schema button.
  3. Click the Advanced settings tab and set the
    parameters in this view according to your needs.

    In this example:
    • Select the Use mockup mode (no credit
      consumption)
      check box.

      This check box enables you to simulate execution and responses from
      the Loqate server by using as input a batch file that has been already
      processed by the Job and saved on the server.

    • Log in to Everything
      Location
      and access the Loqate server at Online
      Batch Cleansing
      to fetch the identifier of the batch file you
      want to use as output in your Job.

      use_case-tbatchaddressrowcloud6.png
    • Set the identifier in the Batch ID
      field.

      use_case-tbatchaddressrowcloud8.png

      This option is used only for testing or for development needs.

    • Leave all other default parameters as they are.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel
    component to display the Basic settings
    view and define the component properties.

    use_case-tbatchaddressrowcloud7.png

  2. Set the destination file name as well as the sheet name and then select
    the Define all columns auto size check
    box.
  3. Save your Job and press F6 to execute
    it.

    The tBatchAddressRowCloud component
    parses addresses using batch processing. It corrects addresses using the
    online batch service of Loqate and writes the result in batch files on the
    Loqate server.
  4. Right-click the output component and select Data
    Viewer
    to display the formatted address data.

    use_case-tbatchaddressrowcloud5.png

    tBatchAddressRowCloud matches input
    address data against the Loqate repository.
    The STATUS output column returns the OK status
    for all address rows. This means that the verification process of all
    address rows could be completed successfully by the component. For further
    information about process status, see Process status in tLoqateAddressRow.
    The VerificationLevel output column
    provides you with a verification levels defined by
    Talend
    . For further information, see Address verification levels in tAddressRowCloud.
    The AddressVerificationCode output column returns a
    verification code for each of the processed address rows.
    For further information about what values this code is made up of and the
    implications of each segment, see Address verification codes in tLoqateAddressRow.

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x