August 17, 2023

tBatchAddressRowCloud – Docs for ESB 5.x

tBatchAddressRowCloud

tBatchAddressRowCloud_icon32.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tBatchAddressRowCloud properties

Component family

Data Quality

 

Function

tBatchAddressRowCloud splits
addresses from input streams into several files (batches) and
verifies and formats addresses in each file by using online batch
services.

Purpose

tBatchAddressRowCloud enables you
to use batch processing to parse address data and get formatted
addresses quickly, accurately and without installing any software.

Address data is corrected against the latest online reference data
from providers that support a batch services including Loqate and
MelissaData. tBatchAddressRowCloud
proposes alternatives for missing address data such as country or
postal code, and addresses are enriched with other elements such as
latitude longitude.

The advantages of this component over tAddressRowCloud is that you gain in performance when
dealing with large amounts of data.

Warning

The quota and charges will depend on the license provided by
the webservice provider. Check the providers websites for more
information.

Basic settings

Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

 

 

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Edit Schema

Click the […] button and define
the input and output schema of the address data.

The output schema of tBatchAddressRowCloud proposes several read-only address
columns including a VerificationLevel column which provides you with a
verification status of the processed address. The verification
levels in this column are defined by Talend. For
further information, see Address verification levels in tAddressRowCloud.

Also some of the output columns could be empty depending on what
address provider you select in the component basic settings when
executing the Job.

 

Address Provider

Select from the list the provider of the reference data against
which you want to validate and format addresses.

The list of address providers includes Loqate and
MelissaData.

 

Default Country

Select the country name for which the ISO 3166-1 alpha-3 code
should be used when parsing data and if no identifiable country is
found in an input record.

 

License/API key

Enter the license or the API key provided by the address provider
you select from the list.

You must visit the provider website, register and get the
license/API key.

Only Loqate 

Batch job name

Enter a name of your choice to give to the batch files that will
be generated and saved on the Loqate server to hold the results of
batch processing.

Only Loqate 

Number of rows in each batch file

Enter the number of address records you want to group in each
batch file.

Only Loqate 

Loqate website login

Enter your login provided by Loqate.

Only Loqate 

Loqate website password

Enter the password provided by Loqate.

Only Loqate 

Processing Mode

This option is applied only to the Loqate provider.

Select from the list the mode of address validation you want to
have:

Verify and Geocode (selected by
default): with this mode, the component standardizes and corrects
addresses and enriches them with latitude and longitude information.

Note

Combining address verification and geocoding will use extra
credits. For further information, see Cloud Price Card.

Verify only: with this mode, the
component standardizes and corrects addresses without enriching them
with latitude and longitude information.

 

Input Mapping

Address field: add lines to the
table and select from the predefined address list the fields that
will hold input addresses.

tBatchAddressRowCloud provides a
long list of address fields because some countries have more complex
address structures than others. For further information about the
input fields, see Address fields in tLoqateAddressRow.

Input Column: add lines to the
table and select from the list the columns that hold the input
addresses. The input schema can have one or multiple columns and can
have columns that do not represent address data.

 

Use Additional Output

Select this check box and use the Output
Mapping
table to add more address columns to the output
schema:

Address field: add lines to the
table and select from a predefined address list the fields of the
extra information you want to output.

These predefined address fields vary according to the provider you
select from the Address Provider
list. For further information about the additional address fields,
check the provider website.

Output Column: select from the
list the columns that will hold the additional addresses
information. You must first add these additional columns to the
tBatchAddressRowCloud output schema
through the Edit Schema
button.

tBatchAddressRowCloud maps the
values of the address fields to the output columns in the Output Column.

If you select to have an output column in the Output Address table that has the exact name
of an input column, the input column value will be overwritten by
the value given by the component.

Advanced settings

Fields in this view will vary according to the address provider
you select in the basic settings view.

Address Line Separator: define
the string which will separate the output address components within
the output address fields.

If you keep the default option, Default in this field, the component uses the line
separator according to the address provider you select: for example,
it uses the line break string (<BR>) with Loqate and
; with MelissaData.

Forced Country: select the
country name for which the ISO 3166-1 alpha-3 code should be used
for all input records when parsing data.

Output Script: select the
transliterate language of the output address.

The script list differs according to the address provider you
select.

When the address provider is Loqate or MelissaData:

If you keep the default option, Not
set
in this field, the component checks the input data
and decide to use Native or Latin according to whether the bigger
portion of input is Native or Latin.

Select Latin to encode the
parsing results in Latin, or western characters.

Select Native/Match input to
encode the parsing results using the country script wherever
possible.

The Native/Match input script
includes the following supported character sets (scripts) and
languages tBatchAddressRowCloud can
transliterate:

Cyrl – Cyrillic (Russia),

Grek – Greek (Greece)

Hebr – Hebrew (Israel),

Hani – Kanji (Japan),

Hans – simplified Chinese (China),

Arab – Arabic (United Arab Emirates),

Thai – Thai (Thailand),

Hang – Hangul (South Korea),

Minimum match score: set the
minimum match score a record must reach in order not to be reverted.
The default value is zero, and valid values are between zero and
100.

This option is very helpful when you want to get, in the output
fields, the input data if a specific level of verification (minimum
match score) was not reached.

Minimum interval between two queries
(milliseconds)
: set in millisecond the minimum wait
period between two queries.

Limit of retrying the same query in case it
fails (times)
: set the number of times a query should be
retried in case of failure.

Interval between two retries of the same
query (milliseconds)
: set in millisecond the minimum wait
period between two tries of the same query.

Delay before forcing the termination of the
query executor (seconds)
: set in seconds the wait period
before forcing the query executor to shut down.

Interval to check the output handler
(milliseconds)
: set in millisecond the time to wait
before retrieving the cleansing results in the next batch
file.

Fetch and reuse previous batch data (no
credit consumption)

This option is only for testing or for development needs. You will
not be charged for batch processing if you select this option before
running the Job.

Select this check box to simulate execution and responses from the
Loqate server by using as input a batch file that has been processed
and saved on the server.

Batch ID: set the identifier of
the batch file you want to use as input in your Job.

You can get the file identifier if you access Loqate server at
Your
Batch Cloud Jobs
.

tStatCatcher
Statistics

Select this check box to collect log data at the component
level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is an intermediary step. It requires an input and
output flows.

Limitation

n/a


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x