tAddressRowCloud
Verifies and formats international addresses in the Cloud by using online
services.
tAddressRowCloud enables you to parse address data and get formatted
addresses quickly, accurately and without installing any software.
Address data is corrected against the latest online reference data from several providers
including Loqate, MelissaData, Google or QAS. tAddressRowCloud
proposes alternatives for missing address data such as country or postal code, and
addresses are enriched with other elements such as latitude longitude.
For further information about the terms of services of Google
Places API, see Terms of Service.
Address verification levels in tAddressRowCloud
The tAddressRowCloud component
outputs a VerificationLevel
column. This column lists
the address verification levels defined by
Talend.
The providers which are supported in the component (Loqate, Melissadata
and so on) have different verification levels as these providers use different databases
and different algorithms to verify addresses. The results of address verification of the
providers are mapped to
Talend
verification levels.
The below table describes the verification levels that are output by the
component.
Verification levels |
Description |
---|---|
Verified |
A complete match is made between the input data and a single record from the |
Partially Verified |
A complete match is made between the input data and a single record from the |
Unverified |
A partial match is made between the input data and a single record from the |
Ambiguous |
More than one close reference data match is found. |
Conflict |
More than one close reference data match is found with conflicting |
Reverted |
The record can not be verified with a minimum acceptable level. Output fields |
tAddressRowCloud Standard properties
These properties are used to configure tAddressRowCloud running in the Standard Job framework.
The Standard
tAddressRowCloud component belongs to the Data Quality family.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Basic settings
Schema |
A schema is a row description. It defines the number of fields |
|
Built-In: You create and store the schema locally for this component |
|
Repository: You have already created the schema and stored it in the |
Edit Schema |
Click the […] button and define the The output schema of tAddressRowCloud Also some of the output columns could be empty depending on what |
Address Provider |
Select from the list the provider of the reference data against which The list of address providers includes Google, Loqate, QAS and |
License/API key |
Enter the license or the API key provided by the address provider you When you select Google as a provider, the component uses the Google |
Processing Mode |
This option is applied only to the Loqate provider. Select from the list the mode of address validation you want to –Verify and Geocode (selected by
default): with this mode, the component standardizes and corrects addresses and enriches them with latitude and longitude information. Note:
Combining address verification and geocoding will cost extra –Verify only: with this mode, the |
Country |
This option is applied only to the QAS provider. Select from the list the country corresponding to your input When you select QAS as a provider, the component uses the QAS Pro |
QAS OnDemand username |
This option is applied only to the QAS provider. Enter the username you can find in the license provided by QAS. You can check your username from the QAS OnDemand |
Password |
This option is applied only to the QAS provider. Enter the password you can find in the license provided by QAS. You can check your password from the QAS OnDemand |
Use security mode to connect |
Select this check box to connect to the Cloud in a secure mode. This This check box is not available with all address providers. |
Mapping |
Address field: add lines to the table The address list includes the following columns for all address
Input Column: add lines to the table |
Use Additional Output |
This option is not available for the QAS provider. Select this check box and use the Output
Address field: add lines to the table These predefined address fields vary according to the provider you
Output Column: select from the list
tAddressRowCloud maps the values of If you select to have an output column in the Output Address table that has the exact name of an input |
Die on error |
Select the check box to stop the execution of the Job when an error Clear the check box to skip any rows on error and complete the process for |
Advanced settings
Fields in this view will vary according to the address provider you –Address Line Separator: define the If you keep the default option, Default in this field, the component uses the line –Default Country: select the country –Forced Country: select the country –Output Script: select the The script list differs according to the address provider you When the address provider is Loqate or MelissaData: If you keep the default option, Not Select Latin to encode the parsing Select Native/Match input to encode The Native/Match input script
–Minimum match score: set the minimum This option is very helpful when you want to get, in the output –Minimum interval between two queries –Limit of retrying the same query in case it –Interval between two retries of the same query –Delay before forcing the termination of the |
|
tStat |
Select this check box to collect log data at the component |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is usually used as an intermediate component, and it requires an This component enables you to create a data flow, using a Row > Main link, and to create a reject flow with a Row > Reject link filtering the data in error. |
Parsing addresses against reference data in the Cloud
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
This scenario describes a three-component Job that:
-
uses the tFixedFlowInput
component to generate the address data to be analyzed, -
uses the tAddressRowCloud
component to parse, standardize and format the addresses in the Cloud through
the Address Validation API, -
uses a tFileOutputExcel
component to output the correct formatted addresses in an .xls file.
You must have internet connection to be able to use tAddressRowCloud.
Setting up the Job
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tAddressRowCloud and tFileOutputExcel.
- Connect the three components together using the Main links.
Configuring the input component
-
Double-click tFixedFlowInput to open its
Basic settings view in the Component tab. -
Create the schema through the Edit Schema
button.In the open dialog box, click the [+]
button and add the columns that will hold the information in the input
address, in this example: Address and
Country. -
Click OK.
An address and a country columns
are created in the Inline Table. -
In the Number of rows field, set the
number of rows as 1. - In the Mode area, select the Use Inline Table option.
-
In the Content table, enter the address
data you want to analyze, for example:1234"1 Chemin de l'Abbaye, Paris""1 Rue de l'Abbaye, Paris""1 Place de l'Abbaye basset, Paris"Set the country for the three address lines to
FRA.
Parsing addresses against Loqate
Setting the schema and defining address mapping
-
Double-click tAddressRowCloud to display
the Basic settings view and define the
component properties. -
If required, click Sync columns to
retrieve the schema defined in the input component. -
Click the Edit schema button to open the
schema dialog box.tAddressRowCloud proposes several
predefined read-only address columns as shown in the below capture.The STATUS column returns the status of
processing input addresses. For further information about process status,
see Process status in tLoqateAddressRow.The AddressVerificationCode column
returns the verification code for the processed address. For further
information about what values this code is made up of and the implications
of each segment, see Address verification codes in tLoqateAddressRow. -
Move any of the input columns to the output schema according to your
needs, click OK and accept to propagate the
changes.You can also add columns directly in the output schema to retrieve
additional address information from the Loqate repository. -
Select from the Address Provider list the
provider of the reference data against which you want to validate and format
input addresses, Loqate in this
example. -
Select the Use security mode to connect
check box to connect to the provider repository in a secure mode.This may have a slight impact on performance. -
In the License/API key field, enter the
license key provided by Loqate. -
From the Processing Mode list,
select:Option
To… Verify and Geocode
(selected by default)standardize and correct addresses and enrich them with
latitude and longitude.Note:
Combining address verification and geocoding will
cost extra credits. For further information, see
Cloud Price Card.Verify only standardize and correct addresses without enriching
them with latitude and longitude. -
In the Mapping table:
-
Use the [+] button to add lines
in the table. -
Click in the Address Field column
and select from the list predefined in the component the fields that
hold the input address, Address and
Country in this example.The component will map the values of these fields to the input
columns you set in this table.tAddressRowCloud provides a list
of individual fields because some countries have more complex
addressing structures than others. -
Click in the Input Column column
and select from the list of the input schema the columns that hold
the input address, address and
country in this example.
-
Defining additional address fields
-
If required, select the Use Additional
Output check box to retrieve additional address information
from the provider repository. -
Click the Edit schema button to open the
schema dialog box and add in the output schema the columns which will hold
the extra address information. Add all_info
and Geo_info columns for this
example. -
In the Output Mapping table:
-
Use the [+] button to add lines
in the table. -
Click in the Address Field column
and select from the predefined list the additional address fields
you want to add to the output schema. -
Click in the Output Column column
and select from tAddressRowCloud
output schema the columns that will hold the additional address
information.
The component maps the values of the address fields in the Loqate
repository to the output columns you set in the table. -
-
Set the parameters in the Advanced
settings view according to your needs.The default parameters are not changed for this example.
Configuring the output component and executing the Job
-
Double-click the tFileOutputExcel
component to display the Basic settings
view and define the component properties. -
Set the destination file name as well as the sheet name and then select
the Define all columns auto size check
box. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against the Loqate repository.The all_info and Geo_info
columns retrieve additional address information from the
Raw_Response and GeoAccuracy
columns respectively in the Loqate repository. The
Raw_Response column provides you with all address
information from the provider repository without any formatting. if you want
this information to be more readable, you must parse it using json or
xml.TheSTATUS
output column returns theOK
status
for all address rows. This means that the verification process of all
address rows could be completed successfully by the component. For further
information about process status, see Process status in tLoqateAddressRow.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Verification status.The AddressVerificationCode output column
returns a verification code for each of the processed address rows. For
example, the first verification codeV44-I45-P3-100
means:-
Verification status = V (verified): a complete match was made
between the input address and a single record from the available
reference data. -
Post-processed verification match level = 4 (premises): the level
to which the input data matches the available reference data once
all changes and additions performed during the verification process
have been taken into account. -
Pre-processed verification match level = 4 (premises): the level
to which the input data matches the available reference data prior
to any changes or additions performed during the verification
process. -
Parsing status = I (identified and parsed): all components of the
input data have been able to be identified and placed into output
fields. -
Lexicon identification match level = 4 (premises): using pattern
matching, a numeric value or word has been identified as a premises
number or name. -
Context identification match level = 5 (delivery point, PostBox or
SubBuilding): a numeric value or word has been identified as a post
box number or sub building name. -
Postcode Status = P3 (added): the primary postal code for the
country has been added. -
Match score = 100 (complete similarity): the input data and
closest reference data match completely.
For further information about what values this code is made up of and the
implications of each segment, see Address verification codes in tLoqateAddressRow. -
Parsing addresses against MelissaData
You can run the Parsing addresses against reference data in the Cloud Job against
Melissadata repository by doing the followings:
-
In the tFixedFlowInput
Basic settings, create the schema through
the Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:123456"1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA""B69 2lt 9kings United Kingdom ave""1729号 黄兴路 China, 200433""15 Rue Nelaton Paris PARIS 92800 France""1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK" -
In the basic settings of tAddressRowCloud, select MelissaData from the Address
Provider list. -
In the License key/KPI key field, enter
the license key provided by MelissaData. -
In the Mapping table, click the [+] button to add a line and then select
Address.The component will map the values of this field to the input column you
set in this table. -
If required, select the Use Additional
Output check box and use the Output
Mapping table to retrieve additional address information from
the provider repository.For further information, see Defining additional address fields. -
Leave the parameters in the Advanced
settings view unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against the MelissaData data repository and writes formatted addresses
in the output file.TheAddressVerificationCode
output column returns a
verification code for each of the processed address rows. These codes are
written in comma-delimited lists. Each code consists of two letters followed
by two numbers. These codes indicate different statuses and errors. For
example, the AC02 code means that the state name is
corrected based on the combination of city name and zip code.For a complete list of the meaning of the result codes and for further
information about all the output columns, see the Address Object Reference
Guide you can download from the Support Center of MelissaData athttp://www.melissadata.com/.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.
Parsing addresses against Google
You can run the Parsing addresses against reference data in the Cloud Job against
Google Places API by doing the followings:
-
In the tFixedFlowInput
Basic settings, create the schema through
the Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:12345678910"1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA""B69 2lt 9kings United Kingdom ave""1729号 黄兴路 China, 200433""15 Rue Nelaton Paris PARIS 92800 France""1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK""1 Rue de l'Abbaye, Paris""1 Chemin de l'Abbaye, Paris""1 Place de l'Abbaye basset, Paris""8000 Cummings Hall,Hanover,New Hampshire,03755," -
In the basic settings of tAddressRowCloud, select Google
from the Address Provider list. -
In the License/API key field, enter the
API key you generate from the Google Developer Console at https://developers.google.com/console/help/new/. -
In the Mapping table, click the [+]
button to add a line and then select Address.The component will map the values of this field to the input column you
set in this table. -
If required, select the Use Additional
Output check box and use the Output
Mapping table to retrieve additional address information from
the provider repository.For further information, see Defining additional address fields. -
In the Advanced settings view, set
Output Script to FRENCH and leave the other parameters unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against Google Places API and writes formatted addresses in the output
file.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.
Parsing addresses against QAS
You can run the Parsing addresses against reference data in the Cloud Job using the
QAS Pro OnDemand service and verifiy the accuracy and completeness of
addresses.
-
In the tFixedFlowInput
Basic settings, create the schema through
the Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:123456789"1 nonsense st, nowhereville, SC,11111""14 elmwood,rome,ga,30161""300 n quincy pl, charlestown,MA,02129""reba st,pelion,SC,29123""1445 montebello st,montebello,90640""43400 gadsden ave,lancaster,ca,93534""po box 123,san francisco,ca,94104""43400 gadsden ave apt 3,lancaster,ca,93534" -
In the basic settings of tAddressRowCloud, select QAS
from the Address Provider list. -
From the Country list, select the country
corresponding to your input addresses, United States in
this example. -
In the QAS OnDemand username and
password fields, enter respectively
your username and password you can find in the license provided by
QAS. -
In the Mapping table, click the [+] button to add a line and then select
Address.The component will map the values of this field to the input column you
set in this table. -
Leave the parameters in the Advanced
settings view unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, validates and retrieves the corrected data and writes the
result in the output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud validates input address
data against QAS Pro OnDemand and writes formatted addresses in the output
file.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.