Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.
Component family |
Data Quality |
|
Function |
tAddressRowCloud verifies and |
|
Purpose |
tAddressRowCloud enables you to Address data is corrected against the latest online reference data WarningEach data row needs one or several calls to the webservice of |
|
Basic settings |
Schema |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-In: You create and store the schema locally for this |
|
|
Repository: You have already created the schema and |
|
Edit Schema |
Click the […] button and define The output schema of tAddressRowCloud proposes several read-only address Also some of the output columns could be empty depending on what |
|
Address Provider |
Select from the list the provider of the reference data against The list of address providers includes Google, Loqate, QAS and |
|
License/API key |
Enter the license or the API key provided by the address provider When you select Google as a provider, the component uses the |
Only Loqate |
Processing Mode |
This option is applied only to the Loqate provider. Select from the list the mode of address validation you want to –Verify and Geocode (selected by NoteCombining address verification and geocoding will cost extra –Verify only: with this mode, the |
Only QAS |
Country |
This option is applied only to the QAS provider. Select from the list the country corresponding to your input When you select QAS as a provider, the component uses the QAS Pro |
Only QAS |
QAS OnDemand username |
This option is applied only to the QAS provider. Enter the username you can find in the license provided by You can check your username from the QAS OnDemand portal at https://ondemand.qas.com/index.htm. |
Only QAS |
Password |
This option is applied only to the QAS provider. Enter the password you can find in the license provided by You can check your password from the QAS OnDemand portal at https://ondemand.qas.com/index.htm. |
|
Use security mode to connect |
Select this check box to connect to the Cloud in a secure mode. This check box is not available with all address providers. |
|
Mapping |
Address field: add lines to the tAddressRowCloud provides a long Input Column: add lines to the |
|
Use Additional Output |
This option is not available for the QAS provider. Select this check box and use the Output Address field: add lines to the These predefined address fields vary according to the provider you Output Column: select from the tAddressRowCloud maps the values If you select to have an output column in the Output Address table that has the exact name |
Advanced settings |
Fields in this view will vary according to the address provider –Address Line Separator: define If you keep the default option, Default in this field, the component uses the line –Default Country: select the –Forced Country: select the –Output Script: select the The script list differs according to the address provider you When the address provider is Loqate or MelissaData: If you keep the default option, Not Select Latin to encode the Select Native/Match input to The Native/Match input script
–Minimum match score: set the This option is very helpful when you want to get, in the output –Minimum interval between two queries –Limit of retrying the same query in case it –Interval between two retries of the same –Delay before forcing the termination of the |
|
tStatCatcher |
Select this check box to collect log data at the component |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is an intermediary step. It requires an input and |
|
Limitation |
n/a |
The tAddressRowCloud component outputs a
VerificationLevel
column. This column lists the address verification levels
defined by Talend.
The providers which are supported in the component (Loqate, Melissadata and so on)
have different verification levels as these providers use different databases and
different algorithms to verify addresses. The results of address verification of the
providers are mapped to Talend verification levels.
The below table describes the verification levels that are ouput by the
component.
Verification levels |
Description |
---|---|
Verified |
A complete match is made between the input data and a single |
Partially Verified |
A partial match is made between the input data and a single record |
Unverified |
Unable to verify the address. Output fields will contain input |
Ambiguous |
More than one close reference data match is found. |
Conflict |
More than one close reference data match is found with conflicting |
Reverted |
The record can not be verified with a minimum acceptable level. |
This scenario describes a three-component Job that:
-
uses the tFixedFlowInput component to
generate the address data to be analyzed, -
uses the tAddressRowCloud component to parse,
standardize and format the addresses in the Cloud through the Address Validation
API, -
uses a tFileOutputExcel component to output
the correct formatted addresses in an .xls file.
You must have internet connection to be able to use tAddressRowCloud.
-
Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tAddressRowCloud and tFileOutputExcel.
-
Connect the three components together using the Main links.
-
Double-click tFixedFlowInput to open its
Basic settings view in the Component tab. -
Create the schema through the Edit Schema
button.In the open dialog box, click the [+]
button and add the columns that will hold the information in the input
address, in this example: Address and
Country. -
Click OK.
An address and a country columns
are created in the Inline Table. -
In the Number of rows field, set the
number of rows as 1. -
In the Mode area, select the Use Inline Table option.
-
In the Content table, enter the address
data you want to analyze, for example:123"1 Chemin de l'Abbaye, Paris""1 Rue de l'Abbaye, Paris""1 Place de l'Abbaye basset, Paris"Set the country for the three address lines to
FRA.
Setting the schema and defining address mapping
-
Double-click tAddressRowCloud to display
the Basic settings view and define the
component properties. -
If required, click Sync columns to
retrieve the schema defined in the input component. -
Click the Edit schema button to open the
schema dialog box.tAddressRowCloud proposes several
predefined read-only address columns as shown in the below capture.The
STATUS
column returns the status of
processing input addresses. For further information about process status,
see Process status in tLoqateAddressRow.The
AddressVerificationCode
column returns the verification
code for the processed address. For further information about what values
this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow. -
Move any of the input columns to the output schema according to your
needs, click OK and accept to propagate the
changes.You can also add columns directly in the output schema to retrieve
additional address information from the Loqate repository. -
Select from the Address Provider list the
provider of the reference data against which you want to validate and format
input addresses, Loqate in this
example. -
Select the Use security mode to connect
check box to connect to the provider repository in a secure mode.This may have a slight impact on performance.
-
In the License/API key field, enter the
license key provided by Loqate. -
From the Processing Mode list,
select:Option
To… Verify and Geocode
(selected by default)standardize and correct addresses and enrich them with
latitude and longitude.Note
Combining address verification and geocoding will
cost extra credits. For further information, see
Cloud Price Card.Verify only standardize and correct addresses without enriching
them with latitude and longitude. -
In the Mapping table:
-
Use the [+] button to add lines
in the table. -
Click in the Address Field column
and select from the list predefined in the component the fields that
hold the input address, Address and
Country in this example.The component will map the values of these fields to the input
columns you set in this table.tAddressRowCloud provides a list
of individual fields because some countries have more complex
addressing structures than others. -
Click in the Input Column column
and select from the list of the input schema the columns that hold
the input address, address and
country in this example.
-
Defining additional address fields
-
If required, select the Use Additional
Output check box to retrieve additional address information from
the provider repository. -
Click the Edit schema button to open the
schema dialog box and add in the output schema the columns which will hold
the extra address information. -
In the Output Mapping table:
-
Use the [+] button to add lines
in the table. -
Click in the Address Field column
and select from the predefined list the additional address fields
you want to add to the output schema. -
Click in the Output Column column
and select from tAddressRowCloud
output schema the columns that will hold the additional address
information.
The component maps the values of the address fields in the Loqate
repository to the output columns you set in the table. -
-
Set the parameters in the Advanced
settings view according to your needs.The default parameters are not changed for this example.
-
Double-click the tFileOutputExcel
component to display the Basic settings
view and define the component properties. -
Set the destination file name as well as the sheet name and then select
the Define all columns auto size check
box. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against the Loqate repository.The all_info and Geo_info
columns retrieve additional address information from the
Raw_Response and GeoAccuracy
columns respectively in the Loqate repository. The
Raw_Response column provides you with all address
information from the provider repository without any formatting. if you want
this information to be more readable, you must parse it using json or
xml.The
STATUS
output column returns theOK
status
for all address rows. This means that the verification process of all
address rows could be completed successfully by the component. For further
information about process status, see Process status in tLoqateAddressRow.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Verification status.The
AddressVerificationCode
output column returns a
verification code for each of the processed address rows. For example, the
first verification codeV44-I45-P3-100
means:-
Verification status = V (verified): a complete match was made
between the input address and a single record from the available
reference data. -
Post-processed verification match level = 4 (premises): the level
to which the input data matches the available reference data once
all changes and additions performed during the verification process
have been taken into account. -
Pre-processed verification match level = 4 (premises): the level
to which the input data matches the available reference data prior
to any changes or additions performed during the verification
process. -
Parsing status = I (identified and parsed): all components of the
input data have been able to be identified and placed into output
fields. -
Lexicon identification match level = 4 (premises): using pattern
matching, a numeric value or word has been identified as a premises
number or name. -
Context identification match level = 5 (delivery point, PostBox or
SubBuilding): a numeric value or word has been identified as a post
box number or sub building name. -
Postcode Status = P3 (added): the primary postal code for the
country has been added. -
Match score = 100 (complete similarity): the input data and
closest reference data match completely.
For further information about what values this code is made up of and the
implications of each segment, see Address verification codes in tLoqateAddressRow. -
You can run the Scenario: Parsing addresses against reference data in the Cloud Job against
Melissadata repository by doing the followings:
-
In the tFixedFlowInput Basic settings, create the schema through the
Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:12345"1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA""B69 2lt 9kings United Kingdom ave""1729号 黄兴路 China, 200433""15 Rue Nelaton Paris PARIS 92800 France""1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK" -
In the basic settings of tAddressRowCloud, select MelissaData from the Address
Provider list. -
In the License key/KPI key field, enter
the license key provided by MelissaData. -
In the Mapping table, click the [+] button to add a line and then select Address.
The component will map the values of this field to the input column you
set in this table. -
If required, select the Use Additional
Output check box and use the Output
Mapping table to retrieve additional address information from the
provider repository.For further information, see Defining additional address fields.
-
Leave the parameters in the Advanced
settings view unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against the MelissaData data repository and writes formatted addresses
in the output file.The
AddressVerificationCode
output column returns a
verification code for each of the processed address rows. These codes are
written in comma-delimited lists. Each code consists of two letters followed
by two numbers. These codes indicate different statuses and errors. For
example, the AC02 code means that the state name is
corrected based on the combination of city name and zip code.For a complete list of the meaning of the result codes and for further
information about all the output columns, see the Address Object Reference
Guide you can download from the Support Center of MelissaData athttp://www.melissadata.com/.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.
You can run the Scenario: Parsing addresses against reference data in the Cloud Job against
Google Places API by doing the followings:
-
In the tFixedFlowInput Basic settings, create the schema through the
Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:123456789"1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA""B69 2lt 9kings United Kingdom ave""1729号 黄兴路 China, 200433""15 Rue Nelaton Paris PARIS 92800 France""1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK""1 Rue de l'Abbaye, Paris""1 Chemin de l'Abbaye, Paris""1 Place de l'Abbaye basset, Paris""8000 Cummings Hall,Hanover,New Hampshire,03755," -
In the basic settings of tAddressRowCloud, select Google
from the Address Provider list. -
In the License/API key field, enter the
API key you generate from the Google Developer Console at https://developers.google.com/console/help/new/. -
In the Mapping table, click the [+] button to add a line and then select Address.
Address is the only available field when Google is
the address provider. The component will map the values of this field to the
input column you set in this table. -
If required, select the Use Additional
Output check box and use the Output
Mapping table to retrieve additional address information from the
provider repository.For further information, see Defining additional address fields.
-
In the Advanced settings view, set
Output Script to FRENCH and leave the other parameters unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, retrieves the corrected data and writes the result in the
output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud matches input address
data against Google Places API and writes formatted addresses in the output
file.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.
You can run the Scenario: Parsing addresses against reference data in the Cloud Job using the
QAS Pro OnDemand service and verify the accuracy and completeness of
addresses.
-
In the tFixedFlowInput Basic settings, create the schema through the
Edit Schema button.In the open dialog box, click the [+]
button and add one column that will hold the information in the input
address, in this example: address. -
Click OK.
An address column is created in the Inline Table.
-
In the Inline Table table, enter the
address data you want to analyze, for example:12345678"1 nonsense st, nowhereville, SC,11111""14 elmwood,rome,ga,30161""300 n quincy pl, charlestown,MA,02129""reba st,pelion,SC,29123""1445 montebello st,montebello,90640""43400 gadsden ave,lancaster,ca,93534""po box 123,san francisco,ca,94104""43400 gadsden ave apt 3,lancaster,ca,93534" -
In the basic settings of tAddressRowCloud, select QAS
from the Address Provider list. -
From the Country list, select the country
corresponding to your input addresses, United States in
this example. -
In the QAS OnDemand username and
password fields, enter respectively
your username and password you can find in the license provided by
QAS. -
In the Mapping table, click the [+] button to add a line and then select Address.
The component will map the values of this field to the input column you
set in this table. -
Leave the parameters in the Advanced
settings view unchanged. -
Save your Job and press F6 to execute
it.The tAddressRowCloud component uploads
data to the cloud, validates and retrieves the corrected data and writes the
result in the output file. -
Right-click the output component and select Data
Viewer to display the formatted address data.tAddressRowCloud validates input address
data against QAS Pro OnDemand and writes formatted addresses in the output
file.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud.