tLoqateAddressRow
Parses, verifies, cleanses, standardizes, transliterates, and formats international
addresses.
tLoqateAddressRow enables you to parse structured
or unstructured text into labeled address, it automatically puts address components into
the correct address field. You can compare address data against reference data to ensure
that it is accurate and complete. You can correct spellings, add missing address data
such as city, city area, region or postcode, and enrich address with other elements such
as latitude longitude and other relevant data.
This address management component is the result of Talend collaboration with Loqate, one of the world leaders for high quality,
accurate location information.
For more information about the enterprise and its software tools, please visit http://
www.loqate.com/.
This component uses the Loqate Global Knowledge Repository containing definitive
address and geographic reference data for over 240 countries in multiple languages
and character sets.
tLoqateAddressRow uses the
Q2.2 2016 release.
Address fields in tLoqateAddressRow
Some countries have more complex addressing structures than others. As such, the use of
individual fields in this component will vary based on the input country and the available
reference data.
The table below lists all input and output fields in tLoqateAddressRow. The field that can be used on input is designated as [in] ,
[out] designates a field that may be present on output, and [in,out] designates that a field
can be used for both input and output.
Field name |
Description |
---|---|
|
used to specify the full mailing address in the relevant country. |
|
used to specify input data for the address line in the relevant country, split |
|
used to specify the full address including line breaks without the |
|
used to specify the individual lines contained within the |
|
used to provide the country name or code. |
|
used to provide the ISO 3166 official country name. |
|
used to provide the ISO 3166 2-character country code. |
|
used to provide the ISO 3166 3-character country code. |
|
used to provide the ISO 3166 3-digit numeric country code. |
|
used to provide the largest geographic data element within a country. |
|
used to provide the most common geographic data element within a country. For |
|
used to provide the smallest geographic data element within a country. For |
|
used to provide the most common population center data element within a country. |
|
used to provide a smaller population center data element, dependent on the |
|
used to provide the smallest population center data element, dependent on both |
|
used to provide the most common street or block data element within a country. |
|
used to provide the dependent street or block data element within a country. For |
|
used to provide the descriptive name identifying an individual location, if such |
|
used to provide the alphanumeric code identifying an individual location, if |
|
used to provide the secondary identifiers for a particular delivery point. For |
|
used to provide the complete postal code for a particular delivery point, if |
|
used to provide the primary postal code used for a particular country. For |
|
used to provide secondary postal code information, if used in a particular |
|
used to provide the business name associated with a particular delivery point, |
|
used to provide the post box for a particular delivery point, if it |
|
used to list any words that could not be matched to a particular address |
|
used to provide the WGS 84 latitude in decimal degrees format. |
|
used to provide the WGS 84 longitude in decimal degrees format. |
|
used to provide the GeoAccuracy code. For further information, see GeoAccuracy Code. |
|
used to provide the radius of accuracy in meters, giving an indication of the |
GeoAccuracy Code
GeoAccuracy
code is made up of thefollowing values :
-
The geocoding status.
-
The geocoding level.
P3
geoaccuracy codeimplies:
-
P
: a single geocode was found matching the
input address. -
3
: the geocode level is Thoroughfare.
The tables below give detail description of the geocoding status and
level.
Geocoding status |
Description |
---|---|
|
a single geocode was found matching the input address. |
|
a geocode was able to be interpolated from the input |
|
multiple candidate geocodes were found to match the input |
|
a geocode was not able to be generated for the input |
Geocoding level |
Description |
---|---|
|
delivery point (PostBox or SubBuilding). |
|
premises (Premises or Building). |
|
thoroughfare. |
|
locality. |
|
administrative area. |
|
none. |
Address verification codes in tLoqateAddressRow
The tLoqateAddressRow component
outputs an ACCURACYCODE
column. This column holds the
verification codes for processed addresses.
The verification code is made up of the following values:
Verification code values |
Description |
---|---|
The verification status |
used to specify the full mailing address in the relevant |
The post-processed verification match level. |
used to specify input data for the address line in the |
The pre-processed verification match level |
used to specify the full address including line breaks |
The parsing status |
used to specify the individual lines contained within the |
The lexicon identification match level |
used to supply the country name or code. |
The context identification match level |
used to supply the ISO 3166 official country name. |
The postcode status |
used to supply the ISO 3166 2-character country code. |
The matchscore |
used to supply the ISO 3166 3-character country code. |
For example, the V44-I44-P3-100
verification code implies:
-
Verification status = V (verified): a complete match was made
between the input address and a single record from the available reference
data. -
Post-processed verification match level = 4 (premises): the
level to which the input data matches the available reference data once all
changes and additions performed during the verification process have been taken
into account. -
Pre-processed verification match level = 4 (premises): the level
to which the input data matches the available reference data prior to any
changes or additions performed during the verification process. -
Parsing status = I (identified and parsed): all components of
the input data have been able to be identified and placed into output
fields. -
Lexicon identification match level = 4 (premises): using pattern
matching, a numeric value or word has been identified as a premise number or
name. -
Context identification match level = 4 (premises): using a least
accurate form of matching, a numeric value or word has been identified as a
premises number or name. -
Postcode Status = P3 (added): the primary postal code for the
country has been added. -
Match score = 100 (complete similarity): the input data and
closest reference data match completely.
The following sections explain in more details all segments of the
verification code.
Verification status
The verification status can be one of the followings:
Status |
Description |
---|---|
|
the address was parsed and an exact match in the reference |
|
the reference data has more detail than the input data for |
|
the input data could not be parsed. The output fields will |
|
more than one item in the reference data match the input |
|
individual address components are valid, but the address is |
|
the address was parsed and verified but a minimum |
Post-processed verification match level
The post-processed verification match level gives the level to which the
input data matches the available reference data once all changes and additions performed
during the verification process have been taken into account.
Match level |
Description |
---|---|
|
delivery point (PostBox or SubBuilding). |
|
premises (Premises or Building). |
|
thoroughfare. |
|
locality. |
|
administrative area. |
|
none. |
Pre-processed verification match level
The pre-processed verification match level gives the level to which the
input data matches the available reference data prior to any changes or additions
performed during the verification process.
Match level |
Description |
---|---|
|
delivery point (PostBox or SubBuilding). |
|
premises (Premises or Building). |
|
thoroughfare. |
|
locality. |
|
administrative area. |
|
none. |
Parsing status
The parsing status can be one of the followings:
-
I
(identified and parsed): all input data was identified and placed
into different address fields. -
U
(unable to parse): not all input data was identified and
parsed.
Lexicon identification match level
data has some recognized form, through the use of:
-
pattern matching, for example a numeric value could be a
premises number, and -
lexicon matching, for example rd
could be aThoroughfare
type (road) and London could be a
Locality
.
Match level |
Description |
---|---|
|
delivery point (PostBox or SubBuilding). |
|
premises (Premises or Building). |
|
thoroughfare. |
|
locality. |
|
administrative area. |
|
none. |
Context identification match level
The context identification match level gives the level to which the input data can be
recognized based on the context in which it appears.
This is the least accurate form of matching and is based on identifying a word as, for
instance, a Thoroughfare
based on it being preceded by something that could be
a Premise
, and followed by something that could be a Locality
, the
latter items being identified through a match against the reference data or the
lexicon.
Match level |
Description |
---|---|
|
delivery point (PostBox or SubBuilding). |
|
premises (Premise or Building). |
|
thoroughfare. |
|
locality. |
|
administrative area. |
|
none. |
Postcode status
The postal code status can be of the following values:
Status |
Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Match score
The match score gives the similarity between the input data and closest reference data
match as a percentage between 0 and 100. 100% means complete similarity.
Process status in tLoqateAddressRow
The tLoqateAddressRow component outputs a
STATUS
column. This column holds the status of processing input addresses as
the following:
Status |
Description |
---|---|
|
the process completed normally. The score must be examined |
|
an exception occurred during processing input records, |
|
the process could not be completed because the server has |
|
the input record contains invalid data, often due to the |
tLoqateAddressRow Standard properties
These properties are used to configure tLoqateAddressRow running in the Standard Job framework.
The Standard
tLoqateAddressRow component belongs to the Data Quality family.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Basic settings
Schema |
A schema is a row description, it defines the number of fields to be processed |
 |
Built-in: You create the schema and store it |
 |
Repository: You have already created the schema |
Edit Schema |
Click the […] button and define the input and Make sure to define in the output schema all columns necessary to output the |
Input Address |
Address field: add lines to the table and
tLoqateAddressRow provides a long list of
Input Column: add lines to the table and select |
Output Address |
Address field: add lines to the table and
tLoqateAddressRow provides a long list of
Output Column: add lines to the table and If you select to have an output column in the Output In the output schema, there are two output standard columns that are read-only: – – |
Loqate Data Path |
Set the path to the Loqate Global Knowledge Repository provided by Loqate and You must order and download the Loqate Local API and the Global Knowledge |
Advanced settings
Server options |
Set the server options as the following: –Address Line Separator: define the string –Default Country: select the country name for –Forced Country: select the country name for –Output Script: use this option to Select Latin to encode the parsing results in Select Native to encode the parsing results Below is a list of the character sets (scripts) and languages tLoqateAddressRow can transliterate:
–Minimum match score: specify the minimum match This option is very helpful when you want to get, in the output fields, the |
tStat |
Select this check box to collect log data at the component level. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is an intermediary step. It requires an input and output |
Installing the Loqate Local API
Before being able to use the tLoqateRowAddress
component, you must install the Loqate Local API.
-
Download and run the Loqate Local API installer.
Starting from Loqate release 2018Q1, the installation path must not contain
spaces. For more information, see https://support.loqate.com/getting-started/installers/local-api-install-process/. -
After the Loqate Local API has been installed successfully, run the Loqate
Install Manager and enter your license key to download and install the Loqate
data packs.For more information, see https://support.loqate.com/getting-started/installers/data-installation-and-update-process/. -
Add the Loqate installation path to the Path environment variable.
If the Path environment variable is not properly configured, the following
error is displayed when running a Job using
tLoqateAddressRow:Ifjava.lang.UnsatisfiedLinkError:
.
C:Loqatelqtjava.dll: Can't find dependent libraries - Restart Talend Studio.
Parsing addresses against Loqate data
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
This scenario describes a three-component Job that:
- uses the tFixedFlowInput component
to generate the address data to be analyzed, - uses the tLoqateAddressRow
component to parse, standardize and format the US addresses generated by the
tFixedFlowInput component, - uses a tFileOutputExcel component
to output the correct formatted addresses in an .xsl file.
Before being able to use the tLoqateAddressRow component, you must order and download the Loqate
Local API and the Global Knowledge Repository from http:// www.loqate.com/.
tLoqateAddressRow uses the Q2.2 2016
release.
Setting up the Job
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tLoqateAddressRow and tFileOutputExcel.
- Connect the three components together using the Main links.
Configuring the input component
-
Double-click tFixedFlowInput to open its
Basic settings view in the Component tab. -
Create the schema through the Edit Schema
button.In the open dialog box, click the plus button and add the columns that will
hold the information in the input address, in this example:
address_input, COUNTRY and
data_description. - Click OK.
-
In the Number of rows field, set the number
of rows as 1. -
In the Mode area, select the Use Inline Content (delimited file) option, and set
the row and field separators in the corresponding fields. -
In the Content table, enter the address data
you want to analyze, for example:1234567Boise Town Square 421 N Cole Rd 83704,,wrong dataBoise Capitol 280 S Capitol Blvd 83702,us,both address coutry correctFederal Way 3563 South Federal Way 83705,US, both correctSalmon Creek In-Store (ALB) 14300 NE 20th Ave Ste.B-101 Vancouver WA 98686,US,both correctBattle Ground 2500 West Main Street,,no country;address miss(Battle Ground WA 98604 )Battle Ground 2500 West abcd Street,,no country address changedsouth southjkjkjkjkjkj,,wrong data
Configuring the tLoqateAddressRow component
-
Double-click tLoqateAddressRow to display the
Basic settings view and define the
component properties. -
Click the Edit schema button and define in
the output schema all the columns necessary to hold the formatted address you
want to get from tLoqateAddressRow.Two output columns are read-only:
STATUS
and
ACCURACYCODE
. The first column returns the status of processing
input addresses. For further information about process status, see Process status in tLoqateAddressRow. The second
column returns the verification code for the processed address. For further
information about what values this code is made up of and the implications of
each segment, see Address verification codes in tLoqateAddressRow.In this example, using the same address-input
column in the output schema will output the input address. This could be helpful
to compare how the address elements were parsed and standardized. -
Click OK and accept to propagate the
changes. -
In the Input Address table:
-
add lines in the table,
-
in the Address Field column, click a
line and select from the list, predefined in the component, the fields
that hold the input address, Address
and Country in this example. -
in the Input Column column, click a
line and select from the list of the input schema the columns that hold
the input address, address-input and
COUNTRY in this example.
-
-
In the Output Address table:
-
add lines in the table,
-
in the Address Field column, click a
line and select from the list, predefined in the component, the fields
that will hold the output address.The component will map the values of these fields to the output
columns you set in this table.tLoqateAddressRow provides a long
list of individual fields because some countries have more complex
addressing structures than others. For further information about the
output fields, see Address fields in tLoqateAddressRow. -
in the Output Column column, click a
line and select from the list the columns that will hold the
standardized output address.
-
-
In the Loqate Data Path field, set the path
to the Loqate data folder provided by Loqate and installed locally.
Setting a JVM argument and finalizing the Job
-
Double-click the tFileOutputExcel component
to display the Basic settings view and define
the component properties. -
Set the destination file name as well as the sheet name and then select the
Include header and Define all columns auto size check boxes. -
Click the Run tab and then in the open view
click Advanced settings. -
Select the Use specific JVM arguments check
box and then click New…. -
In the pop-up window, set the following JVM argument:
-Djava.library.path=<path/to/lqtjava.dll/folder/>
.In this argument, you must indicate the folder where the
loqate library, called liblqtjava.so on Linux or
lqtjava.dll on Windows, is installed.Without the correct JVM argument setting, the following error
is to be expected:java.lang.Error:
.
java.lang.UnsatisfiedLinkError -
Save your Job and press F6 to execute
it.tLoqateAddressRow reads the input address
data. It parses, verifies, cleanses, standardizes addresses and gives the result
in the output rows you defined in the output schema.tLoqateAddressRow matches input address
data against the Loqate data file you downloaded locally.The STATUS standard output column returns thepsOK
status for all address rows. This means that
the verification process of all address rows could be completed successfully by
the component. For further information about process status, see Process status in tLoqateAddressRow.The ACCURACYCODE standard output column returns a verification
code for each of the processed address rows. For example, the first verification
codeV44-I45-P7-100
means:- Verification status = V (verified): a complete match was made between
the input address and a single record from the available reference
data. - Post-processed verification match level = 4 (premises): the level to
which the input data matches the available reference data once all
changes and additions performed during the verification process have
been taken into account. - Pre-processed verification match level = 4 (premises): the level to
which the input data matches the available reference data prior to any
changes or additions performed during the verification process. - Parsing status = I (identified and parsed): all components of the input
data have been able to be identified and placed into output fields. - Lexicon identification match level = 4 (premises): using pattern
matching, a numeric value or word has been identified as a premises
number or name. - Context identification match level = 5 (delivery point, PostBox or
SubBuilding): a numeric value or word has been identified as a post box
number or sub building name. - Postcode Status = P7 (added): the primary postal code for the country
has been verified and a secondary postal code has been added. - Match score = 100 (complete similarity): the input data and closest
reference data match completely.
For further information about what values this code is made up
of and the implications of each segment, see Address verification codes in tLoqateAddressRow. - Verification status = V (verified): a complete match was made between