tMelissaDataAddress
Verifies if an address is properly formatted and corrects any formatting or
spelling errors in each row.
This address management component is the result of
Talend
collaboration with Melissa Data, one of the world
leaders for global address validation.
For more information about the enterprise and its software tools, visit http://www.melissadata.com/.
tMelissaDataAddress validates, corrects and standardizes Canadian and
United States addresses. It iterates on each row and reads all input addresses against a
Melissa Data Data file.
tMelissaDataAddress uses the June 2017 release of the AddressObject library from Melissa Data.
The Data Quality Suite from Melissa Data and the data used to validate addresses
are updated regularly but the AddressObject library
has not been modified since June 2017.
APIs used in tMelissaDataAddress
- Address Object is used to clean up contact
data, - GeoCoder Object is used to access geographic
data. GeoCode and GeoPoint are two different GeoCoder API methods. Using
GeoCode, you can retrieve the latitude and longitude coordinates of a 9-digit
ZIP code centroid. Using GeoPoint, you can retrieve the rooftop level latitude
and longitude coordinates of addresses, provided that you purchased the
license, - RightFielder Object is used to parse and
reorganize input data into usable data types. TheParse(String address1Str)
method is used to parse
fields.
Input fields used in tMelissaDataAddress
The table below lists all input fields in tMelissaDataAddress:
Field name |
Description |
---|---|
Address1 |
This field is used to map the first line of the address. |
Address2 |
This field is used to map the second line of the address. |
Company |
This field is used to map the company name. |
City |
This field is used to map the city name. |
State |
This field is used to map the state name. |
Postal |
This field is used to map the postal ZIP code. |
Output standard columns used in tMelissaDataAddress
The table below lists all the output standard columns in tMelissaDataAddress. These read-only columns are automatically added to
the output schema.
Output column |
Description |
---|---|
COMPANY_STANDARDIZED |
This column returns a standardized company name. |
ADDRESSLINE1_STANDARDIZED |
This column returns the first line of the address. |
ADDRESSLINE2_STANDARDIZED |
This column returns the second line of the address. |
CITY_STANDARDIZED |
This column returns a standardized city name. |
STATE_STANDARDIZED |
This column returns a two-letter abbreviation for the state |
COUNTRY_STANDARDIZED |
This column returns a two-letter abbreviation for the country |
RESULTS_CODE |
This column returns verification codes to indicate data quality, For example, the AC02 code means that the For a complete list of the result codes visit http://www.melissadata.com/. |
tMelissaDataAddress Standard properties
These properties are used to configure tMelissaDataAddress running in the Standard Job framework.
The Standard
tMelissaDataAddress component belongs to the Data Quality family.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description, it defines the number of fields to be |
 |
Built-in: You create the schema and |
 |
Repository: You have already created |
Input address |
Click the [+] button to add lines to Click on Address field and select The component will map the values of these fields to the input columns Click on Input Column and select from |
Output address |
Use this table to add extra columns to the output. Click the [+] button to add lines to Click on Address field and select The component will map the values of these fields to the output Click on Output Column and select |
Specify your MelissaData license |
Enter the Melissa Data license key provided by Melissa Data when you This software key unlocks the full functionality of Address For more information, visit http://www.melissadata.com/ and download the Reference Guide If your GeoCoder license has expired, you can use it in demo mode. |
Specify your MelissaData DataFile folder |
Set the path to the MelissaData Data folder provided by MelissaData You must order and download the Data Quality Suite or the Address |
Advanced settings
GeoCoder Licensing Agreement |
Select the license you purchased:
You cannot check the license validity at the initialization of the |
tStat |
Select this check box to gather the Job processing metadata at the Job level |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is usually used as an intermediate component, and it requires an |
Editing addresses against a Melissa Data data file
This Job uses the tFixedFlowInput component to generate the
address data to be analyzed, the tMelissaDataAddress component to
analyze the input schema and validate, correct and standardize the US addresses
generated by the tFixedFlowInput component and a
tLogRow component to output the correct formatted addresses on
the console.
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Prerequisites to using the tMelissaDataAddress component
tMelissaDataAddress component, follow these steps:
- To retrieve longitude and latitude data and the GeoCode result codes, you
must have purchased a GeoCode or a GeoPoint license. - To successfully execute a Job with the
tMelissaDataAddress component, you must have
installed Melissa Data with the GeoPoint and GeoCode data files. - Add the path to the folder containing the mdAddr library to the system
environment variables. For example,export
on Linux and
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path to folder containing
libmdAddr.so>PATH=%PATH%;<path to
on Windows. If the system
folder containing mdAddr.dll>
environment variable is not set correctly, the following error is to be
expected:java.lang.Error:
.
java.lang.UnsatisfiedLinkError - On Linux, restart your computer after setting your system environment
variables to take the changes into account.
Setting up the Job
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tMelissaDataAddress and tLogRow.
- Connect the three components together using Row > Main connections.
Configuring the input component
-
Double-click tFixedFlowInput to open its Basic
settings view in the Component tab. -
Click Edit schema to make changes to the
schema. -
Click the [+] button to add the columns that will hold
the address data to your input schema.For this example, add:- input_company
- input_address1
- input_address2
- input_city
- input_state
- input_postal
- Click OK.
-
In the Number of rows
field, set the number of rows as 1. -
In the Mode area, select
the Use Inline Content (delimited file)
option, and set the row and field separators in the corresponding fields. -
In the Content table,
enter the address data you want to analyze.For
example:1234Talend Inc.|5150 El Camino Real|Suite C-31|Los Altos||94022|Talend Inc.|6 Executive Circle|Suite 200|Irvine|California|92614|Talend Inc.|220 White Plains Road|Suite 390|Tarrytown|New York|10591|Talend Inc.|8 New England Executive Park|Suite 170|Burlington|Massachusetts|01803|
Configuring the tMelissaDataAddress component
-
Double-click tMelissaDataAddress to display the Basic
settings view and define the component properties. -
Click Sync columns to retrieve the schema
from the preceding component. -
Click the Edit schema
button to view the input and output schema and edit the output schema, if
necessary.Read-only columns are added the output schema:- COMPANY_STANDARDIZED returns the standard company name.
- ADDRESLINE1_STANDARDIZED returns the first line of the street
address. - ADDRESLINE2_STANDARDIZED returns the second line of the street
address. - CITY_STANDARDIZED returns the standard city name.
- STATE_STANDARDIZED returns a two-letter abbreviation for the
state name. - POSTAL_STANDARDIZED returns the postal ZIP code.
- COUNTRY_STANDARDIZED returns a two-letter abbreviation for the
country name. - RESULT_CODES returns verification codes.
- Click OK to close the dialog box.
-
In the Input Address
table:-
Use the [+] button to add lines in the
table. -
Click in the Address Field column and select
from the predefined list the fields that hold the input address
data.The component will map the values of these fields to the input
columns you set in this table. -
Click in the Input Column column and select from
the list the columns from the input schema that hold the input address
data you want to parse.
-
Use the [+] button to add lines in the
-
In the Output Address
table, you can define additional address fields:-
Use the [+] button to add lines in the table.
These lines will hold the extra information you want to retrieve from
Melissa Data, such as the Address Key, the country name or longitude and
latitude data. -
Click in the Address Field column and select
from the predefined list the fields that hold the output address
data.The component will map the values of these fields to the output
columns you set in this table. -
Click in the Output Column column and select
from the list the columns from the output schema that will hold the
extra information.If you click Sync Columns after adding columns
to the output schema, they are removed.
-
Use the [+] button to add lines in the table.
-
In the Specify your MelissaData
license field, set your license key provided by Melissa Data
when you order the Data Quality Suite or the Address Object API.If the license key you entered is not correct, you can use GeoCoder in demo
mode. -
In the Specify your MelissaData DataFile
folder field, set the path to the Melissa Data data folder
provided by Melissa Data. -
In the Advanced settings
view of the component, select the license you purchased.If you have not purchased a GeoPoint or a GeoCode license, select
No Melissa GeoCoder License Was Purchased to run
the Job. Note that you will not be able to retrieve latitude and longitude
data and GeoCode result codes.
Saving and executing the Job
The tMelissaDataAddress reads the input address rows,
corrects and formats the addresses and gives the result in a kind of
“standardized” address output rows.
tMelissaDataAddress will also match street names against a
ZIP code, match geographic data to ZIP code and city information and finally parse
street addresses and return all these results via different output columns. This
example shows only some of the output columns written by the
tMelissaDataAddress component:
- GetAddressKey returns the Address Key.
- GetCountyName returns the county names.
- GetTimeZone returns the time zone.
- GetLongitude returns the longitude data.
- GetLatitude returns the latitude data.
- GeoCodeResult returns the GeoCode result codes.
- The output standard columns return the standard company name, up to two
street address lines, the standard city name, two-letter abbreviation for
the state name, the postal ZIP code, and two-letter abbreviation for the
country name. -
The RESULTS_CODE output column returns verification codes for each of the
processed address rows. These codes are written in comma-delimited
lists. Each code consists of two letters followed by two numbers. These
codes indicate different statuses and errors. For example, the
AC02
code means that the state name is corrected
based on the combination of city name and ZIP code, and the
AS01
code means that the street address is valid
and deliverable.
For a complete list of the result codes and for further information about all the
output columns,visit http://www.melissadata.com/.