tQASBatchAddressRow
Corrects any formatting or spelling errors, adds missing data and gives the
verification status for each row.
Indeed, the address may not always have enough information to be matched to a single
deliverable result in the QAS files. For more information about the verification status, see
QuickAccess verification levels (verification status).
The address management components discussed here are the result of
Talend
collaboration with Experian QAS, one of the world leaders for
global address data quality.
For more information about the enterprise and its software tools, visit http://www.qas.com.
tQASBatchAddressRow verifies addresses in a
column. It iterates on each row and reads input addresses against the
locally-installed QAS Batch Application with the help of a Dynamic Library. The
Dynamic Library file extension is .dll in Windows
and .so in Linux.
The advantages of this component over tQASAddressRow is that
it does not call a web service to be able to verify postal address data. This
component uses QAS files to verify postal addresses and thus optimize performance,
especially when dealing with large amounts of data.
For further information on installation and on configuration parameters, see
QuickAddress Batch and Setting configuration parameters in the QAS files
respectively.
tQASBatchAddressRow uses QAS Batch 4.80 and
7.53 on both Linux and Windows.
Setting configuration parameters in the QAS files
-
For both Linux and Windows, edit the
qaworld.ini file to configure the related country
section for tQASBatchAddressRow‘s output schema.For example, if your address layout for Luxembourgish addresses has three
lines, the configuration could look like the following:123456789[LUX]CountryBase=LUXCleaningAction=AddressLUXAddresLineCount=3LUXAddresLine=W60LUXAddresLine=W60LUXAddresLine=W60,C11,L11This example contains the following parameters:
-
LUXAddressLineCount=3
where 3 is
the number of lines of the address; and -
LUXAddressLineN
where the values are element codes
separated by commas. In this example,
LUXAddressLine3=W60,C11,L11
means that the max
width of the third line (LUXAddressLine3) of the
address is 60 characters (W60). The postal code
(C11) and the locality
(L11) appear on this line. -
LUXCapitaliseItem=L11
means that the locality
(L11) appears in upper case in the formatted
address.
For more information about setting the output address format in
qaworld.ini, see QAS documentation. -
-
Set the path and the library path environment variables to
point to the QAS files.For Linux, open the ~/.profile file in your home folder
and add the following lines, modify them according to your extract
location:123# for QAS Batch JNIexport PATH=$PATH:/path/to/qasbatch/apps #the folder which contains qaworld.iniexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/jni_wrapper_folderFor Windows, add the path to the folder which contains
qaworld.ini to thePATH
environement variable. -
If you install QAS Batch API manually on Linux, do the
following:-
Add a new line at the end of ./apps/qalicn.ini and
put a valid license. -
Put valid files which contain country address data into the right
folder, and configure qawserve.ini to add country
support. There must be three elements for each country line: a short
name, a full country name, and data path which can be relative or
absolute.
-
Add a new line at the end of ./apps/qalicn.ini and
tQASBatchAddressRow Standard properties
These properties are used to configure tQASBatchAddressRow running in the Standard Job framework.
The Standard
tQASBatchAddressRow component belongs to the Data Quality family.
This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
Basic settings
Schema |
A schema is a row description, it defines the number of fields to be processed Click Sync columns to retrieve the schema from |
 |
Built-in: You create the schema and store it |
 |
Repository: You have already created the schema |
Edit schema |
Click the […] button Make sure to define in the output schema all columns necessary to output the |
QAS Version |
Select from the list the QAS Batch version to use for tQASBatchAddressRow. |
Country |
Select from the list the country corresponding to your input addresses. If you want to have a global output schema, select Universal from this list. |
Choose the address column |
Select from the list the address column you want to analyze. |
Specify the configuration file |
Click the […] button |
Advanced settings
tStat |
Select this check box to collect log data at the component level. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is an intermediary step. It requires an input flow as well as an |
Limitation/prerequisite |
Before being able to use this component, you must install the QAS Batch |
Editing addresses against QAS files and giving the verification status
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
The three-component Job created in Talend Studio for
this scenario uses the tQASBatchAddressRow component to analyze
the input columns and display the correct formatted address along with their
verification status on the console.
Setting up the Job
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tQASBatchAddressRow and tLogRow.
-
Connect the components together using Row > Main connections.
Configuring the components
- Make sure you installed the relevant country datasets from QAS.
- Make sure the default output layout parameters in the
qaworld.ini file are appropriate, or edit the
parameters for the relevant country as needed.
You can find more information about editing the
qaworld.ini file on Talend Help Center (https://help.talend.com).
-
Double-click tFixedFlowInput to display its Basic
settings view and define its properties. -
Click the […] button
next to Edit Schema to open a dialog box,
and add one column: addr. Then click OK to close the dialog box. -
Define the data for the input column:
-
In the Mode area, select Use Inline
Table. -
Click the [+] button to add rows in the
table. - Click each row and enter the input addresses.
-
In the Mode area, select Use Inline
-
Double-click the tQASBatchAddressRow component to display its Basic settings and define the component
properties. -
From the QAS Version list, select the version of the QAS
Batch API you installed, QAS V7.50(+) in this
example. -
From the Country list,
select the country corresponding to your input addresses. -
From the Choose the address
column list, select the address column you want to analyze,
addr in this example. -
Click the […] button
next to the Specify the configuration
file field and browse to the QAS configuration file installed
locally. -
Click the […] button next to Edit
schema and define in the output schema the columns necessary to
hold the formatted address.The output schema depends on the output layout for the selected country
in the qaworld.ini file. -
Double-click the tLogRow
component to display its Basic settings
view and select Table in the Mode area to display the Job execution result in
table cells.
Executing the Job
tQASBatchAddressRow reads input rows, corrects and formats
addresses, gives the verification status in the STATUS column and gives the
result in the ADDRESS_LINE_1, ADDRESS and POSTAL_CODE_CITY columns. For further
information on the status column, check the corresponding documentation at http://www.qas.com.