tBatchAddressRowCloud
Uses batch processing to parse address data and get formatted addresses quickly,
accurately and without installing any software.
Address data is corrected against the latest online reference data from providers that
support a batch services including Loqate and MelissaData. tBatchAddressRowCloud proposes alternatives for missing address data such as
country or postal code, and addresses are enriched with other elements such as latitude
longitude.The advantages of this component over tAddressRowCloud is that you gain in performance when dealing with large amounts
of data.
tBatchAddressRowCloud splits addresses
from input streams into several files (batches) and verifies and formats addresses in each
file by using online batch services.
tBatchAddressRowCloud Standard properties
These properties are used to configure tBatchAddressRowCloud running in the Standard Job framework.
The Standard
tBatchAddressRowCloud component belongs to the Data Quality family.
This component is available in the Palette of the Studio only if you have subscribed to one of the Talend Platform products.
Basic settings
Schema |
A schema is a row description. It defines the number of fields (columns) to |
|
Built-In: You create and store the |
|
Repository: You have already created |
Edit Schema |
Click the […] button and define The output schema of tBatchAddressRowCloud proposes several read-only Also some of the output columns could be empty depending on what |
Address Provider |
Select from the list the provider of the reference data against The list of address providers includes Loqate and |
Default Country |
Select the country name for which the ISO 3166-1 alpha-3 code |
License/API key |
Enter the license or the API key provided by the address provider You must visit the provider website, register and get the |
Batch job name |
Enter, between quotation marks, a name of your choice to give to |
Number of rows in each batch file |
Enter the number of address records you want to group in each |
Loqate website login |
Enter your login provided by Loqate. |
Loqate website password |
Enter the password provided by Loqate. |
Processing Mode |
This option is applied only to the Loqate provider. Select from the list the mode of address validation you want to –Verify and Geocode (selected by
default): with this mode, the component standardizes and corrects addresses and enriches them with latitude and longitude information. Note:
Combining address verification and geocoding will use –Verify only: with this mode, the |
Input Mapping |
Address field: add lines to the
tBatchAddressRowCloud provides a
Input Column: add lines to the |
Use Additional Output |
Select this check box and use the Output
Address field: add lines to the These predefined address fields vary according to the provider you
Output Column: select from the
tBatchAddressRowCloud maps the If you select to have an output column in the Output Address table that has the exact |
Advanced settings
Fields in this view will vary according to the address provider –Address Line Separator: define If you keep the default option, Default in this field, the component uses the line –Forced Country: select the –Output Script: select the The script list differs according to the address provider you When the address provider is Loqate or MelissaData: If you keep the default option, Not Select Latin to encode the Select Native/Match input to The Native/Match input script
–Minimum match score: set the This option is very helpful when you want to get, in the output –Minimum interval between two queries –Limit of retrying the same query in case it –Interval between two retries of the same –Delay before forcing the termination of the |
|
Use mockup mode (no credit consumption) |
Before being able to use this option, you must run your Job at This option is only for testing or for development needs. You will Select this check box to simulate execution and responses from the –Batch ID: set the identifier of You can get the file identifier if you log in at Everything |
tStat |
Select this check box to collect log data at the component |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is an intermediary step. It requires an input and |
Scenario: Parsing addresses against reference data in the Cloud using batch
processing
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
This scenario describes a Job which uses:
-
the tFixedFlowInput component to generate the
address data to be analyzed, -
the tBatchAddressRowCloud component to parse,
standardize and format the addresses in the Cloud through the Address Validation
API, -
the tFileOutputExcel component to output the
correct formatted addresses in an .xls file.
You must have internet connection to be able to use tBatchAddressRowCloud.
Setting up the Job
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tBatchAddressRowCloud and tFileOutputExcel.
- Connect the three components together using the Main links.
Configuring the input component
-
Double-click tFixedFlowInput to open its
Basic settings view in the Component tab. -
Create the schema through the Edit Schema
button.In the open dialog box, click the [+]
button and add the columns that will hold the information in the input
address. For this example, add ID,
Organization, Address1 x 8,
Locality, AdministrativeArea,
PostalCode and Country. - Click OK.
-
In the Number of rows field, enter
1. - In the Mode area, select the Use Inline Content option.
-
In the Content table, enter the address
data you want to analyze, for example:1234567891011121000 23 girdwood road london sw18 GBR1001 1111 bayhill drive ste 290 san bruno ca USA1002 23 girdwood road london sw18 GBR1003 1111 bayhill drive ste 290 san bruno ca USA1004 23 girdwood road london sw18 GBR1005 1111 bayhill drive ste 290 san bruno ca USA1006 23 girdwood road london sw18 GBR1007 1111 bayhill drive ste 290 san bruno ca USA1008 23 girdwood road london sw18 GBR1009 1111 bayhill drive ste 290 san bruno ca USA1010 23 girdwood road london sw18 GBR...
Parsing addresses against Loqate
Setting the schema and selecting an address provider
-
Double-click tBatchAddressRowCloud to
display the Basic settings view and define
the component properties. -
If required, click Sync columns to
retrieve the schema defined in the input component. -
Click the Edit schema button to open the
schema dialog box.tBatchAddressRowCloud proposes several
predefined read-only address columns as shown in the below capture.The
STATUS
column returns the status of
processing input addresses. For further information about process status,
see Process status in tLoqateAddressRow.TheAddressVerificationCode
column returns the verification
code for the processed address. For further information about what values
this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.The VerificationLevel output column
provides you with a verification status of the processed addresses. For
further information, see Address verification levels in tAddressRowCloud. -
Move any of the input columns to the output schema if you want to show
them in the verification results, click OK
and accept to propagate the changes.You can also add columns directly in the output schema to retrieve
additional address information from the provider repository. -
Select from the Address Provider list the
provider of the reference data against which you want to validate and format
input addresses, Loqate in this
example.You can also validate addresses against MelissaData online service. -
In the License/API key field, enter the
license key provided by Loqate. -
In the Batch job name field, enter
between quotation marks a name of your choice to give to the batch files
that will be generated and saved on the Loqate server.Set the number of address records you want to group in each batch file in
the Number of rows in each batch file
field. - Enter the login and password provided by Loqate in the Loqate website login and Loqate website password respectively.
-
From the Processing Mode list,
select:Option
To… Verify and Geocode
(selected by default)standardize and correct addresses and enrich them with
latitude and longitude.Combining address verification and geocoding will cost
extra credits. For further information, see Cloud Price Card.Verify only standardize and correct addresses without enriching
them with latitude and longitude.
Defining address mapping and setting advanced parameters
-
In the Input Mapping table:
-
Use the [+] button to add lines
in the table. -
Click in the Address Field column
and select from the predefined list the fields that hold the input
address, Address in this
example.The component will map the values of these fields to the input
columns you set in this table.tBatchAddressRowCloud provides a
list of individual fields because some countries have more complex
addressing structures than others. -
Click in the Input Column column
and select from the list of the input schema the columns that hold
the input address you want to parse, Address1 in this example.
-
-
If required, select the Use Additional
Output check box and define in the table what extra address
fields you want to retrieve from the provider repository and add to the
parsing results. For an example on how to use this table, check Defining additional address fields.The Address field column holds predefined
address fields which vary according to the provider you select. The
Output Column column holds the fields
you want to use to output the extra information. You must first add these
additional columns to the component schema through the Edit Schema button. -
Click the Advanced settings tab and set the
parameters in this view according to your needs.In this example:-
Select the Use mockup mode (no credit
consumption) check box.This check box enables you to simulate execution and responses from
the Loqate server by using as input a batch file that has been already
processed by the Job and saved on the server. -
Log in to Everything
Location and access the Loqate server at Online
Batch Cleansing to fetch the identifier of the batch file you
want to use as output in your Job. -
Set the identifier in the Batch ID
field.This option is used only for testing or for development needs.
-
Leave all other default parameters as they are.
-
Configuring the output component and executing the Job
-
Double-click the tFileOutputExcel
component to display the Basic settings
view and define the component properties. -
Set the destination file name as well as the sheet name and then select
the Define all columns auto size check
box. -
Save your Job and press F6 to execute
it.The tBatchAddressRowCloud component
parses addresses using batch processing. It corrects addresses using the
online batch service of Loqate and writes the result in batch files on the
Loqate server. -
Right-click the output component and select Data
Viewer to display the formatted address data.tBatchAddressRowCloud matches input
address data against the Loqate repository.TheSTATUS
output column returns theOK
status
for all address rows. This means that the verification process of all
address rows could be completed successfully by the component. For further
information about process status, see Process status in tLoqateAddressRow.The VerificationLevel output column
provides you with a verification levels defined by
Talend
. For further information, see Address verification levels in tAddressRowCloud.TheAddressVerificationCode
output column returns a
verification code for each of the processed address rows.For further information about what values this code is made up of and the
implications of each segment, see Address verification codes in tLoqateAddressRow.