tMap

Transforms and routes data from single or multiple sources to single or multiple
destinations.

tMap is an advanced component, which integrates itself as plugin to
Talend Studio.

Tip:

There is no order among the output flows of
tMap. To make the output flows to be executed one by one,
you can output them to temporary files or memory, and then read and insert them into
files or databases using different subjobs linked by Trigger > OnSubjobOK connections.

Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:

Standard: see tMap Standard properties.

The component in this framework is available in all Talend
products.
MapReduce: see tMap MapReduce properties (deprecated).

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Spark Batch:
see tMap properties for Apache Spark Batch.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Spark Streaming:
see tMap properties for Apache Spark Streaming.

This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.

tMap Standard properties

These properties are used to configure tMap running in the Standard Job framework.

The Standard
tMap component belongs to the Processing family.

The component in this framework is available in all Talend
products.

Basic settings

Map editor	It allows you to define the tMap routing and transformation properties. If needed, click the button at the top of the input area to open the Property Settings dialog box, which provides the following options: Die on error: Select this check box if you want to kill the Job if there is an error. This check box is selected by default. Lookup in parallel: Select this check to maximize the data transformation performance in a Job that handles multiple lookup input flows with large amounts of data. Enable Auto-Conversion of types: If your input and output columns across a mapping are of different data types, select this check box to enable automatic type conversion at the run time to avoid compiling errors. This option is enabled by default if the Enable Auto-Conversion of types check box is selected in the Project Settings view when this component is added. You can also override the default conversion behavior of this component by setting conversion rules in the Project Settings view. For more information, see Talend Studio User Guide. Note that auto conversion between Date and BigDecimal is not supported. Store on disk: The options provided in this area are identical to the relevant options provided on the Basic settings and Advanced settings tabs respectively. Settings made in the Property Settings dialog box are reflected in the respective tab views and vice versa. This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide. This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.
Mapping links display as	Auto: the default setting is curves links Curves: the mapping display as curves Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance.
Temp data directory path	Enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide.
Preview	The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes.

Advanced settings

Max buffer size (nb of rows)	Type in the size of physical memory, in number of rows, you want to allocate to processed data.
Ignore trailing zeros for BigDecimal	Select this check box to ignore trailing zeros for BigDecimal data.
tStatCatcher Statistics	Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more…
Limitation	The use of tMap supposes minimum Java knowledge in order to fully exploit its functionalities. This component is a junction step, and for this reason cannot be a start nor end component in the Job.

Usage rule

Possible uses are from a simple reorganization of fields to the most
complex Jobs of data multiplexing or demultiplexing transformation,
concatenation, inversion, filtering and more…

Limitation

The use of tMap supposes minimum Java
knowledge in order to fully exploit its functionalities.

This component is a junction step, and for this reason cannot be a
start nor end component in the Job.

Mapping data using a filter and a simple explicit join

The Job described below aims at reading data from a csv file with its schema stored in
the Repository, looking up at a reference file, the schema of which is also stored in
the Repository, then extracting data from these two files based on a defined filter to
an output file and reject files.

Linking the components

Drop two tFileInputDelimited components,
tMap and three tFileOutputDelimited components onto the design workspace.
Rename the two tFileInputDelimited
components as Cars and Owners,
either by double-clicking the label in the design workspace or via the
View tab of the Component view.
Connect the two input components to tMap
using Row > Main connections and label the connections as
Cars_data and Owners_data
respectively.
Connect tMap to the three output
components using Row > New Output (Main) connections and name the output
connections as Insured, Reject_NoInsur and Reject_OwnerID
respectively.

Configuring the components

Double-click the tFileInputDelimited
component labelled Cars to display its Basic settings view.
Select Repository from the Property type list and select the component’s
schema, cars in this scenario, from the Repository Content dialog box. The rest fields
are automatically filled.
Double-click the component labelled Owners and repeat
the setting operation. Select the appropriate metadata entry,
owners in this scenario.

Note:

In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see

Talend Studio User Guide.
Double-click the tMap component to open
the Map Editor.

Note that the input area is already filled with the defined input tables
and that the top table is the main input table, and the respective row
connection labels are displayed on the top bar of the table.
Create a join between the two tables on the ID_Owner
column by simply dropping the ID_Owner column from the
Cars_data table onto the
ID_Owner column in the
Owners_data table.
Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model,
clicking the small button that appears in the field, and selecting Inner Join from the Options dialog box.
Drag all the columns of the Cars_data table to the
Insured table.
Drag the ID_Owner,
Registration, and ID_Reseller
columns of the Cars_data table and the
Name column of the Owners_data
table to the Reject_NoInsur table.
Drag all the columns of the Cars_data table to the
Reject_OwnerID table.

For more information regarding data mapping, see
Talend Studio
User Guide.
Click the plus arrow button at the top of the Insured
table to add a filter row.

Drag the ID_Insurance column of the
Owners_data table to the filter condition area and
enter the formula meaning ‘not undefined’: Owners_data.ID_Insurance != null.

With this filter, the Insured table will gather all
the records that include an insurance ID.
Click the tMap settings button at the top
of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to
gather the records that do not include an insurance ID.
Click the tMap settings button at the top
of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the
records from the Cars_data flow with missing or
unmatched owner IDs.

Click OK to validate the mappings and
close the Map Editor.
Double-click each of the output components, one after the other, to define
their properties. If you want a new file to be created, browse to the
destination output folder, and type in a file name including the
extension.

Select the Include header check box to
reuse the column labels from the schema as header row in the output
file.

Executing the Job

Press Ctrl + S to save
your Job.
Press F6 to run the
Job.

The output files are created, which contain the relevant data
as defined.
For examples of how to use
dynamic schemas with tMap,
see:
- Writing dynamic columns from a database to an output file.
- Writing dynamic columns from a source file to a database.

Mapping data using inner join rejections

This scenario, based on scenario 1, adds one input file containing details about
resellers and extra fields in the main output table. Two filters on inner joins are
added to gather specific rejections.

Linking the components

Drop a tFileInputDelimited component and
a tFileOutputDelimited component to the
design workspace, and label the components as Resellers
and No_Reseller_ID respectively.
Connect it to the Mapper using a Row >
Main connection, and label the
connection as Resellers_data.
Connect the tMap component to the new
tFileOutputDelimited component by using
the Row connection named
Reject_ResellerID.

Configuring the components

Double-click the Resellers component to display its
Basic settings view.
Select Repository from the Property type list and select the component’s
schema, resellers in this scenario, from the Repository Content dialog box. The rest fields
are automatically filled.

Note:

In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see

Talend Studio User Guide.
Double-click the tMap component to open
the Map Editor.

Note that the schema of the new input component is already added in the
Input area.
Create a join between the main input flow and the new input flow by
dropping the ID_Reseller column of the
Cars_data table to the
ID_Reseller column of the
Resellers_data table.
Click the tMap settings button at the top
of the Resellers_data table and set Join Model to Inner
Join.
Drag all the columns except ID_Reseller of the
Resellers_data table to the main output table,
Insured.

Note:

When two inner joins are defined, you either need to define two
different inner join reject tables to differentiate the two rejections
or, if there is only one inner join reject output, both inner join
rejections will be stored in the same output.
Click the [+] button at the top of the
output area to add a new output table, and name this new output table
Reject_ResellerID.
Drag all the columns of the Cars_data table to the
Reject_ResellerID table.
Click the tMap settings button and select
Catch lookup inner join reject to
true to define this new output table as
an inner join reject output.

If the defined inner join cannot be established, the information about
the relevant cars will be gathered through this output flow.
Now apply filters on the two Inner Join reject outputs, in order for to
distinguish the two types of rejection.

In the first Inner Join output table, Reject_OwnerID,
click the plus arrow button to add a filter line and fill it with the
following formula to gather only owner ID related rejection:
Owners_data.ID_Owner==null
In the second Inner Join output table,
Reject_ResellerID, repeat the same operation using
the following formula: Resellers_data.ID_Reseller==null

Click OK to validate the map settings and
close the Mapper Editor.
Double-click the No_Reseller_ID component to display
its Basic settings view.

Specify the output file path and select the Include
Header check box, and leave the other parameters as they
are.
To demonstrate the work of the Mapper, in this example, remove reseller
IDs 5 and 8 from the input file
Resellers.csv.

Executing the Job

Press Ctrl + S to save your Job.
Press F6 to run the Job.

The four output files are all created in the specified folder, containing
information as defined. The output file
No_Reseller_ID.csv contains the
cars information related to reseller IDs
5 and 8, which are missing in
the input file Resellers.csv.

For examples of how to use dynamic schemas with tMap, see:

As third advanced use scenario, based on the scenario 2, add a
new Input table containing Insurance details for example.

Set up an Inner Join between two lookup input tables (Owners
and Insurance) in the Mapper to create a cascade lookup and hence retrieve
Insurance details via the Owners table data.

Advanced mapping using filters, explicit joins and rejections

This scenario introduces a Job that allows you to find BMW owners who have two
to six children (inclusive), for sales promotion purpose for example.

Linking the components

Drop three tFileInputDelimited
components, a tMap component, and two
tFileOutputDelimited components from
the Palette onto the design workspace, and
label them to best describe their functions.
Connect the input components to the tMap
using Row > Main connections.

Pay attention to the file you connect first as it will automatically be
set as Main flow, and all the other
connections will be Lookup flows. In this
example, the connection for the input component Owners
is the Main flow.

Configuring the components

Define the properties of each input components in the respective Basic settings view. Define the properties of
Owners.
Select Repository from the Property type list and select the component’s
schema, owners in this scenario, from the Repository Content dialog box. The rest fields
are automatically filled.

Note:

In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see

Talend Studio User Guide.

In the same way, set the properties of the other input components:
Cars and Resellers. These two
Lookup flows will fill in secondary
(lookup) tables in the input area of the Map
Editor.
Then double-click the tMap component to
launch the Map Editor and define the
mappings and filters.

Set an explicit join between the Main
flow Owner and the Lookup flow Cars by dropping the
ID_Owner column of the Owners
table to the ID_Owner column of the
Cars table.

The explicit join is displayed along with a hash key.
In the Expr. Key field of the
Make column, type in a filter. In this use case,
simply type in "BMW" as the search is focused on the owners of
this particular make.
Implement a cascading join between the two lookup tables
Cars and Resellers on the
ID_Reseller column in order to retrieve resellers
information.
As you want to reject the null values into a separate table and exclude
them from the standard output, click the tMap
settings button and set Join
Model to Inner Join in each
of the Lookup tables.
In the tMap settings, you can set Match
Model to Unique match,
First match, or All matches. In this use case, the All
matches option is selected. Thus if several matches are found
in the Inner Join, rows matching the explicit join as well as the filter,
all of them will be added to the output flow (either in rejection or the
regular output).

Note:

The Unique match option functions as
a Last match. The First match and All
matches options function as named.
On the output area of the Map Editor,
click the plus button to add two tables, one for the full matches and the
other for the rejections.
Drag all the columns of the Owners table, the
Registration, Make and
Color columns of the Cars
table, and the ID_Reseller and
Name_Reseller columns of the
Resellers table to the main output table.
Drag all the columns of the Owners table to the
reject output table.
Click the Filter button at the top of the
main output table to display the Filter
expression area.

Type in a filter statement to narrow down the number of rows loaded in the
main output flow. In this use case, the statement reads:
Owners.Children_Nr >= 2 && Owners.Children_Nr <= 6.
In the reject output table, click the tMap
settings button and set the reject types.

Set Catch output reject to true to collect data about BMW car owners who
have less than two or more than six children.

Set Catch lookup inner join reject to
true to collect data about owners of
other car makes and owners for whom the reseller information is not
found.

Click OK to validate the mappings and
close the Map Editor.

On the design workspace, right-click the tMap and pull the respective output link to the relevant
output components.
Define the properties of the output components in their respective
Basic settings view.

In this use case, simple specify the output file paths and select the
Include Header check box, and leave the
other parameters as they are.

Executing the Job

Press Ctrl + S to save your Job.
Press F6 to run it.

The main output file contains the information related to BMW owners who
have two to six children, and the reject output file contains the
information about the rest of the car owners.
For examples of how to use dynamic schemas with tMap, see:
- Writing dynamic columns from a database to an output file.
- Writing dynamic columns from a source file to a database.

Advanced mapping with filters and different rejections

This scenario is a modified version of the preceding scenario. It describes a Job that
applies filters to limit the search to BMW and Mercedes owners who have two to six
children and divides unmatched data into different reject output flows.

Linking the components

Take the same Job as in Advanced mapping using filters, explicit joins and rejections.
Drop a new tFileOutputDelimited component
from the Palette on the design workspace,
and name it Rejects_BMW_Mercedes to present its
functionality.
Connect the tMap component to the new
output component using a Row connection and
label the connection according to the functionality of the output component.

This connection label will appear as the name of the new output table in
the Map Editor.
Relabel the existing output connections and output components to reflect
their functionality.

The existing output tables in the Map
Editor will be automatically renamed according to the
connection labels. In this example, relabel the existing output connections
BMW_Mercedes_withChildren and
Owners_Other_Makes respectively.

Configuring the components

Double-click the tMap component to launch
the Map Editor to change the mappings and
the filters.

Note that the output area contains a new, empty output table named
Rejects_BMW_Mercedes. You can adjust the position
of the table by selecting it and clicking the Up or Down arrow button at
the top of the output area.
Remove the Expr. key filter
(“BMW”) from the Cars table in
the input area.
Click the Filters button to display the
Filter field, and type in a new filter
to limit the search to BMW or
Mercedes car makes. The statement reads as follows:
Cars.Make.equals("BMW") || Cars.Make.equals("Mercedes")
Select all the columns of the main output table and drop them down to the
new output table.

Alternatively, you can also drag the corresponding columns from the
relevant input tables to the new output table.
Click the tMap settings button at the top
of the new output table and set Catch output
reject to true to collect
data about BMW and Mercedes owners who have less than two or more than six
children.
In the Owners_Other_Makes table, set Catch lookup inner join reject to true to collect data about owners of other car
makes and owners for whom the reseller information is not found.
Click OK to validate the mappings and
close the Map Editor.
Define the properties of the output components in their respective
Basic settings view.

In this use case, simple specify the output file paths and select the
Include Header check box, and leave the
other parameters as they are.

Executing the Job

Press Ctrl + S to save the Job.
Press F6 to run it.

The output files contain content of the main output flow shows that the
filtered rows have correctly been passed on.
For examples of how to use dynamic schemas with tMap, see:
- Writing dynamic columns from a database to an output file.
- Writing dynamic columns from a source file to a database.

Advanced mapping with lookup reload at each row

The following scenario describes a Job that retrieves people details from a
lookup database, based on a join on the age. The main flow source data is read from a
MySQL database table called people_age that contains people details
such as numeric id, alphanumeric first name and last name and numeric age. The people
age is either 40 or 60. The number of records in this table is intentionally
restricted.

The reference or lookup information is also stored in a MySQL database
table called large_data_volume. This lookup table contains a
number of records including the city where people from the main flow have been to. For
the sake of clarity, the number of records is restricted but, in a normal use, the
usefulness of the feature described in the example below is more obvious for very large
reference data volume.

To optimize performance, a database connection component is used in the
beginning of the Job to open the connection to the lookup database table in order not to
do that every time we want to load a row from the lookup table.

An Expression Filter is applied to this lookup source flow, in order to
select only data from people whose age is equal to 60 or 40. This way only the relevant
rows from the lookup database table are loaded for each row from the main flow.

Therefore this Job shows how, from a limited number of main flow rows, the
lookup join can be optimized to load only results matching the expression key.

Note:

Generally speaking, as the lookup loading is performed for each main
flow row, this option is mainly interesting when a limited number of rows is
processed in the main flow while a large number of reference rows are to be looked
up to.

The join is solved on the age field. Then, using the
relevant loading option in the tMap component
editor, the lookup database information is loaded for each main flow incoming row.

For this Job, the metadata has been prepared for the source and connection
components. For more information on how to set up the DB connection schema metadata, see
the relevant section in the
Talend Studio User Guide.

This Job is formed with five components, four database components and a
mapping component.

Linking the components

Drop the DB Connection under the Metadata
node of the Repository to the design
workspace. In this example, the source table is called
people_age.
Select tMysqlInput from the list that
pops up when dropping the component.
Drop the lookup DB connection table from the Metadata node to the design workspace selecting tMysqlInput from the list that pops up. In this
Job, the lookup is called large_data_volume.
The same way, drop the DB connection from the Metadata node to the design workspace selecting tMysqlConnection from the list that pops up. This
component creates a permanent connection to the lookup database table in
order not to do that every time we want to load a row from the lookup
table.
Then pick the tMap component from the
Processing family, and the tMysqlOutput and tMysqlCommit components from the Database family in the Palette
to the right hand side of the editor.
Now connect all the components together. To do so, right-click the
tMysqlInput component corresponding to
the people table and drag the link towards tMap.
Release the link over the tMap component,
the main row flow is automatically set up.
Rename the Main row link to
people, to identify more easily the main flow
data.
Perform the same operation to connect the lookup table
(large_data_volume) to the tMap component and the tMap
to the tMysqlOutput component.
A dialog box prompts for a name to the output link. In this example, the
output flow is named: people_mixandmatch.
Rename also the lookup row connection link to
large_volume, to help identify the reference data
flow.
Connect tMysqlConnection to tMysqlInput using the trigger link OnSubjobOk.
Connect the tMysqlInput component to the
tMysqlCommit component using the
trigger link OnSubjobOk.

Configuring the components

Double-click the tMap component to open
the graphical mapping editor.
The Output table (that was created
automatically when you linked the tMap to
the tMySQLOutput will be formed by the
matching rows from the lookup flow (large_data_volume)
and the main flow (people_age).

Select the main flow rows that are to be passed on to the output and drag
them over to paste them in the Output table (to the right hand side of the
mapping editor).

In this example, the selection from the main flow include the following
fields: id, first_name,
last_Name and age.

From the lookup table, the following column is selected:
city.

Drop the selected columns from the input tables
(people and large_volume) to
the output table.
Now set up the join between the main and lookup flows.

Select the age column of the main flow table (on top)
and drag it towards the age column of the lookup flow
table (large_volume in this example).

A key icon appears next to the linked expression on the lookup table. The
join is now established.
Click the tMap settings button, click the
three-dot button corresponding to Lookup
Model, and select the Reload at each
row option from the Options dialog box in order to reload the lookup for each
row being processed.
In the same way, set Match Model to
All matches in the Lookup table, in
order to gather all instances of age matches in the
output flow.
Now implement the filtering, based on the age column,
in the Lookup table. The GlobalMapKey field
is automatically created when you selected the Reload
at each row option. Indeed you can use this expression to
dynamically filter the reference data in order to load only the relevant
information when joining with the main flow.

As mentioned in the introduction of the scenario, the main flow data
contains only people whose age is either 40 or 60. To avoid the pain of
loading all lookup rows, including ages that are different from 40 and 60,
you can use the main flow age as global variable to feed the lookup
filtering.
Drop the Age column from the main flow table to the
Expr. field of the lookup table.
Then in the globalMap Key field, put in
the variable name, using the expression. In this example, it reads:
"people.Age"

Click OK to save the mapping setting and
go back to the design workspace.
To finalize the implementation of the dynamic filtering of the lookup
flow, you need now to add a WHERE clause in the query of the database
input.
At the end of the Query field, following
the Select statement, type in the following WHERE clause:
WHERE AGE ='"+((Integer)globalMap.get("people.Age"))+"'"
Make sure that the type corresponds to the column used as variable. In
this use case, Age is of Integer
type. And use the variable the way you set in the globalMap key field of the map editor.
Double-click the tMysqloutput component
to define its properties.
Select the Use an existing connection
check box to leverage the created DB connection.

Define the target table name and relevant DB actions.

Executing the Job

Press Ctrl + S to save the Job.
Click the Run tab at the bottom of the
design workspace, to display the Job execution tab.
From the Debug Run view, click the
Traces Debug button to view the data
processing progress.

For more comfort, you can maximize the Job design view while executing by
simply double-clicking on the Job name tab.

The lookup data is reloaded for each of the main flow’s rows,
corresponding to the age constraint. All age matches
are retrieved in the lookup rows and grouped together in the output
flow.

Therefore if you check out the data contained in the newly created
people_mixandmatch table, you will find all the
age duplicates corresponding to different
individuals whose age equals to 60 or 40 and the city where they have been
to.
For examples of how to use dynamic schemas with tMap, see:
- Writing dynamic columns from a database to an output file.
- Writing dynamic columns from a source file to a database.

Mapping with join output tables

The following scenario describes a Job that processes reject flows without
separating them from the main flow.

Linking the components

In the Repository tree view, click
Metadata > File delimited. Drag and drop the
customers metadata onto the workspace.

The customers metadata contains information about
customers, such as their ID, their name or their address, etc.

For more information about centralizing metadata, see
Talend Studio
User Guide.
In the dialog box that asks you to choose which component type you want to
use, select tFileInputDelimited and click
OK.
Drop the states metadata onto the design workspace.
Select the same component in the dialog box and click OK.

The states metadata contains the ID of the state,
and its name.
Drop a tMap and two tLogRow components from the Palette onto the design workspace.
Connect the customers component to the tMap, using a Row >
Main connection.
Connect the states component to the tMap, using a Row >
Main connection. This flow will automatically be defined as
Lookup.

Configuring the components

Double-click the tMap component to open
the Map Editor.

Drop the idState column from the main input table to
the idState column of the lookup table to create a
join.

Click the tMap settings button and set
Join Model to Inner Join.
Click the Property Settings button at the
top of the input area to open the Property
Settings dialog box, and clear the Die
on error check box in order to handle the execution errors.

The ErrorReject table is automatically
created.
Select the id, idState,
RegTime and RegisterTime in
the input table and drag them to the ErrorReject table.
Click the [+] button at the top right of
the editor to add an output table. In the dialog box that opens, select
New output. In the field next to it,
type in the name of the table, out1. Click OK.
Drag the following columns from the input tables to the
out1 table: id,
CustomerName, idState, and
LabelState.

Add two columns, RegTime and
RegisterTime, to the end of the
out1 table and set their date formats:
"dd/MM/yyyy HH:mm" and "yyyy-MM-dd HH:mm:ss.SSS" respectively.
Click in the Expression field for the
RegTime column, and press Ctrl+Space to display the auto-completion list. Find and
double-click TalendDate.parseDate. Change the pattern to
("dd/MM/yyyy HH:mm",row1.RegTime).
Do the same thing for the RegisterTime column, but
change the pattern to ("yyyy-MM-dd HH:mm:ss.SSS",row1.RegisterTime).
Click the [+] button at the top of the
output area to add an output table. In the dialog box that opens, select
Create join table from, choose
Out1, and name it rejectInner.
Click OK.
Click the tMap settings button and set
Catch lookup inner join reject to
true in order to handle rejects.
Drag the id, CustomerName, and
idState columns from the input tables to the
corresponding columns of the rejectInner table.

Click in the Expression field for the
LabelState column, and type in
"UNKNOWN".
Click in the Expression field for the
RegTime column, press Ctrl+Space, and select TalendDate.parseDate. Change the
pattern to ("dd/MM/yyyy HH:mm",row1.RegTime).
Click in the Expression field for the
RegisterTime column, press Ctrl+Space, and select TalendDate.parseDate, but change
the pattern to ("yyyy-MM-dd HH:mm:ss.SSS",row1.RegisterTime).

If the data from row1 has a wrong pattern, it will
be returned by the ErrorReject flow.

Click OK to validate the changes and
close the editor.
Double-click the first tLogRow component
to display its Component view.

Click Sync columns to retrieve the
schema structure from the mapper if needed.

In the Mode area, select Table.

Do the same thing with the second tLogRow.

Executing the Job

Press Ctrl + S to save your Job.
Press F6 to execute it.

The Run console displays the main out
flow and the ErrorReject flow. The main output flow unites both valid data
and inner join rejects, while the ErrorReject flow contains the error
information about rows with unparseable date formats.

For examples of how to use dynamic
schemas with tMap, see:

tMap MapReduce properties (deprecated)

These properties are used to configure tMap running in the MapReduce Job framework.

The MapReduce
tMap component belongs to the Processing family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.

Basic settings

Map editor	It allows you to define the tMap routing and transformation properties. Note: If you do not want to handle execution errors, you can click the Property Settings button at the top of the input area and select the Die on error check box (selected by default) in the Property Settings dialog box. It will kill the Job if there is an error. Note: To maximize the data transformation performance in a Job that handles multiple lookup input flows with large amounts of data, you can select the Lookup in parallel check box in the Property Settings dialog box. However, in a Map/Reduce Job, only one expression key is allowed per mapping component. If you need to use multiple expression keys to join different input tables, use multiple tMap components one after another.
Mapping links display as	Auto: the default setting is curves links Curves: the mapping display as curves Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance.
Temp data directory path	Enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide.
Preview	The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes.
Use replicated join	Select this check box to perform a replicated join between the input flows. By replicating each lookup table into memory, this type of join doesn’t require an additional shuffle-and-sort step, thus speeding up the whole process. You need to ensure that the entire lookup tables fit in memory.

Advanced settings

Max buffer size (nb of rows)	Type in the size of physical memory, in number of rows, you want to allocate to processed data.
Ignore trailing zeros for BigDecimal	Select this check box to ignore trailing zeros for BigDecimal data.

Global Variables

Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	In a Talend Map/Reduce Job, this component is used as an intermediate step and other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. As explained earlier, If you need to use multiple expression keys to join different input tables, use mutiple tMap components one after another. For further information about a Talend Map/Reduce Job, see the sections describing how to create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide . Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.

Usage rule

In a
Talend
Map/Reduce Job, this component is used as an intermediate
step and other components used along with it must be Map/Reduce components, too. They
generate native Map/Reduce code that can be executed directly in Hadoop.

As explained earlier, If you need to use multiple expression keys
to join different input tables, use mutiple tMap components one after another.

For further information about a
Talend
Map/Reduce Job, see the sections
describing how to create, convert and configure a
Talend
Map/Reduce Job of the

Talend Open Studio for Big Data Getting Started Guide
.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs, and non Map/Reduce Jobs.

Related scenarios

No scenario is available for the Map/Reduce version of this component yet.

tMap properties for Apache Spark Batch

These properties are used to configure tMap running in the Spark Batch Job framework.

The Spark Batch
tMap component belongs to the Processing family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Map editor	It allows you to define the tMap routing and transformation properties but note that only the Load once lookup model is supported by the Spark Batch Jobs. For further information about this Load once lookup model, see the related description of Handling Lookups. When you click the Property Settings button at the top of the input area, a Property Settings dialog box is displayed in which you can set the following parameters: If you do not want to handle execution errors, select the Die on error check box (selected by default). It will kill the Job if there is an error. To maximize the data transformation performance in a Job that handles multiple lookup input flows with large amounts of data, you can select the Lookup in parallel check box. Temp data directory path: enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide. Max buffer size (nb of rows): enter the size of physical memory, in number of rows, you want to allocate to processed data.
Mapping links display as	Auto: the default setting is curves links Curves: the mapping display as curves Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance.
Preview	The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes.
Use replicated join	Select this check box to perform a replicated join between the input flows. By replicating each lookup table into memory, this type of join doesn’t require an additional shuffle-and-sort step, thus speeding up the whole process. You need to ensure that the entire lookup tables fit in memory.
Max buffer size (nb of rows)	Type in the size of physical memory, in number of rows, you want to allocate to processed data.

Usage

Usage rule	This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, appears only when you are creating a Spark Batch Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs.
Spark Connection	In the Spark Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files: Yarn mode (Yarn client or Yarn cluster): When using Google Dataproc, specify a bucket in the Google Storage staging bucket field in the Spark configuration tab. When using HDInsight, specify the blob to be used for Job deployment in the Windows Azure Storage configuration area in the Spark configuration tab. When using Altus, specify the S3 bucket or the Azure Data Lake Storage for Job deployment in the Spark configuration tab. When using Qubole, add a tS3Configuration to your Job to write your actual business data in the S3 system with Qubole. Without tS3Configuration, this business data is written in the Qubole HDFS system and destroyed once you shut down your cluster. When using on-premise distributions, use the configuration component corresponding to the file system your cluster is using. Typically, this system is HDFS and so use tHDFSConfiguration. Standalone mode: use the configuration component corresponding to the file system your cluster is using, such as tHDFSConfiguration or tS3Configuration. If you are using Databricks without any configuration component present in your Job, your business data is written directly in DBFS (Databricks Filesystem). This connection is effective on a per-Job basis.

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Batch component Palette it belongs to,
appears only when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise explicitly stated, a
scenario presents only Standard Jobs, that is to
say traditional
Talend
data integration Jobs.

Spark Connection

In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:

Yarn mode (Yarn client or Yarn cluster):
- When using Google Dataproc, specify a bucket in the
  Google Storage staging bucket
  field in the Spark configuration
  tab.
- When using HDInsight, specify the blob to be used for Job
  deployment in the Windows Azure Storage
  configuration area in the Spark
  configuration tab.
- When using Altus, specify the S3 bucket or the Azure
  Data Lake Storage for Job deployment in the Spark
  configuration tab.
- When using Qubole, add a
  tS3Configuration to your Job to write
  your actual business data in the S3 system with Qubole. Without
  tS3Configuration, this business data is
  written in the Qubole HDFS system and destroyed once you shut
  down your cluster.
- When using on-premise
  distributions, use the configuration component corresponding
  to the file system your cluster is using. Typically, this
  system is HDFS and so use tHDFSConfiguration.
Standalone mode: use the
configuration component corresponding to the file system your cluster is
using, such as tHDFSConfiguration or
tS3Configuration.

If you are using Databricks without any configuration component present
in your Job, your business data is written directly in DBFS (Databricks
Filesystem).

This connection is effective on a per-Job basis.

Related scenarios

For a related scenario, see Performing download analysis using a Spark Batch Job.

tMap properties for Apache Spark Streaming

These properties are used to configure tMap running in the Spark Streaming Job framework.

The Spark Streaming
tMap component belongs to the Processing family.

This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.

Basic settings

Map editor	It allows you to define the tMap routing and transformation properties. When you click the Property Settings button at the top of the input area, a Property Settings dialog box is displayed in which you can set the following parameters: If you do not want to handle execution errors, select the Die on error check box (selected by default). It will kill the Job if there is an error. To maximize the data transformation performance in a Job that handles multiple lookup input flows with large amounts of data, you can select the Lookup in parallel check box. Temp data directory path: enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide. Max buffer size (nb of rows): enter the size of physical memory, in number of rows, you want to allocate to processed data.
Mapping links display as	Auto: the default setting is curves links Curves: the mapping display as curves Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance.
Preview	The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes.
Use replicated join	Select this check box to perform a replicated join between the input flows. By replicating each lookup table into memory, this type of join doesn’t require an additional shuffle-and-sort step, thus speeding up the whole process. You need to ensure that the entire lookup tables fit in memory.

Usage

Usage rule	It usually works with a Lookup Input component such as tMongoDBLookupInput to construct and consume a lookup flow. In this situation, you must use Reload at each row or Reload at each row (cache) to read data from the lookup flow. This approach ensures that no redundant records are stored in memory before being sent to tMap. For a use case in which tMap is used with a Lookup Input component, see Reading and writing data in MongoDB using a Spark Streaming Job. Note that Reload at each row or Reload at each row (cache) in a streaming Job is supported by the Lookup Input components only.
Spark Connection	In the Spark Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files: Yarn mode (Yarn client or Yarn cluster): When using Google Dataproc, specify a bucket in the Google Storage staging bucket field in the Spark configuration tab. When using HDInsight, specify the blob to be used for Job deployment in the Windows Azure Storage configuration area in the Spark configuration tab. When using Altus, specify the S3 bucket or the Azure Data Lake Storage for Job deployment in the Spark configuration tab. When using Qubole, add a tS3Configuration to your Job to write your actual business data in the S3 system with Qubole. Without tS3Configuration, this business data is written in the Qubole HDFS system and destroyed once you shut down your cluster. When using on-premise distributions, use the configuration component corresponding to the file system your cluster is using. Typically, this system is HDFS and so use tHDFSConfiguration. Standalone mode: use the configuration component corresponding to the file system your cluster is using, such as tHDFSConfiguration or tS3Configuration. If you are using Databricks without any configuration component present in your Job, your business data is written directly in DBFS (Databricks Filesystem). This connection is effective on a per-Job basis.

Usage rule

It usually works with a Lookup Input component such as tMongoDBLookupInput to construct and consume a lookup flow. In this situation,
you must use Reload at each row or Reload at each row (cache) to read data from the lookup flow. This approach ensures that no redundant records are stored in memory before being sent to tMap. For a use case in which tMap
is used with a Lookup Input component, see Reading and writing data in MongoDB using a Spark Streaming Job. Note that Reload at each row or Reload at each row (cache) in a streaming Job is supported by the Lookup Input components
only.

Spark Connection

Yarn mode (Yarn client or Yarn cluster):
- When using Google Dataproc, specify a bucket in the
  Google Storage staging bucket
  field in the Spark configuration
  tab.
- When using HDInsight, specify the blob to be used for Job
  deployment in the Windows Azure Storage
  configuration area in the Spark
  configuration tab.
- When using Altus, specify the S3 bucket or the Azure
  Data Lake Storage for Job deployment in the Spark
  configuration tab.
- When using Qubole, add a
  tS3Configuration to your Job to write
  your actual business data in the S3 system with Qubole. Without
  tS3Configuration, this business data is
  written in the Qubole HDFS system and destroyed once you shut
  down your cluster.
- When using on-premise
  distributions, use the configuration component corresponding
  to the file system your cluster is using. Typically, this
  system is HDFS and so use tHDFSConfiguration.
Standalone mode: use the
configuration component corresponding to the file system your cluster is
using, such as tHDFSConfiguration or
tS3Configuration.

If you are using Databricks without any configuration component present
in your Job, your business data is written directly in DBFS (Databricks
Filesystem).

This connection is effective on a per-Job basis.

Related scenarios

For a related scenario, see Analyzing a Twitter flow in near real-time.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 7.x

0 Comments

Inline Feedbacks

View all comments