Component family |
Processing |
|
Function |
tMap is an advanced component, |
|
Purpose |
tMap transforms and routes data |
|
Basic settings |
Map editor |
It allows you to define the tMap NoteIf you do not want to handle execution errors, you can click NoteTo maximize the data transformation performance in a Job that However, in a Map/Reduce Job, only one This component offers the advantage of the dynamic schema feature. This allows you to This dynamic schema feature is designed for the purpose of retrieving unknown columns |
|
Mapping links display as |
Auto: the default setting is
Curves: the mapping display as
Lines: the mapping displays as |
Temp data directory path | Enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide. |
|
|
Preview |
The preview is an instant shot of the Mapper data. It becomes |
Use replicated join |
Select this check box to perform a replicated join between the Note that this feature is available only to the Map/Reduce version |
|
Advanced settings | Max buffer size (nb of rows) | Type in the size of physical memory, in number of rows, you want to allocate to processed data. |
Ignore trailing zeros for BigDecimal | Select this check box to ignore trailing zeros for BigDecimal data. |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
Possible uses are from a simple reorganization of fields to the |
|
Usage in Map/Reduce Jobs |
If you have subscribed to one of the Talend solutions with Big Data, you can also As explained earlier, If you need to use multiple expression keys For further information about a Talend Map/Reduce Job, see the sections Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
The use of tMap supposes minimum This component is a junction step, and for this reason cannot be |
Note
For further information, see Talend Studio User
Guide.
The Job described below aims at reading data from a csv file with its schema stored in
the Repository, looking up at a reference file, the schema of which is also stored in
the Repository, then extracting data from these two files based on a defined filter to
an output file and reject files.
-
Drop two tFileInputDelimited components,
tMap and three tFileOutputDelimited components onto the design workspace. -
Rename the two tFileInputDelimited
components as Cars and Owners,
either by double-clicking the label in the design workspace or via the
View tab of the Component view. -
Connect the two input components to tMap
using Row > Main connections and label the connections as
Cars_data and Owners_data
respectively. -
Connect tMap to the three output
components using Row > New Output (Main) connections and name the output
connections as Insured, Reject_NoInsur and Reject_OwnerID
respectively.
-
Double-click the tFileInputDelimited
component labelled Cars to display its Basic settings view. -
Select Repository from the Property type list and select the component’s
schema, cars in this scenario, from the [Repository Content] dialog box. The rest fields
are automatically filled. -
Double-click the component labelled Owners and repeat
the setting operation. Select the appropriate metadata entry,
owners in this scenario.Note
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see
Talend Studio User Guide. -
Double-click the tMap component to open
the Map Editor.Note that the input area is already filled with the defined input tables
and that the top table is the main input table, and the respective row
connection labels are displayed on the top bar of the table. -
Create a join between the two tables on the ID_Owner
column by simply dropping the ID_Owner column from the
Cars_data table onto the
ID_Owner column in the
Owners_data table. -
Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model,
clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box. -
Drag all the columns of the Cars_data table to the
Insured table. -
Drag the ID_Owner,
Registration, and ID_Reseller
columns of the Cars_data table and the
Name column of the Owners_data
table to the Reject_NoInsur table. -
Drag all the columns of the Cars_data table to the
Reject_OwnerID table.For more information regarding data mapping, see Talend Studio
User Guide. -
Click the plus arrow button at the top of the Insured
table to add a filter row.Drag the ID_Insurance column of the
Owners_data table to the filter condition area and
enter the formula meaning ‘not undefined’:Owners_data.ID_Insurance !=
null.With this filter, the Insured table will gather all
the records that include an insurance ID. -
Click the tMap settings button at the top
of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to
gather the records that do not include an insurance ID. -
Click the tMap settings button at the top
of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the
records from the Cars_data flow with missing or
unmatched owner IDs.Click OK to validate the mappings and
close the Map Editor. -
Double-click each of the output components, one after the other, to define
their properties. If you want a new file to be created, browse to the
destination output folder, and type in a file name including the
extension.Select the Include header check box to
reuse the column labels from the schema as header row in the output
file.
-
Press Ctrl + S to save your Job.
-
Press F6 to run the Job.
The output files are created, which contain the relevant data as defined.
For examples of how to use dynamic
schemas with tMap, see:
This scenario, based on scenario 1, adds one input file containing details about
resellers and extra fields in the main output table. Two filters on inner joins are
added to gather specific rejections.
-
Drop a tFileInputDelimited component and
a tFileOutputDelimited component to the
design workspace, and label the components as Resellers
and No_Reseller_ID respectively. -
Connect it to the Mapper using a Row >
Main connection, and label the
connection as Resellers_data. -
Connect the tMap component to the new
tFileOutputDelimited component by using
the Row connection named
Reject_ResellerID.
-
Double-click the Resellers component to display its
Basic settings view. -
Select Repository from the Property type list and select the component’s
schema, resellers in this scenario, from the [Repository Content] dialog box. The rest fields
are automatically filled.Note
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see
Talend Studio User Guide. -
Double-click the tMap component to open
the Map Editor.Note that the schema of the new input component is already added in the
Input area. -
Create a join between the main input flow and the new input flow by
dropping the ID_Reseller column of the
Cars_data table to the
ID_Reseller column of the
Resellers_data table. -
Click the tMap settings button at the top
of the Resellers_data table and set Join Model to Inner
Join. -
Drag all the columns except ID_Reseller of the
Resellers_data table to the main output table,
Insured.Note
When two inner joins are defined, you either need to define two
different inner join reject tables to differentiate the two rejections
or, if there is only one inner join reject output, both inner join
rejections will be stored in the same output. -
Click the [+] button at the top of the
output area to add a new output table, and name this new output table
Reject_ResellerID. -
Drag all the columns of the Cars_data table to the
Reject_ResellerID table. -
Click the tMap settings button and select
Catch lookup inner join reject to
true to define this new output table as
an inner join reject output.If the defined inner join cannot be established, the information about
the relevant cars will be gathered through this output flow. -
Now apply filters on the two Inner Join reject outputs, in order for to
distinguish the two types of rejection.In the first Inner Join output table, Reject_OwnerID,
click the plus arrow button to add a filter line and fill it with the
following formula to gather only owner ID related rejection:
Owners_data.ID_Owner==null
-
In the second Inner Join output table,
Reject_ResellerID, repeat the same operation using
the following formula:Resellers_data.ID_Reseller==null
Click OK to validate the map settings and
close the Mapper Editor. -
Double-click the No_Reseller_ID component to display
its Basic settings view.Specify the output file path and select the Include
Header check box, and leave the other parameters as they
are. -
To demonstrate the work of the Mapper, in this example, remove reseller
IDs 5 and 8 from the input file
Resellers.csv.
-
Press Ctrl + S to save your Job.
-
Press F6 to run the Job.
The four output files are all created in the specified folder, containing
information as defined. The output file
No_Reseller_ID.csv contains the
cars information related to reseller IDs
5 and 8, which are missing in
the input file Resellers.csv.For examples of how to use dynamic
schemas with tMap, see:
As third advanced use scenario, based on the scenario 2, add a new Input table
containing Insurance details for example.
Set up an Inner Join between two lookup input tables (Owners and Insurance) in the
Mapper to create a cascade lookup and hence retrieve Insurance details via the Owners
table data.
This scenario introduces a Job that allows you to find BMW owners who have two to six
children (inclusive), for sales promotion purpose for example.
-
Drop three tFileInputDelimited
components, a tMap component, and two
tFileOutputDelimited components from
the Palette onto the design workspace, and
label them to best describe their functions. -
Connect the input components to the tMap
using Row > Main connections.Pay attention to the file you connect first as it will automatically be
set as Main flow, and all the other
connections will be Lookup flows. In this
example, the connection for the input component Owners
is the Main flow.
-
Define the properties of each input components in the respective Basic settings view. Define the properties of
Owners. -
Select Repository from the Property type list and select the component’s
schema, owners in this scenario, from the [Repository Content] dialog box. The rest fields
are automatically filled.Note
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further
information regarding metadata creation in the Repository, see
Talend Studio User Guide.In the same way, set the properties of the other input components:
Cars and Resellers. These two
Lookup flows will fill in secondary
(lookup) tables in the input area of the Map
Editor. -
Then double-click the tMap component to
launch the Map Editor and define the
mappings and filters.Set an explicit join between the Main
flow Owner and the Lookup flow Cars by dropping the
ID_Owner column of the Owners
table to the ID_Owner column of the
Cars table.The explicit join is displayed along with a hash key.
-
In the Expr. Key field of the
Make column, type in a filter. In this use case,
simply type in"BMW"
as the search is focused on the owners of
this particular make. -
Implement a cascading join between the two lookup tables
Cars and Resellers on the
ID_Reseller column in order to retrieve resellers
information. -
As you want to reject the null values into a separate table and exclude
them from the standard output, click the tMap
settings button and set Join
Model to Inner Join in each
of the Lookup tables. -
In the tMap settings, you can set Match
Model to Unique match,
First match, or All matches. In this use case, the All
matches option is selected. Thus if several matches are found
in the Inner Join, rows matching the explicit join as well as the filter,
all of them will be added to the output flow (either in rejection or the
regular output).Note
The Unique match option functions as
a Last match. The First match and All
matches options function as named. -
On the output area of the Map Editor,
click the plus button to add two tables, one for the full matches and the
other for the rejections. -
Drag all the columns of the Owners table, the
Registration, Make and
Color columns of the Cars
table, and the ID_Reseller and
Name_Reseller columns of the
Resellers table to the main output table. -
Drag all the columns of the Owners table to the
reject output table. -
Click the Filter button at the top of the
main output table to display the Filter
expression area.Type in a filter statement to narrow down the number of rows loaded in the
main output flow. In this use case, the statement reads:
Owners.Children_Nr >= 2 && Owners.Children_Nr <=
.
6 -
In the reject output table, click the tMap
settings button and set the reject types.Set Catch output reject to true to collect data about BMW car owners who
have less than two or more than six children.Set Catch lookup inner join reject to
true to collect data about owners of
other car makes and owners for whom the reseller information is not
found.Click OK to validate the mappings and
close the Map Editor.On the design workspace, right-click the tMap and pull the respective output link to the relevant
output components. -
Define the properties of the output components in their respective
Basic settings view.In this use case, simple specify the output file paths and select the
Include Header check box, and leave the
other parameters as they are.
-
Press Ctrl + S to save your Job.
-
Press F6 to run it.
The main output file contains the information related to BMW owners who
have two to six children, and the reject output file contains the
information about the rest of the car owners.For examples of how to use dynamic
schemas with tMap, see:
This scenario is a modified version of the preceding scenario. It describes a Job that
applies filters to limit the search to BMW and Mercedes owners who have two to six
children and divides unmatched data into different reject output flows.
-
Take the same Job as in Scenario 4: Advanced mapping using filters, explicit joins and rejections.
-
Drop a new tFileOutputDelimited component
from the Palette on the design workspace,
and name it Rejects_BMW_Mercedes to present its
functionality. -
Connect the tMap component to the new
output component using a Row connection and
label the connection according to the functionality of the output component.This connection label will appear as the name of the new output table in
the Map Editor. -
Relabel the existing output connections and output components to reflect
their functionality.The existing output tables in the Map
Editor will be automatically renamed according to the
connection labels. In this example, relabel the existing output connections
BMW_Mercedes_withChildren and
Owners_Other_Makes respectively.
-
Double-click the tMap component to launch
the Map Editor to change the mappings and
the filters.Note that the output area contains a new, empty output table named
Rejects_BMW_Mercedes. You can adjust the position
of the table by selecting it and clicking the Up or Down arrow button at
the top of the output area. -
Remove the Expr. key filter
(“BMW”) from the Cars table in
the input area. -
Click the Filters button to display the
Filter field, and type in a new filter
to limit the search to BMW or
Mercedes car makes. The statement reads as follows:
Cars.Make.equals("BMW") ||
Cars.Make.equals("Mercedes") -
Select all the columns of the main output table and drop them down to the
new output table.Alternatively, you can also drag the corresponding columns from the
relevant input tables to the new output table. -
Click the tMap settings button at the top
of the new output table and set Catch output
reject to true to collect
data about BMW and Mercedes owners who have less than two or more than six
children. -
In the Owners_Other_Makes table, set Catch lookup inner join reject to true to collect data about owners of other car
makes and owners for whom the reseller information is not found. -
Click OK to validate the mappings and
close the Map Editor. -
Define the properties of the output components in their respective
Basic settings view.In this use case, simple specify the output file paths and select the
Include Header check box, and leave the
other parameters as they are.
-
Press Ctrl + S to save the Job.
-
Press F6 to run it.
The output files contain content of the main output flow shows that the
filtered rows have correctly been passed on.For examples of how to use dynamic
schemas with tMap, see:
The following scenario describes a Job that retrieves people details from a lookup
database, based on a join on the age. The main flow source data is read from a MySQL
database table called people_age that contains people details such
as numeric id, alphanumeric first name and last name and numeric age. The people age is
either 40 or 60. The number of records in this table is intentionally restricted.
The reference or lookup information is also stored in a MySQL database table called
large_data_volume. This lookup table contains a number of
records including the city where people from the main flow have been to. For the sake of
clarity, the number of records is restricted but, in a normal use, the usefulness of the
feature described in the example below is more obvious for very large reference data
volume.
To optimize performance, a database connection component is used in the beginning of
the Job to open the connection to the lookup database table in order not to do that
every time we want to load a row from the lookup table.
An Expression Filter is applied to this lookup source flow, in order to select only
data from people whose age is equal to 60 or 40. This way only the relevant rows from
the lookup database table are loaded for each row from the main flow.
Therefore this Job shows how, from a limited number of main flow rows, the lookup join
can be optimized to load only results matching the expression key.
Note
Generally speaking, as the lookup loading is performed for each main flow row,
this option is mainly interesting when a limited number of rows is processed in the
main flow while a large number of reference rows are to be looked up to.
The join is solved on the age field. Then, using the relevant
loading option in the tMap component editor, the lookup
database information is loaded for each main flow incoming row.
For this Job, the metadata has been prepared for the source and connection components.
For more information on how to set up the DB connection schema metadata, see the
relevant section in the Talend Studio User Guide.
This Job is formed with five components, four database components and a mapping
component.
-
Drop the DB Connection under the Metadata
node of the Repository to the design
workspace. In this example, the source table is called
people_age. -
Select tMysqlInput from the list that
pops up when dropping the component. -
Drop the lookup DB connection table from the Metadata node to the design workspace selecting tMysqlInput from the list that pops up. In this
Job, the lookup is called large_data_volume. -
The same way, drop the DB connection from the Metadata node to the design workspace selecting tMysqlConnection from the list that pops up. This
component creates a permanent connection to the lookup database table in
order not to do that every time we want to load a row from the lookup
table. -
Then pick the tMap component from the
Processing family, and the tMysqlOutput and tMysqlCommit components from the Database family in the Palette
to the right hand side of the editor. -
Now connect all the components together. To do so, right-click the
tMysqlInput component corresponding to
the people table and drag the link towards tMap. -
Release the link over the tMap component,
the main row flow is automatically set up. -
Rename the Main row link to
people, to identify more easily the main flow
data. -
Perform the same operation to connect the lookup table
(large_data_volume) to the tMap component and the tMap
to the tMysqlOutput component. -
A dialog box prompts for a name to the output link. In this example, the
output flow is named: people_mixandmatch. -
Rename also the lookup row connection link to
large_volume, to help identify the reference data
flow. -
Connect tMysqlConnection to tMysqlInput using the trigger link OnSubjobOk.
-
Connect the tMysqlInput component to the
tMysqlCommit component using the
trigger link OnSubjobOk.
-
Double-click the tMap component to open
the graphical mapping editor. -
The Output table (that was created
automatically when you linked the tMap to
the tMySQLOutput will be formed by the
matching rows from the lookup flow (large_data_volume)
and the main flow (people_age).Select the main flow rows that are to be passed on to the output and drag
them over to paste them in the Output table (to the right hand side of the
mapping editor).In this example, the selection from the main flow include the following
fields: id, first_name,
last_Name and age.From the lookup table, the following column is selected:
city.Drop the selected columns from the input tables
(people and large_volume) to
the output table. -
Now set up the join between the main and lookup flows.
Select the age column of the main flow table (on top)
and drag it towards the age column of the lookup flow
table (large_volume in this example).A key icon appears next to the linked expression on the lookup table. The
join is now established. -
Click the tMap settings button, click the
three-dot button corresponding to Lookup
Model, and select the Reload at each
row option from the [Options] dialog box in order to reload the lookup for each
row being processed. -
In the same way, set Match Model to
All matches in the Lookup table, in
order to gather all instances of age matches in the
output flow. -
Now implement the filtering, based on the age column,
in the Lookup table. The GlobalMapKey field
is automatically created when you selected the Reload
at each row option. Indeed you can use this expression to
dynamically filter the reference data in order to load only the relevant
information when joining with the main flow.As mentioned in the introduction of the scenario, the main flow data
contains only people whose age is either 40 or 60. To avoid the pain of
loading all lookup rows, including ages that are different from 40 and 60,
you can use the main flow age as global variable to feed the lookup
filtering. -
Drop the Age column from the main flow table to the
Expr. field of the lookup table. -
Then in the globalMap Key field, put in
the variable name, using the expression. In this example, it reads:
"people.Age"
Click OK to save the mapping setting and
go back to the design workspace. -
To finalize the implementation of the dynamic filtering of the lookup
flow, you need now to add a WHERE clause in the query of the database
input. -
At the end of the Query field, following
theSelect
statement, type in the following WHERE clause:
WHERE AGE
='"+((Integer)globalMap.get("people.Age"))+"'" -
Make sure that the type corresponds to the column used as variable. In
this use case, Age is of Integer
type. And use the variable the way you set in the globalMap key field of the map editor. -
Double-click the tMysqloutput component
to define its properties. -
Select the Use an existing connection
check box to leverage the created DB connection.Define the target table name and relevant DB actions.
-
Press Ctrl + S to save the Job.
-
Click the Run tab at the bottom of the
design workspace, to display the Job execution tab. -
From the Debug Run view, click the
Traces Debug button to view the data
processing progress.For more comfort, you can maximize the Job design view while executing by
simply double-clicking on the Job name tab.The lookup data is reloaded for each of the main flow’s rows,
corresponding to the age constraint. All age matches
are retrieved in the lookup rows and grouped together in the output
flow.Therefore if you check out the data contained in the newly created
people_mixandmatch table, you will find all the
age duplicates corresponding to different
individuals whose age equals to 60 or 40 and the city where they have been
to.For examples of how to use dynamic
schemas with tMap, see:
The following scenario describes a Job that processes reject flows without separating
them from the main flow.
-
In the Repository tree view, click
Metadata > File delimited. Drag and drop the
customers metadata onto the workspace.The customers metadata contains information about
customers, such as their ID, their name or their address, etc.For more information about centralizing metadata, see Talend Studio
User Guide. -
In the dialog box that asks you to choose which component type you want to
use, select tFileInputDelimited and click
OK. -
Drop the states metadata onto the design workspace.
Select the same component in the dialog box and click OK.The states metadata contains the ID of the state,
and its name. -
Drop a tMap and two tLogRow components from the Palette onto the design workspace.
-
Connect the customers component to the tMap, using a Row >
Main connection. -
Connect the states component to the tMap, using a Row >
Main connection. This flow will automatically be defined as
Lookup.
-
Double-click the tMap component to open
the Map Editor.Drop the idState column from the main input table to
the idState column of the lookup table to create a
join.Click the tMap settings button and set
Join Model to Inner Join. -
Click the Property Settings button at the
top of the input area to open the [Property
Settings] dialog box, and clear the Die
on error check box in order to handle the execution errors.The ErrorReject table is automatically
created. -
Select the id, idState,
RegTime and RegisterTime in
the input table and drag them to the ErrorReject table. -
Click the [+] button at the top right of
the editor to add an output table. In the dialog box that opens, select
New output. In the field next to it,
type in the name of the table, out1. Click OK. -
Drag the following columns from the input tables to the
out1 table: id,
CustomerName, idState, and
LabelState.Add two columns, RegTime and
RegisterTime, to the end of the
out1 table and set their date formats:
"dd/MM/yyyy HH:mm"
and"yyyy-MM-dd
respectively.
HH:mm:ss.SSS" -
Click in the Expression field for the
RegTime column, and press Ctrl+Space to display the auto-completion list. Find and
double-clickTalendDate.parseDate
. Change the pattern to
("dd/MM/yyyy HH:mm",row1.RegTime)
. -
Do the same thing for the RegisterTime column, but
change the pattern to("yyyy-MM-dd
.
HH:mm:ss.SSS",row1.RegisterTime) -
Click the [+] button at the top of the
output area to add an output table. In the dialog box that opens, select
Create join table from, choose
Out1, and name it rejectInner.
Click OK. -
Click the tMap settings button and set
Catch lookup inner join reject to
true in order to handle rejects. -
Drag the id, CustomerName, and
idState columns from the input tables to the
corresponding columns of the rejectInner table.Click in the Expression field for the
LabelState column, and type in
"UNKNOWN"
. -
Click in the Expression field for the
RegTime column, press Ctrl+Space, and selectTalendDate.parseDate
. Change the
pattern to("dd/MM/yyyy HH:mm",row1.RegTime)
. -
Click in the Expression field for the
RegisterTime column, press Ctrl+Space, and selectTalendDate.parseDate
, but change
the pattern to("yyyy-MM-dd
.
HH:mm:ss.SSS",row1.RegisterTime)If the data from row1 has a wrong pattern, it will
be returned by the ErrorReject flow.Click OK to validate the changes and
close the editor. -
Double-click the first tLogRow component
to display its Component view.Click Sync columns to retrieve the
schema structure from the mapper if needed.In the Mode area, select Table.
Do the same thing with the second tLogRow.
-
Press Ctrl + S to save your Job.
-
Press F6 to execute it.
The Run console displays the main out
flow and the ErrorReject flow. The main output flow unites both valid data
and inner join rejects, while the ErrorReject flow contains the error
information about rows with unparseable date formats.For examples of how to use dynamic
schemas with tMap, see: