tReplicate
Duplicates the incoming schema into two identical output flows.
This component performs different operations on the same
schema.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tReplicate Standard properties.
The component in this framework is available in all Talend
products. -
MapReduce: see tReplicate MapReduce properties (deprecated).
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Batch: see tReplicate properties for Apache Spark Batch.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming: see tReplicate properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
-
Storm: see tReplicate Storm properties (deprecated).
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tReplicate Standard properties
These properties are used to configure tReplicate running in the Standard Job framework.
The Standard
tReplicate component belongs to the Orchestration family.
The component in this framework is available in all Talend
products.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve This This |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is not startable (green background), it requires an |
Connections |
Outgoing links (from this component to another): Row: Main.
Trigger: Run if; On Component Ok; Incoming links (from one component to this one): Row: Main; Reject; For further information regarding connections, see |
Replicating a flow and sorting two identical flows respectively
The scenario describes a Job that reads an input flow which contains names and states
from a CSV file, replicates the input flow, then sorts the two identical flows based on
name and state respectively, and displays the sorted data on the console.

Setting up the Job
- Drop the following components from the Palette to the design workspace: one tFileInputDelimited component, one tReplicate component, two tSortRow components, and two tLogRow components.
- Connect tFileInputDelimited to tReplicate using a Row > Main link.
-
Repeat the step above to connect tReplicate to two tSortRow
components respectively and connect tSortRow to tLogRow.
- Label the components to better identify their functions.
Configuring the components
-
Double-click the tFileInputDelimited
component to open its Basic settings view
in the Component tab. -
Click the […] button next to the
File name/Stream field to browse to the
file from which you want to read the input flow. In this example, the input
file is Names&States.csv, which
contains two columns: name and state.12345678910111213name;stateAndrew Kennedy;MississippiBenjamin Carter;LouisianaBenjamin Monroe;West VirginiaBill Harrison;TennesseeCalvin Grant;VirginiaChester Harrison;Rhode IslandChester Hoover;KansasChester Kennedy;MarylandChester Polk;IndianaDwight Nixon;NevadaDwight Roosevelt;MississippiFranklin Grant;Nebraska -
Fill in the Header, Footer and Limit fields
according to your needs. In this example, type in 1 in the Header field to
skip the first row of the input file. -
Click Edit schema to define the data
structure of the input flow. -
Double-click the first tSortRow component
to open its Basic settings view. -
In the Criteria panel, click the
[+] button to add one row and set the
sorting parameters for the schema column to be processed. To sort the input
data by name, select name under Schema column. Select alpha as the sorting type and asc as the sorting order.For more information about those parameters, see tSortRow Standard properties. -
Double-click the second tSortRow
component and repeat the step above to define the sorting parameters for the
state column. -
In the Basic settings view of each
tLogRow component, select Table in the Mode area for a better view of the Job execution
result.
Saving and executing the Job
- Press Ctrl+S to save your Job.
-
Execute the Job by pressing F6 or
clicking Run on the Run tab.The data sorted by name and state are both displayed on the
console.
tReplicate MapReduce properties (deprecated)
These properties are used to configure tReplicate running in the MapReduce Job framework.
The MapReduce
tReplicate component belongs to the Processing family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve This This |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is not startable, it requires an In a For further information about a Note that in this documentation, unless otherwise |
Connections |
Outgoing links (from this component to another): Row: Main.
Trigger: Run if; On Component Ok; Incoming links (from one component to this one): Row: Main; Reject; For further information regarding connections, see |
Related scenarios
No scenario is available for the Map/Reduce version of this component yet.
tReplicate properties for Apache Spark Batch
These properties are used to configure tReplicate running in the Spark Batch Job framework.
The Spark Batch
tReplicate component belongs to the Processing family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve the |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Cache replicated RDD |
Select this check box to store the replicated RDD in the cache. From the Storage level drop-down list that is displayed, select how the cached RDDs are For further information about each of the storage level, see https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.
tReplicate properties for Apache Spark Streaming
These properties are used to configure tReplicate running in the Spark Streaming Job framework.
The Spark Streaming
tReplicate component belongs to the Processing family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve the |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Cache replicated RDD |
Select this check box to store the replicated RDD in the cache. From the Storage level drop-down list that is displayed, select how the cached RDDs are For further information about each of the storage level, see https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.
tReplicate Storm properties (deprecated)
These properties are used to configure tReplicate running in the Storm Job framework.
The Storm
tReplicate component belongs to the Processing family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
The Storm framework is deprecated from Talend 7.1 onwards. Use Talend Jobs for Apache Spark Streaming to accomplish your Streaming related tasks.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve This This |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Usage
Usage rule |
This component is not startable, it requires an If you have subscribed to one of the The Storm version does not support the use of the global variables. You need to use the Storm Configuration tab in the This connection is effective on a per-Job basis. For further information about a Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Connections |
Outgoing links (from this component to another): Row: Main.
Trigger: Run if; On Component Ok; Incoming links (from one component to this one): Row: Main; Reject; For further information regarding connections, see |
Related scenarios
No scenario is available for the Storm version of this component
yet.