August 17, 2023

tReplicate – Docs for ESB 5.x

tReplicate

tReplicate_icon32_white.png

tReplicate Properties

Component family

Orchestration

 

Function

Duplicate the incoming schema into two identical output
flows.

If you have subscribed to one of the Talend solutions with Big Data, you are
able to use this component in a Talend Map/Reduce or Storm Job. In either
of those contexts, tReplicate
belongs to the Processing component family.

Purpose

Allows you to perform different operations on the same
schema.

Basic settings

Schema and Edit
Schema

A schema is a row description, it defines the number of fields
that will be processed and passed on to the next component. The
schema is either Built-in or remote
in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

Click Sync columns to retrieve
the schema from the previous component in the Job.

This component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see Talend Studio
User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns
of a table and is recommended to be used for this purpose only; it is not recommended
for the use of creating tables.

 

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see Talend Studio User Guide.

 

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and Job designs. Related topic: see
Talend Studio User
Guide
.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is not startable (green background), it requires an
Input component and an output component.

Usage in Map/Reduce Jobs

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Map/Reduce component. In a Talend Map/Reduce Job, this
component is used as an intermediate step and other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

For further information about a Talend Map/Reduce Job, see the sections
describing how to create, convert and configure a Talend Map/Reduce Job of the
Talend Big Data Getting Started Guide.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Usage in Storm Jobs

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Storm component. In a Talend Storm Job, this component is used as
an intermediate step and other components used along with it must be Storm components, too.
They generate native Storm code that can be executed directly in a Storm system.

The Storm version does not support the use of the global variables.

You need to use the Storm Configuration tab in the
Run view to define the connection to a given Storm
system for the whole Job.

This connection is effective on a per-Job basis.

For further information about a Talend Storm Job, see the sections
describing how to create and configure a Talend Storm Job of the Talend Big Data Getting Started Guide.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs.

Connections

Outgoing links (from this component to another):

Row: Main.

Trigger: Run if; On Component Ok;
On Component Error.

Incoming links (from one component to this one):

Row: Main; Reject;

For further information regarding connections, see
Talend Studio User
Guide.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Scenario: Replicating a flow and sorting two identical flows respectively

The scenario describes a Job that reads an input flow which contains names and states
from a CSV file, replicates the input flow, then sorts the two identical flows based on
name and state respectively, and displays the sorted data on the console.

use_case_treplicate.png

Setting up the Job

  1. Drop the following components from the Palette to the design workspace: one tFileInputDelimited component, one tReplicate component, two tSortRow components, and two tLogRow components.

  2. Connect tFileInputDelimited to tReplicate using a Row > Main link.

  3. Repeat the step above to connect tReplicate to two tSortRow
    components respectively and connect tSortRow to tLogRow.

  4. Label the components to better identify their functions.

Configuring the components

  1. Double-click the tFileInputDelimited
    component to open its Basic settings view
    in the Component tab.

    use_case_treplicate1.png
  2. Click the […] button next to the
    File name/Stream field to browse to the
    file from which you want to read the input flow. In this example, the input
    file is Names&States.csv, which
    contains two columns: name and state.

  3. Fill in the Header, Footer and Limit fields
    according to your needs. In this example, type in 1 in the Header field to
    skip the first row of the input file.

  4. Click Edit schema to define the data
    structure of the input flow.

    use_case_treplicate2.png
  5. Double-click the first tSortRow component
    to open its Basic settings view.

    use_case_treplicate3.png
  6. In the Criteria panel, click the
    [+] button to add one row and set the
    sorting parameters for the schema column to be processed. To sort the input
    data by name, select name under Schema column. Select alpha as the sorting type and asc as the sorting order.

    For more information about those parameters, see tSortRow properties.

  7. Double-click the second tSortRow
    component and repeat the step above to define the sorting parameters for the
    state column.

    use_case_treplicate4.png
  8. In the Basic settings view of each
    tLogRow component, select Table in the Mode area for a better view of the Job execution
    result.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or
    clicking Run on the Run tab.

    use_case_treplicate5.png

    The data sorted by name and state are both displayed on the
    console.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x