August 16, 2023

tJavaRow – Docs for ESB 6.x

tJavaRow

Provides a code editor that lets you enter the Java code to be applied to each row
of the flow.

tJavaRow allows you to enter customized
code which you can integrate in a Talend program.

Depending on the Talend solution you
are using, this component can be used in one, some or all of the following Job
frameworks:

tJavaRow Standard properties

These properties are used to configure tJavaRow running in the Standard Job framework.

The Standard
tJavaRow component belongs to the Custom Code family.

The component in this framework is generally available.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

This component offers the
advantage of the dynamic schema feature. This allows you to retrieve unknown columns
from source files or to copy batches of columns from a source without mapping each
column individually. For further information about dynamic schemas, see
Talend Studio

User Guide.

This dynamic schema
feature is designed for the purpose of retrieving unknown columns of a table and is
recommended to be used for this purpose only; it is not recommended for the use of
creating tables.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

When the schema to be reused has default values that are integers or
functions, ensure that these default values are not enclosed within quotation marks. If
they are, you must remove the quotation marks manually.

You can find more details about how to verify default
values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from
the previous component connected in the Job.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The principle of this mapping is to relate the columns that have the same column name.
Then you can adapt the generated code depending on the actual map you need.

Code

Enter the Java code to be applied to each line of the data
flow.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries used in the Code field of the Basic settings
view.

tStatCatcher Statistics

Select this check box to collect the log data at a component
level..

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the entire piece of code manually, that is to
say ((Integer)globalMap.get(“tFileRowCount_COUNT”)).

Usage

Usage rule

This component is used as an intermediary between two other
components. It must be linked to both an input and an output
component.

Function

tJavaRow allows you to enter
customized code which you can integrate in a Talend programme. With tJavaRow, you can enter the Java code to be
applied to each row of the flow.

Purpose

tJavaRow allows you to broaden the
functionality of
Talend
Jobs, using the
Java language.

Limitation

Knowledge of Java language is necessary.

Scenario: Transforming data
line by line using tJavaRow

In this scenario, the information of a few cities read from an input delimited file is
transformed using Java code through the tJavaRow
component and printed on the console.

use_case-tjavarow1-1.png

Setting up the Job

  1. Drop a tFileInputDelimited component and
    a tJavaRow component from the Palette onto the design workspace, and label them
    to better identify their roles in the Job.
  2. Connect the two components using a Row >
    Main connection.

Configuring the components

  1. Double-click the tFileInputDelimited
    component to display its Basic settings
    view in the Component tab.

    use_case-tjavarow1-2.png

  2. In the File name/Stream field, type in
    the path to the input file in double quotation marks, or browse to the path
    by clicking the […] button, and define
    the first line of the file as the header.

    In this example, the input file has the following content:
  3. Click the […] button next to Edit
    schema to open the [Schema] dialog box, and
    define the data structure of the input file. Then, click OK to validate the schema setting and close the
    dialog box.

    use_case-tjavarow1-3.png

  4. Double-click the tJavaRow component to
    display its Basic settings view in the
    Component tab.

    use_case-tjavarow1-4.png

  5. Click Sync columns to make sure that the
    schema is correctly retrieved from the preceding component.
  6. In the Code field, enter the code to be
    applied on each line of data based on the defined schema columns.

    In this example, we want to transform the city names to upper case, group
    digits of numbers larger than 1000 using the thousands separator for ease of
    reading, and print the data on the console:
    Note:

    In the Code field, input_row refers to the link that connects to
    tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Press F6 or click Run on the Run tab to
    execute the Job.

    The city information is transformed by the Java code set through tJavaRow and displayed on the console.
    use_case-tjavarow1-5.png

Scenario: Using tJavaRow to handle file content based on a dynamic schema

This scenario applies only to a subscription-based Talend solution.

This scenario describes a three-component Job that uses Java code through a tJavaRow component to display the content of an input file
and pass it to the output component. As all the components in this Job support the
dynamic schema feature, we can leverage this feature to save the time of configuring
each column of the schema.

Setting up the Job

  1. Drop tFileInputDelimited, tJavaRow, and tFileOutputDelimited from the Palette onto the design workspace, and label them according
    to their roles in the Job.
  2. Connect the components in a series using Row > Main links.

    use_case-tjavarow2-1.png

Configuring the input and output components

  1. Double-click the tFileInputDelimited
    component, which is labeled Source, to
    display its Basic settings view.

    use_case-tjavarow2-2.png

    Warning:

    The dynamic schema feature is only supported in Built-In mode and requires the input file
    to have a header row.

  2. In the File name/Stream field, type in
    the path to the input file in double quotation marks, or browse to the path
    by clicking the […] button.
  3. In the Header field, type in 1 to define the first line of the file as the
    header.
  4. Click the […] button next to Edit schema to open the [Schema] dialog box.

    use_case-tjavarow2-3.png

  5. Click the [+] button to add a column,
    give a name to the column, dyna in this
    example, and select Dynamic from the
    Type list. This dynamic column will
    retrieve the three columns, FirstName,
    LastName and Address, of the input file.
  6. Click OK to validate the setting and
    close the dialog box.
  7. Double-click the tFileOutputDelimited
    component, which is labeled Target, to
    display its Basic settings view.

    use_case-tjavarow2-4.png

  8. Define the output file path in the File
    Name
    field.
  9. Select the Include Header check box to
    include the header in the output file. Leave all the other settings are they
    are.

Configuring the tJavaRow component

  1. Double-click tJavaRow to display its
    Basic settings view and define the
    components properties.

    use_case-tjavarow2-5.png

  2. Click Sync columns to make sure that the
    schema is correctly retrieved from the preceding component.
  3. In the Code field, enter the following
    code to display the content of the input file and pass the data to the next
    component based on the defined dynamic schema column:

    Note:

    In the Code field, input_row and output_row correspond to the links to and from tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Pressing F6, or click Run on the Run
    tab to execute the Job.

    The content of the input file is displayed on the console and written to
    the output file.
    use_case-tjavarow2-6.png

tJavaRow properties for Apache Spark Batch

These properties are used to configure tJavaRow running in the Spark Batch Job framework.

The Spark Batch
tJavaRow component belongs to the Custom Code family.

The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

When the schema to be reused has default values that are integers or
functions, ensure that these default values are not enclosed within quotation marks. If
they are, you must remove the quotation marks manually.

You can find more details about how to verify default
values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from
the previous component connected in the Job.

Note that the input schema and the output schema of this component can
be different.

Map type

Select the type of the Map transformation you need to write. This
allows the component to automatically select the method accordingly and
declare the variables to be used in your custom code.

The available types are:

  • Map: it returns only one
    output record for each input record. It uses Spark’s
    PairFunction
    method

  • FlatMap: it returns 0 or
    more output records for each input record. It uses Spark’s
    FlatMapFunction
    method.

For further information about these methods, see Apache Spark’s
documentation about its Java API in https://spark.apache.org/docs/latest/api/java/index.html.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The generated sample code shows what the pre-defined variables are for the input and the
output RDDs and how these variables can be used.

Code

Write the custom body of the method you have selected from the
Map type drop-down list. You need
to use the input schema and the output schema to manage the columns of
the input and the output RDD records. This custom code is applied on a
row-by-row basis in the RDD records.

For example, the input schema contains a user column, then you need to use the input.user variable to get the user column of each input record.

For further information about the available variables in writing the
custom code, see the default comment displayed in this field.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries used in the Code field of the Basic settings
view.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the entire piece of code manually, that is to
say ((Integer)globalMap.get(“tFileRowCount_COUNT”)).

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Batch component Palette it belongs to, appears only
when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs.

Spark Connection

You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Limitation

Knowledge of Spark and Java language is necessary.

Related scenarios

No scenario is available for the Spark Batch version of this component
yet.

tJavaRow properties for Apache Spark Streaming

These properties are used to configure tJavaRow running in the Spark Streaming Job framework.

The Spark Streaming
tJavaRow component belongs to the Custom Code family.

The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

When the schema to be reused has default values that are integers or
functions, ensure that these default values are not enclosed within quotation marks. If
they are, you must remove the quotation marks manually.

You can find more details about how to verify default
values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from
the previous component connected in the Job.

Note that the input schema and the output schema of this component can
be different.

Map type

Select the type of the Map transformation you need to write. This
allows the component to automatically select the method accordingly and
declare the variables to be used in your custom code.

The available types are:

  • Map: it returns only one
    output record for each input record. It uses Spark’s
    PairFunction
    method

  • FlatMap: it returns 0 or
    more output records for each input record. It uses Spark’s
    FlatMapFunction
    method.

For further information about these methods, see Apache Spark’s
documentation about its Java API in https://spark.apache.org/docs/latest/api/java/index.html.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The generated sample code shows what the pre-defined variables are for the input and the
output RDDs and how these variables can be used.

Code

Write the custom body of the method you have selected from the
Map type drop-down list. You need
to use the input schema and the output schema to manage the columns of
the input and the output RDD records. This custom code is applied on a
row-by-row basis in the RDD records.

For example, the input schema contains a user column, then you need to use the input.user variable to get the user column of each input record.

For further information about the available variables in writing the
custom code, see the default comment displayed in this field.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries used in the Code field of the Basic settings
view.

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Streaming component Palette it belongs to, appears
only when you are creating a Spark Streaming Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional
Talend
data
integration Jobs.

Spark Connection

You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Limitation

Knowledge of Spark and Java language is necessary.

Related scenarios

No scenario is available for the Spark Streaming version of this component
yet.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x