July 30, 2023

tJavaRow – Docs for ESB 7.x

tJavaRow

Provides a code editor that lets you enter the Java code to be applied to each row
of the flow.

tJavaRow allows you to enter customized
code which you can integrate in a Talend program.

Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:

tJavaRow Standard properties

These properties are used to configure tJavaRow running in the Standard Job framework.

The Standard
tJavaRow component belongs to the Custom Code family.

The component in this framework is available in all Talend
products
.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

 

Built-In: You create and store the schema locally for this component
only.

 

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

When the schema to be reused has default values that are
integers or functions, ensure that these default values are not enclosed within
quotation marks. If they are, you must remove the quotation marks manually.

You can find more details about how to
verify default values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Click Sync
columns
to retrieve the schema from the previous component connected in the
Job.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The principle of this mapping is to relate the columns that have the same column name.
Then you can adapt the generated code depending on the actual map you need.

Code

Enter the Java code to be applied to each line of the data
flow.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries
used in the Code field of the Basic settings view.

tStatCatcher Statistics

Select this check box to collect the log data at a component
level..

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the entire piece of code manually, that is to
say ((Integer)globalMap.get(“tFileRowCount_COUNT”)).

Usage

Usage rule

This component is used as an intermediary between two other
components. It must be linked to both an input and an output
component.

Function

tJavaRow allows you to enter
customized code which you can integrate in a Talend programme. With tJavaRow, you can enter the Java code to be
applied to each row of the flow.

Purpose

tJavaRow allows you to broaden the
functionality of
Talend
Jobs, using the
Java language.

Limitation

Knowledge of Java language is necessary.

Transforming data
line by line using tJavaRow

In this scenario, the information of a few cities read from an input delimited file is
transformed using Java code through the tJavaRow
component and printed on the console.

tJavaRow_1.png

Setting up the Job

  1. Drop a tFileInputDelimited component and
    a tJavaRow component from the Palette onto the design workspace, and label them
    to better identify their roles in the Job.
  2. Connect the two components using a Row >
    Main connection.

Configuring the components

  1. Double-click the tFileInputDelimited
    component to display its Basic settings
    view in the Component tab.

    tJavaRow_2.png

  2. In the File name/Stream field, type in
    the path to the input file in double quotation marks, or browse to the path
    by clicking the […] button, and define
    the first line of the file as the header.

    In this example, the input file has the following content:
  3. Click the […] button next to Edit
    schema to open the Schema dialog box, and
    define the data structure of the input file. Then, click OK to validate the schema setting and close the
    dialog box.

    tJavaRow_3.png

  4. Double-click the tJavaRow component to
    display its Basic settings view in the
    Component tab.

    tJavaRow_4.png

  5. Click Sync columns to make sure that the
    schema is correctly retrieved from the preceding component.
  6. In the Code field, enter the code to be
    applied on each line of data based on the defined schema columns.

    In this example, we want to transform the city names to upper case, group
    digits of numbers larger than 1000 using the thousands separator for ease of
    reading, and print the data on the console:
    Note:

    In the Code field, input_row refers to the link that connects to
    tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Press F6 or click Run on the Run tab to
    execute the Job.

    The city information is transformed by the Java code set through tJavaRow and displayed on the console.
    tJavaRow_5.png

Using tJavaRow to handle file content based on a dynamic schema

This scenario applies only to subscription-based Talend products.

This scenario describes a three-component Job that uses Java code through a tJavaRow component to display the content of an input file
and pass it to the output component. As all the components in this Job support the
dynamic schema feature, we can leverage this feature to save the time of configuring
each column of the schema.

Setting up the Job

  1. Drop tFileInputDelimited, tJavaRow, and tFileOutputDelimited from the Palette onto the design workspace, and label them according
    to their roles in the Job.
  2. Connect the components in a series using Row > Main links.

    tJavaRow_6.png

Configuring the input and output components

  1. Double-click the tFileInputDelimited
    component, which is labeled Source, to
    display its Basic settings view.

    tJavaRow_7.png

    Warning:

    The dynamic schema feature is only supported in Built-In mode and requires the input file
    to have a header row.

  2. In the File name/Stream field, type in
    the path to the input file in double quotation marks, or browse to the path
    by clicking the […] button.
  3. In the Header field, type in 1 to define the first line of the file as the
    header.
  4. Click the […] button next to Edit schema to open the Schema dialog box.

    tJavaRow_8.png

  5. Click the [+] button to add a column,
    give a name to the column, dyna in this
    example, and select Dynamic from the
    Type list. This dynamic column will
    retrieve the three columns, FirstName,
    LastName and Address, of the input file.
  6. Click OK to validate the setting and
    close the dialog box.
  7. Double-click the tFileOutputDelimited
    component, which is labeled Target, to
    display its Basic settings view.

    tJavaRow_9.png

  8. Define the output file path in the File
    Name
    field.
  9. Select the Include Header check box to
    include the header in the output file. Leave all the other settings are they
    are.

Configuring the tJavaRow component

  1. Double-click tJavaRow to display its
    Basic settings view and define the
    components properties.

    tJavaRow_10.png

  2. Click Sync columns to make sure that the
    schema is correctly retrieved from the preceding component.
  3. In the Code field, enter the following
    code to display the content of the input file and pass the data to the next
    component based on the defined dynamic schema column:

    Note:

    In the Code field, input_row and output_row correspond to the links to and from tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Pressing F6, or click Run on the Run
    tab to execute the Job.

    The content of the input file is displayed on the console and written to
    the output file.
    tJavaRow_11.png

tJavaRow properties for Apache Spark Batch

These properties are used to configure tJavaRow running in the Spark Batch Job framework.

The Spark Batch
tJavaRow component belongs to the Custom Code family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

 

Built-In: You create and store the schema locally for this component
only.

 

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

When the schema to be reused has default values that are
integers or functions, ensure that these default values are not enclosed within
quotation marks. If they are, you must remove the quotation marks manually.

You can find more details about how to
verify default values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Click Sync
columns
to retrieve the schema from the previous component connected in the
Job.

Note that the input schema and the output schema of this component can
be different.

Map type

Select the type of the Map transformation you need to write. This
allows the component to automatically select the method accordingly and
declare the variables to be used in your custom code.

The available types are:

  • Map: it returns only one
    output record for each input record. It uses Spark’s
    PairFunction
    method

  • FlatMap: it returns 0 or
    more output records for each input record. It uses Spark’s
    FlatMapFunction
    method.

For further information about these methods, see Apache Spark’s
documentation about its Java API in https://spark.apache.org/docs/latest/api/java/index.html.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The generated sample code shows what the pre-defined variables are for the input and the
output RDDs and how these variables can be used.

Code

Write the custom body of the method you have selected from the
Map type drop-down list. You need
to use the input schema and the output schema to manage the columns of
the input and the output RDD records. This custom code is applied on a
row-by-row basis in the RDD records.

For example, the input schema contains a user column, then you need to use the input.user variable to get the user column of each input record.

For further information about the available variables in writing the
custom code, see the default comment displayed in this field.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries
used in the Code field of the Basic settings view.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the entire piece of code manually, that is to
say ((Integer)globalMap.get(“tFileRowCount_COUNT”)).

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Batch component Palette it belongs to,
appears only when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise explicitly stated, a
scenario presents only Standard Jobs, that is to
say traditional
Talend
data integration Jobs.

Spark Connection

In the Spark
Configuration
tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:

  • Yarn mode (Yarn client or Yarn cluster):

    • When using Google Dataproc, specify a bucket in the
      Google Storage staging bucket
      field in the Spark configuration
      tab.

    • When using HDInsight, specify the blob to be used for Job
      deployment in the Windows Azure Storage
      configuration
      area in the Spark
      configuration
      tab.

    • When using Altus, specify the S3 bucket or the Azure
      Data Lake Storage for Job deployment in the Spark
      configuration
      tab.
    • When using Qubole, add a
      tS3Configuration to your Job to write
      your actual business data in the S3 system with Qubole. Without
      tS3Configuration, this business data is
      written in the Qubole HDFS system and destroyed once you shut
      down your cluster.
    • When using on-premise
      distributions, use the configuration component corresponding
      to the file system your cluster is using. Typically, this
      system is HDFS and so use tHDFSConfiguration.

  • Standalone mode: use the
    configuration component corresponding to the file system your cluster is
    using, such as tHDFSConfiguration or
    tS3Configuration.

    If you are using Databricks without any configuration component present
    in your Job, your business data is written directly in DBFS (Databricks
    Filesystem).

This connection is effective on a per-Job basis.

Limitation

Knowledge of Spark and Java language is necessary.

Related scenarios

No scenario is available for the Spark Batch version of this component
yet.

tJavaRow properties for Apache Spark Streaming

These properties are used to configure tJavaRow running in the Spark Streaming Job framework.

The Spark Streaming
tJavaRow component belongs to the Custom Code family.

This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

 

Built-In: You create and store the schema locally for this component
only.

 

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

When the schema to be reused has default values that are
integers or functions, ensure that these default values are not enclosed within
quotation marks. If they are, you must remove the quotation marks manually.

You can find more details about how to
verify default values in retrieved schema in Talend Help Center (https://help.talend.com).

 

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Click Sync
columns
to retrieve the schema from the previous component connected in the
Job.

Note that the input schema and the output schema of this component can
be different.

Map type

Select the type of the Map transformation you need to write. This
allows the component to automatically select the method accordingly and
declare the variables to be used in your custom code.

The available types are:

  • Map: it returns only one
    output record for each input record. It uses Spark’s
    PairFunction
    method

  • FlatMap: it returns 0 or
    more output records for each input record. It uses Spark’s
    FlatMapFunction
    method.

For further information about these methods, see Apache Spark’s
documentation about its Java API in https://spark.apache.org/docs/latest/api/java/index.html.

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output
schema. This generation does not change anything in your schema.

The generated sample code shows what the pre-defined variables are for the input and the
output RDDs and how these variables can be used.

Code

Write the custom body of the method you have selected from the
Map type drop-down list. You need
to use the input schema and the output schema to manage the columns of
the input and the output RDD records. This custom code is applied on a
row-by-row basis in the RDD records.

For example, the input schema contains a user column, then you need to use the input.user variable to get the user column of each input record.

For further information about the available variables in writing the
custom code, see the default comment displayed in this field.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries
used in the Code field of the Basic settings view.

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Streaming component Palette it belongs to, appears
only when you are creating a Spark Streaming Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional
Talend
data
integration Jobs.

Spark Connection

In the Spark
Configuration
tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:

  • Yarn mode (Yarn client or Yarn cluster):

    • When using Google Dataproc, specify a bucket in the
      Google Storage staging bucket
      field in the Spark configuration
      tab.

    • When using HDInsight, specify the blob to be used for Job
      deployment in the Windows Azure Storage
      configuration
      area in the Spark
      configuration
      tab.

    • When using Altus, specify the S3 bucket or the Azure
      Data Lake Storage for Job deployment in the Spark
      configuration
      tab.
    • When using Qubole, add a
      tS3Configuration to your Job to write
      your actual business data in the S3 system with Qubole. Without
      tS3Configuration, this business data is
      written in the Qubole HDFS system and destroyed once you shut
      down your cluster.
    • When using on-premise
      distributions, use the configuration component corresponding
      to the file system your cluster is using. Typically, this
      system is HDFS and so use tHDFSConfiguration.

  • Standalone mode: use the
    configuration component corresponding to the file system your cluster is
    using, such as tHDFSConfiguration or
    tS3Configuration.

    If you are using Databricks without any configuration component present
    in your Job, your business data is written directly in DBFS (Databricks
    Filesystem).

This connection is effective on a per-Job basis.

Limitation

Knowledge of Spark and Java language is necessary.

Related scenarios

No scenario is available for the Spark Streaming version of this component
yet.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x