August 15, 2023

tFileOutputJSON – Docs for ESB 6.x

tFileOutputJSON

Receives data and rewrites it in a JSON structured data block in an output
file.

Depending on the Talend solution you
are using, this component can be used in one, some or all of the following Job
frameworks:

tFileOutputJSON Standard properties

These properties are used to configure tFileOutputJSON running in the Standard Job framework.

The Standard
tFileOutputJSON component belongs to the File family.

The component in this framework is generally available.

Basic settings

File Name

Name and path of the output file.

Generate an array json

Select this check box to generate an array JSON file.

Name of data block

Enter a name for the data block to be written, between double
quotation marks.

This field disappears when the Generate an
array json
check box is selected.

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Sync columns

Click to synchronize the output file schema with the input file
schema. The Sync function only displays once the Row connection is
linked with the Output component.

Advanced settings

Create directory if not exists

This check box is selected by default. This option creates the
directory that will hold the output files if it does not already
exist.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

Use this component to rewrite received data in a JSON structured
output file.

Scenario: Writing a JSON structured file

This is a 2 component scenario in which a
tRowGenerator
component generates random data which a tFileOutputJSON component then writes to a JSON structured
output file.

Use_Case_tFileOutputJSON1.png

Procedure

  1. Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the
    Palette.
  2. Link the components using a Row > Main
    connection.
  3. Double click tRowGenerator to define
    its Basic Settings properties in the
    Component view.

    Use_Case_tFileOutputJSON2.png

  4. Click […] next to Edit Schema to display the corresponding dialog box and define
    the schema.

    Use_Case_tFileOutputJSON3.png

  5. Click [+] to add the number of columns
    desired.
  6. Under Columns type in the column
    names.
  7. Under Type, select the data type from the
    list.
  8. Click OK to close the dialog box.
  9. Click [+] next to RowGenerator Editor to open the corresponding dialog box.

    Use_Case_tFileOutputJSON4.png

  10. Under Functions, select pre-defined functions
    for the columns, if required, or select […]
    to set customized function parameters in the Function
    parameters
    tab.
  11. Enter the number of rows to be generated in the corresponding field.
  12. Click OK to close the dialog box.
  13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.

    Use_Case_tFileOutputJSON5.png

  14. Click […] to browse to where you want the
    output JSON file to be generated and enter the file name.
  15. Enter a name for the data block to be generated in the corresponding field,
    between double quotation marks.
  16. Select Built-In as the Schema type.
  17. Click Sync Columns to retrieve the schema
    from the preceding component.
  18. Press F6 to run the Job.

    Use_Case_tFileOutputJSON6.png

The data from the input schema is written in a JSON structured data block in the
output file.

tFileOutputJSON MapReduce properties

These properties are used to configure tFileOutputJSON running in the MapReduce Job framework.

The MapReduce
tFileOutputJSON component belongs to the MapReduce family.

The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Folder

Enter the folder on HDFS where you want to store the JSON output
file(s).

The folder will be created automatically if it does not
exist.

Note that you need
to ensure you have properly configured the connection to the Hadoop
distribution to be used in the Hadoop
configuration
tab in the Run view.

Output type

Select the structure for the JSON output file(s).

  • All in one block: the
    received data will be written into one data block.

  • One row per record: the
    received data will be written into separate data blocks row
    by row.

Name of data block

Type in the name of the data block for the JSON output
file(s).

This field will be available only if you select All in one block from the Output type list.

Action

Select the action that you want to perform on the data:

  • Overwrite: the data on
    HDFS will be overwritten if it already exists.

  • Create: the data will be
    created.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

Use this component to rewrite received data in a JSON structured output file.

In a
Talend
Map/Reduce Job, it is used as an end component and requires
a transformation component as input link. The other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

Once a Map/Reduce Job is opened in the workspace, tFileOutputJSON as well as the MapReduce
family appears in the Palette of
the Studio.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

This connection is effective on a per-Job basis.

Prerequisites

The Hadoop distribution must be properly installed, so as to guarantee the interaction
with
Talend Studio
. The following list presents MapR related information for
example.

  • Ensure that you have installed the MapR client in the machine where the Studio is,
    and added the MapR client library to the PATH variable of that machine. According
    to MapR’s documentation, the library or libraries of a MapR client corresponding to
    each OS version can be found under MAPR_INSTALL
    hadoophadoop-VERSIONlib
    ative
    . For example, the library for
    Windows is lib
    ativeMapRClient.dll
    in the MapR
    client jar file. For further information, see the following link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr.

    Without adding the specified library or libraries, you may encounter the following
    error: no MapRClient in java.library.path.

  • Set the -Djava.library.path argument, for example, in the Job Run VM arguments area
    of the Run/Debug view in the [Preferences] dialog box in the Window menu. This argument provides to the Studio the path to the
    native library of that MapR client. This allows the subscription-based users to make
    full use of the Data viewer to view locally in the
    Studio the data stored in MapR.

For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.

Related scenarios

No scenario is available for the Map/Reduce version of this component yet.

tFileOutputJSON properties for Apache Spark Batch

These properties are used to configure tFileOutputJSON running in the Spark Batch Job framework.

The Spark Batch
tFileOutputJSON component belongs to the File family.

The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data.

Basic settings

Define a storage configuration
component

Select the configuration component to be used to provide the configuration
information for the connection to the target file system such as HDFS.

If you leave this check box clear, the target file system is the local
system.

The configuration component to be used must be present in the same Job. For
example, if you have dropped a tHDFSConfiguration component in the Job, you can select it to write
the result in a given HDFS system.

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Folder

Browse to, or enter the path pointing to the data to be used in the file system.

Note that this path must point to a folder rather than a file.

The button for browsing does not work with the Spark Local mode; if you are using the Spark Yarn or the Spark Standalone mode,
ensure that you have properly configured the connection in a configuration component in
the same Job, such as tHDFSConfiguration.

Output type

Select the structure for the JSON output file(s).

  • All in one block: the
    received data will be written into one data block.

  • One row per record: the
    received data will be written into separate data blocks row
    by row.

Name of data block

Type in the name of the data block for the JSON output
file(s).

This field will be available only if you select All in one block from the Output type list.

Action

Select the action that you want to perform on the data:

  • Overwrite: the data on
    HDFS will be overwritten if it already exists.

  • Create: the data will be
    created.

Usage

Usage rule

This component is used as an end component and requires an input link.

This component, along with the Spark Batch component Palette it belongs to, appears only
when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs.

Spark Connection

You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Batch version of this component
yet.

tFileOutputJSON properties for Apache Spark Streaming

These properties are used to configure tFileOutputJSON running in the Spark Streaming Job framework.

The Spark Streaming
tFileOutputJSON component belongs to the File family.

The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.

Basic settings

Define a storage configuration
component

Select the configuration component to be used to provide the configuration
information for the connection to the target file system such as HDFS.

If you leave this check box clear, the target file system is the local
system.

The configuration component to be used must be present in the same Job. For
example, if you have dropped a tHDFSConfiguration component in the Job, you can select it to write
the result in a given HDFS system.

Schema and Edit
Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Folder

Browse to, or enter the path pointing to the data to be used in the file system.

Note that this path must point to a folder rather than a file.

The button for browsing does not work with the Spark Local mode; if you are using the Spark Yarn or the Spark Standalone mode,
ensure that you have properly configured the connection in a configuration component in
the same Job, such as tHDFSConfiguration.

Output type

Select the structure for the JSON output file(s).

  • All in one block: the
    received data will be written into one data block.

  • One row per record: the
    received data will be written into separate data blocks row
    by row.

Name of data block

Type in the name of the data block for the JSON output
file(s).

This field will be available only if you select All in one block from the Output type list.

Action

Select the action that you want to perform on the data:

  • Overwrite: the data on
    HDFS will be overwritten if it already exists.

  • Create: the data will be
    created.

Usage

Usage rule

This component is used as an end component and requires an input link.

This component, along with the Spark Streaming component Palette it belongs to, appears
only when you are creating a Spark Streaming Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional
Talend
data
integration Jobs.

Spark Connection

You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Streaming version of this component
yet.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x