August 17, 2023

tFileOutputJSON – Docs for ESB 5.x

tFileOutputJSON

tFileOutputJSON_icon32_white.png

tFileOutputJSON properties

Component Family

File / Output

 

Function

tFileOutputJSON writes data to a
JSON structured output file.

If you have subscribed to one of the Talend solutions with Big Data, you are
able to use this component in a Talend Map/Reduce Job to generate
Map/Reduce code. For further information, see tFileOutputJSON in Talend
Map/Reduce Jobs
. In that
situation, tFileOutputJSON belongs
to the MapReduce component family.

Purpose

tFileOutputJSON receives data and
rewrites it in a JSON structured data block in an output
file.

Basic settings

File Name

Name and path of the output file.

 

Generate an array json

Select this check box to generate an array JSON file.

 

Name of data block

Enter a name for the data block to be written, between double
quotation marks.

This field disappears when the Generate an
array json
check box is selected.

 

Schema and Edit
Schema

A schema is a row description. It defines the number of fields to
be processed and passed on to the next component. The schema is
either Built-in or stored remotely
in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see Talend Studio User Guide.

 

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and Job flowcharts. Related topic: see
Talend Studio User
Guide
.

 

Sync columns

Click to synchronize the output file schema with the input file
schema. The Sync function only displays once the Row connection is
linked with the Output component.

Advanced settings

Create directory if not exists

This check box is selected by default. This option creates the
directory that will hold the output files if it does not already
exist.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

Use this component to rewrite received data in a JSON structured
output file.

Usage in Map/Reduce Jobs

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Map/Reduce component. In a Talend Map/Reduce Job, this
component is used as an intermediate step and other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

For further information about a Talend Map/Reduce Job, see the sections
describing how to create, convert and configure a Talend Map/Reduce Job of the
Talend Big Data Getting Started Guide.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

tFileOutputJSON in Talend
Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tFileOutputJSON, as well as the whole Map/Reduce Job using it, generates
native Map/Reduce code. This section presents the specific properties of tFileOutputJSON when it is used in that situation. For
further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family

MapReduce / Output

 

Function

In a Map/Reduce Job, tFileOutputJSON receives data from a transformation
component and outputs the data as one or more JSON files to
HDFS.

Basic settings

Schema and Edit
Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

   

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

   

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Folder

Enter the folder on HDFS where you want to store the JSON output
file(s).

The folder will be created automatically if it does not
exist.

Note that you need
to ensure you have properly configured the connection to the Hadoop
distribution to be used in the Hadoop
configuration
tab in the Run view.

 

Output type

Select the structure for the JSON output file(s).

  • All in one block: the
    received data will be written into one data block.

  • One row per record: the
    received data will be written into separate data blocks row
    by row.

 

Name of data block

Type in the name of the data block for the JSON output
file(s).

Warning

This field will be available only if you select All in one block from the Output type list.

 

Action

Select the action that you want to perform on the data:

  • Overwrite: the data on
    HDFS will be overwritten if it already exists.

  • Create: the data will be
    created.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

In a Talend Map/Reduce Job, it is used as an end component and requires
a transformation component as input link. The other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

Once a Map/Reduce Job is opened in the workspace, tFileOutputJSON as well as the MapReduce
family appears in the Palette of
the Studio.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

This connection is effective on a per-Job basis.

Prerequisites

The Hadoop distribution must be properly installed, so as to guarantee the interaction
with Talend Studio. The following list presents MapR related information for
example.

  • Ensure that you have installed the MapR client in the machine where the Studio is,
    and added the MapR client library to the PATH variable of that machine. According
    to MapR’s documentation, the library or libraries of a MapR client corresponding to
    each OS version can be found under MAPR_INSTALL
    hadoophadoop-VERSIONlib
    ative
    . For example, the library for
    Windows is lib
    ativeMapRClient.dll
    in the MapR
    client jar file. For further information, see the following link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr.

    Without adding the specified library or libraries, you may encounter the following
    error: no MapRClient in java.library.path.

  • Set the -Djava.library.path argument, for example, in the Job Run VM arguments area
    of the Run/Debug view in the [Preferences] dialog box. This argument provides to the Studio the
    path to the native library of that MapR client. This allows the subscription-based
    users to make full use of the Data viewer to view
    locally in the Studio the data stored in MapR. For further information about how to
    set this argument, see the section describing how to view data of Talend Big Data Getting Started Guide.

For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.

Scenario: Writing a JSON structured file

This is a 2 component scenario in which a
tRowGenerator
component generates random data which a tFileOutputJSON component then writes to a JSON structured
output file.

Use_Case_tFileOutputJSON1.png
  1. Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the
    Palette.

  2. Link the components using a Row > Main
    connection.

  3. Double click tRowGenerator to define
    its Basic Settings properties in the
    Component view.

    Use_Case_tFileOutputJSON2.png
  4. Click […] next to Edit Schema to display the corresponding dialog box and define
    the schema.

    Use_Case_tFileOutputJSON3.png
  5. Click [+] to add the number of columns
    desired.

  6. Under Columns type in the column
    names.

  7. Under Type, select the data type from the
    list.

  8. Click OK to close the dialog box.

  9. Click [+] next to RowGenerator Editor to open the corresponding dialog box.

    Use_Case_tFileOutputJSON4.png
  10. Under Functions, select pre-defined functions
    for the columns, if required, or select […]
    to set customized function parameters in the Function
    parameters
    tab.

  11. Enter the number of rows to be generated in the corresponding field.

  12. Click OK to close the dialog box.

  13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.

    Use_Case_tFileOutputJSON5.png
  14. Click […] to browse to where you want the
    output JSON file to be generated and enter the file name.

  15. Enter a name for the data block to be generated in the corresponding field,
    between double quotation marks.

  16. Select Built-In as the Schema type.

  17. Click Sync Columns to retrieve the schema
    from the preceding component.

  18. Press F6 to run the Job.

    Use_Case_tFileOutputJSON6.png

The data from the input schema is written in a JSON structured data block in the
output file.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x