tFileOutputJSON

Receives data and rewrites it in a JSON structured data block in an output
file.

Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:

Standard: see tFileOutputJSON Standard properties.

The component in this framework is available in all Talend
products.
MapReduce: see tFileOutputJSON MapReduce properties (deprecated).

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Spark Batch: see tFileOutputJSON properties for Apache Spark Batch.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Spark Streaming: see tFileOutputJSON properties for Apache Spark Streaming.

This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.

tFileOutputJSON Standard properties

These properties are used to configure tFileOutputJSON running in the Standard Job framework.

The Standard
tFileOutputJSON component belongs to the File family.

The component in this framework is available in all Talend
products.

Basic settings

File Name	Name and path of the output file. Warning: Use absolute path (instead of relative path) for this field to avoid possible errors.
Generate an array json	Select this check box to generate an array JSON file.
Name of data block	Enter a name for the data block to be written, between double quotation marks. This field disappears when the Generate an array json check box is selected.
Schema and Edit Schema	A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word `line` when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.
	Built-In: You create and store the schema locally for this component only.
	Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.
Sync columns	Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the Output component.

Advanced settings

Create directory if not exists	This check box is selected by default. This option creates the directory that will hold the output files if it does not already exist.
tStatCatcher Statistics	Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Global Variables

Global Variables	NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer. ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	Use this component to rewrite received data in a JSON structured output file.

Writing a JSON structured file

This is a 2 component scenario in which a
tRowGenerator component generates random data which a tFileOutputJSON component then writes to a JSON structured
output file.

Procedure

Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the
Palette.
Link the components using a Row > Main
connection.
Double click tRowGenerator to define
its Basic Settings properties in the
Component view.
Click […] next to Edit Schema to display the corresponding dialog box and define
the schema.
Click [+] to add the number of columns
desired.
Under Columns type in the column
names.
Under Type, select the data type from the
list.
Click OK to close the dialog box.
Click [+] next to RowGenerator Editor to open the corresponding dialog box.
Under Functions, select pre-defined functions
for the columns, if required, or select […]
to set customized function parameters in the Function
parameters tab.
Enter the number of rows to be generated in the corresponding field.
Click OK to close the dialog box.
Click tFileOutputJSON to set its Basic Settings properties in the Component view.
Click […] to browse to where you want the
output JSON file to be generated and enter the file name.
Enter a name for the data block to be generated in the corresponding field,
between double quotation marks.
Select Built-In as the Schema type.
Click Sync Columns to retrieve the schema
from the preceding component.
Press F6 to run the Job.

The data from the input schema is written in a JSON structured data block in the
output file.

tFileOutputJSON MapReduce properties (deprecated)

These properties are used to configure tFileOutputJSON running in the MapReduce Job framework.

The MapReduce
tFileOutputJSON component belongs to the MapReduce family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.

Basic settings

Schema and Edit Schema	A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word `line` when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.
	Built-In: You create and store the schema locally for this component only.
	Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.
Folder	Enter the folder on HDFS where you want to store the JSON output file(s). The folder will be created automatically if it does not exist. Note that you need to ensure you have properly configured the connection to the Hadoop distribution to be used in the Hadoop configuration tab in the Run view.
Output type	Select the structure for the JSON output file(s). All in one block: the received data will be written into one data block. One row per record: the received data will be written into separate data blocks row by row.
Name of data block	Type in the name of the data block for the JSON output file(s). This field will be available only if you select All in one block from the Output type list.
Action	Select the action that you want to perform on the data: Overwrite: the data on HDFS will be overwritten if it already exists. Create: the data will be created.

Advanced settings

Use local timezone for date	Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data.

Global Variables

Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	Use this component to rewrite received data in a JSON structured output file. In a Talend Map/Reduce Job, it is used as an end component and requires a transformation component as input link. The other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. Once a Map/Reduce Job is opened in the workspace, tFileOutputJSON as well as the MapReduce family appears in the Palette of the Studio. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Hadoop Connection	You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job. This connection is effective on a per-Job basis.
Prerequisites	The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio . The following list presents MapR related information for example. Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR client library to the PATH variable of that machine. According to MapR’s documentation, the library or libraries of a MapR client corresponding to each OS version can be found under MAPR_INSTALL hadoophadoop-VERSIONlib ative. For example, the library for Windows is lib ativeMapRClient.dll in the MapR client jar file. For further information, see the following link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr. Without adding the specified library or libraries, you may encounter the following error: `no MapRClient in java.library.path`. Set the `-Djava.library.path` argument, for example, in the Job Run VM arguments area of the Run/Debug view in the Preferences dialog box in the Window menu. This argument provides to the Studio the path to the native library of that MapR client. This allows the subscription-based users to make full use of the Data viewer to view locally in the Studio the data stored in MapR. For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop distribution you are using.

Usage rule

Use this component to rewrite received data in a JSON
structured output file.

In a
Talend
Map/Reduce Job, it is used as an end component and requires
a transformation component as input link. The other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

Once a Map/Reduce Job is opened in the workspace, tFileOutputJSON as well as the MapReduce family
appears in the Palette of the
Studio.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

This connection is effective on a per-Job basis.

Prerequisites

The Hadoop distribution must be properly installed, so as to guarantee the interaction
with
Talend Studio
. The following list presents MapR related information for
example.

Ensure that you have installed the MapR client in the machine where the Studio is,
and added the MapR client library to the PATH variable of that machine. According
to MapR’s documentation, the library or libraries of a MapR client corresponding to
each OS version can be found under MAPR_INSTALL
hadoophadoop-VERSIONlib
ative. For example, the library for
Windows is lib
ativeMapRClient.dll in the MapR
client jar file. For further information, see the following link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr.

Without adding the specified library or libraries, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument, for example, in the Job Run VM arguments area
of the Run/Debug view in the Preferences dialog box in the Window menu. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make
full use of the Data viewer to view locally in the
Studio the data stored in MapR.

For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.

Related scenarios

No scenario is available for the Map/Reduce version of this component yet.

tFileOutputJSON properties for Apache Spark Batch

These properties are used to configure tFileOutputJSON running in the Spark Batch Job framework.

The Spark Batch
tFileOutputJSON component belongs to the File family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Define a storage configuration component	Select the configuration component to be used to provide the configuration information for the connection to the target file system such as HDFS. If you leave this check box clear, the target file system is the local system. The configuration component to be used must be present in the same Job. For example, if you have dropped a tHDFSConfiguration component in the Job, you can select it to write the result in a given HDFS system.
Schema and Edit Schema	A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word `line` when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.
	Built-In: You create and store the schema locally for this component only.
	Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.
Folder	Browse to, or enter the path pointing to the data to be used in the file system. Note that this path must point to a folder rather than a file. The button for browsing does not work with the Spark Local mode; if you are using the other Spark Yarn modes that the Studio supports with your distribution, ensure that you have properly configured the connection in a configuration component in the same Job, such as tHDFSConfiguration . Use the configuration component depending on the filesystem to be used.
Output type	Select the structure for the JSON output file(s). All in one block: the received data will be written into one data block. One row per record: the received data will be written into separate data blocks row by row.
Name of data block	Type in the name of the data block for the JSON output file(s). This field will be available only if you select All in one block from the Output type list.
Action	Select the action that you want to perform on the data: Overwrite: the data on HDFS will be overwritten if it already exists. Create: the data will be created.

Advanced settings

Use local timezone for date	Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data.

Usage

Usage rule	This component is used as an end component and requires an input link. This component, along with the Spark Batch component Palette it belongs to, appears only when you are creating a Spark Batch Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs.
Spark Connection	In the Spark Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files: Yarn mode (Yarn client or Yarn cluster): When using Google Dataproc, specify a bucket in the Google Storage staging bucket field in the Spark configuration tab. When using HDInsight, specify the blob to be used for Job deployment in the Windows Azure Storage configuration area in the Spark configuration tab. When using Altus, specify the S3 bucket or the Azure Data Lake Storage for Job deployment in the Spark configuration tab. When using Qubole, add a tS3Configuration to your Job to write your actual business data in the S3 system with Qubole. Without tS3Configuration, this business data is written in the Qubole HDFS system and destroyed once you shut down your cluster. When using on-premise distributions, use the configuration component corresponding to the file system your cluster is using. Typically, this system is HDFS and so use tHDFSConfiguration. Standalone mode: use the configuration component corresponding to the file system your cluster is using, such as tHDFSConfiguration or tS3Configuration. If you are using Databricks without any configuration component present in your Job, your business data is written directly in DBFS (Databricks Filesystem). This connection is effective on a per-Job basis.

Usage rule

This component is used as an end component and requires an input link.

This component, along with the Spark Batch component Palette it belongs to,
appears only when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise explicitly stated, a
scenario presents only Standard Jobs, that is to
say traditional
Talend
data integration Jobs.

Spark Connection

In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:

Yarn mode (Yarn client or Yarn cluster):
- When using Google Dataproc, specify a bucket in the
  Google Storage staging bucket
  field in the Spark configuration
  tab.
- When using HDInsight, specify the blob to be used for Job
  deployment in the Windows Azure Storage
  configuration area in the Spark
  configuration tab.
- When using Altus, specify the S3 bucket or the Azure
  Data Lake Storage for Job deployment in the Spark
  configuration tab.
- When using Qubole, add a
  tS3Configuration to your Job to write
  your actual business data in the S3 system with Qubole. Without
  tS3Configuration, this business data is
  written in the Qubole HDFS system and destroyed once you shut
  down your cluster.
- When using on-premise
  distributions, use the configuration component corresponding
  to the file system your cluster is using. Typically, this
  system is HDFS and so use tHDFSConfiguration.
Standalone mode: use the
configuration component corresponding to the file system your cluster is
using, such as tHDFSConfiguration or
tS3Configuration.

If you are using Databricks without any configuration component present
in your Job, your business data is written directly in DBFS (Databricks
Filesystem).

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Batch version of this component
yet.

tFileOutputJSON properties for Apache Spark Streaming

These properties are used to configure tFileOutputJSON running in the Spark Streaming Job framework.

The Spark Streaming
tFileOutputJSON component belongs to the File family.

This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.

Basic settings

Define a storage configuration component	Select the configuration component to be used to provide the configuration information for the connection to the target file system such as HDFS. If you leave this check box clear, the target file system is the local system. The configuration component to be used must be present in the same Job. For example, if you have dropped a tHDFSConfiguration component in the Job, you can select it to write the result in a given HDFS system.
Schema and Edit Schema	A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word `line` when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.
	Built-In: You create and store the schema locally for this component only.
	Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.
Folder	Browse to, or enter the path pointing to the data to be used in the file system. Note that this path must point to a folder rather than a file. The button for browsing does not work with the Spark Local mode; if you are using the other Spark Yarn modes that the Studio supports with your distribution, ensure that you have properly configured the connection in a configuration component in the same Job, such as tHDFSConfiguration . Use the configuration component depending on the filesystem to be used.
Output type	Select the structure for the JSON output file(s). All in one block: the received data will be written into one data block. One row per record: the received data will be written into separate data blocks row by row.
Name of data block	Type in the name of the data block for the JSON output file(s). This field will be available only if you select All in one block from the Output type list.
Action	Select the action that you want to perform on the data: Overwrite: the data on HDFS will be overwritten if it already exists. Create: the data will be created.

Advanced settings

Write empty batches	Select this check box to allow your Spark Job to create an empty batch when the incoming batch is empty. For further information about when this is desirable behavior, see this discussion.
Use local timezone for date	Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data.

Write empty batches

Select this check box to allow your Spark Job to create an empty batch when the
incoming batch is empty.

For further information about when this is desirable
behavior, see this discussion.

Use local timezone for date

Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data.

Usage

Usage rule	This component is used as an end component and requires an input link. This component, along with the Spark Streaming component Palette it belongs to, appears only when you are creating a Spark Streaming Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs.
Spark Connection	In the Spark Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files: Yarn mode (Yarn client or Yarn cluster): When using Google Dataproc, specify a bucket in the Google Storage staging bucket field in the Spark configuration tab. When using HDInsight, specify the blob to be used for Job deployment in the Windows Azure Storage configuration area in the Spark configuration tab. When using Altus, specify the S3 bucket or the Azure Data Lake Storage for Job deployment in the Spark configuration tab. When using Qubole, add a tS3Configuration to your Job to write your actual business data in the S3 system with Qubole. Without tS3Configuration, this business data is written in the Qubole HDFS system and destroyed once you shut down your cluster. When using on-premise distributions, use the configuration component corresponding to the file system your cluster is using. Typically, this system is HDFS and so use tHDFSConfiguration. Standalone mode: use the configuration component corresponding to the file system your cluster is using, such as tHDFSConfiguration or tS3Configuration. If you are using Databricks without any configuration component present in your Job, your business data is written directly in DBFS (Databricks Filesystem). This connection is effective on a per-Job basis.

Usage rule

This component is used as an end component and requires an input link.

This component, along with the Spark Streaming component Palette it belongs to, appears
only when you are creating a Spark Streaming Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional
Talend
data
integration Jobs.

Spark Connection

Yarn mode (Yarn client or Yarn cluster):
- When using Google Dataproc, specify a bucket in the
  Google Storage staging bucket
  field in the Spark configuration
  tab.
- When using HDInsight, specify the blob to be used for Job
  deployment in the Windows Azure Storage
  configuration area in the Spark
  configuration tab.
- When using Altus, specify the S3 bucket or the Azure
  Data Lake Storage for Job deployment in the Spark
  configuration tab.
- When using Qubole, add a
  tS3Configuration to your Job to write
  your actual business data in the S3 system with Qubole. Without
  tS3Configuration, this business data is
  written in the Qubole HDFS system and destroyed once you shut
  down your cluster.
- When using on-premise
  distributions, use the configuration component corresponding
  to the file system your cluster is using. Typically, this
  system is HDFS and so use tHDFSConfiguration.
Standalone mode: use the
configuration component corresponding to the file system your cluster is
using, such as tHDFSConfiguration or
tS3Configuration.

If you are using Databricks without any configuration component present
in your Job, your business data is written directly in DBFS (Databricks
Filesystem).

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Streaming version of this component
yet.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 7.x

0 Comments

Inline Feedbacks

View all comments

tFileOutputJSON – Docs for ESB 7.x

tFileOutputJSON

tFileOutputJSON Standard properties

Basic settings

Advanced settings

Global Variables

Usage

Writing a JSON structured file

Procedure

tFileOutputJSON MapReduce properties (deprecated)

Basic settings

Advanced settings

Global Variables

Usage

Related scenarios

tFileOutputJSON properties for Apache Spark Batch

Basic settings

Advanced settings

Usage

Related scenarios

tFileOutputJSON properties for Apache Spark Streaming

Basic settings

Advanced settings

Usage

Related scenarios

My Website Links

Tags