
Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.
Component family |
Big Data / Hadoop |
|||
Function |
tHDFSOutput writes data flows it If you have subscribed to one of the Talend solutions with Big Data, you are |
|||
Purpose |
tHDFSOutput transfers data flows |
|||
Basic settings |
Property type |
Either Built-in or Repository
Built-in: No property data stored
Repository: Select the repository Since version 5.6, both the Built-In mode and the Repository mode are |
||
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|||
Built-In: You create and store the schema locally for this |
||||
Repository: You have already created the schema and |
||||
Use an existing connection |
Select this check box and in the Component List click the NoteWhen a Job contains the parent Job and the child Job, Component |
|||
Version |
Distribution |
Select the cluster you are using from the drop-down list. The options in the list vary
In order to connect to a custom distribution, once selecting Custom, click the
|
||
Hadoop version |
Select the version of the Hadoop distribution you are using. The available options vary
|
|||
Authentication |
Use kerberos authentication |
If you are accessing the Hadoop cluster running with Kerberos security, select this check This check box is available depending on the Hadoop distribution you are connecting |
||
Use a keytab to authenticate |
Select the Use a keytab to authenticate check box to log Note that the user that executes a keytab-enabled Job is not necessarily the one a |
|||
NameNode URI |
Type in the URI of the Hadoop NameNode. The NameNode is the master node of a Hadoop system. |
|||
|
User name |
Enter the user authentication name of HDFS. |
||
Group |
Enter the membership including the authentication user under which the HDFS instances were |
|||
|
File Name |
Browse to, or enter the location of the file which you write data |
||
File type |
Type |
Select the type of the file to be processed. The type of the file may be:
|
||
|
Action |
Select an operation in HDFS: Create: Creates a file with data Overwrite: Overwrites the data in Append: Inserts the data into the |
||
Row separator |
Enter the separator used to identify the end of a row. This field is not available for a Sequence file. |
|||
Field separator |
Enter character, string or regular expression to separate fields for the transferred This field is not available for a Sequence file. |
|||
Custom encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Select the encoding from the list or select Custom and This option is not available for a Sequence file. |
|||
Compression |
Select the Compress the data check box to compress the Hadoop provides different compression formats that help reduce the space needed for |
|||
Include header |
Select this check box to output the header of the data. This option is not available for a Sequence file. |
|||
Advanced settings |
Hadoop properties |
Talend Studio uses a default configuration for its engine to perform
For further information about the properties required by Hadoop and its related systems such
|
||
|
tStatCatcher Statistics |
Select this check box to collect log data at the component |
||
Dynamic settings |
Click the [+] button to add a row in the table and fill the The Dynamic settings table is available only when the For more information on Dynamic settings and context |
|||
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|||
Usage |
This component needs an input component. |
|||
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|||
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|||
Limitations |
JRE 1.6+ is required. |
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tHDFSOutput, as well as the other Map/Reduce components preceding it,
generates native Map/Reduce code. This section presents the specific properties of
tHDFSOutput when it is used in that situation. For
further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
Component family |
MapReduce / Output |
|
Basic settings |
Property type |
Either Built-in or Repository. |
Built-in: no property data stored |
||
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the Hadoop |
||
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on |
|
Built-In: You create and store the schema locally for this |
||
Repository: You have already created the schema and |
||
|
Folder |
Browse to, or enter the directory in HDFS where the data you need to use is. This path must point to a folder rather than a file, because a Note that you need |
File type |
Type |
Select the type of the file to be processed. The type of the file may be:
|
Action |
Select an operation in HDFS: Create: Creates a file and write Overwrite: Overwrites the file |
|
Row separator |
Enter the separator used to identify the end of a row. This field is not available for a Sequence file. |
|
Field separator |
Enter character, string or regular expression to separate fields for the transferred This field is not available for a Sequence file. |
|
Include header |
Select this check box to output the header of the data. This option is not available for a Sequence file. |
|
Custom encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Select the encoding from the list or select Custom and This option is not available for a Sequence file. |
|
Compression |
Select the Compress the data check box to compress the Hadoop provides different compression formats that help reduce the space needed for |
|
Merge result to single file |
Select this check box to merge the final part files into a single file and put that file in a Once selecting it, you need to enter the path to, or browse to the The following check boxes are used to manage the source and the target files:
This option is not available for a Sequence file. |
|
Advanced settings |
Advanced separator (for number) |
Select this check box to change the separator used for numbers. By |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
In a Talend Map/Reduce Job, it is used as an end component and requires Once a Map/Reduce Job is opened in the workspace, tHDFSOutput as well as the MapReduce Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
-
Related topic, see Scenario 1: Writing data in a delimited file.
-
Related topic, see Scenario: Computing data with Hadoop distributed file system.
If you are a subscription-based Big Data user, you can as well consult a Talend
Map/Reduce Job using the Map/Reduce version of tHDFSOutput: