Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.
|
Component family |
Big Data / Hadoop |
|||
|
Function |
tHDFSInput reads a file located If you have subscribed to one of the Talend solutions with Big Data, you are |
|||
|
Purpose |
tHDFSInput extracts the data in a |
|||
|
Basic settings |
Property type |
Either Built-in or Repository
Built-in: No property data stored
Repository: Select the repository Since version 5.6, both the Built-In mode and the Repository mode are |
||
|
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|||
|
Built-In: You create and store the schema locally for this |
||||
|
Repository: You have already created the schema and |
||||
|
Use an existing connection |
Select this check box and in the Component List click the NoteWhen a Job contains the parent Job and the child Job, Component |
|||
|
Version |
Distribution |
Select the cluster you are using from the drop-down list. The options in the list vary
In order to connect to a custom distribution, once selecting Custom, click the
|
||
|
Hadoop version |
Select the version of the Hadoop distribution you are using. The available options vary
|
|||
|
Authentication |
Use kerberos authentication |
If you are accessing the Hadoop cluster running with Kerberos security, select this check This check box is available depending on the Hadoop distribution you are connecting |
||
| Use a keytab to authenticate |
Select the Use a keytab to authenticate check box to log Note that the user that executes a keytab-enabled Job is not necessarily the one a |
|||
|
NameNode URI |
Type in the URI of the Hadoop NameNode. The NameNode is the master node of a Hadoop system. |
|||
|
|
User name |
Enter the user authentication name of HDFS. |
||
|
Group |
Enter the membership including the authentication user under which the HDFS instances were |
|||
|
|
File Name |
Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read |
||
|
File type |
Type |
Select the type of the file to be processed. The type of the file may be:
|
||
|
Row separator |
Enter the separator used to identify the end of a row. This field is not available for a Sequence file. |
|||
|
Field separator |
Enter character, string or regular expression to separate fields for the transferred This field is not available for a Sequence file. |
|||
|
Header |
Set values to ignore the header of the transferred data. For This field is not available for a Sequence file. |
|||
|
Custom encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Select the encoding from the list or select Custom and This option is not available for a Sequence file. |
|||
|
Compression |
Select the Uncompress the data check box to uncompress Hadoop provides different compression formats that help reduce the space needed for This option is not available for a Sequence file. |
|||
|
Advanced settings |
Include sub-directories if path is |
Select this check box to read not only the folder you have |
||
|
Hadoop properties |
Talend Studio uses a default configuration for its engine to perform
For further information about the properties required by Hadoop and its related systems such
|
|||
|
|
tStatCatcher Statistics |
Select this check box to collect log data at the component level. |
||
|
Dynamic settings |
Click the [+] button to add a row in the table and fill the The Dynamic settings table is available only when the For more information on Dynamic settings and context |
|||
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|||
|
Usage |
This component needs an output link. |
|||
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|||
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|||
|
Limitations |
JRE 1.6+ is required. |
|||
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tHDFSInput, as well as the whole Map/Reduce Job using it, generates
native Map/Reduce code. This section presents the specific properties of tHDFSInput when it is used in that situation. For further
information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
|
Component family |
MapReduce / Input |
|
|
Basic settings |
Property type |
Either Built-in or Repository. |
|
Built-in: no property data stored |
||
|
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the Hadoop |
||
|
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes |
|
|
Built-In: You create and store the schema locally for this |
||
|
Repository: You have already created the schema and |
||
|
|
Folder/File |
Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
Note that you need |
|
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. |
|
File type |
Type |
Select the type of the file to be processed. The type of the file may be:
|
|
Row separator |
Enter the separator used to identify the end of a row. This field is not available for a Sequence file. |
|
|
Field separator |
Enter character, string or regular expression to separate fields for the transferred This field is not available for a Sequence file. |
|
|
Header |
Enter the number of rows to be skipped in the beginning of file. For example, enter 0 to ignore This field is not available for a Sequence file. |
|
|
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Then select the encoding to be used from the list or select This option is not available for a Sequence file. |
|
|
Advanced settings |
Advanced separator (for number) |
Select this check box to change the separator used for numbers. By |
|
Trim all columns |
Select this check box to remove the leading and trailing |
|
|
Check column to trim |
This table is filled automatically with the schema being used. Select the check box(es) |
|
|
|
tStatCatcher Statistics |
Select this check box to collect log data at the component |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
|
Usage |
In a Talend Map/Reduce Job, it is used as a start component and requires Once a Map/Reduce Job is opened in the workspace, tHDFSInput as well as the MapReduce Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
|
-
Related topic, see Scenario 1: Writing data in a delimited file.
-
Related topic, see Scenario: Computing data with Hadoop distributed file system.
If you are a subscription-based Big Data user, you can as well consult a Talend
Map/Reduce Job using the Map/Reduce version of tHDFSInput:
