
Component family |
File/Input |
|
Function |
tFileInputFullRow reads a given If you have subscribed to one of the Talend solutions with Big Data, you are |
|
Purpose |
tFileInputFullRow opens a file |
|
Basic settings |
Schema and Edit |
A schema is a row description, it defines the number of fields to Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
File Name |
Name of the file and/or the variable to be processed For further information about how to define and use a variable in |
|
Row separator |
Enter the separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
|
|
Footer |
Number of rows to be skipped at the end of a file. |
|
Limit |
Maximum number of rows to be processed. If Limit = 0, no row is |
|
Skip empty rows |
Select this check box to skip the empty rows. |
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. |
Advanced settings |
Encoding |
Select the encoding from the list or select Custom and |
|
Extract lines at random |
Select this check box to set the number of lines to be extracted |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
NB_LINE: the number of rows processed. This is an After ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
Use this component to read full rows in delimited files that can |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tFileInputFullRow, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tFileInputFullRow when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
Component family |
MapReduce / Input |
|
Basic settings |
Property type |
Either Built-in or Repository. |
Built-in: no property data stored |
||
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the Hadoop |
||
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes |
|
Built-In: You create and store the schema locally for this |
||
Repository: You have already created the schema and |
||
|
Folder/File |
Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
Note that you need |
Die on error |
Clear the check box to skip any rows on error and complete the process for error-free rows. |
|
|
Row separator |
Enter the separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
|
Skip empty rows |
Select this check box to skip the empty rows. |
|
Advanced settings |
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Then select the encoding to be used from the list or select |
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
In a Talend Map/Reduce Job, it is used as a start component and requires Once a Map/Reduce Job is opened in the workspace, tFileInputFullRow as well as the Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
The following scenario creates a two-component Job that aims at reading complete rows
in a file and displaying the output in the Run log
console.
-
Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace.
-
Right-click on the tFileInputFullRow
component and connect it to tLogRow using a
Row Main link. -
In the design workspace, select tFileInputFullRow.
-
Click the Component tab to define the basic
settings for tFileInputFullRow. -
In the Basic settings view, set Schema to Built-In.
-
Click the three-dot […] button next to the
Edit schema field to see the data to pass
on to the tLogRow component. Note that the
schema is read-only and it consists of one column,
line. -
Fill in a path to the file to process in the File
Name field, or click the three-dot […] button. This field is
mandatory. In this scenario, the file to read is test5. It
holds three rows where each row consists of tow fields separated by a semi
colon. -
Define the Row separator used to identify the
end of a row. -
Set the Header to 1, in this scenario the
footer and the number of processed rows are not set. -
From the design workspace, select tLogRow and
click the Component tab to define its basic
settings. For more information, see tLogRow -
Save your Job and press F6 to execute
it.tFileInputFullRow reads the three rows one by
one ignoring field separators, and the complete rows are displayed on the
Run console.Note
To extract only fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, and tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields.