
Component family |
File/Input |
|
Function |
tFileInputDelimited reads a given If you have subscribed to one of the Talend solutions with Big Data, you are |
|
Purpose |
Opens a file and reads it row by row to split them up into fields |
|
Basic settings |
Property type |
Either Built-in or Repository. |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
File Name/Stream |
File name: Name and path of the Stream: The data flow to be This variable could be already pre-defined in your Studio or In order to avoid the inconvenience of hand writing, you could Related topic to the available variables: see Talend Studio User Guide |
|
Row separator |
Enter the separator used to identify the end of a row. |
|
Field separator |
Enter character, string or regular expression to separate fields for the transferred |
|
CSV options |
Select this check box to include CSV specific parameters such as |
|
Header |
Enter the number of rows to be skipped in the beginning of file. NoteWhen using the dynamic schema feature, the first row of the For further information about dynamic schemas, see |
|
Footer |
Number of rows to be skipped at the end of the file. |
|
Limit |
Maximum number of rows to be processed. If Limit = 0, no row is |
|
Schema and Edit |
A schema is a row description, it defines the number of fields to Click Edit schema to make changes to the schema. If the
Note that if the input value of any non-nullable primitive field is null, the row of This component offers the advantage of the dynamic schema feature. This allows you to This dynamic schema feature is designed for the purpose of retrieving unknown columns WarningWhen using the dynamic schema feature, the dynamic column does |
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
Skip empty rows |
Select this check box to skip the empty rows. |
|
Uncompress as zip file |
Select this check box to uncompress the input file. |
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. To catch the |
Advanced settings |
Advanced separator (for numbers) |
Select this check box to modify the separators used for Thousands separator: define Decimal separator: define |
|
Extract lines at random |
Select this check box to set the number of lines to be extracted |
|
Encoding |
Select the encoding from the list or select Custom and |
|
Trim all column |
Select this check box to remove the leading and trailing |
|
Check each row structure against schema |
Select this check box to check whether the total number of columns |
|
Check date |
Select this check box to check the date format strictly against the input schema. |
|
Check columns to trim |
This table is filled automatically with the schema being used. Select the check box(es) |
|
Split row before field |
Select this check box to split rows before splitting |
|
Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric |
Select this check box if any of your numeric types (long, integer, short, or byte type), will In the table that displays, select the check box next to the Select the Permit hexadecimal or This table appears only when the Permit |
|
tStatCatcher Statistics |
Select this check box to gather the processing metadata at the Job |
Global Variables |
NB_LINE: the number of rows processed. This is an After ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
Use this component to read a file and separate fields contained in |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tFileInputDelimited, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tFileInputDelimited when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
Component family |
MapReduce / Input |
|
Basic settings |
Property type |
Either Built-in or Repository. |
Built-in: no property data stored |
||
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the Hadoop |
||
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes |
|
Built-In: You create and store the schema locally for this |
||
Repository: You have already created the schema and |
||
|
Folder/File |
Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
Note that you need |
Die on error |
Clear the check box to skip any rows on error and complete the process for error-free rows. |
|
|
Row separator |
Enter the separator used to identify the end of a row. |
Field separator |
Enter character, string or regular expression to separate fields for the transferred |
|
Header |
Enter the number of rows to be skipped in the beginning of file. |
|
CSV options |
Select this check box to include CSV specific parameters such as |
|
Skip empty rows |
Select this check box to skip the empty rows. |
|
Advanced settings |
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Then select the encoding to be used from the list or select |
Advanced separator (for number) |
Select this check box to change the separator used for numbers. By |
|
Trim all columns |
Select this check box to remove the leading and trailing |
|
Check column to trim |
This table is filled automatically with the schema being used. Select the check box(es) |
|
Check each row structure against |
Select this check box to check whether the total number of columns |
|
Check date |
Select this check box to check the date format strictly against the input schema. |
|
Decode String for long, int, short, byte Types |
Select this check box if any of your numeric types (long, integer, short, or byte type), will |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
In a Talend Map/Reduce Job, it is used as a start component and requires Once a Map/Reduce Job is opened in the workspace, tFileInputDelimited as well as the Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
The following scenario creates a two-component Job, which aims at reading each row of
a file, selecting delimited data and displaying the output in the Run log console.

-
Drop a tFileInputDelimited component and
a tLogRow component from the Palette to the design workspace. -
Right-click on the tFileInputDelimited
component and select Row > Main. Then drag it onto the tLogRow component and release when the plug symbol shows
up.
-
Select the tFileInputDelimited component
again, and define its Basic settings: -
Fill in a path to the file in the File
Name field. This field is mandatory.Warning
If the path of the file contains some accented characters, you will
get an error message when executing your Job. For more information
regarding the procedures to follow when the support of accented
characters is missing, see the Talend Installation
and Upgrade Guide of the Talend
Solution you are using. -
Define the Row separator allowing to
identify the end of a row. Then define the Field
separator used to delimit fields in a row. -
In this scenario, the header and footer limits are not set. And the
Limit number of processed rows is set
on 50. -
Set the Schema as either a local
(Built-in) or a remotely managed
(Repository) to define the data to pass
on to the tLogRow component. -
You can load and/or edit the schema via the Edit
Schema function.Related topics: see Talend Studio User
Guide. -
Enter the encoding standard the input file is encoded in. This setting is
meant to ensure encoding consistency throughout all input and output
files. -
Select the tLogRow and define the
Field separator to use for the output
display. Related topic: tLogRow. -
Select the Print schema column name in front of each
value check box to retrieve the column labels in the output
displayed.
-
Press Ctrl+S to save your Job.
-
Go to Run tab, and click on Run to execute the Job.
The file is read row by row and the extracted fields are displayed on the
Run log as defined in both components
Basic settings.The Log sums up all parameters in a header followed by the result of the
Job.
This scenario describes a four component Job used to fetch data from a voluminous file
almost as soon as it has been read. The data is displayed in the Run view. The advantage of this technique is that you do not have to
wait for the entire file to be downloaded, before viewing the data.

-
Drop the following components onto the workspace: tFileFetch, tSleep,
tFileInputDelimited, and tLogRow. -
Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk
link and connect tFileInputDelimited to
tLogRow using a Row > Main link.
-
Double-click tFileFetch to display the
Basic settings tab in the Component view and set the properties. -
From the Protocol list, select the
appropriate protocol to access the server on which your data is
stored. -
In the URI field, enter the URI required
to access the server on which your file is stored. -
Select the Use cache to save the resource
check box to add your file data to the cache memory. This option allows you
to use the streaming mode to transfer the data. -
In the workspace, click tSleep to display
the Basic settings tab in the Component view and set the properties.By default, tSleep‘s Pause field is set to 1
second. Do not change this setting. It pauses the second Job in order to
give the first Job, containing tFileFetch,
the time to read the file data. -
In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the Component view and set the properties.
-
In the File name/Stream field:
– Delete the default content.
– Press Ctrl+Space to view the variables
available for this component.– Select tFileFetch_1_INPUT_STREAM from the
auto-completion list, to add the following variable to the Filename field:
((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STREAM"))
. -
From the Schema list, select Built-in and click […] next to the Edit
schema field to describe the structure of the file that you
want to fetch. The US_Employees file is composed of six
columns: ID, Employee,
Age, Address,
State, EntryDate.Click [+] to add the six columns and set
them as indicated in the above screenshot. Click OK. -
In the workspace, double-click tLogRow to
display its Basic settings in the Component view and click Sync Columns to ensure that the schema structure is properly
retrieved from the preceding component.
-
Click the Job tab and then on the
Extra view. -
Select the Multi thread execution check
box in order to run the two Jobs at the same time. Bear in mind that the
second Job has a one second delay according to the properties set in
tSleep. This option allows you to fetch
the data almost as soon as it is read by tFileFetch, thanks to the tFileDelimited component. -
Save the Job and press F6 to run it.
The data is displayed in the console as almost as soon as it is
read.
For a scenario concerning the use of dynamic
schemas in tFileInputDelimited, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.