tFileInputFullRow
the next component via a Row link.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tFileInputFullRow Standard properties.
The component in this framework is available in all Talend
products. -
MapReduce: see tFileInputFullRow MapReduce properties (deprecated).
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Batch: see tFileInputFullRow properties for Apache Spark Batch.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming: see tFileInputFullRow properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tFileInputFullRow Standard properties
These properties are used to configure tFileInputFullRow running in the Standard Job framework.
The Standard
tFileInputFullRow component belongs to the File family.
The component in this framework is available in all Talend
products.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
File Name |
Specify the path to the file to be processed. Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors. |
Row separator |
The separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
Footer |
Enter the number of rows to be skipped at the end of the file. |
Limit |
Enter the maximum number of rows to be processed. If the value is set |
Skip empty rows |
Select this check box to skip the empty rows. |
Advanced settings
Encoding |
Select the encoding from the list or select Custom |
Extract lines at random |
Select this check box to set the number of lines to be extracted |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a Job |
Global Variables
Global Variables |
NB_LINE: the number of rows processed. This is an After
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
Use this component to read full rows in delimited files that can get |
Reading full rows in a delimited file
The following scenario creates a two-component Job that aims at reading complete rows in
the delimited file states.csv and displaying the rows on
the console.
The content of the file states.csv that holds ten rows
of data is as follows:
1 2 3 4 5 6 7 8 9 10 11 |
StateID;StateName 1;Alabama 2;Alaska 3;Arizona 4;Arkansas 5;California 6;Colorado 7;Connecticut 8;Delaware 9;Florida 10;Georgia |
Reading full rows in a delimited file
-
Create a new Job and add a tFileInputFullRow
component and a tLogRow component by typing their
names in the design workspace or dropping them from the Palette. -
Link the tFileInputFullRow component to the
tLogRow component using a Row > Main connection. -
Double-click the tFileInputFullRow component to
open its Basic settings view on the Component tab. -
Click the […] button next to Edit schema to view the data to be passed onto the
tLogRow component. Note that the schema is
read-only and it consists of only one column line. -
In the File Name field, browse to or enter the
path to the file to be processed. In this scenario, it is E:/states.csv. -
In the Row Separator field, enter the separator
used to identify the end of a row. In this example, it is the default value
. -
In the Header field, enter 1 to skip the header row at the beginning of the
file. -
Double-click the tLogRow component to open its
Basic settings view on the Component tab.In the Mode area, select Table (print values in cells of a table) for better readability of
the result. -
Press Ctrl+S to save your Job and then F6 to execute it.
As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring field separators, and the
complete rows of data are displayed on the console.To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, or tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields
and tExtractRegexFields.
tFileInputFullRow MapReduce properties (deprecated)
These properties are used to configure tFileInputFullRow running in the MapReduce Job framework.
The MapReduce
tFileInputFullRow component belongs to the MapReduce family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.
Basic settings
Property type |
Either Built-In or Repository. |
 |
Built-In: No property data stored centrally. |
 |
Repository: Select the repository file where the The properties are stored centrally under the Hadoop For further information about the Hadoop The fields that come after are pre-filled in using the fetched |
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
schema to make changes to the schema. Note: If you
make changes, the schema automatically becomes built-in. |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Folder/File |
Browse to, or enter the path pointing to the data to be used in the file system. If the path you set points to a folder, this component will read all of the files stored in that folder, for example,/user/talend/in; if sub-folders exist, the sub-folders are automatically ignored unless you define the property mapreduce.input.fileinputformat.input.dir.recursive to be If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
Note that you need |
Die on error |
Clear the check box to skip any rows on error and complete the process for |
Row separator |
The separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
Skip empty rows |
Select this check box to skip the empty rows. |
Advanced settings
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that Then select the encoding to be used from the list or select |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
In a Once a Map/Reduce Job is opened in the workspace, tFileInputFullRow as well as the Note that in this documentation, unless otherwise |
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Map/Reduce version of this component yet.
tFileInputFullRow properties for Apache Spark Batch
These properties are used to configure tFileInputFullRow running in the Spark Batch Job framework.
The Spark Batch
tFileInputFullRow component belongs to the File family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Define a storage configuration |
Select the configuration component to be used to provide the configuration If you leave this check box clear, the target file system is the local The configuration component to be used must be present in the same Job. |
Property type |
Either Built-In or Repository. |
 |
Built-In: No property data stored centrally. |
 |
Repository: Select the repository file where the The properties are stored centrally under the Hadoop For further information about the Hadoop The fields that come after are pre-filled in using the fetched |
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
schema to make changes to the schema. Note: If you
make changes, the schema automatically becomes built-in. |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Folder/File |
Browse to, or enter the path pointing to the data to be used in the file system. If the path you set points to a folder, this component will
read all of the files stored in that folder, for example, /user/talend/in; if sub-folders exist, the sub-folders are automatically ignored unless you define the property spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive to be true in the Advanced properties table in theSpark configuration tab.
If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
The button for browsing does not work with the Spark tHDFSConfiguration |
Die on error |
Select the check box to stop the execution of the Job when an error |
Row separator |
The separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
Skip empty rows |
Select this check box to skip the empty rows. |
Advanced settings
Set minimum partitions |
Select this check box to control the number of partitions to be created from the input In the displayed field, enter, without quotation marks, the minimum number of partitions When you want to control the partition number, you can generally set at least as many partitions as |
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that Then select the encoding to be used from the list or select |
Usage
Usage rule |
This component is used as a start component and requires an output This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.
tFileInputFullRow properties for Apache Spark Streaming
These properties are used to configure tFileInputFullRow running in the Spark Streaming Job framework.
The Spark Streaming
tFileInputFullRow component belongs to the File family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Define a storage configuration |
Select the configuration component to be used to provide the configuration If you leave this check box clear, the target file system is the local The configuration component to be used must be present in the same Job. |
Property type |
Either Built-In or Repository. |
 |
Built-In: No property data stored centrally. |
 |
Repository: Select the repository file where the The properties are stored centrally under the Hadoop For further information about the Hadoop The fields that come after are pre-filled in using the fetched |
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
schema to make changes to the schema. Note: If you
make changes, the schema automatically becomes built-in. |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Folder/File |
Browse to, or enter the path pointing to the data to be used in the file system. If the path you set points to a folder, this component will
read all of the files stored in that folder, for example, /user/talend/in; if sub-folders exist, the sub-folders are automatically ignored unless you define the property spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive to be true in the Advanced properties table in theSpark configuration tab.
If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
The button for browsing does not work with the Spark tHDFSConfiguration |
Die on error |
Select the check box to stop the execution of the Job when an error |
Row separator |
The separator used to identify the end of a row. |
Header |
Enter the number of rows to be skipped in the beginning of file. |
Skip empty rows |
Select this check box to skip the empty rows. |
Advanced settings
Set minimum partitions |
Select this check box to control the number of partitions to be created from the input In the displayed field, enter, without quotation marks, the minimum number of partitions When you want to control the partition number, you can generally set at least as many partitions as |
Custom Encoding |
You may encounter encoding issues when you process the stored data. In that Then select the encoding to be used from the list or select |
Usage
Usage rule |
This component is used as a start component and requires an output link. This component is only used to provide the lookup flow (the right side of a join This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.