tFileInputFullRow

tFileInputFull Row properties

Component family	File/Input
Function	tFileInputFullRow reads a given file row by row. If you have subscribed to one of the Talend solutions with Big Data, you are able to use this component in a Talend Map/Reduce Job to generate Map/Reduce code. For further information, see tFileInputFullRow in Talend Map/Reduce Jobs.
Purpose	tFileInputFullRow opens a file and reads it row by row and sends complete rows as defined in the Schema to the next Job component, via a Row link.
Basic settings	Schema and Edit Schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
	File Name	Name of the file and/or the variable to be processed For further information about how to define and use a variable in a Job, see Talend Studio User Guide.
	Row separator	Enter the separator used to identify the end of a row.
	Header	Enter the number of rows to be skipped in the beginning of file.
	Footer	Number of rows to be skipped at the end of a file.
	Limit	Maximum number of rows to be processed. If Limit = 0, no row is read or processed.
	Skip empty rows	Select this check box to skip the empty rows.
	Die on error	Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.
Advanced settings	Encoding	Select the encoding from the list or select Custom and define it manually. This field is compulsory for database data handling.
	Extract lines at random	Select this check box to set the number of lines to be extracted randomly.
	tStatCatcher Statistics	Select this check box to gather the Job processing metadata at a Job level as well as at each component level.
Global Variables	NB_LINE: the number of rows processed. This is an After variable and it returns an integer. ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	Use this component to read full rows in delimited files that can get very large. You can also create a rejection flow using a Row > Reject link to filter the data which does not correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file.
Log4j	The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide. For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

tFileInputFullRow in Talend
Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tFileInputFullRow, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tFileInputFullRow when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family	MapReduce / Input
Basic settings	Property type	Either Built-in or Repository.
		Built-in: no property data stored centrally.
		Repository: reuse properties stored centrally under the Hadoop Cluster node of the Repository tree. The fields that come after are pre-filled in using the fetched data. For further information about the Hadoop Cluster node, see the Getting Started Guide.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in.
		Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.
	Folder/File	Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read all of the files stored in that folder, for example, /user/talend/in; if sub-folders exist, the sub-folders are automatically ignored unless you define the path like /user/talend/in/. If you want to specify more than one files or directories in this field, separate each path using a coma (,). If the file to be read is a compressed one, enter the file name with its extension; then tHDFSFullRow* automatically decompresses it at runtime. The supported compression formats and their corresponding extensions are: DEFLATE: .deflate gzip: .gz bzip2: .bz2 LZO: .lzo Note that you need to ensure you have properly configured the connection to the Hadoop distribution to be used in the Hadoop configuration tab in the Run view.
	Die on error	Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.
	Row separator	Enter the separator used to identify the end of a row.
	Header	Enter the number of rows to be skipped in the beginning of file.
	Skip empty rows	Select this check box to skip the empty rows.
Advanced settings	Custom Encoding	You may encounter encoding issues when you process the stored data. In that situation, select this check box to display the Encoding list. Then select the encoding to be used from the list or select Custom and define it manually.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	In a Talend Map/Reduce Job, it is used as a start component and requires a transformation component as output link. The other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. Once a Map/Reduce Job is opened in the workspace, tFileInputFullRow as well as the MapReduce family appears in the Palette of the Studio. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Hadoop Connection	You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job. This connection is effective on a per-Job basis.

Scenario: Reading full rows in a delimited file

The following scenario creates a two-component Job that aims at reading complete rows
in a file and displaying the output in the Run log
console.

Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace.
Right-click on the tFileInputFullRow
component and connect it to tLogRow using a
Row Main link.
In the design workspace, select tFileInputFullRow.
Click the Component tab to define the basic
settings for tFileInputFullRow.
In the Basic settings view, set Schema to Built-In.
Click the three-dot […] button next to the
Edit schema field to see the data to pass
on to the tLogRow component. Note that the
schema is read-only and it consists of one column,
line.
Fill in a path to the file to process in the File
Name field, or click the three-dot […] button. This field is
mandatory. In this scenario, the file to read is test5. It
holds three rows where each row consists of tow fields separated by a semi
colon.
Define the Row separator used to identify the
end of a row.
Set the Header to 1, in this scenario the
footer and the number of processed rows are not set.
From the design workspace, select tLogRow and
click the Component tab to define its basic
settings. For more information, see tLogRow
Save your Job and press F6 to execute
it.

tFileInputFullRow reads the three rows one by
one ignoring field separators, and the complete rows are displayed on the
Run console.

Note

To extract only fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, and tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tFileInputFullRow – Docs for ESB 5.x

tFileInputFullRow

tFileInputFull Row properties

tFileInputFullRow in Talend
Map/Reduce Jobs

Warning

Scenario: Reading full rows in a delimited file

Note

My Website Links

Tags

tFileInputFullRow

tFileInputFull Row properties

tFileInputFullRow in Talend Map/Reduce Jobs

Warning

Scenario: Reading full rows in a delimited file

Note

tFileInputFullRow in Talend
Map/Reduce Jobs