August 17, 2023

tFileInputFullRow – Docs for ESB 5.x

tFileInputFullRow

tFileInputFullRow_icon32_white.png

tFileInputFull Row properties

Component family

File/Input

 

Function

tFileInputFullRow reads a given
file row by row.

If you have subscribed to one of the Talend solutions with Big Data, you are
able to use this component in a Talend Map/Reduce Job to generate
Map/Reduce code. For further information, see tFileInputFullRow in Talend
Map/Reduce Jobs
.

Purpose

tFileInputFullRow opens a file
and reads it row by row and sends complete rows as defined in the
Schema to the next Job component, via a Row link.

Basic settings

Schema and Edit
Schema

A schema is a row description, it defines the number of fields to
be processed and passed on to the next component. The schema is
either Built-in or stored remotely
in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

 

File Name

Name of the file and/or the variable to be processed

For further information about how to define and use a variable in
a Job, see Talend Studio
User Guide.

 

Row separator

Enter the separator used to identify the end of a row.

 

Header

Enter the number of rows to be skipped in the beginning of file.

 

Footer

Number of rows to be skipped at the end of a file.

 

Limit

Maximum number of rows to be processed. If Limit = 0, no row is
read or processed.

 

Skip empty rows

Select this check box to skip the empty rows.

 

Die on error

Select this check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows.
When errors are skipped, you can collect the rows on error using a Row
> Reject
link.

Advanced settings

Encoding

Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database data handling.

 

Extract lines at random

Select this check box to set the number of lines to be extracted
randomly.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

Use this component to read full rows in delimited files that can
get very large. You can also create a rejection flow using a
Row > Reject link to filter
the data which does not correspond to the type defined. For an
example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited
file
.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

tFileInputFullRow in Talend
Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tFileInputFullRow, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tFileInputFullRow when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family

MapReduce / Input

 

Basic settings

Property type

Either Built-in or Repository.

   

Built-in: no property data stored
centrally.

   

Repository: reuse properties
stored centrally under the Hadoop
Cluster
node of the Repository tree.

The fields that come after are pre-filled in using the fetched
data.

For further information about the Hadoop
Cluster
node, see the Getting Started Guide.

 

Schema and Edit
Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes
built-in.

   

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

   

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Folder/File

Browse to, or enter the directory in HDFS where the data you need to use is.

If the path you set points to a folder, this component will read
all of the files stored in that folder, for example, /user/talend/in; if sub-folders exist,
the sub-folders are automatically ignored unless you define the path
like
/user/talend/in/*
.

If you want to specify more than one files or directories in this
field, separate each path using a coma (,).

If the file to be read is a compressed one, enter the file name
with its extension; then tHDFSFullRow automatically decompresses it at
runtime. The supported compression formats and their corresponding
extensions are:

  • DEFLATE: *.deflate

  • gzip: *.gz

  • bzip2: *.bz2

  • LZO: *.lzo

Note that you need
to ensure you have properly configured the connection to the Hadoop
distribution to be used in the Hadoop
configuration
tab in the Run view.

 

Die on error

Clear the check box to skip any rows on error and complete the process for error-free rows.
When errors are skipped, you can collect the rows on error using a Row
> Reject
link.

 

Row separator

Enter the separator used to identify the end of a row.

 

Header

Enter the number of rows to be skipped in the beginning of file.

 

Skip empty rows

Select this check box to skip the empty rows.

Advanced settings

Custom Encoding

You may encounter encoding issues when you process the stored data. In that situation, select
this check box to display the Encoding list.

Then select the encoding to be used from the list or select
Custom and define it
manually.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

In a Talend Map/Reduce Job, it is used as a start component and requires
a transformation component as output link. The other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

Once a Map/Reduce Job is opened in the workspace, tFileInputFullRow as well as the
MapReduce family appears in the Palette of the Studio.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

This connection is effective on a per-Job basis.

Scenario: Reading full rows in a delimited file

The following scenario creates a two-component Job that aims at reading complete rows
in a file and displaying the output in the Run log
console.

  1. Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace.

  2. Right-click on the tFileInputFullRow
    component and connect it to tLogRow using a
    Row Main link.

    Use_Case_tFileInputFullRow.png
  3. In the design workspace, select tFileInputFullRow.

  4. Click the Component tab to define the basic
    settings for tFileInputFullRow.

    Use_Case_tFileInputFullRow1.png
  5. In the Basic settings view, set Schema to Built-In.

  6. Click the three-dot […] button next to the
    Edit schema field to see the data to pass
    on to the tLogRow component. Note that the
    schema is read-only and it consists of one column,
    line.

    Use_Case_tFileInputFullRow2.png
  7. Fill in a path to the file to process in the File
    Name
    field, or click the three-dot […] button. This field is
    mandatory. In this scenario, the file to read is test5. It
    holds three rows where each row consists of tow fields separated by a semi
    colon.

  8. Define the Row separator used to identify the
    end of a row.

  9. Set the Header to 1, in this scenario the
    footer and the number of processed rows are not set.

  10. From the design workspace, select tLogRow and
    click the Component tab to define its basic
    settings. For more information, see tLogRow

  11. Save your Job and press F6 to execute
    it.

    Use_Case_tFileInputFullRow3.png

    tFileInputFullRow reads the three rows one by
    one ignoring field separators, and the complete rows are displayed on the
    Run console.

    Note

    To extract only fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, and tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x