August 17, 2023

tExtractDelimitedFields – Docs for ESB 5.x

tExtractDelimitedFields

tExtractDelimitedFields_icon32_white.png

tExtractDelimitedFields properties

Component family

Processing/Fields

 

Function

tExtractDelimitedFields generates
multiple columns from a given column in a delimited file.

If you have subscribed to one of the Talend solutions with Big Data, you are
able to use this component in a Talend Map/Reduce Job to generate
Map/Reduce code. For further information, see tExtractDelimitedFields in Talend Map/Reduce Jobs.

Purpose

tExtractDelimitedFields helps to
extract ‘fields’ from within a string to write them elsewhere for
example.

Basic settings

Field to split

Select an incoming field from the Field to
split
list to split.

 

Ignore NULL as the source data

Select this check box to ignore the Null value in the source data.

Clear this check box to generate the Null records that correspond
to the Null value in the source data.

 

Field separator

Enter character, string or regular expression to separate fields for the transferred
data.

Note

Since this component uses regex to split a filed and the regex
syntax uses special characters as operators, make sure to
precede the regex operator you use as a field separator by a
double backslash. For example, you have to use “\|” instead of
“|”.

 

Die on error

Clear the check box to skip any rows on error and complete the process for error-free rows.
When errors are skipped, you can collect the rows on error using a Row
> Reject
link.

 

Schema and Edit
Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

Click Sync columns to retrieve the schema from the
previous component connected in the Job.

 

 

Built-in: You create the schema
and store it locally for the component. Related topic: see
Talend Studio User
Guide
.

 

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and Job flowcharts. Related topic: see
Talend Studio User
Guide
.

Advanced settings

Advanced separator (for number)

Select this check box to modify the separators used for
numbers.

 

Trim column

Select this check box to remove leading and trailing whitespace
from all columns.

 

Check each row structure against schema

Select this check box to check whether the total number of columns
in each row is consistent with the schema. If not consistent, an
error message will be displayed on the console.

 

Validate date

Select this check box to check the date format strictly against the input schema.

 

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job
level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component handles flow of data therefore it requires input
and output components. It allows you to extract data from a
delimited field, using a Row >
Main link, and enables you to
create a reject flow filtering data which type does not match the
defined type.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

tExtractDelimitedFields in Talend Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tExtractDelimitedFields, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tExtractDelimitedFields when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family

Processing / Fields

 

Basic settings

Property type

Either Built-in or Repository.

   

Built-in: no property data stored
centrally.

   

Repository: reuse properties
stored centrally under the Hadoop
Cluster
node of the Repository tree.

The fields that come after are pre-filled in using the fetched
data.

For further information about the Hadoop
Cluster
node, see the Getting Started Guide.

 

Field

Select the column in which the fields are to be split.

 

Schema and Edit
Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes
built-in.

   

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

   

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Die on error

Clear the check box to skip any rows on error and complete the process for error-free rows.
When errors are skipped, you can collect the rows on error using a Row
> Reject
link.

 

Field separator

Enter character, string or regular expression to separate fields for the transferred
data.

 

CSV options

Select this check box to include CSV specific parameters such as
Escape char and Text enclosure.

Advanced settings

Custom Encoding

You may encounter encoding issues when you process the stored data. In that situation, select
this check box to display the Encoding list.

Then select the encoding to be used from the list or select
Custom and define it
manually.

Advanced separator (for number)

Select this check box to change the separator used for numbers. By
default, the thousands separator is a coma (,) and the decimal separator is a period (.).

 

Trim all columns

Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is cleared, the
Check column to trim table is
displayed, which lets you select particular columns to trim.

 

Check column to trim

This table is filled automatically with the schema being used. Select the check box(es)
corresponding to the column(s) to be trimmed.

 

Check each row structure against
schema

Select this check box to check whether the total number of columns
in each row is consistent with the schema. If not consistent, an
error message will be displayed on the console.

 

Check date

Select this check box to check the date format strictly against the input schema.

 

Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric
types

Select this check box if any of your numeric types (long, integer, short, or byte type), will
be parsed from a hexadecimal or octal string.

 

tStatCatcher Statistics

Select this check box to collect log data at the component
level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

If you have subscribed to one of the Talend solutions with Big Data, you can also
use this component as a Map/Reduce component. In a Talend Map/Reduce Job, this
component is used as an intermediate step and other components used along with it must be
Map/Reduce components, too. They generate native Map/Reduce code that can be executed
directly in Hadoop.

Once a Map/Reduce Job is opened in the workspace, tExtractDelimitedFields as well as the
MapReduce family appears in the Palette of the Studio.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents
only Standard Jobs, that is to say traditional Talend data
integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the
Run view to define the connection to a given Hadoop
distribution for the whole Job.

This connection is effective on a per-Job basis.

Scenario: Extracting fields from a comma-delimited file

This scenario describes a three-component Job where the tExtractdelimitedFields component is used to extract two columns from a
comma-delimited file.

First names and last names are extracted and displayed in the corresponding defined
columns on the console.

Linking the components

  1. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tExtractDelimitedFields, and tLogRow.

  2. Connect them using the Row Main
    links.

    Use_Case_tExtractdelimitedField.png

Configuring the components

  1. Double-click the tFileInputDelimited
    component to open its Basic settings
    view.

    Use_Case_tExtractdelimitedField1.png
  2. In the Basic settings view, set Property Type to Built-In.

  3. Click the […] button next to the
    File Name field to select the path to
    the input file.

    Note

    The File Name field is mandatory.

    The input file used in this scenario is called test5.
    It is a text file that holds comma-delimited data.

    Use_Case_tExtractdelimitedField2.png
  4. In the Basic settings view, fill in all
    other fields as needed. For more information, see tFileInputDelimited. In this scenario, the header and the
    footer are not set and there is no limit for the number of processed
    rows

  5. Click Edit schema to describe the data
    structure of this input file. In this scenario, the schema is made of one
    column, name.

    Use_Case_tExtractdelimitedField3.png
  6. Double-click the tExtractDelimitedFields
    component to open its Basic settings
    view.

    Use_Case_tExtractdelimitedField4.png
  7. From the Field to split list, select the
    column to split, name in this scenario.

  8. In the Field separator field, enter the
    corresponding separator.

  9. Click Edit schema to describe the data
    structure of this processing component.

  10. In the output panel of the [Schema of
    tExtractDelimitedFields]
    dialog box, click the plus button to
    add two columns for the output schema, firstname and
    lastname.

    Use_Case_tExtractdelimitedField5.png

    In this scenario, we want to split the name column
    into two columns in the output flow, firstname and
    lastname.

  11. Click OK to close the [Schema of tExtractDelimitedFields] dialog
    box.

  12. In the design workspace, select tLogRow
    and click the Component tab to define its
    basic settings. For more information, see tLogRow.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to execute it.

    Use_Case_tExtractdelimitedField6.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x