August 17, 2023

tPatternExtract – Docs for ESB 5.x

tPatternExtract

tPatternExtract_white.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tPatternExtract properties

Component
family

Data Quality

 

Function

tPatternExtract extracts from a column
all data strings that match a given Java regular
expression.

Purpose

tPatternExtract allows to output all
data that match a given pattern. You can then
implement any required operation on the extracted
data.

Basic
settings

Column to check

Select the column you want to
analyze.

 

Schema and
Edit schema

A schema is a row description, it defines
the number of fields to be processed and passed on to the next component. The schema
is either Built-in or stored remotely in the
Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

Pattern type

Select from the list the pattern you want to
check the data against and then extract all data
that match the selected pattern.

 

 

Built-in:
You create the schema and store it locally for
this component only. Related topic: see
Talend Studio User Guide.

 

 

Repository:
You have already created the schema and stored it
in the Repository. You can reuse it in various
projects and job designs. Related topic: see
Talend Studio User Guide.

Advanced
settings

tStatCatcher
Statistics

Select this check box to collect log data at
the component level.

Global
Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

NB_LINE_OK: the number of rows matching a given pattern.
This is an After variable and it returns an integer.

NB_LINE_REJECT: the number of rows not matching a given
pattern. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is an intermediary step. It
requires an input flow as well as an
output.

Limitation

n/a

Scenario: Extracting only the data that corresponds to a defined pattern from
a delimited file

This scenario describes a four-component Job where the tExtractPattern component is used to extract only
customers’ email addresses (that match the Email
address
pattern) from a delimited file that holds
different customer data. Then it writes the extracted data into another
delimited file. A tFilterColumns component
is used to adapt the output schema.

Use_Case_tPatternExtract.png

Setting up the Job

  1. Drop the following components from the Palette to the design
    workspace: tFileInputDelimited, tPatternExtract, tFilterColumns, and
    tFileOutputDelimited.

  2. Connect the tFileInputDelimited component to the
    tPatternExtract
    component using a Row > Main connection.

  3. Connect the tPatternExtract component to the
    tFilterColumns
    component using the Row > Matching
    Data
    connection.

  4. Connect the tFilterColumns component to the
    tFileOutputDelimited component using a
    Row > Main connection.

Configuring the components

  1. Double-click tFileInputDelimited to display its
    Basic settings
    view and define the component properties, including
    the input file name, the number of header rows to
    skip, and the schema.

    Use_Case_tPatternExtract1.png

    In this scenario, the delimited file holds names,
    email addresses and telephone numbers, all in a
    single column:
    Name_Telephone_Address. The
    following shows an extract of the input file.

    Use_Case_tPatternExtract2.png

    Therefore, define the input schema as follows:

    Use_Case_tPatternExtract3.png
  2. Double-click tPatternExtract to display its
    Basic settings
    view and define the component properties.

    Use_Case_tPatternExtract4.png
  3. From the Column to
    check
    list, select the column you want
    to check its data against the defined
    pattern,
    Name_Telephone_Address
    in this
    example.

  4. In the Pattern type
    list, select the pattern you want to extract data
    according to, /Regex/internet/Email Address in this
    example.

  5. In the Basic settings
    view of the tFilterColumns component, click the
    […] button next
    to Edit schema to
    open the [Schema]
    dialog box.

    Use_Case_tPatternExtract5.png
  6. Select the column of interest from the Input schema,
    and click the right arrow button to copy the column
    to the output schema. Then, click OK to close the dialog
    box.

  7. Double-click tFileOutputDelimited to display its
    Basic settings
    view and define the component properties.

    Use_Case_tPatternExtract6.png
  8. In the File Name
    field, specify the path to the file you want to
    write the output data to.

  9. Define the row and field separators in the
    corresponding fields, if any. In this example, we
    want to separate customers’ email addresses by
    semicolons.

Executing the Job

  • Save your Job and press F6
    to execute it.

    Customers’ email addresses are extracted from the
    selected column according to the defined Email
    pattern and written in the output file using
    semicolons as row separators. You can then, for
    example, send an email to all your customers in one
    go.

    Use_Case_tPatternExtract8.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x