August 17, 2023

tFindRegexlibExpressions – Docs for ESB 5.x

tFindRegexlibExpressions

tFindRegexlibExpressions_icon32_gif.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tFindRegexlibExpressions

Component family

Data Quality

 

Function

tFindRegexlibExpressions connects
to a web service at http://regexlib.com to get a list of regular
expressions for all languages, even those that are not supported by
Talend.

Purpose

tFindRegexlibExpressions returns
a data set holding information about all of the regular expressions
that match the request sent to the web server. Then you can keep
this information

Basic settings

Schema and Edit
Schema

These fields are read-only. The schema of this component contains
the following fields: Title, Expression, Description,
Matches, Non-Matches, Author, Rating
.

 

Regexp Substring

Define a regular expression substring you want to use as a filter
on the regular expression list.

 

Key Words

Enter the key word(s) you want to use as a filter on the regular
expression list. Key words are separated by commas.

 

Min Rate

Define a regular expression rating you want to use as a filter on
the regular expression list.

 

Relative path

Type in the relative path pointing to the pattern folder you need
to create under the Patterns >
Regex
node in the DQ
Repository
tree view for keeping the retrieved
patterns. For example, you need to create a folder called
phone with a sub-folder
uk for the phone patterns used in the U.K.,
then type in "phone/uk" in this Relative path field.

In order to create definitely the pattern folder in the DQ
Repository, you must import therein the retrieved regular
expressions that have been stored in a .csv
file. For further information about how to import regular expression
from a .csv file, see the Talend Studio
User Guide.

Advanced settings

tStat
Catcher
Statistics

Select this check box to collect log data at the component
level.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is a start component. It requires an output flow,
usually a csv file. You can later import all collected expressions
from a well formatted csv file into Talend Studio.

For more information about importing patterns, see Talend Studio User
Guide
.

Limitation

n/a

Scenario: Connecting to a web service and returning a list of regular
expressions

This scenario is a three-component Java Job created in Talend Studio.

This scenario:

  • uses the tFindRegexlibExpression component to
    connect to a web server and collects all regular expressions that have the word
    “email” in their description field,

  • uses the tMap component to reorganize the
    incoming data in the output flow and also to concatenate the two fields from the
    incoming data flow in one output column,

  • and finally writes all collected expressions in an csv file.

This Job can also be generated automatically from the Patterns
> Regex
node in the DQ Repository
tree view. For further information about how to generate a Job to recuperate regular
expressions, see the Talend Studio User Guide.

Configuring the tFindRegexlibExpressions component

Use_Case_tFindRegexlibExpressions.png
  1. Drop the following components from the Palette onto the design workspace: tFindRegexlibExpressions, tMap, and tFileOutputDelimited.

  2. Double-click the tFindRegexlibExpressions component to open its Basic settings view and define its
    properties.

    Use_Case_tFindRegexlibExpressions1.png

    The schema of this component is read-only and it contains the following
    fields: Title, Expression,
    Description, Matches,
    Non-Matches, Author,
    Rating and
    Relative_path.

  3. In the Regexp Substring field, define a
    regular expression substring you want to use as a filter on the regular
    expression list.

  4. In the Key Words field, define the key
    word(s) you want to use as a filter on the regular expression list.

  5. In the Min Rate field, define a regular
    expression rating you want to use as a filter on the regular expression
    list.

  6. In the Relative path field, type in the relative path pointing to the
    folder to be created in the Patterns >
    Regex
    node of the DQ
    Repository
    tree view for the retrieved patterns. In this
    example, this folder is email.

    In this scenario we want tFindRegexlibExpressions to collect all regular expressions
    on the web server that have the word “email” in their
    Description field and those which rate is at least
    1.

  7. Connect tFindRegexlibExpressions and
    tMap using a Main row link.

Configuring the tMap component

  1. Double-click the tMap component to open
    the Map Editor and do necessary fields
    reorganization and concatenation.

    Use_Case_tFindRegexlibExpressions2.png
  2. In the Map Editor, click the plus button
    in the upper-right corner to open a dialog box where you can give a name to
    the new output table, regex in this scenario.

    This will create a new link in the tMap
    component holding the same name and that you can use to connect tMap to the next component.

  3. In the lower-right corner of the map Editor, click the plus button to
    define the fields in the regex output table.

  4. In the upper half of the Map Editor, drop fields from the input table to
    fill the fields of the output schema as necessary. For more information
    regarding data mapping, see Talend Studio User
    Guide
    .

    In this scenario, we want to concatenate the Matches,
    and Non-Matches fields from the incoming
    data flow in one output column: Purpose.We want as well
    to have a new column in the output schema called Path.
    And finally, we do not want to have any rating-related information in the
    output schema.

  5. Click Ok to validate and close the Map
    Editor.

  6. Right-click tMap and select the regex link to connect tMap to tFileOutputDelimited.

Configuring the output component

  1. Double-click tFileOutputDelimited to
    display its Basic settings and define its
    properties.

    Use_Case_tFindRegexlibExpressions3.png
  2. Click the three-dot button next to the File
    Name
    field to browse to the file where you want to write the
    output data.

  3. Define the row and field separators in the corresponding fields.

  4. Select the Append check box if you want
    to add the new rows at the end of the records.

  5. Select the Include Header check box to
    include column headers in the output data.

  6. If needed, click Edit schema to view the
    input and output data flows.

Job execution

Save your Job an press F6 to execute it.

tFindRegexlibExpressions connects to the web
server and collects all regular expressions that match the request, tMap does all defined filed reorganization and
concatenation and passes the output flow to tFileOutptdelimited. The output file will look something like the
following:

Use_Case_tFindRegexlibExpressions4.png

You can later import all collected regular expressions from a well formatted csv
file into Talend Studio. for more
information about importing patterns, see Talend Studio
User Guide.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x