July 30, 2023

tFindRegexlibExpressions – Docs for ESB 7.x

tFindRegexlibExpressions

Returns a dataset holding information about all of the regular expressions that
match the request sent to the web server.

tFindRegexlibExpressions connects to
a web service at http://regexlib.com to get a list of regular expressions for all languages, even those that are not
supported by Talend.

tFindRegexlibExpressions Standard properties

These properties are used to configure tFindRegexlibExpressions running in the Standard Job framework.

The Standard
tFindRegexlibExpressions component belongs to the Data Quality family.

This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema and Edit
Schema

These fields are read-only. The schema of this component contains
the following fields: Title, Expression, Description,
Matches, Non-Matches, Author, Rating
.

Regexp Substring

Define a regular expression substring you want to use as a filter
on the regular expression list.

Key Words

Enter the key word(s) you want to use as a filter on the regular
expression list. Key words are separated by commas.

Min Rate

Define a regular expression rating you want to use as a filter on
the regular expression list.

Relative path

Type in the relative path pointing to the pattern folder you need
to create under the Patterns >
Regex
node in the DQ
Repository
tree view for keeping the retrieved
patterns. For example, you need to create a folder called
phone with a sub-folder
uk for the phone patterns used in the U.K.,
then type in "phone/uk" in this Relative path field.

In order to create definitely the pattern folder in the DQ
Repository, you must import therein the retrieved regular
expressions that have been stored in a .csv
file. For further information about how to import regular expression
from a .csv file, see the
Talend Studio

User Guide.

Advanced settings

tStat
Catcher
Statistics

Select this check box to collect log data at the component
level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is a start component. It requires an output flow,
usually a csv file. You can later import all collected expressions
from a well formatted csv file into
Talend Studio
.

For more information about importing patterns, see
Talend Studio User
Guide
.

Connecting to a web service and returning a list of regular
expressions

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario is a three-component Java Job created in
Talend Studio
.

This scenario:

  • uses the tFindRegexlibExpression component to connect to a web server
    and collects all regular expressions that have the word “email” in their
    description field,

  • uses the tMap component to
    reorganize the incoming data in the output flow and also to concatenate the two
    fields from the incoming data flow in one output column,

  • and finally writes all collected expressions in an csv file.

In this scenario we want tFindRegexlibExpressions
to collect all regular expressions on the web server that have the word “email” in their
Description field and those which rate is at least 1.

This Job can also be generated automatically from the Patterns > Regex node in the DQ Repository tree view. For further information about how to generate
a Job to recuperate regular expressions, see the
Talend Studio User Guide
.

tFindRegexlibExpressions_1.png

Configuring the tFindRegexlibExpressions component

  1. Drop the following components from the Palette onto the design workspace: tFindRegexlibExpressions, tMap, and tFileOutputDelimited.
  2. Double-click the tFindRegexlibExpressions component to open its Basic settings view and define its
    properties.

    tFindRegexlibExpressions_2.png

    The schema of this component is read-only and it contains the following
    fields: Title, Expression, Description,
    Matches, Non-Matches, Author, Rating and
    Relative_path.

  3. In the Regexp Substring field, define a
    regular expression substring you want to use as a filter on the regular
    expression list.
  4. In the Key Words field, define the key
    word(s) you want to use as a filter on the regular expression list.
  5. In the Min Rate field, define a regular
    expression rating you want to use as a filter on the regular expression
    list.
  6. In the Relative path field, type in the relative path pointing to the
    folder to be created in the Patterns >
    Regex
    node of the DQ
    Repository
    tree view for the retrieved patterns. In this
    example, this folder is email.
  7. Connect tFindRegexlibExpressions and
    tMap using a Main row link.

Configuring the tMap component

  1. Double-click the tMap component to open
    the Map Editor and do necessary fields
    reorganization and concatenation.

    tFindRegexlibExpressions_3.png

  2. In the Map Editor, click the plus button
    in the upper-right corner to open a dialog box where you can give a name to
    the new output table, regex in this scenario.

    This will create a new link in the tMap component
    holding the same name and that you can use to connect
    tMap to the next component.
  3. In the lower-right corner of the map Editor, click the plus button to
    define the fields in the regex output table.
  4. In the upper half of the Map Editor, drop fields from the input table to
    fill the fields of the output schema as necessary. For more information
    regarding data mapping, see
    Talend Studio User
    Guide
    .

    In this scenario, we want to concatenate the Matches, and
    Non-Matches fields from the incoming data flow in one output column:
    Purpose.We want as well to have a new column in the output schema called
    Path. And finally, we do not want to have any rating-related information
    in the output schema.
  5. Click Ok to validate and close the Map
    Editor.
  6. Right-click tMap and select the regex link to connect tMap to tFileOutputDelimited.

Configuring the output component

  1. Double-click tFileOutputDelimited to
    display its Basic settings and define its
    properties.

    tFindRegexlibExpressions_4.png

  2. Click the three-dot button next to the File
    Name
    field to browse to the file where you want to write the
    output data.
  3. Define the row and field separators in the corresponding fields.
  4. Select the Append check box if you want
    to add the new rows at the end of the records.
  5. Select the Include Header check box to
    include column headers in the output data.
  6. If needed, click Edit schema to view the
    input and output data flows.

Saving and executing the Job

  1. Press Ctrl + S to save
    the Job.
  2. Press F6 to run the
    Job.

tFindRegexlibExpressions
connects to the web server and collects all regular expressions that match the
request, tMap does all defined filed
reorganization and concatenation and passes the output flow to tFileOutptdelimited. The output file will look something
like the following:

tFindRegexlibExpressions_5.png

You can later import all collected regular expressions from a well
formatted csv file into
Talend Studio
. for more information about importing
patterns, see
Talend Studio

User Guide.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x