July 30, 2023

tMultiPatternCheck – Docs for ESB 7.x

tMultiPatternCheck

Checks all existing data in multiple columns against a given Java regular
expression.

tMultiPatternCheck can give two output flows:
Matching Data and Non-Matching Data. The first
collects all data that match a given pattern, and the second collects all data that do not
match the pattern. You can then implement any required corrections.

tMultiPatternCheck Standard properties

These properties are used to configure tMultiPatternCheck running in the Standard Job framework.

The Standard
tMultiPatternCheck component belongs to the Data Quality family.

This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

 

Built-in: You create the schema and store it
locally for this component only. Related topic: see
Talend Studio User Guide
.

 

Repository: You have already created the schema
and stored it in the Repository. You can reuse it in various projects and job designs.
Related topic: see
Talend Studio User Guide
.

Logical operator used to combine check conditions

In the case you want to combine the conditions you set on columns, select from
this list the combine mode you want to use.

Columns To Check

Set a regular expression for each of the analyzed columns.

  • Column: list of the analyzed columns.
  • Property: The pattern against which you want to check the data is either
    Built-in or stored remotely in the
    Repository.
  • Pattern Selection: You can click fields in this
    column only if you have selected Repository from the
    Property list. To select the pattern against which
    you want to check the column data from the Pattern
    Selector
    , click the […] button in the
    corresponding field. These patterns are retrieved from the DQ Repository of Talend Studio. The list includes the system and user-defined patterns.
  • Pattern Regex: You can enter your own customized
    regular expression if you have selected Built-in in
    the Property column.
  • Is Case sensitive: select the check boxes of the
    column name where you want to consider, when doing the pattern check, lower
    and upper cases.
  • Check: select the check boxes of the column(s) you
    want to check against the defined patterns.
  • Message: leave this column empty to have automatic
    messages about which pattern has invalidated the data row and caused it to
    be rejected. You can also enter your own personalized message to enrich the
    Job result with information about the patterns that cause the row to be
    rejected.

Advanced settings

tStatCatcher
Statistics

Select this check box to collect log data at the component level.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

NB_LINE_OK: the number of rows matching a given pattern.
This is an After variable and it returns an integer.

NB_LINE_REJECT: the number of rows not matching a given
pattern. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is usually used as an intermediate component, and it requires an
input component and an output component.

Limitation

When exporting a Job using
tMultiPatternCheck with a user-defined pattern
stored in the DQ Repository, you must export the
pattern with the Job. Otherwise, the pattern will switch to Built-in
mode when importing the Job in another project.

When importing a Job using tMultiPatternCheck with a
system pattern, the pattern will switch to Built-in mode.

For more information about importing Jobs with
data profiling items in Talend Studio, see the
documentation on Talend Help Center (https://help.talend.com).

Checking the data in multiple columns against patterns

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario describes a four-component Job that checks customers’ last and first names
and email against the relevant patterns. It lists data that matches the selected patterns and
data that does not.

The check results are written in two output files: the first for the values
that match the selected patterns and the second for the values that do not match the selected
patterns. Rejected data has a message to tell what pattern was not validated.

Setting up the Job

In this scenario, you have already stored the main input schema in the
Repository. For more information about storing schema metadata in the Repository, see
Talend Studio User Guide.

The main input table contains three columns: lname, fname and email. You want to check
the entries in these columns against patterns.

  1. In the Repository tree
    view, expand Metadata > DB Connections where you have stored the main input schema and drop the relevant
    file onto the design workspace.
  2. From the Components dialog box, select the
    tMysqlInput component and click
    OK to drop it onto the workspace.

    The input table used in this scenario is called
    customer. It holds several columns including the
    three columns against which we want to do a pattern check.

  3. Drop the following components from the Palette onto the design workspace: tMultiPatternCheck and two tLogRow.
  4. Connect the main input component to tMultiPatternCheck using a Row > Main connection.
  5. Connect tMultiPatternCheck to the two tLogRow components using the Matches and Non Matches
    connections.
tMultiPatternCheck_1.png

Configuring the components

  1. Double-click tMultiPatternCheck to display its Basic settings view and define its properties.

    tMultiPatternCheck_2.png

  2. Click Edit schema to
    open a dialog box. Here you can define the data you want to pass to the output
    components, and then click OK to close
    the dialog box.

    In this example you pass to the tMultiPatternCheck
    component all the main input columns.

    tMultiPatternCheck_3.png

  3. Click in the Property column and select
    REPOSITORY from the list.

    In this example, you want to retrieve patterns from the DQ
    Repository
    of Talend Studio.

  4. Click in the Pattern
    Selection
    column and select from the Pattern
    Selector
    the patterns against which you want to check the data in
    the columns.

    In this example, you want to check if customer first and last names start
    with upper case and if emails are valid addresses.

  5. Select from the Pattern Selector the
    Starts with uppercase pattern for the first and last
    names and the Email Address for the customer email.
  6. In the Is Case Sensitive
    column, select the check boxes next to the column name where you want to
    consider, when doing the pattern check, the lower and upper cases.
  7. In the Check column,
    select the check boxes next to the column names you want to check against the
    defined patterns, all columns in this example.
  8. Leave the Message column
    empty if you want to have the automatic message about what pattern is not
    validated. Otherwise, set your own message.
  9. Configure the output components:

    1. Double-click the first tLogRow component to
      display its Basic settings view and define its
      properties.

      tMultiPatternCheck_4.png

    2. In the Mode area, select the
      Table option to print results in a
      table.
    3. Do the same for the second tLogRow
      component.

Executing the Job

Save your Job and press F6 to execute it.

Two output tables are written on the console. The first table lists the data entries
in the three defined columns that match the selected patterns. The second table
lists non match entries in the three columns according to the used patterns.

The REGEX_INVALIDITY_MESSAGE column in the second table provides the name of the
patterns that were not validated and because of which the rows were rejected.

The figure below illustrates extractions of the two output tables:

tMultiPatternCheck_5.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x