August 17, 2023

tMultiPatternCheck – Docs for ESB 5.x

tMultiPatternCheck

tPatternCheck_icon32_white.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tMultiPatternCheck properties

Component family

Data Quality

 

Function

tMultiPatternCheck checks all existing data in
multiple columns against a given Java regular expression.

Purpose

tMultiPatternCheck can give two output flows:
Matching Data and Non-Matching Data. The first collects all data that match a given
pattern, and the second collects all data that do not match the pattern. You can
then implement any required corrections.

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed
and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

 

 

Built-in: You create the schema and store it
locally for this component only. Related topic: see Talend Studio User
Guide
.

 

 

Repository: You have already created the schema
and stored it in the Repository. You can reuse it in various projects and job
designs. Related topic: see Talend Studio User
Guide
.

 

Logical operator used to combine check conditions

In the case you want to combine the conditions you set on columns, select from
this list the combine mode you want to use.

 

Columns to check

Set a regular expression for each of the analyzed columns.

Column: list of the analyzed columns.

Check pattern: Select from the list the
pattern against which you want to check the column data.

These patterns are retrieved from the DQ
Repository
of your studio. The list includes the system and
user-defined patterns.

If you want to customize the data quality pattern against which to check the
column, select Custom from the pattern list.

Custom Pattern: enter your own customized
regular expression if you have selected Custom in
the Check Pattern column.

Is Case sensitive: select the check boxes of
the column name where you want to consider, when doing the pattern check, lower and
upper cases.

Check: select the check boxes of the column(s)
you want to check against the defined patterns.

Message: leave this column empty to have
automatic messages about which pattern has invalidated the data row and caused it to
be rejected.

You can also enter your own personalized message to enrich the Job result with
information about the patterns that cause the row to be rejected.

Advanced settings

tStatCatcher
Statistics

Select this check box to collect log data at the component level.

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

NB_LINE_OK: the number of rows matching a given pattern.
This is an After variable and it returns an integer.

NB_LINE_REJECT: the number of rows not matching a given
pattern. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is an intermediary step. It requires an input flow as well as an
output.

Limitation

n/a

Scenario: Checking the data in multiple columns against patterns

This scenario describes a four-component Job that checks customers’ last and first names
and email against the relevant patterns. It lists data that matches the selected patterns and
data that does not.

The check results are written in two output files: the first for the values that match the
selected patterns and the second for the values that do not match the selected patterns.
Rejected data has a message to tell what pattern was not validated.

Setting up the Job

In this scenario, we have already stored the main input schema in the Repository. For
more information about storing schema metadata in the Repository, see Talend Studio User
Guide
.

The main input table contains three columns: lname,
fname and email. We want to check the entries in
these columns against patterns.

  1. In the Repository tree view, expand MetadataDB Connections
    where you have stored the main input schema and drop the relevant file onto the design
    workspace.

    The [Components] dialog box is displayed.

    Use_Case_tRecordMatching.png
  2. Select the tMysqlInput component, and click
    OK to drop it onto the workspace.

    The input table used in this scenario is called customer. It
    holds several columns including the three columns against which we want to do a pattern
    check.

  3. Drop the following components from the Palette onto
    the design workspace: tMultiPatternCheck and two
    tLogRow.

    Use_Case_tMultiPatternCheck.png
  4. Connect the main input component to tMultiPatternCheck using a Main > Row
    link.

  5. Connect tMultiPatternCheck to the two tLogrow components using the Matches, and Non Matches links.

Configuring the components

  1. Double-click tMultiPatternCheck to display its
    Basic settings view and define its properties.

    Use_Case_tMultiPatternCheck2.png
  2. Click Edit schema to open a dialog box. Here you
    can define the data you want to pass to the output components, and then click OK to close the dialog box.

    Use_Case_tMultiPatternCheck3.png

    In this example we want to pass to the tMultiPatternCheck component all the columns in the main input
    columns.

  3. Click in the Check Pattern column and select from
    the list the patterns against which you want to check the data in the columns.

    In this example, you want to check if customer first and last names start with upper
    case and if emails are valid addresses.

  4. Select from the pattern list the Starts with uppercase pattern
    for the first and last names and the Email Address for the customer
    email.

    The patterns in this list are retrieved from the DQ
    Repository
    of your studio. The list includes the system and user-defined
    patterns.

  5. In the Is Case Sensitive column, select the check
    boxes next to the column name where you want to consider, when doing the pattern check,
    the lower and upper cases.

  6. In the Check column, select the check boxes next to
    the column names you want to check against the defined patterns, all columns in this
    example.

  7. Leave the Message column empty if you want to have
    the automatic message about what pattern is not validated. Otherwise, set your own
    message.

  8. Double-click the first tLogRow component to display
    its Basic settings view and define its
    properties.

    Use_Case_tMultiPatternCheck4.png
  9. In the Mode area, select the Table option to print results in a table.

    Do the same for the second tLogRow
    component.

Executing the Job

  • Save your Job and press F6 to execute it.

    Two output tables are written on the console. The first table lists the data entries
    in the three defined columns that match the selected patterns. The second table lists
    non match entries in the three columns according to the used patterns.

    The REGEX_INVALIDITY_MESSAGE column in the second
    table provides the name of the patterns that were not validated and because of which the
    rows were rejected.

    The figure below illustrates extractions of the two output tables.

    use_case-tmultipatterncheck5and6.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x