Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.
Component family |
Data Quality |
|
Function |
tMultiPatternCheck checks all existing data in |
|
Purpose |
tMultiPatternCheck can give two output flows: |
|
Basic settings |
Schema and Edit schema |
A schema is a row description, it defines the number of fields to be processed Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: You create the schema and store it |
|
|
Repository: You have already created the schema |
|
Logical operator used to combine check conditions |
In the case you want to combine the conditions you set on columns, select from |
|
Columns to check |
Set a regular expression for each of the analyzed columns. –Column: list of the analyzed columns. –Check pattern: Select from the list the These patterns are retrieved from the DQ If you want to customize the data quality pattern against which to check the –Custom Pattern: enter your own customized –Is Case sensitive: select the check boxes of –Check: select the check boxes of the column(s) –Message: leave this column empty to have You can also enter your own personalized message to enrich the Job result with |
Advanced settings |
tStatCatcher |
Select this check box to collect log data at the component level. |
Global Variables |
NB_LINE: the number of rows read by an input component or NB_LINE_OK: the number of rows matching a given pattern. NB_LINE_REJECT: the number of rows not matching a given ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is an intermediary step. It requires an input flow as well as an |
|
Limitation |
n/a |
This scenario describes a four-component Job that checks customers’ last and first names
and email against the relevant patterns. It lists data that matches the selected patterns and
data that does not.
The check results are written in two output files: the first for the values that match the
selected patterns and the second for the values that do not match the selected patterns.
Rejected data has a message to tell what pattern was not validated.
In this scenario, we have already stored the main input schema in the Repository. For
more information about storing schema metadata in the Repository, see Talend Studio User
Guide.
The main input table contains three columns: lname,
fname and email. We want to check the entries in
these columns against patterns.
-
In the Repository tree view, expand Metadata – DB Connections
where you have stored the main input schema and drop the relevant file onto the design
workspace.The [Components] dialog box is displayed.
-
Select the tMysqlInput component, and click
OK to drop it onto the workspace.The input table used in this scenario is called customer. It
holds several columns including the three columns against which we want to do a pattern
check. -
Drop the following components from the Palette onto
the design workspace: tMultiPatternCheck and two
tLogRow. -
Connect the main input component to tMultiPatternCheck using a Main > Row
link. -
Connect tMultiPatternCheck to the two tLogrow components using the Matches, and Non Matches links.
-
Double-click tMultiPatternCheck to display its
Basic settings view and define its properties. -
Click Edit schema to open a dialog box. Here you
can define the data you want to pass to the output components, and then click OK to close the dialog box.In this example we want to pass to the tMultiPatternCheck component all the columns in the main input
columns. -
Click in the Check Pattern column and select from
the list the patterns against which you want to check the data in the columns.In this example, you want to check if customer first and last names start with upper
case and if emails are valid addresses. -
Select from the pattern list the Starts with uppercase pattern
for the first and last names and the Email Address for the customer
email.The patterns in this list are retrieved from the DQ
Repository of your studio. The list includes the system and user-defined
patterns. -
In the Is Case Sensitive column, select the check
boxes next to the column name where you want to consider, when doing the pattern check,
the lower and upper cases. -
In the Check column, select the check boxes next to
the column names you want to check against the defined patterns, all columns in this
example. -
Leave the Message column empty if you want to have
the automatic message about what pattern is not validated. Otherwise, set your own
message. -
Double-click the first tLogRow component to display
its Basic settings view and define its
properties. -
In the Mode area, select the Table option to print results in a table.
Do the same for the second tLogRow
component.
-
Save your Job and press F6 to execute it.
Two output tables are written on the console. The first table lists the data entries
in the three defined columns that match the selected patterns. The second table lists
non match entries in the three columns according to the used patterns.The REGEX_INVALIDITY_MESSAGE column in the second
table provides the name of the patterns that were not validated and because of which the
rows were rejected.The figure below illustrates extractions of the two output tables.