Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.
Component |
Data Quality |
|
Function |
tPatternExtract extracts from a column |
|
Purpose |
tPatternExtract allows to output all |
|
Basic |
Column to check |
Select the column you want to |
|
Schema and |
A schema is a row description, it defines Since version 5.6, both the Built-In mode and the Repository mode are |
|
Pattern type |
Select from the list the pattern you want to |
|
|
Built-in: |
|
|
Repository: |
Advanced |
tStatCatcher |
Select this check box to collect log data at |
Global |
NB_LINE: the number of rows read by an input component or NB_LINE_OK: the number of rows matching a given pattern. NB_LINE_REJECT: the number of rows not matching a given ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is an intermediary step. It |
|
Limitation |
n/a |
This scenario describes a four-component Job where the tExtractPattern component is used to extract only
customers’ email addresses (that match the Email
address pattern) from a delimited file that holds
different customer data. Then it writes the extracted data into another
delimited file. A tFilterColumns component
is used to adapt the output schema.
-
Drop the following components from the Palette to the design
workspace: tFileInputDelimited, tPatternExtract, tFilterColumns, and
tFileOutputDelimited. -
Connect the tFileInputDelimited component to the
tPatternExtract
component using a Row > Main connection. -
Connect the tPatternExtract component to the
tFilterColumns
component using the Row > Matching
Data connection. -
Connect the tFilterColumns component to the
tFileOutputDelimited component using a
Row > Main connection.
-
Double-click tFileInputDelimited to display its
Basic settings
view and define the component properties, including
the input file name, the number of header rows to
skip, and the schema.In this scenario, the delimited file holds names,
email addresses and telephone numbers, all in a
single column:
Name_Telephone_Address. The
following shows an extract of the input file.Therefore, define the input schema as follows:
-
Double-click tPatternExtract to display its
Basic settings
view and define the component properties. -
From the Column to
check list, select the column you want
to check its data against the defined
pattern,
Name_Telephone_Address in this
example. -
In the Pattern type
list, select the pattern you want to extract data
according to, /Regex/internet/Email Address in this
example. -
In the Basic settings
view of the tFilterColumns component, click the
[…] button next
to Edit schema to
open the [Schema]
dialog box. -
Select the column of interest from the Input schema,
and click the right arrow button to copy the column
to the output schema. Then, click OK to close the dialog
box. -
Double-click tFileOutputDelimited to display its
Basic settings
view and define the component properties. -
In the File Name
field, specify the path to the file you want to
write the output data to. -
Define the row and field separators in the
corresponding fields, if any. In this example, we
want to separate customers’ email addresses by
semicolons.
-
Save your Job and press F6
to execute it.Customers’ email addresses are extracted from the
selected column according to the defined Email
pattern and written in the output file using
semicolons as row separators. You can then, for
example, send an email to all your customers in one
go.