tPatternExtract

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend Platform products.

tPatternExtract properties

Component family	Data Quality
Function	tPatternExtract extracts from a column all data strings that match a given Java regular expression.
Purpose	tPatternExtract allows to output all data that match a given pattern. You can then implement any required operation on the extracted data.
Basic settings	Column to check	Select the column you want to analyze.
	Schema and Edit schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
	Pattern type	Select from the list the pattern you want to check the data against and then extract all data that match the selected pattern.
		Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.
Advanced settings	tStatCatcher Statistics	Select this check box to collect log data at the component level.
Global Variables	NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer. NB_LINE_OK: the number of rows matching a given pattern. This is an After variable and it returns an integer. NB_LINE_REJECT: the number of rows not matching a given pattern. This is an After variable and it returns an integer. ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	This component is an intermediary step. It requires an input flow as well as an output.
Limitation	n/a

Scenario: Extracting only the data that corresponds to a defined pattern from
a delimited file

This scenario describes a four-component Job where the tExtractPattern component is used to extract only
customers’ email addresses (that match the Email
address pattern) from a delimited file that holds
different customer data. Then it writes the extracted data into another
delimited file. A tFilterColumns component
is used to adapt the output schema.

Setting up the Job

Drop the following components from the Palette to the design
workspace: tFileInputDelimited, tPatternExtract, tFilterColumns, and
tFileOutputDelimited.
Connect the tFileInputDelimited component to the
tPatternExtract
component using a Row > Main connection.
Connect the tPatternExtract component to the
tFilterColumns
component using the Row > Matching
Data connection.
Connect the tFilterColumns component to the
tFileOutputDelimited component using a
Row > Main connection.

Configuring the components

Double-click tFileInputDelimited to display its
Basic settings
view and define the component properties, including
the input file name, the number of header rows to
skip, and the schema.

In this scenario, the delimited file holds names,
email addresses and telephone numbers, all in a
single column:
Name_Telephone_Address. The
following shows an extract of the input file.

Therefore, define the input schema as follows:
Double-click tPatternExtract to display its
Basic settings
view and define the component properties.
From the Column to
check list, select the column you want
to check its data against the defined
pattern,
Name_Telephone_Address in this
example.
In the Pattern type
list, select the pattern you want to extract data
according to, /Regex/internet/Email Address in this
example.
In the Basic settings
view of the tFilterColumns component, click the
[…] button next
to Edit schema to
open the [Schema]
dialog box.
Select the column of interest from the Input schema,
and click the right arrow button to copy the column
to the output schema. Then, click OK to close the dialog
box.
Double-click tFileOutputDelimited to display its
Basic settings
view and define the component properties.
In the File Name
field, specify the path to the file you want to
write the output data to.
Define the row and field separators in the
corresponding fields, if any. In this example, we
want to separate customers’ email addresses by
semicolons.

Executing the Job

Save your Job and press F6
to execute it.

Customers’ email addresses are extracted from the
selected column according to the defined Email
pattern and written in the output file using
semicolons as row separators. You can then, for
example, send an email to all your customers in one
go.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tPatternExtract – Docs for ESB 5.x

tPatternExtract

Warning

tPatternExtract properties

Scenario: Extracting only the data that corresponds to a defined pattern from
a delimited file

Setting up the Job

Configuring the components

Executing the Job

My Website Links

Tags

tPatternExtract

Warning

tPatternExtract properties

Scenario: Extracting only the data that corresponds to a defined pattern from a delimited file

Setting up the Job

Configuring the components

Executing the Job

Scenario: Extracting only the data that corresponds to a defined pattern from
a delimited file