tFilterRow
columns.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tFilterRow Standard properties.
The component in this framework is available in all Talend
products. -
MapReduce: see tFilterRow MapReduce properties (deprecated).
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Batch:
see tFilterRow properties for Apache Spark Batch.The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming:
see tFilterRow properties for Apache Spark Streaming.This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
-
Storm: see tFilterRow Storm properties (deprecated).
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tFilterRow Standard properties
These properties are used to configure tFilterRow running in the Standard Job framework.
The Standard
tFilterRow component belongs to the Processing family.
The component in this framework is available in all Talend
products.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields The schema of this component is built-in only. This This |
Logical operator used to combine conditions |
Select a logical operator to combine simple conditions and to
And: returns the boolean value of
Or: returns the boolean value of |
Conditions |
Click the plus button to add as many simple conditions as needed.
Input column: Select the column of
Function: Select the function on
Operator: Select the operator to
Value: Type in the filtered value, |
Use advanced mode |
Select this check box when the operations you want to perform If multiple advanced conditions are defined, use a logical
|
Advanced settings
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the
NB_LINE: the number of rows read by an input component or
NB_LINE_OK: the number of rows matching the filter. This
NB_LINE_REJECTED: the number of rows rejected. This is an A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is not startable (green background) and it requires |
Filtering a list of names using simple conditions
The following scenario shows a Job that uses simple conditions to filter a list of
records. This scenario will output two tables: the first will list all male persons with
a last name shorter than nine characters and aged between 10 and 80 years; the second
will list all rejected records. An error message for each rejected record will display
in the same table to explain why such a record has been rejected.
Dropping and linking components
-
Drop tFixedFlowInput, tFilterRow and tLogRow from the Palette
onto the design workspace. -
Connect the tFixedFlowInput to the
tFilterRow, using a Row > Main
link. Then, connect the tFilterRow to the
tLogRow, using a Row > Filter
link. -
Drop tLogRow from the Palette onto the design workspace and rename it
as reject. Then, connect the tFilterRow to the reject, using a
Row > Reject link. - Label the components to better identify their roles in the Job.
Configuring the components
-
Double-click tFixedFlowInput to display
its Basic settings view and define its
properties. -
Click the […] button next to Edit schema to define the schema for the input
data. In this example, the schema is made of the following four columns:
LastName (type String), Gender
(type String), Age (type Integer) and
City (type String).When done, click OK to validate the
schema setting and close the dialog box. A new dialog box opens and asks you
if you want to propagate the schema. Click Yes. -
Set the row and field separators in the corresponding fields if needed. In
this example, use the default settings for both, namely the row separator is
a carriage return and the field separator is a semi-colon. -
Select the Use Inline Content(delimited
file) option in the Mode
area and type in the input data in the Content field.The input data used in this example is shown
below:123456789101112131415Van Buren;M;73;ChicagoAdams;M;40;AlbanyJefferson;F;66;New YorkAdams;M;9;AlbanyJefferson;M;30;ChicagoCarter;F;26;ChicagoHarrison;M;40;New YorkRoosevelt;F;15;ChicagoMonroe;M;8;BostonArthur;M;20;AlbanyPierce;M;18;New YorkQuincy;F;83;AlbanyMcKinley;M;70;BostonCoolidge;M;4;ChicagoMonroe;M;60;Chicago -
Double-click tFilterRow to display its
Basic settings view and define its
properties. -
In the Conditions table, add four
conditions and fill in the filtering parameters.-
From the InputColumn list field
of the first row, select LastName, from the
Function list field, select
Length, from the Operator list field, select Lower than, and in the Value column, type in
9 to limit the length of last names to nine
characters. -
From the InputColumn list field
of the second row, select Gender, from the
Operator list field, select
Equals, and in the Value column, type in
M in double quotes to filter records of
male persons.Warning:In the Value field, you must
type in your values between double quotes for all types of
values, except for integer values, which do not need
quotes. -
From the InputColumn list field
of the third row, select Age, from the
Operator list field, select
Greater than, and in the
Value column, type in
10 to set the lower limit to 10
years. -
From the InputColumn list field
of the four row, select Age, from the Operator list field, select Lower than, and in the Value column, type in
80 to set the upper limit to 80
years.
-
-
To combine the conditions, select And as
that only those records that meet all the defined conditions are
accepted. - In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the Mode area.
Executing the Job
it.
between 10 and 80 years, whose last names are made up of less than nine
characters, and the second table lists all the records that do not match the
filter conditions. Each rejected record has a corresponding error message
that explains the reason of rejection.
Filtering a list of names through different logical operations
only those records of people from New York and Chicago are accepted. Without changing
the filter settings defined in the previous scenario, advanced conditions are added in
this scenario to enable both logical AND and logical OR operations in the same tFilterRow component.
Procedure
-
Double-click the tFilterRow component to show
its Basic settings view. -
Select the Use advanced mode check box, and
type in the following expression in the text field:1input_row.City.equals("Chicago") || input_row.City.equals("New York")This defines two conditions on the City
column of the input data to filter records that contain the cities of Chicago
and New York, and uses a logical OR to combine the two conditions so that
records satisfying either condition will be accepted. -
Press Ctrl+S to save the Job and press
F6 to execute it.As shown above, the result list of the previous scenario has been further
filtered, and only the records containing the cities of New York and Chicago are
accepted.
tFilterRow MapReduce properties (deprecated)
These properties are used to configure tFilterRow running in the MapReduce Job framework.
The MapReduce
tFilterRow component belongs to the Processing family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields |
Logical operator used to combine conditions |
Select a logical operator to combine simple conditions and to
And: returns the boolean value of
Or: returns the boolean value of |
Conditions |
Click the plus button to add as many simple conditions as needed.
Input column: Select the column of
Function: Select the function on
Operator: Select the operator to
Value: Type in the filtered value, |
Use advanced mode |
Select this check box when the operations you want to perform If multiple advanced conditions are defined, use a logical
|
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
In a For further information about a Note that in this documentation, unless otherwise |
Related scenarios
No scenario is available for the Map/Reduce version of this component yet.
tFilterRow properties for Apache Spark Batch
These properties are used to configure tFilterRow running in the Spark Batch Job framework.
The Spark Batch
tFilterRow component belongs to the Processing family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields |
Logical operator used to combine conditions |
Select a logical operator to combine simple conditions and to
And: returns the boolean value of
Or: returns the boolean value of |
Conditions |
Click the plus button to add as many simple conditions as needed.
Input column: Select the column of
Function: Select the function on
Operator: Select the operator to
Value: Type in the filtered value, |
Use advanced mode |
Select this check box when the operations you want to perform If multiple advanced conditions are defined, use a logical
|
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.
tFilterRow properties for Apache Spark Streaming
These properties are used to configure tFilterRow running in the Spark Streaming Job framework.
The Spark Streaming
tFilterRow component belongs to the Processing family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields |
Logical operator used to combine conditions |
Select a logical operator to combine simple conditions and to
And: returns the boolean value of
Or: returns the boolean value of |
Conditions |
Click the plus button to add as many simple conditions as needed.
Input column: Select the column of
Function: Select the function on
Operator: Select the operator to
Value: Type in the filtered value, |
Use advanced mode |
Select this check box when the operations you want to perform If multiple advanced conditions are defined, use a logical
|
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.
tFilterRow Storm properties (deprecated)
These properties are used to configure tFilterRow running in the Storm Job framework.
The Storm
tFilterRow component belongs to the Processing family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
The Storm framework is deprecated from Talend 7.1 onwards. Use Talend Jobs for Apache Spark Streaming to accomplish your Streaming related tasks.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields |
Logical operator used to combine conditions |
Select a logical operator to combine simple conditions and to
And: returns the boolean value of
Or: returns the boolean value of |
Conditions |
Click the plus button to add as many simple conditions as needed.
Input column: Select the column of
Function: Select the function on
Operator: Select the operator to
Value: Type in the filtered value, |
Use advanced mode |
Select this check box when the operations you want to perform If multiple advanced conditions are defined, use a logical
|
Usage
Usage rule |
If you have subscribed to one of the The Storm version does not support the use of the global variables. You need to use the Storm Configuration tab in the This connection is effective on a per-Job basis. For further information about a Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Related scenarios
No scenario is available for the Storm version of this component
yet.