tCompareColumns
Compares two columns to design useful features for generating a classification
model.
tCompareColumns outputs comparison results to manually added
columns.
In local mode, Apache Spark 2.0.0 and 2.4.0 are supported.
tCompareColumns properties for Apache Spark Batch
These properties are used to configure tCompareColumns running
in the Spark Batch Job framework.
The Spark Batch
tCompareColumns component belongs to the Natural Language Processing family.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
Basic settings
Schema and Edit Schema |
A schema is a row description. It defines the number of fields Click Sync Click Edit
Add as many columns as necessary to the output schema according the
algorithms defined in the Comparison options table:
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Comparison options |
In this table, set the rules for comparing tokens in two columns. The column specified in Main column contains the In the Algorithms column, select the algorithm to
Output column(s): Specify the columns that contain |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, |
Spark Batch Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Standard version of this component yet.