tCompareColumns properties
Compares two columns to design useful features for generating a classification
model.
tCompareColumns outputs comparison results to manually added
columns.
This component can run only with Spark 1.6 and 2.0.
tCompareColumns properties for Apache Spark Batch
These properties are used to configure tCompareColumns running
in the Spark Batch Job framework.
The Spark Batch
tCompareColumns component belongs to the Natural Language Processing family.
The component in this framework is available when you have subscribed to any Talend Platform product with Big Data or Talend Data
Fabric.
Basic settings
Schema and Edit Schema |
A schema is a row description. It defines the number of fields (columns) to Click Sync columns to retrieve the schema from Click Edit schema to make changes to the schema.
Add as many columns as necessary to the output schema according the
algorithms defined in the Comparison options table:
|
|
Built-In: You create and store the |
|
Repository: You have already created |
Comparison options |
In this table, set the rules for comparing tokens in two columns. The column specified in Main column contains the In the Algorithms column, select the algorithm to
Output column(s): Specify the columns that contain |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, appears only |
Spark Batch Connection |
You need to use the Spark Configuration tab in
the Run view to define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Standard version of this component yet.