tSVMModel
Generates an SVM-based classifier model that can be used by tPredict to classify given elements.
tSVMModel applies
the SVM algorithm to analyze feature vectors typically prepared and provided by
tModelEncoder.
It generates a binary classification model out of this analysis and
writes this model in memory or in a file system supported by this
component, such as HDFS or S3.
tSVMModel properties for Apache Spark Batch
These properties are used to configure tSVMModel running in the Spark Batch Job framework.
The Spark Batch
tSVMModel component belongs to the Machine Learning family.
This component is available in Talend Platform products with Big Data and
in Talend Data Fabric.
Basic settings
Schema and Edit schema |
A schema is a row description. It defines the number of fields Click Edit
|
Label |
Select the input column used to provide classification labels. The records of this column Since a SVM model is a binary classification model, only two classes are expected, that is to |
Vector to process |
Select the input column used to provide features. Very often, this column is the output of |
Save the model on file |
Select this check box to store the model in a given file system. Otherwise, the model is |
Step size |
Enter the size (numerical value) of the initial step of the gradient Selecting the best step size is often delicate in practice. Generally speaking, when the feature points to be analyzed are very On the other hand, the smaller the step size is, the more slowly the |
Number of iterations |
Enter the number of iterations you want the Job to perform to train the model. |
Fraction of data to be used per |
Enter the fraction (expressed in decimal) of the input data to be used The default value 1.0 means that the whole data set is taken. |
Regularization parameter |
Enter the regularization number to used by the Updater |
Updater function |
Select the function to calculate the form of the hyperplane that This function updates the weights of every point in each iteration so For example, in a 2-dimension space, this hyperplane can be a line or The available functions are:
|
Gradient function |
Select the loss function to calculate the margin between the For further information about the loss functions available on this |
Advanced settings
Use feature scaling |
If your training data cannot converge, select this check box to make Reducing the condition numbers can often improve the convergence |
Intercept |
Select this check box to allow the tSVMModel to automatically calculate the intercept constants In general, intercept can guarantee that the residuals of your model |
Validate data before |
Select this check box to check whether the vectors of the training |
Usage
Usage rule |
This component is used as an end component and requires an input link. You can accelerate the training process by adjusting the stopping conditions such as the |
Model evaluation |
The parameters you need to set are free parameters and so their values may be provided by Therefore, you need to train the classifier model you are generating with different sets You need to select the scores to be used depending on the algorithm you want to use to For examples about how the confusion matrix is used in a For a general explanation about confusion matrix, see https://en.wikipedia.org/wiki/Confusion_matrix from Wikipedia. |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.