tLogisticRegressionModel
Analyzes feature vectors usually pre-processed by tModelEncoder to generate a classifier model that is used by tPredict to classify given elements.
tLogisticRegressionModel analyzes
incoming datasets based on applying the Logistic Regression algorithm.
It generates a classification model out of this analysis and writes
this model either in memory or in a given file system.
tLogisticRegressionModel properties for Apache Spark Batch
These properties are used to configure tLogisticRegressionModel running in the Spark Batch Job framework.
The Spark Batch
tLogisticRegressionModel component belongs to the Machine Learning family.
This component is available in Talend Platform products with Big Data and
in Talend Data Fabric.
Basic settings
Schema and Edit schema |
A schema is a row description. It defines the number of fields Click Edit
|
Label column |
Select the input column used to provide classification labels. The |
Feature column |
Select the input column used to provide features. Very often, this |
Save the model on file |
Select this check box to store the model in a given file system. Otherwise, the model is |
ElasticNet mixing parameter |
Enter the ElasticNet coefficient (numerical value) used for the The value to be put varies between 0.0 and 1.0, indicating the weights For further information about how ElasticNet is implemented in Spark, For further information about ElasticNet, see Regularization and variable selection via the |
Fit an intercept term |
Select this check box to allow the tLogisticRegressionModel to automatically calculate the In general, intercept should present to guarantee that the residuals |
Maximum number of iterations |
Enter the number of iterations you want the Job to perform to train |
Regularization |
Enter the regularization coefficient (numerical value) to be used For further information about how this parameter is implemented in |
Threshold |
Enter the threshold (numerical value and ranging between 0.0 and 1.0) The default threshold is 0.5. |
Convergence tolerance |
Enter the convergence score which the iterations are expected to In general, smaller value will result in higher accuracy in the But note that in some cases, your model may not be able to reach the |
Usage
Usage rule |
This component is used as an end component and requires an input link. You can accelerate the training process by adjusting the stopping conditions such as the |
Model evaluation |
The parameters you need to set are free parameters and so their values may be provided by Therefore, you need to train the classifier model you are generating with different sets You need to select the scores to be used depending on the algorithm you want to use to For examples about how the confusion matrix is used in a For a general explanation about confusion matrix, see https://en.wikipedia.org/wiki/Confusion_matrix from Wikipedia. |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenario
tLogisticRegressionModel is used the same way as
tRandomForestModel. For a scenario in which tRandomForestModel is used, see Creating a classification model to filter spam.