tLinearRegressionModel
Builds a linear regression model using a training dataset.
This component analyzes feature vectors usually prepared and provided by tModelEncoder to generate a linear regression model that
expresses how an outcome is dependent on a given set of features. Then tPredict uses this model to predict the outcome ofthe same
type of features it receives.
This model can be used directly in the same Job or written to a file
system for later use.
tLinearRegressionModel properties for Apache Spark Batch
These properties are used to configure tLinearRegressionModel running in the Spark Batch Job framework.
The Spark Batch
tLinearRegressionModel component belongs to the Machine Learning family.
This component is available in Talend Platform products with Big Data and
in Talend Data Fabric.
Basic settings
|
Schema and Edit schema |
A schema is a row description. It defines the number of fields Click Edit
|
|
Label column |
Select the input column used to provide Double-type |
|
Feature column |
Select the input column used to provide Vector-type |
|
Save the model on file |
Select this check box to store the model in a given file system. Otherwise, the model is |
|
ElasticNet mixing parameter |
Enter the ElasticNet coefficient (numerical value) used for the regularization calculation The value to be put varies between 0.0 and 1.0, indicating the weights of the L1 For further information about how ElasticNet is implemented in Spark, see ML linear methods, in which the related For further information about ElasticNet, see Regularization and variable selection via the |
|
Fit an intercept term |
Select this check box to allow the tLinearRegressionModel to automatically calculate the In general, intercept should be present to guarantee that the |
|
Standardize features before fitting |
Select this check box to scale the features to make them normally |
|
Maximum number of iterations |
Enter the number of iterations you want the Job to perform to train the model. |
|
Regularization |
Enter the regularization coefficient (numerical value) to be used along with ElasticNet For further information about how this parameter is implemented in Spark, see ML linear methods, in which the related |
|
Convergence tolerance |
Enter the convergence score which the iterations are expected to In general, smaller value will result in higher accuracy in the But note that in some cases, your model may not be able to reach the |
|
Solver algorithm |
Select the algorithm used for optimization.
|
Usage
|
Usage rule |
This component is used as an end component and requires an input link. You can accelerate the training process by adjusting the stopping conditions such as the |
|
Model evaluation |
The parameters you need to set are free parameters and so their values may be provided by Therefore, you need to train the relationship model you are generating with different sets For general information about validating a regression-based relationship model, see https://en.wikipedia.org/wiki/Regression_validation. |
|
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.