tALSModel
Generates an user-ranking-product associated matrix, based on given user-product
interactive data.
This matrix is used by tRecommend to estimate these users’
preferences.
tALSModel leverages Spark to process a
large amount of information about users’ preferences over given products.
It receives this kind of information from its preceding Spark
component and performs ALS (Alternating Least Squares) computations over
these sets of information in order to generate and write a fine-tuned
product recommender model in a given file system in the Parquet
format.
In local mode, Apache Spark 1.3.0 and later versions are supported.
tALSModel properties for Apache Spark Batch
These properties are used to configure tALSModel running in the Spark Batch Job framework.
The Spark Batch
tALSModel component belongs to the Machine Learning family.
This component is available in Talend Platform products with Big Data and
in Talend Data Fabric.
Basic settings
Define a storage configuration |
Select the configuration component to be used to provide the configuration If you leave this check box clear, the target file system is the local The configuration component to be used must be present in the same Job. |
Feature table |
Complete this table to map the input columns with the
three factors required to compute the recommender model.
This map allows tASLModel to read the right type of data for each |
Training percentage |
Enter the percentage (expressed in the decimal form) of |
Number of latent factors |
Enter the number of the latent factors, with which each |
Number of iterations |
Enter the number of iterations you want the Job to This number should be smaller than 30 in order to avoid However, if you need to perform more than 30 iterations, |
Regularization factor |
Enter the regularization number you want to use to avoid |
Build model for implicit feedback data |
Select this check box to enable tALSModel to handle the implicit data Contrary to the explicit data sets such as the ranking of If you leave this check box clear, tALSModel handles the explicit data sets For related details about how the ALS model handles the |
Confidence coefficient for implicit |
Enter the number to indicate the level of confidence you |
Parquet model path |
Enter the directory in which you need to store the The button for browsing does not work with the Spark tHDFSConfiguration |
Parquet model name |
Enter the name you need to use for the recommender |
Advanced settings
Set Checkpoint Interval |
Set the frequency of checkpoints. It is recommended to leave the default Before setting a value for this parameter, activate checkpointing and set For further information about checkpointing the |
Usage
Usage rule |
This component is used as an end component and requires an input link. Note that the parameters you need to set are free |
||
MLlib installation |
In Apache Spark V1.3 or earlier versions of Spark, the For further information about MLlib and this library, see |
||
RMSE score |
These scores can be output to the console of the Run view
when you execute the Job when you have added the following code to the Log4j view in the Project Settings dialog box.
These scores are output along with the other Log4j INFO-level information. If you want to If you are using a subscription-based version of the Studio, the activity of this For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.