tKMeansStrModel
Analyzes incoming datasets in near real-time, based on applying the K-Means
algorithm.
This component analyzes streaming feature vectors to continuously adapt an existing
clustering model to changing circumstances. The incoming data is usually pre-processed
by tModelEncoder and the K-Means model is used by
tPredict to cluster given elements.
It continuously updates a K-Means clustering model out of this
analysis and writes this model either in memory or in a given file
system.
tKMeansStrModel properties for Apache Spark Streaming
These properties are used to configure tKMeansStrModel running in the Spark Streaming Job framework.
The Spark Streaming
tKMeansStrModel component belongs to the Machine Learning family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Save on disk |
Select this check box to store the clustering model in an HDFS In this case, you need to enter the time interval (in minutes) at the If you clear this check box, your model will be stored in |
Path |
Select this check box to store the model in a given file system. Otherwise, the model is In the Path field, enter the HDFS This field is available when you select the check boxes used to save a model to or read a model from a file system. |
Load a precomputed model from |
Select this check box to use an existing K-Means model stored in the
If you clear this Load a precomputed model from |
Vector to process |
Select the input column used to provide feature vectors. Very often, This list appears when you have cleared either the Load a precomputed model from disk check box |
Size of your feature vector |
Enter the size of the feature vectors to be processed from the column |
Display the vector size |
Select this check box to display the feature vectors to be used in the This feature will slow down your Job but is useful when you do not |
Number of clusters (K) |
Enter the number of clusters into which you want tKMeansModel to cluster data. In general, a large number of clusters can decreases errors in This field appears when you have cleared the Load a precomputed model from disk check box to create a |
Decay factor |
Enter the decay rate (ranging between 0 and 1) to be applied to Lower decay rate means more importance to be attached to the new |
Time unit |
Select the unit on which the decay rate is applied: point or batch of |
Advanced settings
Display the centers after the |
Select this check box to output the vectors of the cluster centers This feature is often useful when you need to understand how the |
Usage
Usage rule |
This component is used as an end component and requires an input link. |
Model evaluation |
The parameters you need to set are free parameters and so their values may be provided by Therefore, you need to train the relationship model you are generating with different sets |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.