tNaiveBayesModel
Generates a classifier model that is used by tPredict to
classify given elements.
tNaiveBayesModel analyzes incoming datasets based on applying Bayes’ law
with the (naive) assumption that the analyzed features of an element are independent of each
other.
It generates a classification model out of this analysis and writes
this model in a given file system in the PMML (Predictive Model Markup
Language) format.
In local mode, Apache Spark 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.3.0 and 2.4.0 are
supported.
tNaiveBayesModel properties for Apache Spark Batch
These properties are used to configure tNaiveBayesModel running in the Spark Batch Job framework.
The Spark Batch
tNaiveBayesModel component belongs to the Machine Learning family.
This component is available in Talend Platform products with Big Data and
in Talend Data Fabric.
Basic settings
Define a storage configuration |
Select the configuration component to be used to provide the configuration If you leave this check box clear, the target file system is the local The configuration component to be used must be present in the same Job. |
Spark version |
Select the Spark version you are using. For Spark V1.4 onwards, the parameters to be set are:
For Spark 1.3, see the parameters explained in the following rows of |
Column type |
Complete this table to define the feature type of each input column in
order to compute the classifier model.
|
Training percentage |
Enter the percentage (expressed in the decimal form) of the input data |
PMML model path |
Enter the directory in which you need to store the generated The button for browsing does not work with the Spark tHDFSConfiguration For further information about the PMML format used by Naive Bayes |
Parquet model name |
Enter the name you need to use for the classifier model. |
Usage
Usage rule |
This component is used as an end component and requires an input link. |
||
Model evaluation |
The parameters you need to set are free parameters and so their values Therefore, you need to train the classifier model you are generating
|
||
Scores |
These scores can be output to the console of the Run view
when you execute the Job when you have added the following code to the Log4j view in the Project Settings dialog box.
These scores are output along with the other Log4j INFO-level information. If you want to If you are using a subscription-based version of the Studio, the activity of this For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.