Creating a classification model to filter spam
This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.
-
tModelEncoder: several tModelEncoder components are used to transform given SMS text messages
into feature sets. -
tRandomForestModel: it analyzes the features
incoming from tModelEncoder to build a
classification model that understands what a junk message or a normal message could
look like. -
tClassify: in a new Job, it applies this
classification model to process a new set of SMS text messages to classify the spam
and the normal messages. In this scenario, the result of this classification is used
to evaluate the accuracy of the model, since the classification of the messages
processed by tClassify is already known and
explicitly marked. -
A configuration component such as tHDFSConfiguration in each Job: this component is used to connect to the
file system to which the jar files dependent on the Job are transferred during the
execution of the Job.This file-system-related configuration component is required unless you run your
Spark Jobs in the Local mode.