August 15, 2023

Creating a classification model to filter spam – Docs for ESB 6.x

Creating a classification model to filter spam

This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.

In this scenario, you create Spark Batch Jobs. The key components to be used are as follows:

  • tModelEncoder: several tModelEncoder components are used to transform given SMS text messages
    into feature sets.

  • tRandomForestModel: it analyzes the features
    incoming from tModelEncoder to build a
    classification model that understands what a junk message or a normal message could
    look like.

  • tClassify: in a new Job, it applies this
    classification model to process a new set of SMS text messages to classify the spam
    and the normal messages. In this scenario, the result of this classification is used
    to evaluate the accuracy of the model, since the classification of the messages
    processed by tClassify is already known and
    explicitly marked.

  • A configuration component such as tHDFSConfiguration in each Job: this component is used to connect to the
    file system to which the jar files dependent on the Job are transferred during the
    execution of the Job.

    This file-system-related configuration component is required unless you run your
    Spark Jobs in the Local mode.

Document get from Talend
Thank you for watching.
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x