August 15, 2023

Transforming messages to words – Docs for ESB 6.x

Transforming messages to words

  1. Double-click the tModelEncoder component labelled Tokenize to
    open its Component view. This component
    tokenize the SMS messages into words.

    use_case-trandomforestmodel4.png

  2. Click the Sync columns button to retrieve the schema from the
    preceding one.
  3. Click the […] button next to Edit
    schema
    to open the schema editor.
  4. On the output side, click the [+] button to add one row and in the Column column, rename it to
    sms_tokenizer_words. This column is used to carry the
    tokenized messages.

    use_case-trandomforestmodel5.png

  5. In the Type column,
    select Object for this
    sms_tokenizer_words row.
  6. Click OK to validate these changes.
  7. In the Transformations
    table, add one row by clicking the [+]
    button and then proceed as follows:

    1. In the Input column column, select the column
      that provides data to be transformed to features. In this scenario, it
      is sms_contents.
    2. In the Output column column, select the column
      that carry the features. In this scenario, it is
      sms_tokenizer_words.
    3. In the Transformation column, select the
      algorithm to be used for the transformation. In this scenario, it is
      Regex tokenizer.
    4. In the Parameters column, enter the parameters
      you want to customize for use in the algorithm you have selected. In
      this scenario, enter
      pattern=\W;minTokenLength=3.

Using this transformation, tModelEncoder
splits each input message by whitespace, selects only the words contains at least 3
letters and put the result of the transformation in the sms_tokenizer_words column. Thus currency symbols, numeric values,
punctuations and words such as a, an
or to are excluded from this column.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x