August 15, 2023

Scenario 2: Extracting the stems of English words from a specific DB column – Docs for ESB 6.x

Scenario 2: Extracting the stems of English words from a specific DB
column

This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.

This scenario describes a six-component Job that carries out linguistic
normalization on data in the translation column and
extract the base part (word stem) of all English words.

The aim of this Job is to create a kind of dictionary of stems of the English
words listed in the translation column. This dictionary
may be used at a later stage in order to check new words to be put in the
selected table. The extracted English stems are written in an output file
along with the number of their occurrences in the
translation column.

In this scenario, we have already stored the main input schema in the
Repository. For more information about storing schema metadata in the
Repository, see
Talend Studio User
Guide
.

The main input table contains eight columns: id_key,
id_lang, translation,
id_status, id_user_trans,
id_user_validate,
id_editor and date. We
want to extract the stem of the English words in the
translation column.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x