Scenario 3: Extracting exact match by using Index rules
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
In this scenario, you will standardize some long descriptions of customer products by
matching the input flow with the data contained in an index. This scenario explains how
to use Index rules to tokenize product data and then
check each token against an index to extract exact match.
For this scenario, you must first create an index by using a Job with the tSynonymOutput component. You need to create indexes for the
brand, range, color and unit of the customer products. Use the tSynonymOutput component to generate the indexes and feed them with
entries and synonyms. The below capture shows an example Job:
Below is a sample of the generated indexes for this scenario:
Each of the generated indexes has strings (sequences of words) in one column and their
corresponding synonyms in the second column. These strings are used as a reference data
against which the product data, generated by
tFixedFlowInput, will be matched. For further information about index
creation, see tSynonymOutput.
In this scenario, the generated indexes are defined as context variable. For further
information about context variables, see
Talend Studio User
Guide.