July 30, 2023

tSynonymOutput – Docs for ESB 7.x

tSynonymOutput

Creates a Lucene index and feeds it with entries and the related synonyms it
receives.

tSynonymOutput creates synonym
indexes that some components like tStandardizeRow or
tSynonymSearch can refer to when processing data.

For further information about how to access and manage the words
and the reference entries (documents) of an existing synonym index
using the synonym index editor, see the
Talend Studio User Guide
.

For further information about available synonym indexes, see the
appendix about data synonym dictionaries in the
Talend Studio User
Guide
.

Note: This component is enhanced from the Studio version
7.3. If your indexes were created with version 7.2 or lower, you need to update them. The
location of the migration procedure depends on the Studio installation:

  • With the installer: /addons/scripts/Lucene_Migration_Tool/README.md
  • With no installer: in the license email, click the link in Migration tool for Lucene Indexes from version 4 to version 8

tSynonymOutput Standard properties

These properties are used to configure tSynonymOutput running in the Standard Job framework.

The Standard
tSynonymOutput component belongs to the Data Quality family.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.

Basic settings

Schema and Edit
schema

A schema is a row description, it defines the number of fields to be processed and
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.

 

Built-in: The schema will be
created and stored locally for this component only. Related topic:
see
Talend Studio User Guide
.

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused in
various projects and job designs. Related topic: see

Talend Studio User
Guide
.

Index path

Type in or browse to the location where you want to create and
store the synonym index. If the specified directory does not exist,
the component will create it.

Operations

Select the index operation to be performed in directory given in
the Index path field.

(Delete and) initialize an index:
creates a new index and then fills it with the entries and the
corresponding synonyms; if an index already exists, deletes it
before creating a new one.

Insert new documents: inserts new
entries and synonyms into the given existing index. Duplicates are
not inserted.

Update existing documents and insert if not
existing
: updates existing entries and synonyms, and
adds new ones to the given index.

Delete existing documents: deletes
the entries with their synonyms if the same entries are identified
in the incoming data flow from the preceding component.

Entry

Select the column you need to insert to create the entries of the
given index. These entries are used as reference to any associated
synonyms to be inserted alongside in this given index.

Synonyms

Select the column you need to insert to create the synonyms
corresponding to different index entries.

Synonym separator

Type in the separator to be used to separate the synonyms of each
index entry. By default, this separator is |.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the Job and the
component levels.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component needs incoming data from the preceding component
for creating or updating indexes.

Connections

Outgoing links (from this component to another):

Row: Main; Reject

Trigger: Run if; On Component Ok;
On Component Error.

Incoming links (from one component to this one):

Row: Main; Reject

For further information regarding connections, see

Talend Studio User
Guide
.

Creating a synonym index for city names

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

In this scenario, a three-component Job creates an index of the standardized city
names that provides references to the city synonyms used in the client data of an
enterprise.

To create this index, you need a source file to provide the city names and their
corresponding synonyms. In this scenario, this is a .csv file and
reads as follows:

Two columns are found in this file:

  • the left one is the CityName column which holds the
    standard city names as reference data.

  • the right one is the Synonyms column which holds various
    synonyms collected across the client data of this enterprise.

The three components used in this Job are:

  • tFileInputDelimited: this component loads
    data from the source file and inputs them to tSynonymOutput.

  • tSynonymOutput: this component creates the
    index of interest in this scenario and feed it with the synonyms given in the
    source file.

  • tLogRow: this component lists the data that
    have been inserted into the newly created index.

tSynonymOutput_1.png

Setting up the Job

To replicate this scenario, proceed as follows:

  1. Drop tFileInputDelimited, tSynonymOutput and tLogRow from the Palette
    onto the design workspace.

    You can change the displayed name of each of these component as what has
    been done for the tFileInputDelimited
    component, which appears as CityNames in this scenario.
    For further information, see
    Talend Studio User
    Guide
    .
  2. Right-click the tFileInputDelimited
    (CityNames) component to open the contextual
    menu.
  3. From this menu, select Row >
    Main
    .
  4. Click the tSynonymOutput component to
    create an connection between these two components.
  5. Do the same thing to connect tSynonymOutput to tLogRow.

Configuring the components

  1. Double click tFileInputDelimited
    (CityNames) to open its Basic
    settings
    view.

    tSynonymOutput_2.png

  2. In the File name/Stream field, specify
    the path to the input file.
  3. Click the […] button next to Edit schema to open the Schema dialog box, click the [+] button twice to add two columns, and name them
    respectively CityName and Synonyms
    corresponding to the input file structure.

    When done, click OK to close the dialog
    box and propagate the schema setting to the next component.
    tSynonymOutput_3.png

    You can also add this tFileInputDelimited
    file using the established metadata stored in the Repository. This allows you to use automatically the
    configuration of the corresponding metadata. For further information about
    how to create and use this metadata, see
    Talend Studio User
    Guide
    .
  4. Double-click tSynonymOutput to open its
    Basic settings view.

    tSynonymOutput_4.png

  5. In the Index path field, type in or
    browse to the location where you need to create the index.
  6. In the Operation field, select the
    operation you need to perform on this created index as well as the related
    synonyms. In this example, select (Delete and) initialize an
    index
    .
  7. In the Entry field, select the column to
    be used to receive and store the standard reference data. In the source file
    used in this scenario, the CityName column is holding
    the standard city names, so select CityName.
  8. In the Synonyms field, select the column
    to be used to receive and store the synonyms. In this scenario, select
    Synonyms.
  9. In the Basic settings view of the
    tLogRow component, select the Table option for better readable display of the
    Job execution result.

Executing the Job

Press F6 to run this Job.

An index is created in the specified directory, and the city names and
their synonyms are inserted into the index. These entries, along with their
status, are displayed on the Console.
tSynonymOutput_5.png

Creating a synonym index for people names using tMap

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

In this scenario, a four-component Job creates an index storing people names
and their relative nicknames.

The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:

This data describes people’s home country (not to be inserted into the index),
first names (reference entries) and frequently used nicknames (synonyms).

The four components used in this Job are:

  • tFileInputDelimited: this component reads the source
    data and inputs them to tSynonymOutput.

  • tMap: this component is used to transform the source
    data into two separated columns representing the first names and the nicknames, in the
    meantime, ignoring the people’s home country information.

  • tSynonymOutput: this component creates the index of
    interest in this scenario and feeds it with the synonyms given in the source file.

  • tLogRow: this component lists the data that have
    been inserted into the newly created index.

tSynonymOutput_6.png

Setting up the Job

To replicate this scenario, proceed as follows:

  1. Drop tFileInputDelimited, tMap, tSynonymOutput and tLogRow
    from the Palette onto the design
    workspace.

    You can change the displayed name of each of these component. For further
    information, see
    Talend Studio User
    Guide
    .
  2. Right-click the tFileInputDelimited
    component to open the contextual menu, and select Row
    > Main
    to connect it with the tMap component.
  3. Do the same thing to connect tMap to
    tSynonymOutput using Row > Main link.

    A dialog box pops up to prompt you to name this link you are creating.
    tSynonymOutput_7.png

  4. Type in synonyms, for example, then click OK to validate this name and thus close this
    dialog box.
  5. Continue to connect tSynonymOutput to
    tLogRow using Row
    > Main
    link again.

Configuring the components

Configure the data input

  1. Double-click tFileInputDelimited to open
    its Component view.

    tSynonymOutput_8.png

  2. In the File name/Stream field, specify
    the path to the input file.
  3. Click the […] button next to Edit schema to open the Schema dialog box, click the [+] button to add six columns and name them
    Country, FirstName, Nickname1,
    Nickname2, Nickname3 and
    Nickname4 corresponding to the input file
    structure.

    When done, click OK to close the dialog
    box and propagate the schema setting to the next component.
    tSynonymOutput_9.png

    You can also add this tFileInputDelimited
    file using the established metadata stored in the Repository. This allows you to use automatically the
    configuration of the corresponding metadata. For further information about
    how to create and use this metadata, see
    Talend Studio User
    Guide
    .

Configure data structure transformation

  1. Double-click tMap to open the map
    editor.

    tSynonymOutput_10.png

  2. At the bottom right corner (synonyms) of the
    Schema editor view, click the [+] button to add two rows and name them
    FirstName and Nicknames. These
    two columns appear in the synonyms table on the right
    side of the map editor.
  3. On the input side (left) of the upper part, select the
    FirstName column and drop it to the
    FirstName column on the output side (right).
  4. In the Expression field of the Nicknames column on the output side (right),
    type in DqStringHandling.safeConcat('|',).
  5. On the input side (left) of the upper part, select sequentially the
    columns from Nickname1 to
    Nickname4 and drop them to the
    Nicknames columns, and edit the expression in the
    Expression field so that it reads
    DqStringHandling.safeConcat('|', row1.Nickname1, row1.Nickname2,
    row1.Nickname3, row1.Nickname4)
    .
  6. Click OK to validate these changes and
    accept the propagation prompted by the dialog box that pops up.

Configure index creation and console output

  1. Double-click tSynonymOutput to open its
    Basic settings view.

    tSynonymOutput_11.png

  2. In the Index path field, type in or
    browse to the location where you need to create the index.
  3. In the Operation field, select the
    operation you need to perform on this created index as well as the related
    synonyms. In this example, select (Delete and )
    initialize an index
    .
  4. In the Entry field, select the column to
    be used to receive and store the reference entries. In this scenario, the
    FirstName column is holding the reference entries,
    so select FirstName.
  5. In the Synonyms field, select the column
    to be used to receive and store the synonyms. In this scenario, select
    Nicknames.
  6. In the Basic settings view of the
    tLogRow component, select the Table option for better readable display of the
    Job execution result.

Executing the Job

Press F6 to run this Job.

The index is created and you can view its contents and the entry status on
the Console.
tSynonymOutput_12.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x