tSynonymOutput

Creates a Lucene index and feeds it with entries and the related synonyms it
receives.

tSynonymOutput creates synonym
indexes that some components like tStandardizeRow or
tSynonymSearch can refer to when processing data.

For further information about how to access and manage the words
and the reference entries (documents) of an existing synonym index
using the synonym index editor, see the
Talend Studio User Guide.

For further information about available synonym indexes, see the
appendix about data synonym dictionaries in the
Talend Studio User
Guide.

Note: This component is enhanced from the Studio version
7.3. If your indexes were created with version 7.2 or lower, you need to update them. The
location of the migration procedure depends on the Studio installation:

With the installer: /addons/scripts/Lucene_Migration_Tool/README.md
With no installer: in the license email, click the link in Migration tool for Lucene Indexes from version 4 to version 8

tSynonymOutput Standard properties

These properties are used to configure tSynonymOutput running in the Standard Job framework.

The Standard
tSynonymOutput component belongs to the Data Quality family.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.

Basic settings

Schema and Edit schema	A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.
	Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.
	Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: see Talend Studio User Guide.
Index path	Type in or browse to the location where you want to create and store the synonym index. If the specified directory does not exist, the component will create it.
Operations	Select the index operation to be performed in directory given in the Index path field. (Delete and) initialize an index: creates a new index and then fills it with the entries and the corresponding synonyms; if an index already exists, deletes it before creating a new one. Insert new documents: inserts new entries and synonyms into the given existing index. Duplicates are not inserted. Update existing documents and insert if not existing: updates existing entries and synonyms, and adds new ones to the given index. Delete existing documents: deletes the entries with their synonyms if the same entries are identified in the incoming data flow from the preceding component.
Entry	Select the column you need to insert to create the entries of the given index. These entries are used as reference to any associated synonyms to be inserted alongside in this given index.
Synonyms	Select the column you need to insert to create the synonyms corresponding to different index entries.
Synonym separator	Type in the separator to be used to separate the synonyms of each index entry. By default, this separator is `\|`.

Advanced settings

tStatCatcher Statistics	Select this check box to collect log data at the Job and the component levels.

Global Variables

Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule	This component needs incoming data from the preceding component for creating or updating indexes.
Connections	Outgoing links (from this component to another): Row: Main; Reject Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to this one): Row: Main; Reject For further information regarding connections, see Talend Studio User Guide.

Usage rule

This component needs incoming data from the preceding component
for creating or updating indexes.

Connections

Outgoing links (from this component to another):

Row: Main; Reject

Trigger: Run if; On Component Ok;
On Component Error.

Incoming links (from one component to this one):

Row: Main; Reject

For further information regarding connections, see

Talend Studio User
Guide.

Creating a synonym index for city names

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

In this scenario, a three-component Job creates an index of the standardized city
names that provides references to the city synonyms used in the client data of an
enterprise.

To create this index, you need a source file to provide the city names and their
corresponding synonyms. In this scenario, this is a .csv file and
reads as follows:

CityName;Synonyms
North Reading;Redding|North Reading|N. Reading|N Reading|N Redding|NR
Young America;YA|Young America
Dedham;Dedham|dedham|deadham
New York;NY|New York

CityName;Synonyms

Young America;YA|Young America

Dedham;Dedham|dedham|deadham

New York;NY|New York

Two columns are found in this file:

the left one is the CityName column which holds the
standard city names as reference data.
the right one is the Synonyms column which holds various
synonyms collected across the client data of this enterprise.

The three components used in this Job are:

tFileInputDelimited: this component loads
data from the source file and inputs them to tSynonymOutput.
tSynonymOutput: this component creates the
index of interest in this scenario and feed it with the synonyms given in the
source file.
tLogRow: this component lists the data that
have been inserted into the newly created index.

Setting up the Job

To replicate this scenario, proceed as follows:

Drop tFileInputDelimited, tSynonymOutput and tLogRow from the Palette
onto the design workspace.

You can change the displayed name of each of these component as what has
been done for the tFileInputDelimited
component, which appears as CityNames in this scenario.
For further information, see
Talend Studio User
Guide.
Right-click the tFileInputDelimited
(CityNames) component to open the contextual
menu.
From this menu, select Row >
Main.
Click the tSynonymOutput component to
create an connection between these two components.
Do the same thing to connect tSynonymOutput to tLogRow.

Configuring the components

Double click tFileInputDelimited
(CityNames) to open its Basic
settings view.
In the File name/Stream field, specify
the path to the input file.
Click the […] button next to Edit schema to open the Schema dialog box, click the [+] button twice to add two columns, and name them
respectively CityName and Synonyms
corresponding to the input file structure.

When done, click OK to close the dialog
box and propagate the schema setting to the next component.

You can also add this tFileInputDelimited
file using the established metadata stored in the Repository. This allows you to use automatically the
configuration of the corresponding metadata. For further information about
how to create and use this metadata, see
Talend Studio User
Guide.
Double-click tSynonymOutput to open its
Basic settings view.
In the Index path field, type in or
browse to the location where you need to create the index.
In the Operation field, select the
operation you need to perform on this created index as well as the related
synonyms. In this example, select (Delete and) initialize an
index.
In the Entry field, select the column to
be used to receive and store the standard reference data. In the source file
used in this scenario, the CityName column is holding
the standard city names, so select CityName.
In the Synonyms field, select the column
to be used to receive and store the synonyms. In this scenario, select
Synonyms.
In the Basic settings view of the
tLogRow component, select the Table option for better readable display of the
Job execution result.

Executing the Job

Press F6 to run this Job.

An index is created in the specified directory, and the city names and
their synonyms are inserted into the index. These entries, along with their
status, are displayed on the Console.

Creating a synonym index for people names using tMap

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

In this scenario, a four-component Job creates an index storing people names
and their relative nicknames.

The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:

Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4
France;Anne;Ninon;Annie;Ninette;Ann
France;Bernadette;Nad;Netty;Dadette
France;Albert;Al
France;Alexandre;Alex
France;Alfred-Hubert;Alu
France;Andrew;Andy
France;Anthony;Anton;Tony;Tonio
France;Artus;Artie
France;Benoit;Ben
France;Catherine;Cate;Katherine;Kathryn
France;Charles;Charlie;Charlot;Chuck
France;Christophe;Christian;Chris;Kris;Kristof
France;Christian;Chris

Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4

France;Anne;Ninon;Annie;Ninette;Ann

France;Bernadette;Nad;Netty;Dadette

France;Albert;Al

France;Alexandre;Alex

France;Alfred-Hubert;Alu

France;Andrew;Andy

France;Anthony;Anton;Tony;Tonio

France;Artus;Artie

France;Benoit;Ben

France;Catherine;Cate;Katherine;Kathryn

France;Charles;Charlie;Charlot;Chuck

France;Christophe;Christian;Chris;Kris;Kristof

France;Christian;Chris

This data describes people’s home country (not to be inserted into the index),
first names (reference entries) and frequently used nicknames (synonyms).

The four components used in this Job are:

tFileInputDelimited: this component reads the source
data and inputs them to tSynonymOutput.
tMap: this component is used to transform the source
data into two separated columns representing the first names and the nicknames, in the
meantime, ignoring the people’s home country information.
tSynonymOutput: this component creates the index of
interest in this scenario and feeds it with the synonyms given in the source file.
tLogRow: this component lists the data that have
been inserted into the newly created index.