tSynonymOutput
Creates a Lucene index and feeds it with entries and the related synonyms it
receives.
tSynonymOutput creates synonym
indexes that some components like tStandardizeRow or
tSynonymSearch can refer to when processing data.
For further information about how to access and manage the words
and the reference entries (documents) of an existing synonym index
using the synonym index editor, see the
Talend Studio User Guide.
For further information about available synonym indexes, see the
appendix about data synonym dictionaries in the
Talend Studio User
Guide.
7.3. If your indexes were created with version 7.2 or lower, you need to update them. The
location of the migration procedure depends on the Studio installation:
- With the installer: /addons/scripts/Lucene_Migration_Tool/README.md
- With no installer: in the license email, click the link in Migration tool for Lucene Indexes from version 4 to version 8
tSynonymOutput Standard properties
These properties are used to configure tSynonymOutput running in the Standard Job framework.
The Standard
tSynonymOutput component belongs to the Data Quality family.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description, it defines the number of fields to be processed and |
 |
Built-in: The schema will be |
 |
Repository: The schema already |
Index path |
Type in or browse to the location where you want to create and |
Operations |
Select the index operation to be performed in directory given in
(Delete and) initialize an index:
Insert new documents: inserts new
Update existing documents and insert if not
Delete existing documents: deletes |
Entry |
Select the column you need to insert to create the entries of the |
Synonyms |
Select the column you need to insert to create the synonyms |
Synonym separator |
Type in the separator to be used to separate the synonyms of each |
Advanced settings
tStatCatcher Statistics |
Select this check box to collect log data at the Job and the |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component needs incoming data from the preceding component |
Connections |
Outgoing links (from this component to another): Row: Main; Reject
Trigger: Run if; On Component Ok; Incoming links (from one component to this one): Row: Main; Reject For further information regarding connections, see |
Creating a synonym index for city names
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
In this scenario, a three-component Job creates an index of the standardized city
names that provides references to the city synonyms used in the client data of an
enterprise.
To create this index, you need a source file to provide the city names and their
corresponding synonyms. In this scenario, this is a .csv file and
reads as follows:
1 2 3 4 5 |
CityName;Synonyms North Reading;Redding|North Reading|N. Reading|N Reading|N Redding|NR Young America;YA|Young America Dedham;Dedham|dedham|deadham New York;NY|New York |
Two columns are found in this file:
-
the left one is the CityName column which holds the
standard city names as reference data. -
the right one is the Synonyms column which holds various
synonyms collected across the client data of this enterprise.
The three components used in this Job are:
-
tFileInputDelimited: this component loads
data from the source file and inputs them to tSynonymOutput. -
tSynonymOutput: this component creates the
index of interest in this scenario and feed it with the synonyms given in the
source file. -
tLogRow: this component lists the data that
have been inserted into the newly created index.

Setting up the Job
To replicate this scenario, proceed as follows:
-
Drop tFileInputDelimited, tSynonymOutput and tLogRow from the Palette
onto the design workspace.You can change the displayed name of each of these component as what has
been done for the tFileInputDelimited
component, which appears as CityNames in this scenario.
For further information, see
Talend Studio User
Guide. -
Right-click the tFileInputDelimited
(CityNames) component to open the contextual
menu. -
From this menu, select Row >
Main. -
Click the tSynonymOutput component to
create an connection between these two components. - Do the same thing to connect tSynonymOutput to tLogRow.
Configuring the components
-
Double click tFileInputDelimited
(CityNames) to open its Basic
settings view. -
In the File name/Stream field, specify
the path to the input file. -
Click the […] button next to Edit schema to open the Schema dialog box, click the [+] button twice to add two columns, and name them
respectively CityName and Synonyms
corresponding to the input file structure.When done, click OK to close the dialog
box and propagate the schema setting to the next component.You can also add this tFileInputDelimited
file using the established metadata stored in the Repository. This allows you to use automatically the
configuration of the corresponding metadata. For further information about
how to create and use this metadata, see
Talend Studio User
Guide. -
Double-click tSynonymOutput to open its
Basic settings view. -
In the Index path field, type in or
browse to the location where you need to create the index. -
In the Operation field, select the
operation you need to perform on this created index as well as the related
synonyms. In this example, select (Delete and) initialize an
index. -
In the Entry field, select the column to
be used to receive and store the standard reference data. In the source file
used in this scenario, the CityName column is holding
the standard city names, so select CityName. -
In the Synonyms field, select the column
to be used to receive and store the synonyms. In this scenario, select
Synonyms. -
In the Basic settings view of the
tLogRow component, select the Table option for better readable display of the
Job execution result.
Executing the Job
their synonyms are inserted into the index. These entries, along with their
status, are displayed on the Console.

Creating a synonym index for people names using tMap
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
In this scenario, a four-component Job creates an index storing people names
and their relative nicknames.
The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4 France;Anne;Ninon;Annie;Ninette;Ann France;Bernadette;Nad;Netty;Dadette France;Albert;Al France;Alexandre;Alex France;Alfred-Hubert;Alu France;Andrew;Andy France;Anthony;Anton;Tony;Tonio France;Artus;Artie France;Benoit;Ben France;Catherine;Cate;Katherine;Kathryn France;Charles;Charlie;Charlot;Chuck France;Christophe;Christian;Chris;Kris;Kristof France;Christian;Chris |
This data describes people’s home country (not to be inserted into the index),
first names (reference entries) and frequently used nicknames (synonyms).
The four components used in this Job are:
-
tFileInputDelimited: this component reads the source
data and inputs them to tSynonymOutput. -
tMap: this component is used to transform the source
data into two separated columns representing the first names and the nicknames, in the
meantime, ignoring the people’s home country information. -
tSynonymOutput: this component creates the index of
interest in this scenario and feeds it with the synonyms given in the source file. -
tLogRow: this component lists the data that have
been inserted into the newly created index.

Setting up the Job
To replicate this scenario, proceed as follows:
-
Drop tFileInputDelimited, tMap, tSynonymOutput and tLogRow
from the Palette onto the design
workspace.You can change the displayed name of each of these component. For further
information, see
Talend Studio User
Guide. -
Right-click the tFileInputDelimited
component to open the contextual menu, and select Row
> Main to connect it with the tMap component. -
Do the same thing to connect tMap to
tSynonymOutput using Row > Main link.A dialog box pops up to prompt you to name this link you are creating. -
Type in synonyms, for example, then click OK to validate this name and thus close this
dialog box. -
Continue to connect tSynonymOutput to
tLogRow using Row
> Main link again.
Configuring the components
Configure the data input
-
Double-click tFileInputDelimited to open
its Component view. -
In the File name/Stream field, specify
the path to the input file. -
Click the […] button next to Edit schema to open the Schema dialog box, click the [+] button to add six columns and name them
Country, FirstName, Nickname1,
Nickname2, Nickname3 and
Nickname4 corresponding to the input file
structure.When done, click OK to close the dialog
box and propagate the schema setting to the next component.You can also add this tFileInputDelimited
file using the established metadata stored in the Repository. This allows you to use automatically the
configuration of the corresponding metadata. For further information about
how to create and use this metadata, see
Talend Studio User
Guide.
Configure data structure transformation
-
Double-click tMap to open the map
editor. -
At the bottom right corner (synonyms) of the
Schema editor view, click the [+] button to add two rows and name them
FirstName and Nicknames. These
two columns appear in the synonyms table on the right
side of the map editor. -
On the input side (left) of the upper part, select the
FirstName column and drop it to the
FirstName column on the output side (right). -
In the Expression field of the Nicknames column on the output side (right),
type inDqStringHandling.safeConcat('|',)
. -
On the input side (left) of the upper part, select sequentially the
columns from Nickname1 to
Nickname4 and drop them to the
Nicknames columns, and edit the expression in the
Expression field so that it reads
DqStringHandling.safeConcat('|', row1.Nickname1, row1.Nickname2,
.
row1.Nickname3, row1.Nickname4) -
Click OK to validate these changes and
accept the propagation prompted by the dialog box that pops up.
Configure index creation and console output
-
Double-click tSynonymOutput to open its
Basic settings view. -
In the Index path field, type in or
browse to the location where you need to create the index. -
In the Operation field, select the
operation you need to perform on this created index as well as the related
synonyms. In this example, select (Delete and )
initialize an index. -
In the Entry field, select the column to
be used to receive and store the reference entries. In this scenario, the
FirstName column is holding the reference entries,
so select FirstName. -
In the Synonyms field, select the column
to be used to receive and store the synonyms. In this scenario, select
Nicknames. -
In the Basic settings view of the
tLogRow component, select the Table option for better readable display of the
Job execution result.
Executing the Job
the Console.
