August 15, 2023

Indexing clean and deduplicated data in Elasticsearch – Docs for ESB 6.x

Indexing clean and deduplicated data in Elasticsearch

  • The Elasticsearch cluster and Elasticsearch-head are started before executing
    the Job.

    For more information about Elasticsearch-head, which is a plugin for browsing
    an Elasticsearch cluster, see https://mobz.github.io/elasticsearch-head/.

  1. Double click the tMatchIndex component to open its
    Basic settings view and define its properties.

    use_case_tmatchindex3.png

  2. In the Elasticsearch configuration area, enter the
    location of the cluster hosting the Elasticsearch system to be used in the
    Nodes field, for example:

    "localhost:9200"

  3. Enter the index to be created in Elasticsearch in the
    Index field, for example:

    education-agencies-chicago

  4. If you need to clean the Elasticsearch index specified in the
    Index field, select the Reset
    index
    check box.
  5. Enter the path to the local folder from where you want to retrieve the pairing
    model files in the Pairing model folder.

  6. Press F6 to save and execute the
    Job.

tMatchIndex created the
education-agencies-chicago index in Elasticsearch,
populated it with the clean data and computed the best suffixes based on the
blocking key values.

You can browse the index created by tMatchIndex using the
plugin Elasticsearch-head.

use_case_tmatchindex4.png
use_case_tmatchindex5.png

You can now use the indexed data as a reference data set for the
tMatchIndexPredict component.

You can find an example of how to do continuous matching
using tMatchIndexPredict on Talend Help Center (https://help.talend.com).


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x