August 15, 2023

Scenario: Doing continuous matching using tMatchIndexPredict – Docs for ESB 6.x

Scenario: Doing continuous matching using tMatchIndexPredict

This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.

After indexing lookup data in Elasticsearch using tMatchIndex, you do
not need to restart the matching process from scratch. The
tMatchIndexPredict component compares new data records with the
lookup stored in ElasticSearch.

In this example, a list of early childhood education centers in Chicago coming from ten
different source has been cleaned, deduplicated and indexed in Elasticsearch. You want
to match new records which contain information about early childhood education centers
in Chicago against the reference data set stored in Elasticsearch.

tMatchIndexPredict uses pairing and matching models to group together
records from the input data and the matching records from the reference data set indexed
in Elasticsearch and label the suspect pairs.

tMatchIndexPredict outputs potential duplicates and unique records in
separate files.

Before you begin:

  • You generated a pairing model.

    You can find an example of how to generate a pairing
    model on Talend Help Center (https://help.talend.com).

  • You generated a matching model.

    You can find an example of how to generate a
    matching model on Talend Help Center (https://help.talend.com).

  • Clean and deduplicated data has been indexed in Elasticsearch to match
    against new data records and determine whether they are unique records or
    suspect duplicates.

    You can find an example of how to index clean and
    deduplicated data in ElasticSearch on Talend Help Center (https://help.talend.com).

  • The Elasticsearch search cluster must be running ElasticSearch 5+.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x