Scenario: Doing continuous matching using tMatchIndexPredict
This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.
After indexing lookup data in Elasticsearch using tMatchIndex, you do
not need to restart the matching process from scratch. The
tMatchIndexPredict component compares new data records with the
lookup stored in ElasticSearch.
In this example, a list of early childhood education centers in Chicago coming from ten
different source has been cleaned, deduplicated and indexed in Elasticsearch. You want
to match new records which contain information about early childhood education centers
in Chicago against the reference data set stored in Elasticsearch.
tMatchIndexPredict uses pairing and matching models to group together
records from the input data and the matching records from the reference data set indexed
in Elasticsearch and label the suspect pairs.
tMatchIndexPredict outputs potential duplicates and unique records in
separate files.
-
You generated a pairing model.
You can find an example of how to generate a pairing
model on Talend Help Center (https://help.talend.com). -
You generated a matching model.
You can find an example of how to generate a
matching model on Talend Help Center (https://help.talend.com). -
Clean and deduplicated data has been indexed in Elasticsearch to match
against new data records and determine whether they are unique records or
suspect duplicates.You can find an example of how to index clean and
deduplicated data in ElasticSearch on Talend Help Center (https://help.talend.com). -
The Elasticsearch search cluster must be running ElasticSearch 5+.