Scenario 2: Matching customer data through multiple passes
This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.
The Job in this scenario, groups similar customer records by running through two
subsequent matching passes (tMatchGroup components) and
outputs the calculated matches in groups. Each pass provides its matches to the pass
that follows in order for the latter to add more matches identified with new rules and
blocking keys.
In this Job:
-
The tMysqlInput component connects to the
customer records to be processed. -
Each of the tGenKey components defines a way
to partition data records. The first key partitions data to many groups and the
second key creates fewer groups that overlaps the previous blocks depending on
the blocking key definition. -
The tMap component renames the key generated
by the second tGenKey component. -
The first tMatchGroup processes the
partitions defined by the first tGenKey, and
the second tMatchGroup processes those defined
by the second tGenKey.Warning:The two tMatchGroup components must
have the same schema. -
The tLogRow component presents the matching
results after the two passes.