Run the analysis with different probability distributions
-
Switch back to the
Integration
perspective,
select Poisson distribution in the basic
settings of tDuplicateRow and run the
Job. -
In the
Profiling
perspective, click Chart below the Matching Key
table to show the duplicates generated according to the Poisson distribution. -
Run the Job with the Geometric distribution,
then click the Chart in the Profiling to show the duplicates generated according
to the Geometric distribution.The table below shows how results of the generated duplicates differ according
to the probability distribution you select in the tDuplicateRow component.Probability distribution
Duplicate results
Description
Bernoulli distribution

The curve is symmetrical. The groups of duplicates are
distributed evenly on each side of an average value, 4 in
this example. This average value is the average number of
duplicates in a group of duplicates and this value is the
number you set in the Average group
size field in the basic settings of the
tDuplicateRow
component.Poisson distribution

The curve is not symmetrical. The groups of duplicates are
distributed unevenly.Geometric distribution

The form of the curve is decided by the percentage you set
for the duplicated records in the tDuplicateRow basic settings. The higher the
percentage is, the fewer groups with many records you will
have.In this example the percentage for the duplicate records
is set to 80%. This is why many groups
with two-record duplicates are generated
(148 groups), while there is only
one group that has 14, 15 and 16
duplicates.