August 15, 2023

Further restricting the use of sensitive data – Docs for ESB 6.x

Further restricting the use of sensitive data

When shuffling data, it is still advised to mask sensitive data. Remember also to
consider relationships between the columns when shuffling data and make sure the
original data set cannot be reconstructed.

In this scenario, last names and first names are grouped together but the email
adresses are not in the same group. Consequently, the email column
does not relate to the lname and fname
columns. Since the email column usually contains information about
first names and last names, it may help attackers to reconstruct the original
data.

Additionally, the address1, city and
email columns are not in any group, so they were not shuffled. This
means it is possible to infer, for example, that Robert Damstra lives at 1619 Stillman
Court, Lynnwood.

Using this scenario, you can restrict the use of actual sensitive data even more:

  • To avoid the use of real credit card numbers, you can mask credit card numbers
    using the tDataMasking component.

  • To avoid the identification of customers with their email addresses, you can
    mask email addresses using the tDataMasking
    component.

  • To make it more difficult to read real addresses, you can add the
    address1 and city columns in other
    groups.

Note:

As tDataShuffling is supported on the Spark
framework, you can convert this standard Job to a Spark Batch Job by editing the Job
properties. This way you do not need to redefine the settings of the components in
the Job.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x