tDataMasking
Hides original data with random characters or figures to protect the actual data
while having a functional substitute for occasions when it is not advisable to show
sensitive real data.
tDataMasking reads a data set row by row
and creates a structurally similar but inauthentic version of the data after having applied
specific functions on data fields. It generates one row for each input row.
You will be able to use the functional substitute for purposes such as
testing and training. When manipulating Personally Identifiable Information (PII) or
Sensitive Personal Data (SPD), you might want to protect and mask this data.
The definition of sensitive data is broad and may differ from one country
to the other or from one organization to the other. Basically, sensitive data can be
personal information or business information which includes anything that poses a risk to
the person or company in question.
Globally, Credit/Debit card data for example is considered
to be sensitive. Sensitive data is any piece of information that can be used to identify or
locate a person. A non-exhaustive list of personal sensitive data may include: first and
last names, email addresses, addresses, Social Social Number (SSN), credit card numbers,
bank account numbers, race, gender, date of birth, salary and geolocation combined with
time.
For further information about personal sensitive data, see
Personally Identifiable Information.
Also, business sensitive data may include trade secrets,
acquisition plans, financial data and customer information, among other possibilities.
In local mode, Apache Spark 1.6.0 and later versions are supported.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tDataMasking Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
-
Spark Batch: see tDataMasking properties for Apache Spark Batch.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
-
Spark Streaming: see tDataMasking properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Data masking capabilities
bijective and/or random functions, and they can check that the input data is in a valid
format.
Random data masking
Random masking consists of masking an input value with a randomly generated
value.
When there are multiple occurrences of the same value in the input dataset, it can be
masked with different values.
Different values from the input dataset can be masked with the same value.
tDataMasking component can mask data randomly:
- The A value is masked with D
when it first appears in the input dataset. - The B and C values are masked
with E. - The A value is masked with F
when it appears in the input dataset for the second time.
Random data masking examples
Input values | Extra Parameter | Examples of masked values |
---|---|---|
newuser@domain.com | “4” | ohsbser@domain.com |
admin@company.com | “4” | lneen@company.com |
newuser@domain.com | “4” | qzmaser@domain.com |
Input values | Extra Parameter | Examples of masked values |
---|---|---|
newuser@domain.com | “aaaaaa” | rxvsas |
admin@company.com | “aaaaaa” | bbwpba |
newuser@domain.com | “a9aaa9” | r8daw1 |
Input values | Examples of masked values |
---|---|
190049418437621 | 2590459222147 22 |
271083561478941 | 1900846274448 17 |
190049418437621 | 2730364078284 70 |
117029 | 1750694861914 69 |
Consistent data masking
When the same value appears twice in the input data, consistent masking functions
output the same masked value in the same Job execution.
However, two different input values can be masked with the same value in the output.
tDataMasking component can mask data consistently:
- The A value is masked with D,
regardless of the number of occurrences in the input dataset. - The B and C values are masked
with E.
Consistent data masking examples
left part of domain with consistent items function:
Input values | Extra Parameter | Examples of masked values |
---|---|---|
newuser@domain.com | “talend,value,newcompany” | newuser@newcompany.com |
admin@company.com | “talend,value,newcompany” | admin@value.com |
newuser@domain.com | “talend,value,newcompany” | newuser@newcompany.com |
user@company.com | “talend,value,newcompany” | user@value.com |
user@domain.com | “talend,value,newcompany” | user@newcompany.com |
Bijective data masking
- They are consistent masking functions.
- They are injective, meaning that they output two different masked values for two
different input values. - They check that the input data is in a valid format. If the input value is
valid, bijective masking functions output a valid value. If the input value is
not valid, they output an invalid value or replace values with
null
, depending of the masking function used.
tDataMasking component can mask data bijectively:
- The A value is masked with D,
regardless of the number of occurrences in the input dataset. - The B value is masked with
E. - The C value is masked with
F.
Bijective data masking examples
Input values | Example of masked values |
---|---|
190049418437621 | 289052428331901 |
271083561478941 | 234112758889352 |
190049418437621 | 289052428331901 |
117029 | null |
Repeatable data masking
To produce repetable masked values between Job executions, define a seed or a password
in the Advanced settings of the component
For a given combination of input and seed values, the same masked value is produced.
When using Format-Preserving Encryption methods, the same masked value is produced for a
given combination of an input value and a password.
Data masking functions in the masking components
There are several functions in the masking components which vary
according to the data type of the column.
It is advisable to use the functions predefined in the component with
columns that contain personally identifiable information, such as first and last names,
email addresses, addresses, SSNs, credit card numbers, bank account numbers, genders,
date of births and salaries.
Format-preserving
encryption in the masking components
The component uses Format-Preserving Encryption (FPE)
methods to generate masked output values in the same format as the input values.
required version to use the FF1 with
AES method. To be able to use this FPE method with Java versions
earlier than 8u161, download the Java Cryptography Extension (JCE) unlimited
strength jurisdiction policy files from Oracle website.
The FPE methods are based on a National Institute of
Standards and Technology (NIST) standard:
- FF1 with AES relies on the
Advanced Encryption Standard in CBC mode. - FF1 with SHA-2 relies on the
secure hash function HMAC-256.
The FPE methods are bijective methods, except when
using tweaks.
less strong than classical encryption algorithms. If you want to keep the data
format, use the masking components. Otherwise, use the tDataEncrypt component. The encryption is stronger.
The FF1 with
AES and FF1 with SHA-2 methods
require a password to generate encrypted and repeatable masked values. Those FPE methods
do not use a seed.
You can specify this password in the
password for FF1 method field, from the
Advanced Settings of the component.
You can use tweaks so that the
bijection is not performed. It makes the encryption stronger. A unique tweak is
generated for each record and applies to all data of a record. The tweaks change at
each Job execution. You can unmask the data by using the
tDataUnmasking component and the corresponding tweaks.
Format-preserving encryption in the tDatamasking
component
When using the FF1 with
AES and FF1 with SHA-2 methods, input values must contain a
minimum number of characters to be masked. Otherwise, the function returns null.
S426A789QQ using the Keep first n
digits and replace following ones function with the following
parameters:
- FF1 with AES or FF1 with SHA-2
- The
Digits alphabet - “2” as an extra-parameter
There are only 4 digits to be masked because you decided to keep the two first
digits. As a result, the function returns null.
The minimum number of characters
required in the input values varies depending on the selected Alphabet.
When selecting Best
guess, the number varies depending on the represented alphabets in
the input values.
Alphabet | Minimum number of characters to mask |
---|---|
Alphanumeric | 4 |
Digits | 6 |
Latin extended | 3 |
Hiragana | 4 |
Katakana | 3 |
Kanji | 2 |
Hangul | 2 |
Alphabets
When using the Replace
all, Replace characters between two positions,
Replace n first digits and Replace n
last digits with FPE methods, you can select an alphabet.
Characters that belong to the
selected alphabet are masked with characters from the same alphabet.
When selecting the Best
guess alphabet, masked values contain characters from all character
types represented in the input values. Best
guess is the default alphabet.
Any unrecognized character is
copied to the output as is.
The following alphabets are
supported:
Alphabet | Character Type | Unicode Range (version 11.0) | Corresponding characters |
---|---|---|---|
Alphanumeric | Latin numbers | [0030-0039] | [0-9] |
Latin lower-cased letters | [0061-007A] | [a-z] | |
Latin upper-cased letters | [0041-005A] | [A-Z] | |
Digits | Latin numbers | [0030-0039] | [0-9] |
Latin extended | Latin numbers | [0030-0039] | [0-9] |
Latin lower-cased letters | [0061-007A] | [a-z] | |
Latin extended lower-cased letters | [00DF-00F6] [00F8-00FF] |
[ß-ö] [ø-ÿ] | |
Latin upper-cased letters | [0041-005A] | [A-Z] | |
Latin extended upper-cased letters | [00C0-00D6] [00D8-00DE] |
[À-Ö] [Ø-Þ] | |
Hiragana | Hiragana | [3041-3096] 30FC 309D 309E |
[ぁ-ゖ] ー ゝ ゞ |
Katakana | Half-with Katakana | https://www.unicode.org/charts/PDF/UFF00.pdf | [ヲ-ン][FF66-FF9D] |
Full-width Katakana | [30A1-30FA] 30FC 30FD 30FE |
[ァ-ヺ] ー ヽ ヾ | |
Phonetic extension: [31F0-31FF] |
[ㇰ-ㇿ] | ||
Kanji | Kanji | CJK Extension A[FF66-FF9D: [4E00-9FEF] [3400-4DB5] |
[一-] [㐀-䶵] |
CJK Extension B: [20000-2A6D6] |
[?-?] | ||
CJK Extension C: [2A700-2B734] |
[?-?] | ||
CJK Extension D: [2B740-2B81D] |
[?-?] | ||
CJK Extension E: [2B820-2CEA1] |
[–] | ||
CJK Extension F: [2CEB0-2EBE0] |
[–] | ||
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] |
[豈-舘] [–] | ||
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] |
[–] | ||
KangXi Radicals: [2F00-2FD5] |
[⼀-⿕] | ||
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] |
[⺀-⺙] [⺛-⻳] | ||
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] |
[々-々] [〇-〇] [〡-〩] [〸-〻] | ||
Hangul | Hangul | [AC00-D7AF] | [가-] |
Character handling functions
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Replace all | ||||
Replace n first chars | ||||
Replace n last chars | ||||
Replace characters between two positions | ||||
Replace all letters | ||||
Replace all digits | ||||
Keep n first digits and replace following ones |
||||
Keep n last digits and replace following ones |
||||
Keep characters between two positions | ||||
Remove n first chars | N/A | N/A | N/A | N/A |
Remove n last chars | N/A | N/A | N/A | N/A |
Remove characters between two positions | N/A | N/A | N/A | N/A |
Replace all
This function masks all characters from the input values.
This function can be used on Strings.
When using the FF1
with AES and FF1 with SHA-2
methods, the input values must contain at least two characters to mask. Otherwise, the
function returns null.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with the different output values. When the same value appears This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter |
The optional extra parameter must be a character. |
define as an extra parameter.
In the third example, the masked value contains
characters from all alphabets represented in the input value.
Input value | Method | Alphabet | Extra parameter | Example of a masked value |
---|---|---|---|---|
Jack | Randomly | “a” | aaaa | |
S1000D | Randomly | “4” | 444444 | |
S1000D | FF1 with SHA-2 |
Best guess |
2MTW72 |
Replace n first chars
remain as is.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with different output values. When the same value appears twice in This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires an extra parameter.
The extra parameter must be a number. You |
are masked.
In the first example, the replacement character is not defined. The first
two characters are masked with random characters.
In the second example, the
first two characters are masked with the defined character.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
Jack | Randomly | “2” | Pvck |
S1000D | Randomly | “2,s” | ss000D |
Replace n last chars
remain as is.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires an extra parameter.
The extra parameter must be a number. You |
are masked.
In the first example, the replacement character is not defined. The last
two characters are masked with random characters.
In the second example, the
last two characters are masked with the defined character.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
Jack | Randomly | “2” | Jadq |
S1000D | Randomly | “2,s” | S100ss |
Replace characters between two positions
outside the interval are copied to the output as is.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires two extra parameters.
The You can enter a third extra parameter, |
In the first example, the first three characters are masked with the defined
character.
In the second example, the replacement character is not defined. The second, third
and fourth characters are masked with random characters.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
Jack | Randomly | “1,3,p” | pppk |
S1000D | Randomly | “2,4” | S0640D |
Replace all letters
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter |
The optional extra parameter is the replacement |
In the first example, the replacement character is not defined. All letters are
masked with random characters.
In the second example, all letters are masked with the defined character.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
Jack | Randomly | “” | Zvxn |
S1000D | Randomly | “q” | q1000q |
Replace all digits
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Alphabet | Digits is the only alphabet available with the FF1 with AES and FF1 with SHA-2 methods. |
Extra parameter |
The optional extra parameter is the replacement |
In the first example, the replacement character is not defined. All digits are masked
with random characters.
In the second example, all digits are masked with the defined character.
In the third example, all digits are masked with the defined digit.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
Jack | Randomly | “” | Jack |
S1000D | Randomly | “q” | SqqqqD |
S1000D | Randomly | “8” | S8888D |
Keep n first digits and replace following ones
ones with digits. Non-digits characters remain as is.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with the different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires an extra parameter.
The extra parameter is the number of |
In the first example, the input value does not contain any digits, the input value is
copied as is to the output.
In the first example, the first two digits are are copied to the output as is. The
following ones are masked with random digits.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “2” | Jack |
S1000D | “2” | S1023D |
Keep n last digits and replace previous ones
with digits. Non-digits characters remain as is.
Option | Description |
---|---|
Method | The Randomly method randomly selects a character. As a result, two identical input values can be masked with the different output values. When the same value This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires an extra parameter.
The extra parameter is the number of |
In the first example, the input value does not contain any digits, the input value is
copied as is to the output.
In the first example, the last two digits are are copied to the output as is. The
previous ones are masked with random digits.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “2” | Jack |
S1000D | “2” | S8900D |
Keep characters between two positions
outside the interval are removed.
Option | Description |
---|---|
Extra parameter | This function requires two extra parameters.
The extra parameters must be numbers, which are the start |
In the first example, the first three characters are kept, while the other ones are
removed.
In the second example, the second, third and fourth characters are kept, while the other ones
are removed.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “1,3” | Jac |
S1000D | “2,4” | 100 |
Remove characters between two positions
outside the interval are copied to the output as is.
Option | Description |
---|---|
Extra parameter | This function requires two extra parameters.
The |
In the first example, the first three characters are removed, while the other ones
are kept.
In the second example, the second, third and fourth characters are removed, while the
other ones are kept.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “1,3” | k |
S1000D | “2,4” | S0D |
Remove n first chars
are copied to the output as is.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra parameter is the number of |
In the first example, the first two characters are removed.
In the second example, the first four characters are removed.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “2” | ck |
S1000D | “4” | 0D |
Remove n last chars
the output as is.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra parameter is the number of |
In the first example, the last two characters are removed.
In the second example, the last four characters are removed.
Input value | Extra parameter | Example of a masked value |
---|---|---|
Jack | “2” | Ja |
S1000D | “4” | S1 |
Date handling functions
You can mask dates.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation | Note |
---|---|---|---|---|---|
Date variance | You can use the tPatternMasking component to mask dates in a bijective manner. However, the variation in days is not guaranteed. |
||||
Keep year and set day and month to 01/01 |
– |
Date variance
This function varies the input date by the number of days
specified as an extra parameter.
If the input date is null, then the function returns the current date.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra If the extra For example, if the input date is 05-11-2016, then the generated date is randomly selected |
In the first example, the extra parameter is “0”. Then, the function replaces this value with 31. The generated date, 07-07-2018, is randomly selected between 01-06-2018 (31 days before the input date) and 02-08-2018 (31 days after the input date).
In the first example, the extra parameter is “4”. The generated date, 01-07-2018, is randomly selected between 29-06-2018 (4 days before the input date) and 06-08-2018 (4 days after the input
date).
Input value | Extra parameter | Example of a masked value |
---|---|---|
02-07-2018 | “0” | 07-07-2018 |
02-07-2018 | “4” | 01-07-2018 |
Keep year and set day and month to 01/01
This sets the month and day of the input date to January, 1 but does not
change the year.
If the input date is null, the function returns January,
1 of the current year, for example 01-01-2019
.
This function requires no extra
parameter.
The function returns January, 1 of the current year.
Input value | Example of a masked value |
---|---|
24-12-2019 | 01-01-2019 |
Number handling functions
You can mask numbers.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation | Note |
---|---|---|---|---|---|
Generate value between two values | To mask values in a bijective manner, you can use the tPatternMasking component. |
||||
Numeric variance |
– |
Generate value between two values
user-defined minimum and maximum values.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The minimum and maximum values are specified as an extra parameter, by If the
user-defined minimum and maximum values do not use the right format, the function returns the following masked values:
|
The masked value has been randomly selected within the minimum value (50) and the
maximum value (99) defined as extra parameters.
Input value | Extra parameter | Example of a masked value |
---|---|---|
24 | “50,99” | 93 |
Numeric variance
This function varies the input numeric value, based on the percentage
specified as an extra parameter.
This function applies only to numeric data types: Integer, Long, Float and Double.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra parameter must be a number, this parameter For example, if the input is 100 and the parameter is 10, then the generated value will be a If the extra parameter is 0, it will be replaced with 10. If the input is null, then the |
In the following example, the masked value has been randomly selected
between 5 (10 – 50%) and 15 (10 + 50%).
Input value | Extra parameter | Example of a masked value |
---|---|---|
10 | “50” | 7 |
Bank account generation functions
You can generate bank account numbers.
To mask bank account numbers by keeping the original country and using the Format-Preserving
Encryption, use the Bank account masking
function.
Function | Random generation | Consistent generation | Bijective generation | Input data validation |
---|---|---|---|---|
Generate account number | ||||
Generate account number and keep original country |
Generate account number
This function generates a valid French bank account number.
This function only applies on String values.
This function requires no extra
parameter.
A French IBAN number is a 27-character code. The numbers
are randomly generated but against algorithms. The last digit of the IBAN is known as
the “clef RIB” and is generated with an algorithm and the third and fourth digits of the
IBAN are also generated through an algorithm.
In the following example, the masked value is a French IBAN number, regardless of the
input value.
Input value | Example of a masked value |
---|---|
A26 | FR76 3000 6000 0112 3456 7890 189 |
Generate account number and keep original country
This function generates a valid bank account number for the
original country.
If the input is a correct IBAN number, the function generates an IBAN number
from the same country as the input value. The function takes into account the IBAN
number which is different from one country to the other.
If the input value is a correct US account number, the function keeps the first
nine digits and randomly masks the other digits.
If the input value is not a correct account number, the function generates
a valid French IBAN number.
In the first example, the input value is not a correct account number, the
masked value is a valid French IBAN number.
In the second example, the input value is a correct US account number, the masked
value is a correct US account number.
Input value | Example of a masked value |
---|---|
1234567890 | FR76 3000 1007 9412 3456 7890 185 |
091000019 6564833713 | 091000019 3602742991 |
Credit card generation functions
You can generate credit card numbers.
To mask credit card numbers by using the Format-Preserving Encryption, use the Credit Card masking functions.
Function | Random generation | Consistent generation | Bijective generation | Input data validation |
---|---|---|---|---|
Generate credit card | ||||
Generate credit card and keep original bank |
Generate credit card
This function generates a valid credit card number.
This function requires no extra
parameter.
This function applies on String or Long values.
- Visa
- MasterCard
- American Express
One type is randomly chosen and a credit card number is randomly generated. Then,
the generated credit card number passes algorithms that detect false credit card
numbers.
In the following example, the masked value is a valid Visa credit card number,
regardless of the input value.
Input value | Example of a masked value |
---|---|
A26 | 4346065537027896 |
Generate credit card and keep original bank
If the input value is a correct Visa, MasterCard or American Express
credit card number, this function generates a credit card number from the same company
and keeps the prefix
This function applies on String or Long values.
This function requires no extra
parameter.
The generated credit card number passes algorithms that detect false
credit card numbers.
In the following example, the input value is a valid American Express credit card
number. The masked value is also a valid American Express credit card number.
Input value | Example of a masked value |
---|---|
346992550391727 | 348482709815527 |
Data generation functions
You can generate output data different from the input data.
Function | Random generation | Consistent generation | Bijective generation | Input data validation |
---|---|---|---|---|
Generate from pattern | ||||
Generate Uuid | ||||
Generate sequence | ||||
Generate from file/list |
Generate from pattern
This function generates a value based on a user-defined
pattern.
This function is applied only on Strings.
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra parameter is a pattern that
All other characters are copied to the generated value For more information about the supported You can also use
If you want to copy a character used in patterns |
This function does not work correctly if a comma ‘,’ is used in the
pattern.
- a characters are replaced with random
Latin lowercase letters. - s characters are not masked in the
generated output. - \2 calls the group placed after the
second “,” character, which is @talend.com.
Input value | Extra parameter | Example of a masked value |
---|---|---|
A26 | “aaaass\2,@gmail.com,@talend.com” | hjdfss@talend.com |
- \3 calls the group placed after the
third “,” character, which is a. - 9 characters are masked with random
digits.
Input value | Extra parameter | Example of a masked value |
---|---|---|
A26 | “\39999,D,Z,a” | a4825 |
Generate UUID
This function masks the input value with a randomly generated
universally unique identifier (UUID).
This function uses the UUID.randomUUID()
method
provided by Java. This Java method does not use a seed, meaning that if you run the job
twice, the function generates different UUIDs.
This function is applied on Strings.
This function requires no extra
parameter.
In the following example, the masked valued is a randomly generated UUID.
Input value | Example of a masked value |
---|---|
A26 | 28e92000-aafa-4ec3-bd56-240f192a4a8c |
Generate sequence
This function returns the extra parameter, and, for each row, will
increase this number by 1
.
This function can be applied on all data types but Date (Integer, Long,
Strings, etc.).
Option | Description |
---|---|
Extra parameter | This function requires an extra parameter.
The extra parameter must be a number. If the extra |
set as an extra parameter.
Input values | Extra parameter | Examples of masked values |
---|---|---|
21
A48 |
“0” | 0
1 |
Generate from file/list
This function randomly replaces the input value with one of the
user-defined values.
This function is applied to Strings or numerical data types.
Option | Description |
---|---|
Method | The Randomly method randomly selects the value from the list (or file). As a result, two similar input values can be masked with the different output values. The Consistently When using the
Consistently method, the probability of generating duplicates can be calculated using the following formulas:
where Using this approach, it is possible to For example, the probability that, in a group
of n people, two people have the same birthday isthe following:
|
Extra parameter | This function requires an extra parameter.
The extra parameter can be:
The values must be stored in a String and If you use the Apache Spark Batch or the Apache Spark Streaming
version of the component, enter the prefix before the file path:
Paths to folders are not supported. If the extra parameter is not set, the function returns an empty String or |
In the following example, the masked value is one of the values set as extra
parameters.
Input value | Method | Extra parameter | Examples of a masked value |
---|---|---|---|
21 | Randomly | “help,documentation” | help |
Phone number generation functions
You can generate French, German, Japanese, UK and US phone
numbers.
To mask phone numbers by using the Format-Preserving Encryption, use the Phone masking functions.
Function | Random generation | Consistent generation | Bijective generation | Input data validation |
---|---|---|---|---|
Generate French phone number | ||||
Generate German phone number | ||||
Generate Japanese phone number | ||||
Generate UK phone number | ||||
Generate US phone number |
Generate French phone number
This function generates a valid French phone number, regardless of
the input value.
This function only applies on Strings.
This function requires no extra
parameter.
Input value | Example of a masked value |
---|---|
A26 | +33 307066271 |
Generate German phone number
This function generates a valid German phone number, regardless of
the input value.
This function only applies on Strings.
This function requires no extra
parameter.
Input value | Example of a masked value |
---|---|
A26 | 030 30748511 |
Generate Japanese phone number
This function generates a valid Japanese phone number, regardless
of the input value.
This function only applies on Strings.
This function requires no extra
parameter.
Input value | Example of a masked value |
---|---|
A26 | 03-2419-1781 |
Generate UK phone number
This function generates a valid UK phone number, regardless of the
input value.
This function only applies on Strings.
This function requires no extra
parameter.
Input value | Example of a masked value |
---|---|
A26 | 020 3705 5907 |
Generate US phone number
This function generates a valid US phone number, regardless of the
input value.
This function only applies on Strings.
This function requires no extra
parameter.
Input value | Example of a masked value |
---|---|
A26 | 527-708-5526 |
Social Security Number (SSN) generation functions
You can generate Social Security Numbers.
To mask SSNs by using the Format-Preserving Encryption, use the Social Security Number (SSN) masking functions.
Function | Random generation | Consistent generation | Bijective generation | Input data validation |
---|---|---|---|---|
Generate French SSN number | ||||
Generate German SSN number | ||||
Generate Japanese SSN number | ||||
Generate UK SSN number | ||||
Generate US SSN number | ||||
Generate Chinese SSN number | ||||
Generate Indian SSN number |
Generate French SSN number
This function generates a valid French social security number,
regardless of the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 2760774865895 37 |
Generate German SSN number
This function generates a valid German social security number,
regardless of the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 96918234144 |
Generate Japanese SSN number
This function generates a valid Japanese social security number,
regardless of the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 680917875625 |
Generate UK SSN number
This function generates a valid UK social security number, regardless of
the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | BY 15 61 20 D |
Generate US SSN number
This function generates a valid US social security number, regardless of
the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 437-02-2223 |
Generate Chinese SSN number
This function generates a valid Chinese social security number,
regardless of the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 653024204001080102 |
Generate Indian SSN number
This function generates a valid Indian social security number,
regardless of the input value.
This function only applies on Strings.
Input value | Example of a masked value |
---|---|
A26 | 142543864863 |
Bank account masking function
You can mask IBAN and US bank account numbers.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Mask account number and keep original country |
This function applies on String values.
Two methods are available: FF1 with AES
and FF1 with SHA-2. This function requires no
alphabet and no extra parameter.
If the input is a valid IBAN number, the function masks it by an IBAN number
from the same country. The function takes into account the IBAN number which is
different from one country to the other.
If the input is a valid US account number, the function masks all digits.
If the input is neither a valid IBAN nor US account number and there is:
- no “Invalid” output flow, the function returns null in the main flow.
- an “Invalid” output flow, the corresponding data are sent to the
“Invalid” output flow.
them.
Bank account number | Logic | Valid if… |
---|---|---|
IBAN number |
The whole string is verified. The first two The first two digits are generated through a For French and |
|
US | The first nine digits are verified. |
To verify whether the format of an IBAN number is valid or not, you can refer
to this IBAN registry.
In the following example, the Keep
format check box is selected to preserve the space from the input
value.
Input value | Method | Example of masked value |
---|---|---|
SV43ACAT00000000000000123123 |
FF1 with SHA-2 |
SV53FAGI78247154681080694193 |
FR49 2867 2609 7580 N16P 4ZFM V39 |
FF1 with AES | null
Cause: Invalid IBAN number |
159 753 321 16 | FF1 with SHA-2 |
607 503 340 92 |
4556156203746391 | FF1 with AES | null
Cause: Invalid bank account number |
RO49 AAaA 1b31 1000 9344 0000 | FF1 with SHA-2 | null
Cause: Lowercase letters |
ST23000200000289355710148 |
FF1 with AES |
ST30061989350589302375875 |
The given outputs are valid bank account numbers.
Address masking functions
You can mask addresses.
This function only applies on Strings.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Address masking |
This function masks digits with other digits and other
characters with X
.
The following case-insensitive keywords will not be masked in
the output: ALLEE, ALLEY, ALLÉE, AREA, AUFFAHRT, AV, AV., AVDA, AVE,
AVE., AVENIDA, AVENUE, BACKROAD, BANLIEUE, BD, BD., BLV, BLV., BLVD, BOULEVARD,
BREVE, BULEVAR, BVD, BVD., BYWAY, CALLE, CAMINHO, CAMINO, CARREFOUR, CARREGGIATA,
CARRETERA, CHAUSSEE, CHAUSSÉE, CHEMIN, CITE, CITÉ, CORTO, COUR, COURT, CRT, CT, CT.,
CURTO, DR, DR., DRIVE, DRIVEWAY, ESD, ESPLANADA, ESPLANADE, ESTRADA, FAUBOURG,
FORUM, FREEWAY, GLEIS, HIGHWAY, HWY, IMPASSE, INDUSTRIAL, INDUSTRIALE, INDUSTRIELLE,
KURZ, LANE, LUNGOMARE, MANEIRA, MODO, PARKWAY, PARVIS, PASSAGE, PASSERELLE,
PERIFERIA, PERIFERICO, PERIFÉRICO, PERIPHERAL, PERIPHERIQUE, PIAZZA, PISTA, PL, PL.,
PLACE, PLATZ, PLAZA, PONT, PORTE, PROMENADE, PERIPHERIQUE, PÉRIPHÉRIQUE, QUADRADO,
QUAI, R, R., RD, RD., ROAD, ROUTE, RTE, RUA, RUE, SQUARE, ST, ST., STD, STR, STRADA,
STRASSE, STREET, SUBURB, SUBURBIO, SUBÚRBIO, TERRASSE, TRACK, UBER, VIA, VIALE,
VILLA, VLE, VOIE, VORORT, VÍA, WAY, WEG, ZONA, ZONE, ÁREA, ÜBER.
Option | Description |
---|---|
Extra parameter | The optional extra parameter can be:
Those keywords are added to the default list and will not be masked in |
is not part of the list of keywords. As a result, this word is masked in the
output.
In the second example, “venelle”
is added to the list of keywords. As a result, this word is not masked in the
output.
Input value | Extra parameter | Example of a masked value |
---|---|---|
3 venelle Artémis |
“” | 5 XXXXXXX XXXXXXX |
3 venelle Artémis |
“venelle,enceinte” | 6 venelle XXXXXXX |
Email masking functions
You can mask email addresses.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Mask email full domain by character | ||||
Mask email left part of domain by character | ||||
Mask email local part by character |
Mask email local part
This function masks all characters before the @ character. Two
methods are available: By character and From a list of
values.
This function only applies on Strings.
This function requires an extra parameter.
Option | Description |
---|---|
Method | When using the By character method, this function masks what comes before the @ character with a character. When using the From a list of values method, this function |
Extra parameter | This function requires an extra parameter.
When using the By When |
with the user-defined characters.
In the second example, all characters before the @
character are masked with one of the values from the user-defined
list.
Input value | Method | Extra parameter | Example of masked value |
---|---|---|---|
johnsmith@company.com | By character |
“p” | ppppppppp@company.com |
johnsmith@company.com | From a list of values |
“z,x,c,h” | xxxxxxxxx@company.com |
Mask email full domain
This function masks what comes after the @ character. Two methods
are available: By character and From a list of
values.
This function only applies on Strings.
Option | Description |
---|---|
Method | When using the By character method, this function masks what comes after the @ character with a character. When using the From a list of values method, this function |
Extra parameter | This function requires an extra parameter.
When using the By When |
with one of the values from the user-defined list.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
johnsmith@company.com | From a list of values |
“newtalend.com,newcompany.org” | johnsmith@newtalend.com |
Mask email left part of domain
This function masks what comes between the @ character and the dot in
e-mail adresses. Two methods are available: By
character and From a list of
values.
This function only applies on Strings.
Option | Description |
---|---|
Method | When using the By character method, this function masks what comes between the @ character and the dot with a character. When using the |
Extra parameter | This function requires an extra parameter.
When using the By When using the From a list of |
dot are masked with one of the values from the user-defined list.
Input value | Method | Extra parameter | Example of a masked value |
---|---|---|---|
johnsmith@company.com | From a list of values |
“newtalend,talendforge” | johnsmith@newtalend.com |
Credit Card masking functions
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Mask Credit Card and keep bank | ||||
Mask Credit Card |
- apply on String values,
- support all credit card types and
- keep the original format of the credit card number. For example, if the input has 13
digits, the output has 13 digits.
A credit card number is considered invalid when it does not satisfy the Luhn algorithm.
- no “Invalid” output flow, the function returns null in the main flow.
- an “Invalid” output flow, the corresponding data are sent to the “Invalid” output
flow.
Mask Credit Card and keep bank
Identification Number/Issuer Identification Number (BIN/IIN).
The output
value is a valid credit card number.
- keeps the first six digits,
- masks the other digits and
- generates the last digit using the Luhn algorithm.
Two methods are available: FF1 with AES and FF1 with
SHA-2. This function requires no alphabet and no extra parameter.
In the following example, the Keep format check box is selected to
preserve the space from the input value.
Credit card number | Method | Example of masked value |
---|---|---|
4570 5624 6978 6793 |
FF1 with AES |
4570 5678 2786 4430 |
374140537770721 |
FF1 with AES |
374140100455098 |
5168690988613241 |
FF1 with SHA-2 |
5168699616108078 |
5158495805899854 |
FF1 with SHA-2 |
5158494455420285 |
0123 4567 8987 6543 210 | FF1 with AES | null |
Mask Credit Card
The
output value is a valid credit card number.
- masks all digits and
- generates the last digit using the Luhn algorithm.
Two methods are available: FF1 with AES and FF1 with
SHA-2. This function requires no alphabet and no extra parameter.
In the following example, the Keep format check box is selected to
preserve the space from the input value.
Credit card number | Method | Example of masked value |
---|---|---|
4570 5624 6978 6793 |
FF1 with AES |
4931 3744 4754 2072 |
374140537770721 |
FF1 with AES |
749381687018333 |
5168690988613241 |
FF1 with SHA-2 |
4138106541683084 |
5158495805899854 |
FF1 with SHA-2 |
9641013768742255 |
0123 4567 8987 6543 210 | FF1 with AES | null |
Phone masking functions
You can mask French, German, Japanese, UK and US phone
numbers.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Mask French phone number | ||||
Mask German phone number | ||||
Mask Japanese phone number | ||||
Mask UK phone number | ||||
Mask US phone number |
Mask French phone number
This function generates a unique random French phone number related to the
input.
This function masks the last six digits. Input values that contain at least six digits
are regarded as valid phone numbers.
If the input value is not valid, the function returns null.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the Keep
format check box is selected to preserve the spaces from the input
value.
Input value | Method | Example of masked value |
---|---|---|
02 40 99 90 99 | FF1 with AES | 02 40 89 78 01 |
Mask German phone number
This function generates a unique random German phone number related to the
input.
This function masks the last eight digits. Input values that contain at least eight
digits are regarded as valid phone numbers.
If the input value is not valid, the function returns null.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the Keep
format check box is selected to preserve the dash from the input
value.
Input value | Method | Example of masked value |
---|---|---|
636-48018 | FF1 with AES | 389-54922 |
Mask Japanese phone number
This function generates a unique random Japanese phone number related to the
input.
This function masks the last seven digits. Input values that contain at least seven
digits are regarded as valid phone numbers.
If the input value is not valid, the function returns null.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the Keep
format check box is selected to preserve the dashes from the input
value.
Input value | Method | Example of masked value |
---|---|---|
052-2451-4455 | FF1 with AES | 052-2970-7735 |
Mask UK phone number
This function generates a unique random UK phone number related to the
input.
This function masks the last seven digits. Input values that contain at least seven
digits are regarded as valid phone numbers.
If the input value is not valid, the function returns null.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
Input value | Method | Example of masked value |
---|---|---|
02071231234 | FF1 with AES | 02074444306 |
Mask US phone number
This function generates a unique random US phone number related to the
input.
This function masks the last six digits. Input values that contain at least six digits
are regarded as valid phone numbers.
If the input value is not valid, the function returns null.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the Keep
format check box is selected to preserve the dash from the input
value.
Input value | Method | Example of masked value |
---|---|---|
636-48018 | FF1 with AES | 389-54922 |
Social Security Number (SSN) masking functions
You can mask French, German, Japanese, UK, US, Chinese and Indian Social
Security Numbers.
Function | Random masking | Consistent masking | Format-preserving encryption | Input data validation |
---|---|---|---|---|
Mask French SSN number | ||||
Mask German SSN number | ||||
Mask Japanese SSN number | ||||
Mask UK SSN number | ||||
Mask US SSN number | ||||
Mask Chinese SSN number | ||||
Mask Indian SSN number |
Mask French SSN number
This function generates a unique random French social security number
related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
171125612301521 | FF1 with AES | 113056322612896 |
Mask German SSN number
This function generates a unique random German social security number
related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
12123456123 | FF1 with AES | 04538250629 |
Mask Japanese SSN number
This function generates a unique random French phone number related to the
input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
123456789012 | FF1 with AES | 283950101162 |
Mask UK SSN number
This function generates a unique random UK social security number
related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
PP132459A | FF1 with AES | PC916049A |
Mask US SSN number
This function generates a unique random US social security number
related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
153654862 | FF1 with AES | 828521191 |
Mask Chinese SSN number
This function generates a unique random Chinese social security
number related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
130503196704010012 | FF1 with AES | 510304190708135114 |
Mask Indian SSN number
This function generates a unique random Indian social security number
related to the input.
This function only applies on Strings.
If there are duplicates in the input data, you
will get the same duplicates in the masked values. In the same way, if there are no
duplicates in the input data, there will be no duplicates in the masked values.
If the input value is not valid, the function returns null.
Option | Description |
---|---|
Method | The default Basic method uses a proprietary algorithm. Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. This function can encrypt the output masked
The FPE methods are bijective methods, except when The FF1 with You can specify this password in the |
Extra parameter | This function requires no extra parameter. |
In the following example, the input value is a valid SSN number. The masked value is
also a valid SSN number.
Input value | Method | Examples of masked value |
---|---|---|
186034828209 | FF1 with AES | 203307371407 |
Set to null
You can nullify values from the input data.
This function returns null
.
Option | Description |
---|---|
Method | Not applicable |
Extra parameter | This function requires no extra parameter. |
Input value | Examples of masked value |
---|---|
Arthur | null |
09-05-2019 | null |
tDataMasking Standard properties
These properties are used to configure tDataMasking running in the Standard Job framework.
The Standard
tDataMasking component belongs to the Data Quality family.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Sync Click Edit
The output schema of this component contains read-only
columns:
|
|
Built-In: You create and store the schema locally for this component |
|
Repository: You have already created the schema and stored it in the |
Modifications |
Define in the table what fields to change and how to change
Input Column: Select the column from These Category: select a category of masking functions from the list.
Function: Select the function that The functions you can For example, if the column type Method: Select the Basic method or one FF1 algorithm (Format-Preserving The Basic method is the default Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. The FF1 with AES method is based Note: Java 8u161 is the minimum
required version to use the FF1 with AES method. To be able to use this FPE method with Java versions earlier than 8u161, download the Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files from Oracle website. The FF1 with AES and The Method list is only available for functions that use Format-Preserving When using the Characters that When selecting the Best guess alphabet, masked values contain characters Any unrecognized character is copied to the output as is.
Extra Parameter: This field is used
Keep format: this function is only |
Advanced settings
Password for FF1 methods |
Set the password |
Use tweaks with FF1 Encryption |
Select this If bijective |
Seed for random generator |
Set a random number if you want to generate If you do not set the seed, the component |
Encoding |
Select the encoding from the list or select Custom and define it manually. If you select Custom and leave the field empty, the supported When you set Function to Generate from |
Output the original row |
Select this check box to output original data rows in addition to the |
Should null input return |
This check box is selected by If the input is |
Should empty input return empty |
When this check box is selected, empty values are left unchanged in |
Send invalid data to “Invalid” output flow |
This check box is selected by default.
The data are considered invalid when:
|
tStat |
Select this check box to gather the Job processing metadata at the Job level |
Usage
Usage rule |
This component is an intermediary step. It requires an input and |
Altering data values to restrict the use of actual sensitive data
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
With the tDataMasking component, you can replace
sensitive information such as credit card or social security numbers with realistic values,
allowing production data to be safely used for purposes such as testing and training.
This scenario describes a Job which uses:
-
the tFixedFlowInput component to generate
personal data including credit card numbers, -
the tDataMasking component to hide specific
original data with random characters or figures, -
the tFileOutputExcel component to output the
substitute data set.
Setting up the Job
-
Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tDataMasking and
tFileOutputExcel. - Connect the three components together using the Main links.
Configuring the input component
-
Double-click tFixedFlowInput to open its
Basic settings view in the Component tab. -
Create the schema through the Edit Schema
button.In the open dialog box, click the [+] button
and add the columns that will hold the initial input data. - Click OK.
-
In the Number of rows field, enter
1. - In the Mode area, select the Use Inline Content option.
-
In the Content table, enter the customer data
you want to replace with realistic values, for example:1234567891011120|4244487462024688|Nowmer|Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@@Tlaxiaco.org1|3458687462024688||Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@Tlaxiaco.org.org2|4639587470586299|Whelply|Derrick|I.|2219 Dewing Avenue|Sooke|BC|17172|Canada|211-555-7669|DerrickWhelply@Sooke.org3|2541387475757600|Derry|Jeanne||7640 First Ave.|Issaquah|WA|73980|USA|656-555-2272|JeanneDerry@Issaquah.org4|7845987500482201|Spence|Michael|J.|337 Tosca Way|Burnaby|BC|74674|Canada|929-555-7279|MichaelSpence@Burnaby.org5|1547887514054179|Gutierrez|Maya||8668 Via Neruda|Novato|CA|57355|$$#|387-555-7172|MayaGutierrez@Novato.org6|5469887517782449|Damstra|Robert|F.|1619 Stillman Court|Lynnwood|WA|90792|$$#|922-555-5465|RobertDamstra@Lynnwood.org7|54896387521172800|Kanagaki|Rebecca||2860 D Mt. Hood Circle|||13343|Mexico|515-555-6247|RebeccaKanagaki@Tlaxiaco.org8|47859687539744377||Kim|H.|6064 Brodia Court|San Andres|DF|12942|Mexico|411-555-6825|Kim@Brunner@San Andresorg9|35698487544797658||Brenda|C.|7560 Trees Drive||BC|$$|Canada|815-555-3975|BrendaBlumberg@Richmond.org10|36521487568712234|Stanz|Darren|M.|1019 Kenwal Rd.|$$#|OR|82017|USA|847-555-5443|DarrenStanz@Lake Oswego.org...
Replacing actual data with realistic values
-
Double-click tDataMasking to display the
Basic settings view and define the component
properties. -
If required, click Sync columns to retrieve
the schema defined in the input component. -
Click the Edit schema button to open the
schema dialog box.tDataMasking proposes one predefined
read-only column as shown in the below capture.This column identifies bytrue
orfalse
if the
output record is an original or a substitute record respectively. -
Move any of the input columns to the output schema if you want to show them in
the results, click OK and accept to propagate
the changes. -
In the Modifications table, click the [+] button to add four rows, and
perform the following actions:- In the Input Column, select the
columns which content you want to substitute. - In the Category column, select
from the list the category the function you want to use to mask data
belongs to. - In the Function column, select
from the list the function you want to use to mask data. - When available, in the Parameter
column, select from the list the method to be used by the function to
mask data. - When available, in the Parameter
column, enter a value, a pattern or a path to be used by the function to
mask data.
In this example, the Job will generate
inauthentic credit card numbers, replace the first three letters of first names,
replace last names with names from a local file and replace the local part in
email addresses with X
characters. - In the Input Column, select the
-
Click the Advanced settings tab and select
the Output the original row check box.The Job will add the original data rows to the substitute data.
Configuring the output component and executing the Job
-
Double-click the tFileOutputExcel component
to display the Basic settings view and define
the component properties. -
Set the destination file name as well as the sheet name and then select the
Define all columns auto size check box. -
Save your Job and press F6 to execute
it.The tDataMasking component substitutes data
in the selected columns and writes the result in an output file. -
Right-click the output component and select Data
Viewer to display the original and substituted data.tDataMasking outputs original and substitute
rows marked respectively withtrue
andfalse
in the
ORIGINAL_MARK column. It generates inauthentic credit
card numbers, replaces the first three letters of first names, replaces last
names with names from a local file and finally replaces the part before the @
sign in email addresses by the names defined in the component basic
settings.Sensitive personal information in the input data has been “hidden” but data
keeps looking real and consistent. The substitute data is still usable for
purposes other than production.
tDataMasking properties for Apache Spark Batch
These properties are used to configure tDataMasking running in the Spark Batch Job framework.
The Spark Batch
tDataMasking component belongs to the Data Quality family.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
Basic settings
Schema and Edit Schema |
A schema is a row description. It defines the number of fields Click Sync Click Edit
The output schema of this component contains read-only
columns:
|
|
Built-In: You create and store the schema locally for this component |
|
Repository: You have already created the schema and stored it in the |
Modifications |
Define in the table what fields to change and how to change
Input Column: Select the column from These Category: select a category of masking functions from the list.
Function: Select the function that The functions you can For example, if the column type Method: Select the Basic method or one FF1 algorithm (Format-Preserving The Basic method is the default Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. The FF1 with AES method is based Note: Java 8u161 is the minimum
required version to use the FF1 with AES method. To be able to use this FPE method with Java versions earlier than 8u161, download the Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files from Oracle website. The FF1 with AES and The Method list is only available for functions that use Format-Preserving When using the Characters that When selecting the Best guess alphabet, masked values contain characters Any unrecognized character is copied to the output as is.
Extra Parameter: This field is used
Keep format: this function is only |
Advanced settings
Password for FF1 methods |
Set the password |
Use tweaks with FF1 Encryption |
Select this If bijective |
Seed for random generator |
Set a random number if you want to generate If you do not set the seed, the component |
Encoding |
Select the encoding from the list or select Custom and define it manually. If you select Custom and leave the field empty, the supported When you set Function to Generate from |
Output the original row |
Select this check box to output original data rows in addition to the |
Should null input return |
This check box is selected by |
Should empty input return empty |
When this check box is selected, empty values are left unchanged in |
Send invalid data to “Invalid” output flow |
This check box is selected by default.
The data are considered invalid when:
|
tStat |
Select this check box to gather the Job processing metadata at the Job level |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.
tDataMasking properties for Apache Spark Streaming
These properties are used to configure tDataMasking running in the Spark Streaming Job framework.
The Spark Streaming
tDataMasking component belongs to the Data Quality family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Schema and |
A schema is a row description. It defines the number of fields Click Sync Click Edit
The output schema of this component contains read-only
columns:
|
|
Built-In: You create and store the schema locally for this component |
|
Repository: You have already created the schema and stored it in the |
Modifications |
Define in the table what fields to change and how to change
Input Column: Select the column from These Category: select a category of masking functions from the list.
Function: Select the function that The functions you can For example, if the column type Method: Select the Basic method or one FF1 algorithm (Format-Preserving The Basic method is the default Note: As the masking methods are stronger, it is recommended to use the FF1
algorithms rather than the Basic method. The FF1 with AES method is based Note: Java 8u161 is the minimum
required version to use the FF1 with AES method. To be able to use this FPE method with Java versions earlier than 8u161, download the Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files from Oracle website. The FF1 with AES and The Method list is only available for functions that use Format-Preserving When using the Characters that When selecting the Best guess alphabet, masked values contain characters Any unrecognized character is copied to the output as is.
Extra Parameter: This field is used
Keep format: this function is only |
Advanced settings
Password for FF1 methods |
Set the password |
Use tweaks with FF1 Encryption |
Select this If bijective |
Seed for random generator |
Set a random number if you want to generate If you do not set the seed, the component |
Encoding |
Select the encoding from the list or select Custom and define it manually. If you select Custom and leave the field empty, the supported When you set Function to Generate from |
Output the original row |
Select this check box to output original data rows in addition to the |
Should null input return |
This check box is selected by |
Should empty input return empty |
When this check box is selected, empty values are left unchanged in |
Send invalid data to “Invalid” output flow |
This check box is selected by default.
The data are considered invalid when:
|
tStat |
Select this check box to gather the Job processing metadata at the Job level |
Usage
Usage rule |
This component, along with the Spark Streaming component Palette it belongs to, appears This component is used as an intermediate step. You need to use the Spark Configuration tab in the This connection is effective on a per-Job basis. For further information about a Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.