tPatternMasking
Masks data that follows a specific pattern and can transform the original data in
consistent manner, if needed.
If you need to mask heterogeneous data, you can use the
tDataMasking component. For more information, see the
tDataMasking documentation on Talend Help Center (https://help.talend.com).
tPatternMasking replaces pattern-specific and generic data with random
characters from a specified range of date and numeric values or a set of named values.
The actual data is protected and the substitute data is functional for occasions when it
is not advisable to show sensitive real data.
Data will keep looking real and consistent and will remain usable for purposes such as
testing and training. The most common data type which may need masking method is where
the data contains Personally Identifiable Information (PII) or Sensitive Personal Data
(SPD).
similar but inauthentic version of the data after performing the data masking operations
you defined on data fields:
- The component identifies spaces, slashes (/), dashes (-) and points (.) in the
input as separators. - The component preserves the pattern of the input values in the masked
output. - The component generates one row for each input row.
For example, the masked output for 615/67/7489 could be
379/48/1789.
When the input data does not match the pattern you defined,
tPatternMasking outputs null
.
In local mode, Apache Spark 1.6.0, 2.2.0, 2.3.0 and 2.4.0 are supported.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tPatternMasking Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
-
Spark Batch: see tPatternMasking properties for Apache Spark Batch.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
-
Spark Streaming: see tPatternMasking properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
- tPatternMasking Standard properties
These properties are used to configure tPatternMasking running in the Standard Job framework. - tPatternMasking properties for Apache Spark Batch
These properties are used to configure tPatternMasking running in the Spark Batch Job framework. - tPatternMasking properties for Apache Spark Streaming
These properties are used to configure tPatternMasking running in the Spark Streaming Job framework. - Masking Australian phone numbers
The Job in this scenario uses the tPatternMasking component to mask Australian phone numbers in the same format as the input values. - Masking Medicare beneficiary identifiers