August 16, 2023

tVerifyEmail – Docs for ESB 6.x

tVerifyEmail

Verifies and formats email addresses against patterns and regular
expression.

tVerifyEmail
verifies if email addresses comply with specific rules and corrects addresses that do not
match the rules by using the content from specific columns.

Depending on the Talend solution you
are using, this component can be used in one, some or all of the following Job
frameworks:

tVerifyEmail Standard properties

These properties are used to configure tVerifyEmail running in the Standard Job framework.

The Standard
tVerifyEmail component belongs to the Data Quality family.

The component in this framework is available when you have subscribed to one of
the Talend Platform products or Talend Data
Fabric.

Basic settings

Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Edit Schema

Click Edit schema to make changes to the
schema. Note that if you make changes, the schema automatically becomes built-in.

The output schema of tVerifyEmail has different read-only
columns depending on the options you select in the component Basic
settings
view. Read-only output columns include:

VerificationLevel: provides you with the verification
status of the processed email addresses as the following:

VALID: means that the email address comply with the defined
rule.

INVALID: means that the email address does not comply with the
defined rule.

CORRECTED: means that the input email does not comply with the
defined rule and has been corrected by using the content of the selected columns. This
column is available only when you select the Use column
content
option in the LOCAL Part Options
section.

VERIFIED: means that the email address does exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

REJECTED: means that the email address does not exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

Suggested_Email: provides you with a suggested content
for the email part before the @ sign. The email string is built up from the columns you
select from the Use column content view.

Column to validate

Select from the list the column you want to validate with tVerifyEmail.

Check the entire email with regular expression

Select this check box if you want to match the complete email address against a specific
regular expression.

Complete regular expression: enter the regular expression
against which you want to match email addresses.

This match is done as a first step to optimize the matching process and exclude addresses
that have problems before going any further to match the local and domain parts of email
addresses.

LOCAL Part Options

Fields in this section will vary according to what option you select. “LOCAL part” in an
email address refers to the string before the @ sign.

Use regular expression: enter in the Pattern field the expression against which you want to check the
local part of the email address.

Use simplified pattern: enter in the Pattern field the simplified pattern against which you want to check
the local part of the email address. Select the Show syntax of
simplified pattern
option to display the syntax to use for simplified
patterns.

Use column content: use the fields in this view to
decide the content against which you want to check the local part of the email. If the local
part does not match what you have defined, it will be rewritten by using the content of the
fields.

Enable case-sensitive pattern matching: select this
check box to enable a case sensitive pattern matching of the local part of email addresses.
You can use case sensitive pattern matching with each of the above options.

DOMAIN Part Options

Fields in this view will vary according to what option you select.

Check the Top-level Domains and the following ones:
select this check box to verify the part of the email address which follows the last dot.
You can use the Additional Top-level Domains table to add
additional top-level domains against which you want to validate email addresses.

Check domains with a black list: select this option to
verify the domains you define in the Domain list table as
black listed.

Check domains with a white list: select this option to
verify the domains you define in the Domain List table as
white listed.

Check with mail server callback

Select this check box to enable the verification of email addresses by
the SMTP server.

With this technique, the mail server verifies the complete address
(parts before and after the @ sign). It establishes a successful SMTP
connection to the mail exchanger (MX) of the email address. Then it
queries the exchanger, and make sure that it accepts the address as a
valid one. This is done in the same way as sending an email to the
address, however the process is stopped after the mail exchanger accepts
or rejects the address.

It is not advisable to enable the SMTP verification when you have a
lot of email addresses with different domains to check as some mail
servers may not reply correctly and even black list your IP
address.

The following is a list of cases when the SMTP verification will not
work properly:

– When the mail server requires authentication,

– When the mail server has a security policy that may put your IP put
into a black list and reject your queries,

– When the mail server is taking too long to reply (time out),

– Any other unexpected exception generated by the mail server.

In all these cases, the component results will only take into account
the results from the other rules you set in the component
settings.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is usually used as an intermediate component, and it requires an
input component and an output component.

Scenario: Verify email addresses against column content and domain names

This scenario applies only to a subscription-based Talend Platform solution or Talend Data Fabric.

This scenario describes a Job which uses:

  • the tFixedFlowInput component to generate the
    email addresses to be analyzed,

  • the tverifyEmail component to format the email
    addresses through
    Talend
    email API,

  • the tFileOutputExcel component to output the
    formatted addresses in an .xls file.

use_case-tverifyemail.png

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tVerifyEmail and
    tFileOutputExcel.
  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its
    Basic settings view in the Component tab.

    use_case-tverifyemail2.png

  2. Create the schema through the Edit Schema
    button.

    In the open dialog box, click the [+] button
    and add the columns that will hold input address data. For this example, add
    firstname, lastname and
    email.
  3. Click OK.
  4. In the Number of rows field, enter
    1.
  5. In the Mode area, select the Use Inline Table option.
  6. In the Inline table, use the [+] button to add lines to the table and then enter the
    address data you want to analyze.

Verifying and formatting email addresses

  1. Double-click tVerifyEmail to display the
    Basic settings view and define the component
    properties.

    use_case-tverifyemail3.png

  2. If required, click Sync columns to retrieve
    the schema defined in the input component.
  3. Click the Edit schema button to open the
    schema dialog box.

    tVerifyEmail proposes predefined read-only
    address columns as shown in the below capture.
    use_case-tverifyemail4.png

    The VerificationLevel column returns the
    verification status of input email addresses. The SuggestedEmail column returns a suggested content for the email part
    before the @ sign. This column is shown in the output schema only if you select
    theUse column content option in the Local Part Options section. For further information about
    output columns, see tVerifyEmail Standard properties.
  4. Move any of the input columns to the output schema if you want to show them in
    the verification results, click OK and accept
    to propagate the changes.
  5. From the Column to validate list, select the
    email column.
  6. In the LOCAL Part Options section, select the
    Use column content option.

    In this example, you want to check the email part before the @ sign to see if
    it starts with the first letter of the first name followed by the family name,
    all in lower case. If the local part does not match what you have defined,
    tVerifyEmail will rewrite it by using the
    parameters you define.
  7. In the DOMAIN Part Options, select:

    • the Check the default Top-level Domains and the
      following ones
      check box and define in the table the
      additional top-level domain against which you want to validate email
      addresses.

    • the Check domains with a black list
      check box and define in the Domain list
      table the domain to consider as black listed.

  8. Select the Check with mail server callback
    check box to enable the mail server to verify the complete address and accept or
    reject the email.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel component
    to display the Basic settings view and define
    the component properties.

    use_case-tverifyemail5.png

  2. Set the destination file name as well as the sheet name and then select the
    Define all columns auto size check box.
  3. Save your Job and press F6 to execute
    it.

    The tVerifyEmail component analyzes email
    addresses and corrects those that do not match what you have defined in the
    local and domain part options.
  4. Right-click the output component and select Data
    Viewer
    to display the formatted email addresses.

    use_case-tverifyemail6.png

    tVerifyEmail matches input addresses against
    the rule you set in the LOCAL part options
    section and the parameters you set for the domain names.
    The VerificationLevel output column returns
    the status as VALID, INVALID,
    CORRECTED and REJECTED according
    to what you set/selected in tVerifyEmail basic
    settings.
    All email addresses labeled as CORRECTED have a suggested
    address in the SuggestedEmail output column.

tVerifyEmail properties for Apache Spark Batch

These properties are used to configure tVerifyEmail running in the Spark Batch Job framework.

The Spark Batch
tVerifyEmail component belongs to the Data Quality family.

The component in this framework is available when you have subscribed to any Talend Platform product with Big Data or Talend Data
Fabric.

Basic settings

Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Edit Schema

Click Edit schema to make changes to the
schema. Note that if you make changes, the schema automatically becomes built-in.

The output schema of tVerifyEmail has different read-only
columns depending on the options you select in the component Basic
settings
view. Read-only output columns include:

VerificationLevel: provides you with the verification
status of the processed email addresses as the following:

VALID: means that the email address comply with the defined
rule.

INVALID: means that the email address does not comply with the
defined rule.

CORRECTED: means that the input email does not comply with the
defined rule and has been corrected by using the content of the selected columns. This
column is available only when you select the Use column
content
option in the LOCAL Part Options
section.

VERIFIED: means that the email address does exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

REJECTED: means that the email address does not exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

Suggested_Email: provides you with a suggested content
for the email part before the @ sign. The email string is built up from the columns you
select from the Use column content view.

Column to validate

Select from the list the column you want to validate with tVerifyEmail.

Check the entire email with regular expression

Select this check box if you want to match the complete email address against a specific
regular expression.

Complete regular expression: enter the regular expression
against which you want to match email addresses.

This match is done as a first step to optimize the matching process and exclude addresses
that have problems before going any further to match the local and domain parts of email
addresses.

LOCAL Part Options

Fields in this section will vary according to what option you select. “LOCAL part” in an
email address refers to the string before the @ sign.

Use regular expression: enter in the Pattern field the expression against which you want to check the
local part of the email address.

Use simplified pattern: enter in the Pattern field the simplified pattern against which you want to check
the local part of the email address. Select the Show syntax of
simplified pattern
option to display the syntax to use for simplified
patterns.

Use column content: use the fields in this view to
decide the content against which you want to check the local part of the email. If the local
part does not match what you have defined, it will be rewritten by using the content of the
fields.

Enable case-sensitive pattern matching: select this
check box to enable a case sensitive pattern matching of the local part of email addresses.
You can use case sensitive pattern matching with each of the above options.

DOMAIN Part Options

Fields in this view will vary according to what option you select.

Check the Top-level Domains and the following ones:
select this check box to verify the part of the email address which follows the last dot.
You can use the Additional Top-level Domains table to add
additional top-level domains against which you want to validate email addresses.

Check domains with a black list: select this option to
verify the domains you define in the Domain list table as
black listed.

Check domains with a white list: select this option to
verify the domains you define in the Domain List table as
white listed.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Batch component Palette it belongs to, appears only
when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs.

Spark Connection

You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Batch version of this component
yet.

tVerifyEmail properties for Apache Spark Streaming

These properties are used to configure tVerifyEmail running in the Spark Streaming Job framework.

The Spark Streaming
tVerifyEmail component belongs to the Data Quality family.

The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.

Basic settings

Schema

A schema is a row description. It defines the number of fields (columns) to
be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

 

Built-In: You create and store the
schema locally for this component only. Related topic: see
Talend Studio

User Guide.

 

Repository: You have already created
the schema and stored it in the Repository. You can reuse it in various projects and
Job designs. Related topic: see
Talend Studio

User Guide.

Edit Schema

Click Edit schema to make changes to the
schema. Note that if you make changes, the schema automatically becomes built-in.

The output schema of tVerifyEmail has different read-only
columns depending on the options you select in the component Basic
settings
view. Read-only output columns include:

VerificationLevel: provides you with the verification
status of the processed email addresses as the following:

VALID: means that the email address comply with the defined
rule.

VALID: means that the email address comply with the defined
rule.

INVALID: means that the email address does not comply with the
defined rule.

INVALID: means that the email address does not comply with the
defined rule.

CORRECTED: means that the input email does not comply with the
defined rule and has been corrected by using the content of the selected columns. This
column is available only when you select the Use column
content
option in the LOCAL Part Options
section.

VERIFIED: means that the email address does exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

REJECTED: means that the email address does not exist at the domain.
This column is available only when you select the Check with mail
server callback
option.

Suggested_Email: provides you with a suggested content
for the email part before the @ sign. The email string is built up from the columns you
select from the Use column content view.

Column to validate

Select from the list the column you want to validate with tVerifyEmail.

Check the entire email with regular expression

Select this check box if you want to match the complete email address against a specific
regular expression.

Complete regular expression: enter the regular expression
against which you want to match email addresses.

This match is done as a first step to optimize the matching process and exclude addresses
that have problems before going any further to match the local and domain parts of email
addresses.

LOCAL Part Options

Fields in this section will vary according to what option you select. “LOCAL part” in an
email address refers to the string before the @ sign.

Use regular expression: enter in the Pattern field the expression against which you want to check the
local part of the email address.

Use simplified pattern: enter in the Pattern field the simplified pattern against which you want to check
the local part of the email address. Select the Show syntax of
simplified pattern
option to display the syntax to use for simplified
patterns.

Use column content: use the fields in this view to
decide the content against which you want to check the local part of the email. If the local
part does not match what you have defined, it will be rewritten by using the content of the
fields.

Enable case-sensitive pattern matching: select this
check box to enable a case sensitive pattern matching of the local part of email addresses.
You can use case sensitive pattern matching with each of the above options.

DOMAIN Part Options

Fields in this view will vary according to what option you select.

Check the Top-level Domains and the following ones:
select this check box to verify the part of the email address which follows the last dot.
You can use the Additional Top-level Domains table to add
additional top-level domains against which you want to validate email addresses.

Check domains with a black list: select this option to
verify the domains you define in the Domain list table as
black listed.

Check domains with a white list: select this option to
verify the domains you define in the Domain List table as
white listed.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component, along with the Spark Streaming component Palette it belongs to, appears
only when you are creating a Spark Streaming Job.

This component is used as an intermediate step.

You need to use the Spark Configuration tab in the
Run view to define the connection to a given Spark cluster
for the whole Job.

This connection is effective on a per-Job basis.

For further information about a
Talend
Spark Streaming Job, see the sections
describing how to create, convert and configure a
Talend
Spark Streaming Job of the

Talend Open Studio for Big Data Getting Started
Guide

.

Note that in this documentation, unless otherwise
explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional
Talend
data integration Jobs.

Spark Connection
You need to use the Spark Configuration tab in
the Run view to define the connection to a given
Spark cluster for the whole Job. In addition, since the Job expects its dependent jar
files for execution, you must specify the directory in the file system to which these
jar files are transferred so that Spark can access these files:

  • Yarn mode: when using Google
    Dataproc, specify a bucket in the Google Storage staging
    bucket
    field in the Spark
    configuration
    tab; when using other distributions, use a
    tHDFSConfiguration
    component to specify the directory.

  • Standalone mode: you need to choose
    the configuration component depending on the file system you are using, such
    as tHDFSConfiguration
    or tS3Configuration.

This connection is effective on a per-Job basis.

Related scenarios

No scenario is available for the Spark Streaming version of this component
yet.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x