July 30, 2023

tMysqlCDC – Docs for ESB 7.x

tMysqlCDC

Extracts only the changes made to the source operational data and makes them
available to the target system(s) using database CDC views.

tMysqlCDC extracts source system data
that has changed since the last extraction and transports it to another system(s).

tMysqlCDC Standard properties

These properties are used to configure tMysqlCDC running in the Standard Job framework.

The Standard
tMysqlCDC component belongs to the Databases family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click
Apply.

Property type

Either Built-in or Repository.

 

Built-in: No property data stored
centrally.

 

Repository: Select the repository
file in which the properties are stored. The fields that follow are
completed automatically using the data retrieved.

Warning:

Reset the database type by clicking the relevant
button to select the CDC connection.

Use an existing connection

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

Note: When a Job contains the parent Job and the child Job, if you
need to share an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:

  1. In the parent level, register the database connection
    to be shared in the Basic
    settings
    view of the connection component which creates that very database
    connection.

  2. In the child level, use a dedicated connection
    component to read that registered database connection.

For an example about how to share a database connection
across Job levels, see

Talend Studio
User Guide
.

Host

Database server IP address.

Port

Database server listening port number.

Database

Name of the database.

Username and
Password

Database user authentication data.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Schema using CDC and Edit
Schema

A schema is a row description, it defines the number of fields to
be processed and passed on to the next component. The schema is
either Built-in or stored remotely
in the Repository.

 

Built-In: You create and store the schema locally for this component
only.

 

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

Warning:

Reset the database type by clicking the relevant
button to select the schema of the CDC
connection.

 

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Table using CDC

Select the source table from which changes made to data are to be
captured.

Subscriber

Enter the name of the application that will use the change
table.

Events to catch

Insert: Select this check box to
catch the data inserted in the change table since the last
extraction.

Update: Select this check box to
catch the data updated in the change table since the last
extraction.

Delete: Select this check box to
catch the data deleted in the change table since the last
extraction.

Limit

Maximum number of consumed rows a subscriber can recover from the
change table, per execution.

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the database
connection you are creating.

Not available when the Use an existing
connection
check box is selected.

Keep data in CDC table Select this check box to keep the changes made available
to one or more target systems, even after they have been
consulted.
Enable Streaming Result Select this check box to enables streaming over
buffering which allows the code to read from a large table without
consuming a large amount of memory in order to optimize the
performance.

Trim all the String/Char columns

Select this check box to remove leading and trailing whitespace
from all the String/Char columns.

Trim column

Remove leading and trailing whitespace from defined
columns.

Note:

Select Trim all the String/Char
columns
to enable Trim columns in this
field.

tStatCatcher Statistics

Select this check box to collect log data at the component
level.

Enable parallel execution
Select this check box to perform high-speed data processing, by treating
multiple data flows simultaneously. Note that this feature depends on the database or
the application ability to handle multiple inserts in parallel as well as the number of
CPU affected. In the Number of parallel executions
field, either:

  • Enter the number of parallel executions desired.
  • Press Ctrl + Space and select the
    appropriate context variable from the list. For further information, see
    Talend Studio User Guide
    .

Note that when parallel execution is enabled, it is not possible to use global
variables to retrieve return values in a subjob.

  • The Action on
    table
    field is not available with the
    parallelization function. Therefore, you must use a tCreateTable component if you
    want to create a table.
  • When parallel execution is enabled, it is not
    possible to use global variables to retrieve return values in a
    subjob.

Global Variables

Global Variables 

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used as a start component. It requires an output
component and row Main link.

 

Populating a data warehouse

This scenario applies only to subscription-based Talend products.

The following Java scenario creates a three-component Job that populates a data
warehouse. A tMysqlInput component reads your customer
data stored in the Customer base. A tMap component
allows you to modify this data and the modifications are transmitted to the
Leadfact table in the CRM database through a tMysqlOutput component.

Linking the components

  1. Drop the following components from the Palette onto the design workspace: tMysqlInput, tMap, and
    tMysqlOutput.
  2. Connect the three components using Row > Main
    links.

    tMysqlCDC_1.png

Configuring the components

  1. In the design workspace, select tMysqlInput and click the Component tab to define its basic settings.

    tMysqlCDC_2.png

  2. Set Property Type to Repository and then select the connection to the
    Customer database that holds the information about your clients. The
    connection details will display automatically in the corresponding
    fields.

    Note:

    If you have not stored the database connection details in the
    Metadata entry in the Repository,
    select Built-in in the property type
    list and set the connection details manually.

  3. Set Schema to Repository and click the three-dot button to select the
    schema of the Customer database stored in the Metadata entry.

    Related topics: see
    Talend Studio

    User Guide.
  4. In the Table Name field, enter the name
    of the table holding the information you want to modify, in this example:
    customers.
  5. Click Guess Query to retrieve all data
    from your table.
  6. Double-click the tMap component to open
    the Map Editor. Notice that the Input area
    is already filled with the metadata of the input component.

    tMysqlCDC_3.png

  7. Drag the fields in the input zone to the fields in the Leadfact
    table in the output zone. For more information regarding data
    mapping, see
    Talend Studio

    User Guide.
  8. Click OK to validate the
    operation.
  9. In the design workspace, select tMySqlOutput and click the Component tab to define its basic settings.

    tMysqlCDC_4.png

  10. Set Property Type to Repository and then select the connection to the
    CRM data warehouse. The connection details will display automatically in the
    corresponding fields

    Note:

    If you have not stored the CRM data warehouse connection details in
    the Metadata entry in the Repository,
    select Built-in in the property type
    list and set the connection details manually.

    Related topics: see
    Talend Studio

    User Guide.
  11. In the Table Name field, enter the name
    of the table you want to populate with modified data, in this example:
    leadfact.

Executing the Job

  1. Press Ctrl + S to save your Job.
  2. Press F6 to run the Job to create and
    populate the table Leadfact in the CRM data
    warehouse.

Retrieving modified data using CDC

This scenario applies only to subscription-based Talend products.

This scenario is based on the preceding one. It continuously populates and
modifies the data stored in the CRM warehouse, and retrieves and saves, every night, these
modifications in a dedicated table using the CDC function. These modifications could be then
extracted by the various concerned departments.

Configuring CDC

Before being able to retrieve modified data from the CRM data warehouse, you
must:

  1. Set up the database connection dedicated to CDC,

  2. Set up a database connection to the source data and identify the table to
    catch,

  3. Set the connection between the CDC and the data.

Create connections and subscribers

  1. In the Repository tree view and under
    Metadata, create a connection to your
    database dedicated to CDC, in this scenario
    CDC_connection.

    Note:

    Ensure that the database connection for CDC is on the same server with
    the source data to which changes are to be captured.

  2. In the Repository tree view and under
    Metadata, create a connection to the
    source data warehouse and identify the table to catch, in this scenario
    CRM_connection.
  3. Right-click the CRM connection and select Retrieve schema from the drop-down menu to
    retrieve the schema of the table to catch.
  4. Right-click CDC Foundation of
    CRM and select Create
    CDC
    in the drop-down menu.

    The Create Change Data Capture dialog
    box displays
    tMysqlCDC_5.png

  5. In the Set link Connection field, select
    CDC_connection.
  6. Click Create Subscriber. The Create Subscriber and Execute SQL Script dialog
    box displays.

    tMysqlCDC_6.png

  7. Click Execute and then Close.
  8. Click Finish to validate the creation of
    the subscriber table.

    In the CDC Foundation folder, the
    relevant subscriber table displays.

Specify which table the subscriber wants to subscribe to and then activate
the subscription

  1. Right-click the Leafact schema in the source CRM and
    select Add CDC in the drop-down list. The
    Create Subscriber and Execute SQL
    Script
    dialog box displays.

    tMysqlCDC_7.png

  2. In the Events to catch check boxes,
    select Insert, Update and Delete to catch
    inserted, updated or deleted data.
  3. In the Subscriber Name field, enter the
    name of the subscriber that will have access to the modifications, in this
    scenario Sub_Mktg for the Marketing department.
  4. Click Execute and then Close to validate the subscription.

    In the CDC Foundation folder, the two
    created tables display and the schema node of the catched table is marked
    with a green CDC symbol.
    tMysqlCDC_8.png

Create the new subscribers Sub_Finance and
Sub_Sales for the Treasury and Sales departments
respectively

  1. Right-click Leadfact and select Edit CDC Subscribers from the drop-down list. The
    Edit CDC dialog box displays.
  2. Click Add. The Input subscriber name dialog box displays.
  3. Enter the name of the subscriber, in this scenario
    Sub_Finance and
    Sub_Sales.
  4. Click Execute and then Close to validate the creation operation.

Modifying the CRM data

Modify the data of your customers in your CRM, for example, convert all customer
names to upper case.

  1. Double-click the tMap component and
    enter row1.CustomerName.toUpperCase()in front of the CustomerName column to convert all customer names
    to upper case.
  2. Click Ok.
  3. Double-click the tMysqlOutput
    component.
  4. In the Action on table field, select
    None.
  5. In the Action on data field, select
    Insert or update to insert or update
    table data.
  6. Save your job and press F6 to execute
    the job.

To view all changes done on data, right-click the Leadfact
table and select View All Changes to open the
relevant dialog box.

Extracting change data

After setting up the CDC environment, you can now design a job using the Mysql CDC
component to incrementally extract the change data from the
Leadfact table. To do that:

  1. From the Palette, drop the tMysqlCDC and tLogRow components to the design workspace.
  2. Link the two components using a Row Main
    link.

    tMysqlCDC_9.png

  3. Double-click the tMysqlCDC component to
    define its properties.

    tMysqlCDC_10.png

  4. Set Property Type to Repository and then select the select the schema
    corresponding to your Mysql database table,
    CDC_connection in this scenario. The connection
    details will display automatically in the corresponding fields

    Note:

    If you have not stored the CRM data warehouse connection details in
    the Metadata entry in the Repository,
    select Built-in in the property type
    list and set the connection details manually.

  5. In the Schema using CDC field, select
    Repository and then select the schema
    of the Leadfact table stored in the Metadata entry.
  6. In the Table using CDC field, enter the
    name of the table captured by the CDC, in this scenario
    Leadfact.
  7. In the Subscriber field, enter the name
    of the subscriber that will extract modified data,
    Sub_Mktg,
    Sub_Sales, and Sub_Finance for the
    Marketing, Sales and Treasury Departments respectively.
  8. In the Events to catch field, select the
    check boxes corresponding to the type of the modified data the subscriber
    will extract. In this scenario, select the three check boxes for the three
    subscribers.
  9. Double-click the tLogRow component to
    set is properties.

    tMysqlCDC_11.png

  10. Click the Sync columns button to
    retrieve the schema from the preceding component.
  11. Save your job and press F6 to execute
    it.

    tMysqlCDC_12.png

The customer names are converted to upper case and the modification type displays
here is U to stand for Update.

Once these modifications are extracted, they are no more available in the modified
table. To verify their extraction, right-click the Leadfact
table catched by the CDC and then select Views All
Changes
. The extracted changes do not display anymore.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x