tMysqlCDC
Extracts only the changes made to the source operational data and makes them
available to the target system(s) using database CDC views.
tMysqlCDC extracts source system data
that has changed since the last extraction and transports it to another system(s).
tMysqlCDC Standard properties
These properties are used to configure tMysqlCDC running in the Standard Job framework.
The Standard
tMysqlCDC component belongs to the Databases family.
The component in this framework is available in all subscription-based Talend products.
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.
Basic settings
Database |
Select a type of database from the list and click |
Property type |
Either Built-in or Repository. |
 |
Built-in: No property data stored |
 |
Repository: Select the repository Warning:
Reset the database type by clicking the relevant |
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to Note: When a Job contains the parent Job and the child Job, if you
need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:
For an example about how to share a database connection |
Host |
Database server IP address. |
Port |
Database server listening port number. |
Database |
Name of the database. |
Username and |
Database user authentication data. To enter the password, click the […] button next to the |
Schema using CDC and Edit |
A schema is a row description, it defines the number of fields to |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the Warning:
Reset the database type by clicking the relevant |
 |
Click Edit
|
Table using CDC |
Select the source table from which changes made to data are to be |
Subscriber |
Enter the name of the application that will use the change |
Events to catch |
Insert: Select this check box to
Update: Select this check box to
Delete: Select this check box to |
Limit |
Maximum number of consumed rows a subscriber can recover from the |
Advanced settings
Additional JDBC parameters |
Specify additional connection properties for the database Not available when the Use an existing |
Keep data in CDC table | Select this check box to keep the changes made available to one or more target systems, even after they have been consulted. |
Enable Streaming Result | Select this check box to enables streaming over buffering which allows the code to read from a large table without consuming a large amount of memory in order to optimize the performance. |
Trim all the String/Char columns |
Select this check box to remove leading and trailing whitespace |
Trim column |
Remove leading and trailing whitespace from defined Note:
Select Trim all the String/Char |
tStatCatcher Statistics |
Select this check box to collect log data at the component |
Enable parallel execution |
Select this check box to perform high-speed data processing, by treating
multiple data flows simultaneously. Note that this feature depends on the database or the application ability to handle multiple inserts in parallel as well as the number of CPU affected. In the Number of parallel executions field, either:
Note that when parallel execution is enabled, it is not possible to use global
|
Global Variables
Global Variables |
NB_LINE: the number of rows processed. This is an After
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is used as a start component. It requires an output |
Â
Populating a data warehouse
This scenario applies only to subscription-based Talend products.
The following Java scenario creates a three-component Job that populates a data
warehouse. A tMysqlInput component reads your customer
data stored in the Customer base. A tMap component
allows you to modify this data and the modifications are transmitted to the
Leadfact table in the CRM database through a tMysqlOutput component.
Linking the components
-
Drop the following components from the Palette onto the design workspace: tMysqlInput, tMap, and
tMysqlOutput. -
Connect the three components using Row > Main
links.
Configuring the components
-
In the design workspace, select tMysqlInput and click the Component tab to define its basic settings.
-
Set Property Type to Repository and then select the connection to the
Customer database that holds the information about your clients. The
connection details will display automatically in the corresponding
fields.Note:If you have not stored the database connection details in the
Metadata entry in the Repository,
select Built-in in the property type
list and set the connection details manually. -
Set Schema to Repository and click the three-dot button to select the
schema of the Customer database stored in the Metadata entry.Related topics: see
Talend Studio
User Guide. -
In the Table Name field, enter the name
of the table holding the information you want to modify, in this example:
customers. -
Click Guess Query to retrieve all data
from your table. -
Double-click the tMap component to open
the Map Editor. Notice that the Input area
is already filled with the metadata of the input component. -
Drag the fields in the input zone to the fields in the Leadfact
table in the output zone. For more information regarding data
mapping, see
Talend Studio
User Guide. -
Click OK to validate the
operation. -
In the design workspace, select tMySqlOutput and click the Component tab to define its basic settings.
-
Set Property Type to Repository and then select the connection to the
CRM data warehouse. The connection details will display automatically in the
corresponding fieldsNote:If you have not stored the CRM data warehouse connection details in
the Metadata entry in the Repository,
select Built-in in the property type
list and set the connection details manually.Related topics: see
Talend Studio
User Guide. -
In the Table Name field, enter the name
of the table you want to populate with modified data, in this example:
leadfact.
Executing the Job
- Press Ctrl + S to save your Job.
-
Press F6 to run the Job to create and
populate the table Leadfact in the CRM data
warehouse.
Retrieving modified data using CDC
This scenario applies only to subscription-based Talend products.
This scenario is based on the preceding one. It continuously populates and
modifies the data stored in the CRM warehouse, and retrieves and saves, every night, these
modifications in a dedicated table using the CDC function. These modifications could be then
extracted by the various concerned departments.
Configuring CDC
Before being able to retrieve modified data from the CRM data warehouse, you
must:
-
Set up the database connection dedicated to CDC,
-
Set up a database connection to the source data and identify the table to
catch, -
Set the connection between the CDC and the data.
Create connections and subscribers
-
In the Repository tree view and under
Metadata, create a connection to your
database dedicated to CDC, in this scenario
CDC_connection.Note:Ensure that the database connection for CDC is on the same server with
the source data to which changes are to be captured. -
In the Repository tree view and under
Metadata, create a connection to the
source data warehouse and identify the table to catch, in this scenario
CRM_connection. -
Right-click the CRM connection and select Retrieve schema from the drop-down menu to
retrieve the schema of the table to catch. -
Right-click CDC Foundation of
CRM and select Create
CDC in the drop-down menu.The Create Change Data Capture dialog
box displays -
In the Set link Connection field, select
CDC_connection. -
Click Create Subscriber. The Create Subscriber and Execute SQL Script dialog
box displays. - Click Execute and then Close.
-
Click Finish to validate the creation of
the subscriber table.In the CDC Foundation folder, the
relevant subscriber table displays.
Specify which table the subscriber wants to subscribe to and then activate
the subscription
-
Right-click the Leafact schema in the source CRM and
select Add CDC in the drop-down list. The
Create Subscriber and Execute SQL
Script dialog box displays. -
In the Events to catch check boxes,
select Insert, Update and Delete to catch
inserted, updated or deleted data. -
In the Subscriber Name field, enter the
name of the subscriber that will have access to the modifications, in this
scenario Sub_Mktg for the Marketing department. -
Click Execute and then Close to validate the subscription.
In the CDC Foundation folder, the two
created tables display and the schema node of the catched table is marked
with a green CDC symbol.
Create the new subscribers Sub_Finance and
Sub_Sales for the Treasury and Sales departments
respectively
-
Right-click Leadfact and select Edit CDC Subscribers from the drop-down list. The
Edit CDC dialog box displays. - Click Add. The Input subscriber name dialog box displays.
-
Enter the name of the subscriber, in this scenario
Sub_Finance and
Sub_Sales. - Click Execute and then Close to validate the creation operation.
Modifying the CRM data
Modify the data of your customers in your CRM, for example, convert all customer
names to upper case.
-
Double-click the tMap component and
enterrow1.CustomerName.toUpperCase()
in front of the CustomerName column to convert all customer names
to upper case. - Click Ok.
-
Double-click the tMysqlOutput
component. -
In the Action on table field, select
None. -
In the Action on data field, select
Insert or update to insert or update
table data. -
Save your job and press F6 to execute
the job.
To view all changes done on data, right-click the Leadfact
table and select View All Changes to open the
relevant dialog box.
Extracting change data
After setting up the CDC environment, you can now design a job using the Mysql CDC
component to incrementally extract the change data from the
Leadfact table. To do that:
- From the Palette, drop the tMysqlCDC and tLogRow components to the design workspace.
-
Link the two components using a Row Main
link. -
Double-click the tMysqlCDC component to
define its properties. -
Set Property Type to Repository and then select the select the schema
corresponding to your Mysql database table,
CDC_connection in this scenario. The connection
details will display automatically in the corresponding fieldsNote:If you have not stored the CRM data warehouse connection details in
the Metadata entry in the Repository,
select Built-in in the property type
list and set the connection details manually. -
In the Schema using CDC field, select
Repository and then select the schema
of the Leadfact table stored in the Metadata entry. -
In the Table using CDC field, enter the
name of the table captured by the CDC, in this scenario
Leadfact. -
In the Subscriber field, enter the name
of the subscriber that will extract modified data,
Sub_Mktg,
Sub_Sales, and Sub_Finance for the
Marketing, Sales and Treasury Departments respectively. -
In the Events to catch field, select the
check boxes corresponding to the type of the modified data the subscriber
will extract. In this scenario, select the three check boxes for the three
subscribers. -
Double-click the tLogRow component to
set is properties. -
Click the Sync columns button to
retrieve the schema from the preceding component. -
Save your job and press F6 to execute
it.
The customer names are converted to upper case and the modification type displays
here is U to stand for Update.
Once these modifications are extracted, they are no more available in the modified
table. To verify their extraction, right-click the Leadfact
table catched by the CDC and then select Views All
Changes. The extracted changes do not display anymore.