August 17, 2023

tOracleCDC – Docs for ESB 5.x

tOracleCDC

tOracleCDC.png

tOracleCDC properties

Component family

Databases/Oracle

 

Function

tOracleCDC extracts source system
data that has changed since the last extraction and transports it to
another/other system(s).

Purpose

tOracleCDC addresses data
extraction and transportation needs. It only extracts the changes
done to the source operational data and makes them available to the
target system(s) using database CDC views.

Basic settings

Property of the CDC connection

Either Built-in or Repository.

 

 

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

 

 

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

Use an existing connection

Select this check box and in the Component List click the
relevant connection component to reuse the connection details you already defined.

Note

When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:

  1. In the parent level, register the database connection to be shared in the
    Basic settings view of the connection
    component which creates that very database connection.

  2. In the child level, use a dedicated connection component to read that
    registered database connection.

For an example about how to share a database connection across Job levels, see
Talend Studio User
Guide
.

 

Connection Type

Select a connection type from the drop-down list:

  • Oracle SID: Select this
    connection type to uniquely identify a particular database
    on a system.

  • Oracle Service Name: Select
    this connection type to use the TNS alias that you give when
    you connect to the remote database.

  • Oracle OCI: Select this
    connection type to use Oracle Call Interface with a set of
    C-language software APIs that provide an interface to the
    Oracle database.

  • Oracle CUSTOM: Select this
    connection type to access a clustered database.

 

DB Version

Select the version of the Oracle database being used.

 

Host

Specify the host name or IP address of the Oracle database.

This field appears only when Oracle
SID
or Oracle Service
Name
is selected from the Connection Type drop-down list.

 

Port

Specify the listening port number of the Oracle database.

This field appears only when Oracle
SID
or Oracle Service
Name
is selected from the Connection Type drop-down list.

 

Database

Specify the Oracle database name.

This field appears only when Oracle
SID
or Oracle Service
Name
is selected from the Connection Type drop-down list.

 

Local Service Name

Specify the local service name of the Oracle database.

This field appears only when Oracle
OCI
is selected from the Connection Type drop-down list.

 

URL

Specify the URL of the remote Oracle database.

This field appears only when Oracle
CUSTOM
is selected from the Connection Type drop-down list.

 

Oracle schema

Specify the name of the Oracle schema.

 

Username and
Password

Specify the user authentication data of the Oracle
database.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

 

Schema using CDC and Edit
schema

A schema is a row description, it defines the number of fields to
be processed and passed on to the next component. The schema is
either Built-in or stored remotely
in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

The schema is set by default according to the CDC mode being
used.

 

 

Built-In: You create and store the schema locally for this
component only. Related topic: see Talend Studio
User Guide.

 

 

Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various projects and Job designs. Related
topic: see Talend Studio User Guide.

 

CDC mode

Select a CDC mode to be used:

  • Trigger: the CDC trigger
    mode.

  • Log: the CDC Redo/Archive
    log mode. This mode is available only when the DB Version is Oracle 11-6 or a previous
    version.

  • XStream: the CDC XStream
    mode. This mode is available only when the Connection Type is Oracle OCI and the DB Version is Oracle 12-7.

For more information about CDC modes, see Talend Studio User
Guide
.

 

Generate LCR type

Select a type for the LCR to be generated, either LCR Object or LCR
XML Document
.

This list appears only when the CDC mode is XStream.

 

Table using CDC

Specify the source table from which changes made to data are to be
captured.

 

Set data table schema

Select this check box if you want to specify the schema of the CDC
table.

This check box appears only when the CDC mode is Trigger or Log.

 

Subscriber

Specify the name of the application that will use the change table
if you use the Trigger mode to
capture changes done on data.

This field appears only when the CDC mode is Trigger.

 

Events to catch

Insert: Select this check box to
catch the data inserted in the change table since the last
extraction.

Update: Select this check box to
catch the data updated in the change table since the last
extraction.

Delete: Select this check box to
catch the data deleted in the change table since the last
extraction.

These check boxes appear only when the CDC mode is Trigger or Log.

 

Limit

Specify the maximum number of consumed rows a subscriber can
recover from the change table, per execution.

This field appears only when the CDC mode is Trigger.

 

Outbound server name

Specify the XStream outbound server name.

This field appears only when the CDC mode is XStream.

 

Keep listening

Select this check box to keep monitoring the outbound server for
data changes. The Job will not stop until you manually stop it when
you need to.

This check box appears only when the CDC mode is XStream.

Advanced settings

Keep data in CDC Table

Select this check box to keep the changes made available to one or
more target systems, even after they have been consulted.

This field appears only when the CDC mode is Trigger or Log.

 

Use cursor

Select this check box to specify the number of rows you want to
work with at any given time. This option optimises
performance.

This field appears only when the CDC mode is Trigger or Log.

 

Trim all the String/Char columns

Select this check box to delete any spaces found at the beginning
or end of fields, in all of the columns containing character
strings.

This field appears only when the CDC mode is Trigger or Log.

 

Trim column

Remove leading and trailing whitespace from defined
columns.

Note

Clear Trim all the String/Char
columns
to enable Trim columns in the
table.

This table appears only when the CDC mode is Trigger or Log.

 

Acknowledge interval(seconds)

Specify the value for the OCI_ATTR_XSTREAM_ACK_INTERVAL attribute
for the outbound server. The default value is 30 seconds.

For more infomrmation about this attribute, see http://docs.oracle.com/cd/E18283_01/server.112/e16545/xstrm_oci_intro.htm#CEGDICBB.

This field appears only when the CDC mode is XStream.

 

Idle timeout(seconds)

Specify the value for the OCI_ATTR_XSTREAM_IDLE_TIMEOUT attribute
for the outbound server. The default value is 1 second.

For more information about this attribute, see http://docs.oracle.com/cd/E18283_01/server.112/e16545/xstrm_oci_intro.htm#CEGDICBB.

This field appears only when the CDC mode is XStream.

 

tStatCatcher Statistics

Select this check box to collect the log data at component
level.

 

Enable parallel execution

Select this check box to perform high-speed data processing, by treating multiple data flows
simultaneously. Note that this feature depends on the database or the application ability to
handle multiple inserts in parallel as well as the number of CPU affected. In the Number of parallel executions field, either:

  • Enter the number of parallel executions desired.

  • Press Ctrl + Space and select the appropriate
    context variable from the list. For further information, see Talend Studio
    User Guide
    .

Note that when parallel execution is enabled, it is not possible to use global variables to
retrieve return values in a subjob.

Warning

  • The Action on table
    field is not available with the parallelization function. Therefore, you
    must use a tCreateTable component if you
    want to create a table.

  • When parallel execution is enabled, it is not possible to use global
    variables to retrieve return values in a subjob.

Global Variables 

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is used as a start component. It requires an output
component and row Main link.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Scenario 1: Populating a data warehouse

The following Java scenario creates a three-component Job that populates a data
warehouse. A tOracleInput component reads your client
data stored in the CLIENT table. A tMap component allows you to modify this data and the modifications are
transmitted to the Leadfact table in the database through a
tOracleOutput component.

  1. Drop the following components from the Palette onto the design workspace: tOracleInput, tMap, and
    tOracleOutput.

  2. Connect the three components using Row Main
    links. Click the link between tMap
    and tOracleOutput and name it
    leadfact.

    Use_Case_tOracleCDC1_1.png
  3. Double-click tOracleInput to display its
    Basic settings view and define its
    properties.

    Use_Case_tOracleCDC1_2.png
  4. Set Property Type to Repository and then select the connection to the
    TALEND database that holds the information about your
    clients. The connection details will display automatically in the corresponding
    fields.

    Note

    If you have not stored the DB connection details in the Metadata entry in the Repository, select
    Built-in in the property type list and
    set the connection details manually.

  5. Set Schema to Repository and click the three-dot button to select the schema
    of the CLIENT table stored in the Metadata folder.

    Related topics: see Talend Studio User
    Guide
    .

  6. In the Table Name field, enter the name of
    the table holding the information you want to modify,
    CLIENT in this example.

  7. Click Guess Query to retrieve all data from
    your table.

  8. Double-click the tMap component to open the
    Map Editor. Notice that the input area to
    the left is already filled with the metadata of the input component.

    Use_Case_tOracleCDC1_3.png
  9. Drag the fields in the input zone to the fields in the leadfact
    table in the output zone. For more information regarding data
    mapping, see Talend Studio User
    Guide
    .

  10. Click OK to validate the operation.

  11. In the design workspace, double-click tOracleOutput to display its Basic
    settings
    view and define its properties.

    Use_Case_tOracleCDC1_4.png
  12. Set Property Type to Repository and then select the cdc_data
    connection to the data warehouse. The connection details will display
    automatically in the corresponding fields

    Note

    If you have not stored the data warehouse connection details in the
    Metadata folder in the Repository tree view, select Built-in in the property type list and set the
    connection details manually.

    Related topics: see Talend Studio User
    Guide
    .

  13. In the Table Name field, enter the name of
    the table you want to populate with modified data, LEADFACT
    in this example.

  14. Save your job and press F6 to create and
    populate the LEADFACT table in the data warehouse.

Scenario 2: Retrieving modified data using Oracle CDC Redo log mode

Warning

Before modifying data in the LEADFACT table and retrieving
these modifications using the tOracleCDC
component, you must activate the Redo log mode
in the Oracle database that holds the table to monitor. For more information on
how to activate the Redo log mode, see
Talend Studio User Guide.

This scenario is based on the preceding one. It continuously populates and modifies
the data stored in the LEADFACT table, and retrieves and saves,
every night, these modifications in a dedicated table using the CDC function. These
modifications could be then extracted by the various concerned departments.

Step 1: Configuring CDC

Before being able to retrieve modified data from the LEADFACT
table, you must:

  1. first set up the DB connection dedicated to CDC,

  2. second, set up a DB connection to the source data and identify the table
    to catch,

  3. finally, set the connection between the CDC and the data.

    To do that:

  4. In the Repository tree view and under
    Metadata, create a connection to your
    database dedicated to CDC, cdc_publisher in this
    scenario.

    Note

    Ensure that the DB connection for CDC is on the same server with the
    source data to which changes are to be captured.

  5. In the Repository tree view and under
    Metadata, create a connection to the
    source data warehouse and identify the table to catch,
    LEADFACT in this scenario.

  6. Right-click the connection to the orcle database and
    select Retrieve schema from the drop-down
    menu to retrieve the schema of the table to catch.

  7. Right-click CDC Foundation of the
    cdc-data connection and select Create CDC in the drop-down menu.

    The [Create Change Data Capture] dialog
    box displays

    Use_Case_tOracleCDC_Create_Sub2_1.png
  8. Click the three-dot button next to the Set link
    Connection
    field and select the connection that corresponds
    to CDC, cdc_publisher in this example.

  9. In the Options area, select the Log mode check box.

  10. Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog
    box displays.

    Use_Case_tOracleCDC_Create_Sub2_2.png
  11. Click Execute and then Close to close the dialog box.

  12. Click Finish to validate the creation of
    the subscriber table.

    In the CDC Foundation folder, the
    relevant subscriber table displays.

    You must specify which table the subscriber wants to subscribe to and
    then

  13. Right-click the LEADFACT schema of the
    cdc-data connection and select Add CDC in the drop-down list. The [Create Subscriber and Execute SQL Script] dialog
    box displays.

    Note

    For Oracle databases and in Talend Studio version 3.2 or subsequent versions,
    the CDC system creates an alias for the source table(s) monitored. This
    avoids problems relating to identifier length upon creation of the table
    of changes and its associated view. For CDC systems which are already in
    place, the table names are maintained.

    Use_Case_tOracleCDC_Create_Sub2_3.png
  14. Click Execute and then Close to validate the subscription.

    In the CDC Foundation folder, the two
    created tables display and the schema node of the catched table is marked
    with a green CDC symbol.

    Use_Case_tOracleCDC_arbo.png

Step 2: Modifying the data

Modify the data of your clients in the LEADFACT table, for
example, convert all customer names to upper case.

  1. Double-click the tMap component and enter
    row1.LASTNAME.toUpperCase()in front of the CLASTNAME column to convert all customer names to
    upper case.

  2. Click Ok.

  3. Double-click the tOracleOutput
    component.

  4. In the Action on table field, select
    None.

  5. In the Action on data field, select
    Insert or update to insert or update
    table data.

  6. Save your Job and press F6 to execute
    it.

To view all changes done on data, right-click the LEADFACT
table and select View All Changes to open the
relevant dialog box.

Step 3: Extracting change data

After setting up the CDC environment, you can now design a job using the Oracle
CDC component to incrementally extract the change data from the
LEADFACT table. To do that:

  1. From the Palette, drop the OracleCDC and tLogRow components to the design workspace.

  2. Link the two components using a Row Main
    link.

    Use_Case_tOracleCDC2_1.png
  3. Double-click tOracleCDC to open its
    Basic settings view and define its
    properties.

    Use_Case_tOracleCDC2_2.png
  4. Set Property Type to Repository and then select the schema
    corresponding to your Oracle DB table, cdc_publisher in
    this scenario. The connection details will display automatically in the
    corresponding fields

    Note

    If you have not stored the data warehouse connection details in the
    Metadata folder in the Repository tree view, select Built-in in the property type list and set
    the connection details manually.

  5. In the Schema using CDC field, select
    Repository and then select the schema
    of the LEADFACT table stored in the Metadata folder.

  6. In the Table using CDC field, enter the
    name of the table captured by the CDC, in this scenario
    Leadfact.

  7. In the Events to catch field, select the
    check boxes corresponding to the type of the modified data the subscriber
    will extract. In this scenario, select the three check boxes for the three
    subscribers.

  8. Double-click tLogRow to display its
    Basic settings view and define its
    properties.

    Use_Case_tMysqlCDC3.png
  9. Click the Sync columns button to retrieve
    the schema from the preceding component.

  10. Click Edit schema to open the schema
    dialog box.

  11. In the TALEND_CDC_CREATION_DATE line of
    the Date Pattern column, enter between
    brackets the desired date format: "yyyy-MM-dd".

  12. Save your Job and press F6 to execute
    it.

In the Redo log mode, changes done on data are
indicated in the following way: modifications are equal to first, an “update and
delete” operation (UO), and then to an “update and
insert” operation (UN). Thus, client data displays
twice:

– First, data is deleted (UO).

-Second, data is inserted (UN).

Once these modifications are extracted, they are no more available in the modified
table. To verify their extraction, right-click the LEADFACT
table catched by the CDC and then select Views All
Changes
. The extracted changes do not display anymore.

For another CDC scenario using the Trigger mode,
see Scenario 2: Retrieving modified data using CDC.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x