Component family |
Databases/Oracle |
|
Function |
tOracleCDC extracts source system |
|
Purpose |
tOracleCDC addresses data |
|
Basic settings |
Property of the CDC connection |
Either Built-in or Repository. |
|
|
Built-In: You create and store the schema locally for this |
|
|
Repository: You have already created the schema and |
|
Use an existing connection |
Select this check box and in the Component List click the NoteWhen a Job contains the parent Job and the child Job, if you need to share an existing
For an example about how to share a database connection across Job levels, see |
|
Connection Type |
Select a connection type from the drop-down list:
|
|
DB Version |
Select the version of the Oracle database being used. |
|
Host |
Specify the host name or IP address of the Oracle database. This field appears only when Oracle |
|
Port |
Specify the listening port number of the Oracle database. This field appears only when Oracle |
|
Database |
Specify the Oracle database name. This field appears only when Oracle |
|
Local Service Name |
Specify the local service name of the Oracle database. This field appears only when Oracle |
|
URL |
Specify the URL of the remote Oracle database. This field appears only when Oracle |
|
Oracle schema |
Specify the name of the Oracle schema. |
|
Username and |
Specify the user authentication data of the Oracle To enter the password, click the […] button next to the |
|
Schema using CDC and Edit |
A schema is a row description, it defines the number of fields to Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
The schema is set by default according to the CDC mode being |
|
|
Built-In: You create and store the schema locally for this |
|
|
Repository: You have already created the schema and |
|
CDC mode |
Select a CDC mode to be used:
For more information about CDC modes, see Talend Studio User |
|
Generate LCR type |
Select a type for the LCR to be generated, either LCR Object or LCR This list appears only when the CDC mode is XStream. |
|
Table using CDC |
Specify the source table from which changes made to data are to be |
|
Set data table schema |
Select this check box if you want to specify the schema of the CDC This check box appears only when the CDC mode is Trigger or Log. |
|
Subscriber |
Specify the name of the application that will use the change table This field appears only when the CDC mode is Trigger. |
|
Events to catch |
Insert: Select this check box to
Update: Select this check box to
Delete: Select this check box to These check boxes appear only when the CDC mode is Trigger or Log. |
|
Limit |
Specify the maximum number of consumed rows a subscriber can This field appears only when the CDC mode is Trigger. |
|
Outbound server name |
Specify the XStream outbound server name. This field appears only when the CDC mode is XStream. |
|
Keep listening |
Select this check box to keep monitoring the outbound server for This check box appears only when the CDC mode is XStream. |
Advanced settings |
Keep data in CDC Table |
Select this check box to keep the changes made available to one or This field appears only when the CDC mode is Trigger or Log. |
|
Use cursor |
Select this check box to specify the number of rows you want to This field appears only when the CDC mode is Trigger or Log. |
|
Trim all the String/Char columns |
Select this check box to delete any spaces found at the beginning This field appears only when the CDC mode is Trigger or Log. |
|
Trim column |
Remove leading and trailing whitespace from defined NoteClear Trim all the String/Char This table appears only when the CDC mode is Trigger or Log. |
|
Acknowledge interval(seconds) |
Specify the value for the OCI_ATTR_XSTREAM_ACK_INTERVAL attribute For more infomrmation about this attribute, see http://docs.oracle.com/cd/E18283_01/server.112/e16545/xstrm_oci_intro.htm#CEGDICBB. This field appears only when the CDC mode is XStream. |
|
Idle timeout(seconds) |
Specify the value for the OCI_ATTR_XSTREAM_IDLE_TIMEOUT attribute For more information about this attribute, see http://docs.oracle.com/cd/E18283_01/server.112/e16545/xstrm_oci_intro.htm#CEGDICBB. This field appears only when the CDC mode is XStream. |
|
tStatCatcher Statistics |
Select this check box to collect the log data at component |
|
Enable parallel execution |
Select this check box to perform high-speed data processing, by treating multiple data flows
Note that when parallel execution is enabled, it is not possible to use global variables to Warning
|
Global Variables |
NB_LINE: the number of rows processed. This is an After ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is used as a start component. It requires an output |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
The following Java scenario creates a three-component Job that populates a data
warehouse. A tOracleInput component reads your client
data stored in the CLIENT table. A tMap component allows you to modify this data and the modifications are
transmitted to the Leadfact table in the database through a
tOracleOutput component.
-
Drop the following components from the Palette onto the design workspace: tOracleInput, tMap, and
tOracleOutput. -
Connect the three components using Row Main
links. Click the link between tMap
and tOracleOutput and name it
leadfact. -
Double-click tOracleInput to display its
Basic settings view and define its
properties. -
Set Property Type to Repository and then select the connection to the
TALEND database that holds the information about your
clients. The connection details will display automatically in the corresponding
fields.Note
If you have not stored the DB connection details in the Metadata entry in the Repository, select
Built-in in the property type list and
set the connection details manually. -
Set Schema to Repository and click the three-dot button to select the schema
of the CLIENT table stored in the Metadata folder.Related topics: see Talend Studio User
Guide. -
In the Table Name field, enter the name of
the table holding the information you want to modify,
CLIENT in this example. -
Click Guess Query to retrieve all data from
your table. -
Double-click the tMap component to open the
Map Editor. Notice that the input area to
the left is already filled with the metadata of the input component. -
Drag the fields in the input zone to the fields in the leadfact
table in the output zone. For more information regarding data
mapping, see Talend Studio User
Guide. -
Click OK to validate the operation.
-
In the design workspace, double-click tOracleOutput to display its Basic
settings view and define its properties. -
Set Property Type to Repository and then select the cdc_data
connection to the data warehouse. The connection details will display
automatically in the corresponding fieldsNote
If you have not stored the data warehouse connection details in the
Metadata folder in the Repository tree view, select Built-in in the property type list and set the
connection details manually.Related topics: see Talend Studio User
Guide. -
In the Table Name field, enter the name of
the table you want to populate with modified data, LEADFACT
in this example. -
Save your job and press F6 to create and
populate the LEADFACT table in the data warehouse.
Warning
Before modifying data in the LEADFACT table and retrieving
these modifications using the tOracleCDC
component, you must activate the Redo log mode
in the Oracle database that holds the table to monitor. For more information on
how to activate the Redo log mode, see
Talend Studio User Guide.
This scenario is based on the preceding one. It continuously populates and modifies
the data stored in the LEADFACT table, and retrieves and saves,
every night, these modifications in a dedicated table using the CDC function. These
modifications could be then extracted by the various concerned departments.
Before being able to retrieve modified data from the LEADFACT
table, you must:
-
first set up the DB connection dedicated to CDC,
-
second, set up a DB connection to the source data and identify the table
to catch, -
finally, set the connection between the CDC and the data.
To do that:
-
In the Repository tree view and under
Metadata, create a connection to your
database dedicated to CDC, cdc_publisher in this
scenario.Note
Ensure that the DB connection for CDC is on the same server with the
source data to which changes are to be captured. -
In the Repository tree view and under
Metadata, create a connection to the
source data warehouse and identify the table to catch,
LEADFACT in this scenario. -
Right-click the connection to the orcle database and
select Retrieve schema from the drop-down
menu to retrieve the schema of the table to catch. -
Right-click CDC Foundation of the
cdc-data connection and select Create CDC in the drop-down menu.The [Create Change Data Capture] dialog
box displays -
Click the three-dot button next to the Set link
Connection field and select the connection that corresponds
to CDC, cdc_publisher in this example. -
In the Options area, select the Log mode check box.
-
Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog
box displays. -
Click Execute and then Close to close the dialog box.
-
Click Finish to validate the creation of
the subscriber table.In the CDC Foundation folder, the
relevant subscriber table displays.You must specify which table the subscriber wants to subscribe to and
then -
Right-click the LEADFACT schema of the
cdc-data connection and select Add CDC in the drop-down list. The [Create Subscriber and Execute SQL Script] dialog
box displays.Note
For Oracle databases and in Talend Studio version 3.2 or subsequent versions,
the CDC system creates an alias for the source table(s) monitored. This
avoids problems relating to identifier length upon creation of the table
of changes and its associated view. For CDC systems which are already in
place, the table names are maintained. -
Click Execute and then Close to validate the subscription.
In the CDC Foundation folder, the two
created tables display and the schema node of the catched table is marked
with a green CDC symbol.
Modify the data of your clients in the LEADFACT table, for
example, convert all customer names to upper case.
-
Double-click the tMap component and enter
row1.LASTNAME.toUpperCase()
in front of the CLASTNAME column to convert all customer names to
upper case. -
Click Ok.
-
Double-click the tOracleOutput
component. -
In the Action on table field, select
None. -
In the Action on data field, select
Insert or update to insert or update
table data. -
Save your Job and press F6 to execute
it.
To view all changes done on data, right-click the LEADFACT
table and select View All Changes to open the
relevant dialog box.
After setting up the CDC environment, you can now design a job using the Oracle
CDC component to incrementally extract the change data from the
LEADFACT table. To do that:
-
From the Palette, drop the OracleCDC and tLogRow components to the design workspace.
-
Link the two components using a Row Main
link. -
Double-click tOracleCDC to open its
Basic settings view and define its
properties. -
Set Property Type to Repository and then select the schema
corresponding to your Oracle DB table, cdc_publisher in
this scenario. The connection details will display automatically in the
corresponding fieldsNote
If you have not stored the data warehouse connection details in the
Metadata folder in the Repository tree view, select Built-in in the property type list and set
the connection details manually. -
In the Schema using CDC field, select
Repository and then select the schema
of the LEADFACT table stored in the Metadata folder. -
In the Table using CDC field, enter the
name of the table captured by the CDC, in this scenario
Leadfact. -
In the Events to catch field, select the
check boxes corresponding to the type of the modified data the subscriber
will extract. In this scenario, select the three check boxes for the three
subscribers. -
Double-click tLogRow to display its
Basic settings view and define its
properties. -
Click the Sync columns button to retrieve
the schema from the preceding component. -
Click Edit schema to open the schema
dialog box. -
In the TALEND_CDC_CREATION_DATE line of
the Date Pattern column, enter between
brackets the desired date format:"yyyy-MM-dd"
. -
Save your Job and press F6 to execute
it.
In the Redo log mode, changes done on data are
indicated in the following way: modifications are equal to first, an “update and
delete” operation (UO), and then to an “update and
insert” operation (UN). Thus, client data displays
twice:
– First, data is deleted (UO).
-Second, data is inserted (UN).
Once these modifications are extracted, they are no more available in the modified
table. To verify their extraction, right-click the LEADFACT
table catched by the CDC and then select Views All
Changes. The extracted changes do not display anymore.
For another CDC scenario using the Trigger mode,
see Scenario 2: Retrieving modified data using CDC.