
Component family |
Databases/MySQL |
|
Function |
tMysqlSCD reflects and tracks |
|
Purpose |
tMysqlSCD addresses Slowly |
|
Basic settings |
Property type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data stored |
|
|
Repository: Select the Repository |
|
Use an existing connection |
Select this check box and in the Component List click the NoteWhen a Job contains the parent Job and the child Job, if you need to share an existing
For an example about how to share a database connection across Job levels, see |
|
DB Version |
Select the Mysql version you are using. |
|
Host |
Database server IP address. |
|
Port |
Listening port number of DB server. |
|
Database |
Name of the database. |
|
Username and |
DB user authentication data. To enter the password, click the […] button next to the |
|
Table |
Name of the table to be written. Note that only one table can be |
|
Action on table |
On the table defined, you can perform one of the following
None: No operation is carried
Create a table: The table does not
Create a table if not exists: The |
|
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: The schema is created |
|
|
Repository: The schema already |
|
SCD Editor |
The SCD editor helps to build and configure the data flow for For more information, see SCD management methodologies. |
|
Use memory saving mode |
Select this check box to maximize system performance. |
|
Source keys include Null |
Select this check box to allow the source key columns to have Null WarningSpecial attention should be paid to the uniqueness of the |
|
Die on error |
This check box is cleared by default, meaning to skip the row on |
Advanced settings |
Additional JDBC Parameters |
Specify additional connection properties for the DB connection you |
|
End date time details |
Specify the time value of the SCD end date time setting in the This field appears only when SCD Type |
|
tStatCatcher Statistics |
Select this check box to collect log data at the component |
|
Debug mode |
Select this check box to display each step during processing |
Dynamic settings |
Click the [+] button to add a row in the table and fill The Dynamic settings table is available only when the For more information on Dynamic settings and context |
|
Global Variables |
NB_LINE_UPDATED: the number of rows updated. This is an
NB_LINE_INSERTED: the number of rows inserted. This is an
NB_LINE_REJECTED: the number of rows rejected. This is an ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is used as Output component. It requires an Input |
Slowly Changing Dimensions (SCDs) are dimensions that have data that slowly
changes. The SCD editor offers the simplest method of building the data flow for the
SCD outputs. In the SCD editor, you can map columns, select surrogate key columns,
and set column change attributes through combining SCD types.
The following figure illustrates an example of the SCD editor.

You must choose one or more source keys columns from the incoming data to
ensure its unicity.
You must set one surrogate key column in the dimension table and map it to an
input column in the source table. The value of the surrogate key links a record
in the source to a record in the dimension table. The editor uses this mapping
to locate the record in the dimension table and to determine whether a record is
new or changing. The surrogate key is typically the primary key in the source,
but it can be an alternate key as long as it uniquely identifies a record and
its value does not change.
Source keys: Drag one or more columns from
the Unused panel to the Source keys panel to be used as the key(s) that ensure the
unicity of the incoming data.
Surrogate keys: Set the column where the
generated surrogate key will be stored. A surrogate key can be generated based
on a method selected on the Creation
list.
Creation: Select any of the below methods to
be used for the key generation:
Auto increment: auto-incremental key.
Input field: key is provided in an input
field.
When selected, you can drag the appropriate field from the Unused panel to the complement field.
Routine: from the complement field, you can press Ctrl+
Space to display the autocompletion list and select the
appropriate routine.
Table max +1: the maximum value from the SCD
table is incremented to create a surrogate key.
DB Sequence: from the complement field, you can enter the name of the existing
database sequence that will automatically increment the column indicated in the
name field.
Note
This option is only available through the SCD
Editor of the tOracleSCD
component.
The Slowly Changing Dimensions support four types of changes: Type 0 through Type
3. You can apply any of the SCD types to any column in a source
table by a simple drag-and-drop operation.
Type 0: is not used frequently. Some
dimension data may be overwritten and other may stay unchanged over time. This
is most appropriate when no effort has been made to deal with the changing
dimension issues.
Type 1: no history is kept in the database.
New data overwrites old data. Use this type if tracking changes is not
necessary. this is most appropriate when correcting certain typos, for example
the spelling of a name.
Type2: the whole history is stored in the
database. This type tracks historical data by inserting a new record in the
dimensional table with a separate key each time a change is made. This is most
appropriate to track updates, for example.
SCD Type 2 principle lies in the fact that a
new record is added to the SCD table when changes are detected on the columns
defined. Note that although several changes may be made to the same record on
various columns defined as SCD Type 2, only one
additional line tracks these changes in the SCD table.
The SCD schema in this type should include SCD-specific extra columns that
hold standard log information such as:
–start: adds a column to your SCD schema to
hold the start date. You can select one of the input schema columns as a start
date in the SCD table.
–end: adds a column to your SCD schema to
hold the end date value for a record. When the record is currently active, the
end date is NULL or you can select Fixed Year Value and fill in a fictive year to avoid
having a null value in the end date field.
–version: adds a column to your SCD schema
to hold the version number of the record.
–active: adds a column to your SCD schema to
hold the true or false status value. this column helps to easily spot the active
record.
Type 3: only the information about a
previous value of a dimension is written into the database. This type tracks
changes using separate columns. This is most appropriate to track only the
previous value of a changing column.
This five-component Java scenario describes a Job that tracks changes in four of the
columns in a source delimited file, writes changes and the history of changes in an SCD
table, and displays error information on the Run
console.
The source delimited file contains various personal details including
firstname, lastname,
address, city,
company, age, and
status. An id column helps ensuring the
unicity of the data.

We want any change in the marital status to overwrite the existing old status record.
This type of change is equivalent to an SCD Type
1.
We want to insert a new record in the dimensional table with a separate key each time
a person changes his/her company. This type of change is equivalent to an SCD Type 2.
We want to track only the previous city and previous address of a person. This type of
change is equivalent to an SCD Type 3.
To realize this kind of scenario, it is better to divide it into three main steps:
defining the main flow of the Job, setting up the SCD editor, and finally creating the
relevant SCD table in the database.
-
Drop the following components from the Palette onto the design workspace: a tMysqlConnection, a tFileInputDelimited, a tMysqlSCD, a tMysqlCommit,
and two tLogRow components. -
Connect the tFileInputDelimited, the
first tLogRow, and the tMysqlSCD using the Row
Main link. This is the main flow of your Job. -
Connect the tMysqlConnection to the
tFileInputDelimited and tMysqlSCD to tMysqlCommit using the OnComponntOk trigger. -
Connect the tMysqlSCD to the second
tLogRow using the Row
Rejects link. Two columns,
errorCode and errorMessage,
are added to the schema. This connection collects error information.
-
In the design workspace, double-click tMysqlConnection to display its Basic
settings view and set the database connection details. The
tMysqlConnection component should be
used to avoid setting several times the same DB connection when multiple DB
components are used.Note
If you have already stored the connection details locally in the
Repository, drop the needed
metadata item to the design workspace and the database connection detail
will automatically display in the relevant fields. For more information
about Metadata, see Talend Studio User
Guide.In this scenario, we want to connect to the SCD table where changes in the
source delimited file will be tracked down. -
In the design workspace, double-click tFileInputDelimited to display its Basic settings view.
-
Click the three-dot button next to the File Name
field to select the path to the source delimited file,
dataset.csv in this scenario, that contains the
personal details. -
Define the row and field separators used in the source file.
Note
The File Name, Row separator, and Field
separators are mandatory. -
If needed, set Header, Footer, and Limit.
In this scenario, set Header to
1. Footer and limit for the number of processed
rows are not set. -
Click Edit schema to describe the data
structure of the source delimited file.In this scenario, the source schema is made of eight columns:
id, firstName,
lastName, address,
city, company,
age, and status. -
Define the basic settings for the first tLogRow in order to view the content of the source file with
varying attributes in cells of a table on the console before being processed
through the SCD component.
-
In the design workspace, click the tMysqlSCD and select the Component tab to define its basic settings.
-
In the Basic settings view, select the
Use an existing connection check box to
reuse the connection details defined on the tMysqlConnection properties. -
In the Table field, enter the table name
to be used to track changes. -
If needed, click Sync columns to retrieve
the output data structure from the tFileInputDelimited. -
In the design workspace, double-click tMysqlCommit to define its basic settings.
-
Select the relevant connection on the Component
list if more than one connection exists. -
Define the basic settings of the second tLogRow in order to view reject information in cells of a
table.
-
Double-click the tMysqlSCD component in
the design workspace or click the three-dot button next to the SCD Editor
in the component’s Basic settings view to
open the SCD editor and build the data flow
for the SCD outputs.All the columns from the preceding component are displayed in the
Unused panel of the SCD editor. All the other panels in the SCD editor are empty. -
From the Unused list, drop the
id column to the Source
keys panel to use it as the key to ensure the unicity of the
incoming data. -
In the Surrogate keys panel, enter a name
for the surrogate key in the Name field,
SK1 in this scenario. -
From the Creation list, select the method
to be used for the surrogate key generation, Auto-increment in this scenario. -
From the Unused list, drop the
firstname and lastname columns
to the Type 0 panel, changes in these two
columns do not interest us. -
Drop the status column to the
Type 1 panel. The new value will
overwrite the old value. -
Drop the company column to the
Type 2 panel. Each time a person
changes his/her company, a new record will be inserted in the dimensional
table with a separate key.In the Versioning area:
– Define the start and end columns of your SCD table that will hold the
start and end date values. The end date is null for current records until a
change is detected. Then the end date gets filled in and a new record is
added with no end date.In this scenario, we select Fixed Year
Value for the end column and
fill in a fictive year to avoid having a null value in the end date
field.– Select the version check box to hold
the version number of the record.– Select the active check box to spot
the column that will hold the True or
False status. True for the current active record and False for the modified record. -
Drop the address and city
columns to the Type 3 panel to
track only the information about the previous value of the address and
city.For more information about SCD types, see SCD management methodologies.
-
Click OK to validate your configuration
and close the SCD editor.
-
Click Edit schema to view the input and
output data structures.The SCD output schema should include the SCD-specific columns defined in
the SCD editor to hold standard log
information.Note
If you adjust any of the input schema definitions, you need to check,
and reconfigure if necessary, the output flow definitions in the
SCD editor to ensure that the
output data structure is properly updated. -
In the Basic settings view of the
tMysqlSCD component, select Create table if not exists from the Action on table list to avoid creating and
defining the SCD table manually.
Save your Job and press F6 to execute it.
The console shows the content of the input delimited file, and your SCD table is
created in your database, containing the initial dataset.

Janet gets divorced and moves to
Adelanto at 355 Golf Rd. She works at
Greenwood.
Adam gets married and moves to Belmont
at 2505 Alisson ct. He works at
Scoop.
Martin gets a new job at Phillips and
Brothers.
Update the delimited file with the above information and press F6 to run your Job.
The console shows the updated personal information and the rejected data, and the
SCD table shows the history of valid changes made to the input file along with the
status and version number. Because the name of Martin’s new company exceeds the
length of the column company defined in the schema, this change
is directed to the reject flow instead of being logged in the SCD table.
