July 30, 2023

tPostgresqlSCDELT – Docs for ESB 7.x

tPostgresqlSCDELT

Addresses Slowly Changing Dimension needs through SQL queries (server-side
processing mode), and logs the changes into a dedicated DB2 SCD table.

The tPostgresqlSCDELT reflects and
tracks changes in a dedicated Postgresql SCD table.

tPostgresqlSCDELT Standard properties

These properties are used to configure tPostgresqlSCDELT running in the Standard Job framework.

The Standard
tPostgresqlSCDELT component belongs to the Business Intelligence and the Databases families.

The component in this framework is available in all Talend
products
.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click
Apply.

Property type

Either Built-in or Repository.

 

Built-in: No property data stored
centrally. Enter properties manually.

 

Repository: Select the repository
file where Properties are stored. The fields that come after are
pre-filled in using the fetched data.

Use an existing connection

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

Note: When a Job contains the parent Job and the child Job, if you
need to share an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:

  1. In the parent level, register the database connection
    to be shared in the Basic
    settings
    view of the connection component which creates that very database
    connection.

  2. In the child level, use a dedicated connection
    component to read that registered database connection.

For an example about how to share a database connection
across Job levels, see

Talend Studio
User Guide
.

DB Version

List of database versions.

Host

The IP address of the database server.

Port

Listening port number of database server.

Database

Name of the database

Username and
Password

User authentication data for a dedicated database.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Source table

Name of the input DB2 SCD table.

Table

Name of the table to be written. Note that only one table can be
written at a time

Action on table

Select to perform one of the following operations on the table
defined:

None: No action carried out on the
table.

Drop and create table: The table is
removed and created again

Create table: A new table gets
created.

Create table if not exists: A table
gets created if it does not exist.

Clear table: The table content is
deleted. You have the possibility to rollback the operation.

Truncate table: The table content
is deleted. You don not have the possibility to rollback the
operation.

Schema and Edit
schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

 

Built-in: The schema is created
and stored locally for this component only. Related topic: see

Talend Studio User
Guide
.

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused. Related
topic: see
Talend Studio User
Guide
.

Surrogate Key

Select the surrogate key column from the list.

Creation

Select the method to be used for the surrogate key
generation.

For more information regarding the creation methods, see SCD management methodology.

Source Keys

Select one or more columns to be used as keys, to ensure the
unicity of incoming data.

Use SCD Type 1 fields

Use type 1 if tracking changes is not necessary. SCD Type 1 should
be used for typos corrections for example. Select the columns of the
schema that will be checked for changes.

Use SCD Type 2 fields

Use type 2 if changes need to be tracked down. SCD Type 2 should
be used to trace updates for example. Select the columns of the
schema that will be checked for changes.

SCD type 2 fields

Click the [+] button to add as many rows as needed, each row for a column. Click
the arrow on the right side of the cell and select the column whose value changes will
be tracked using Type 2 SCD from the drop-down list displayed .

This table is available only when the Use SCD type 2 fields
option is selected.

Start date

Specify the column that holds the start date for
type 2 SCD.

This list is available only when the Use SCD type 2 fields
option is selected.

End date

Specify the column that holds the end date for type
2 SCD.

This list is available only when the Use SCD type 2 fields
option is selected.

Note: To avoid duplicated change records, it is recommended to
select a column that can identify each change for this field.

Log active status

Select this check box and from the
Active field drop-down list displayed, select
the column that holds the true or false status value, which helps to spot the active
record for type 2 SCD.

This option is available only when the Use SCD type 2 fields
option is selected.

Log versions

Select this check box and from the Version field drop-down list displayed, select the column
that holds the version number of the record for type 2 SCD.

This option is available only when the Use SCD type 2 fields
option is selected.

Advanced settings

Additional JDBC Parameters

Specify additional JDBC parameters for the
database connection created.

This property is not available when the Use an existing connection
check box in the Basic settings view is selected.

Debug mode

Select this check box to display each step during processing
entries in a database.

tStat
Catcher Statistics

Select this check box to collect log data at the component
level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used as an output component. It requires an
input component and Row main link as input.

Dynamic settings

Click the [+] button to add a row in the table
and fill the Code field with a context
variable to choose your database connection dynamically from multiple
connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different
databases, especially when you are working in an environment where you
cannot change your Job settings, for example, when your Job has to be
deployed and executed independent of Talend Studio.

The Dynamic settings table is
available only when the Use an existing
connection
check box is selected in the Basic settings view. Once a dynamic parameter is
defined, the Component List box in the
Basic settings view becomes unusable.

For examples on using dynamic parameters, see Reading data from databases through context-based dynamic connections and Reading data from different MySQL databases using dynamically loaded connection parameters. For more information on Dynamic
settings
and context variables, see Talend Studio
User Guide.

Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component

This scenario describes a Job that captures the employee data changes in a PostgreSQL
table using SCD (Slowly Changing Dimensions) Type 1 and Type 2 methods implemented by the
tPostgreSQLSCDELT component, and writes both the current and historical data in a SCD
dimension table.

The input data contains various employee details including name,
role, salary, and another
id column is added to help ensuring the unicity of the input
data.

At first, the following employee data is inserted to a new Snowflake table.

Later, the table is updated with the following renewed employee data.

You can see the role of Thomas Johnson is changed from
developer to tester, the role of Teddy
Brown
is changed from tester to writer, and his
salary is raised from 13000.00 to 17000.00. Besides, a new
employee record with id 4 is inserted. In this scenario,

  • the existing name and role data will be overwritten by the new data, so SCD Type 1 method
    will be performed on them, and

  • the full history of the salary data will be retained, and a new record with the changed
    data will be always created and the previous record will be closed, so SCD Type 2 method
    will be performed on it.

For more information about SCD types, see SCD management methodology.

Creating a Job for tracking data changes in a PostgreSQL table using
tPostgresqlSCDELT

  1. Create a new Job and add a tPostgreSQLConnection
    component, a tCreateTable component, two
    tFixedFlowInput components, two
    tPostgreSQL Input components, two
    tPostgreSQLOutput components, two
    tPostgreSQLSCDELT components, and two
    tLogRow components to the Job.

    tPostgresqlSCDELT_1.png

  2. Link the first tFixedFlowInput component to the first
    tPostgreSQLOutput component using a Row > Main connection.
  3. Do the same to link the first tPostgreSQLInput component
    to the first tLogRow component, the second
    tFixedFlowInput component to the second
    tPostgreSQLOutput component, and the second
    tPostgreSQLInput component to the second
    tLogRow component.
  4. Link the tPostgreSQLConnection component to the
    tCreateTable component using a Trigger > On Subjob Ok connection.
  5. Do the same to link the first tFixedFlowInput component
    to the first tPostgreSQLSCDELT component, the first
    tPostgreSQLSCDELT component to the first
    tPostgreSQLInput component, the first
    tPostgreSQLInput component to the second
    tFixedFlowInput component, the second
    tFixedFlowInput component to the second
    tPostgreSQLSCDELT component, the second
    tPostgreSQLSCDELT component to the second
    tPostgreSQLInput component.

Opening a connection to a PostgreSQL database

  1. Double-click the tPostgreSQLConnection component to open
    its Basic settings view.

    tPostgresqlSCDELT_2.png

  2. In the Host, Port, Database, Schema, Username, and Password fields, enter the information required for the
    connection to the PostgreSQL database.

Creating a PostgreSQL table

  1. Double-click the tCreateTable component to open its
    Basic settings view.
  2. In the Basic settings view, and from the
    Database Type list, select Postgresql for this
    scenario.

    tPostgresqlSCDELT_3.png

  3. Select the Use an existing connection check box and from
    the Component List drop-down list displayed, select the
    connection component to reuse the connection created by it,
    tPostgreSQLConnection_1 in this example.
  4. In the Table Name field, fill in a name for the table to
    be created, employee in this example.
  5. From the Table Action list, select Create
    table if not exist
    .
  6. Click the […] button next to Edit
    schema
    and in the pop-up dialog box, define the schema by adding
    four columns: id of Integer type as the primary key,
    name and role of String type,
    and salary of Double type.

    In the end, a new table employee is created to store
    the employee data.

    tPostgresqlSCDELT_4.png

Inserting data into the new PostgreSQL table

  1. Double-click the first tFixedFlowInput component to open
    its Basic settings view.
  2. Click the […] button next to Edit
    schema
    and in the pop-up dialog box, define the schema by adding
    four columns: id of Integer type as the primary key,
    name and role of String type,
    and salary of Double type.

    tPostgresqlSCDELT_5.png

  3. Click OK to save the schema changes. In the pop-up
    dialog box, click Yes to propagate the
    schema to the next component.
  4. Select Use Inline Content in the
    Mode area. Then in the Content
    field displayed, enter the following employee data to be inserted.

  5. Double-click the first tPostgreSQLOutput component to
    open its Basic settings view.
  6. Select the Use an existing connection check box and from
    the Component List drop-down list displayed, and then
    select the connection component to reuse the connection created by it,
    tPostgreSQLConnection_1 in this example.
  7. In the Table field, enter the name of the table into
    which the employee data will be written, employee in this
    example.
  8. In the Action on table drop-down list, select
    Default.
  9. In the Action on data drop-down list, select
    Insert to insert the employee data transferred from
    the first tFixedFlowInput component.
  10. Click the […] button next to Edit
    schema
    to check whether the schema of tPostgreSQLOutput is the
    same as the schema of tFixedFlowInput.

    tPostgresqlSCDELT_6.png

Tracking inserted data changes and writing the changes into a SCD dimension table

  1. Double-click the first tPostgreSQLSCDELT component to
    open its Basic settings view.

    tPostgresqlSCDELT_7.png

  2. Select the Use an existing connection check box and from
    the Component List drop-down list displayed, select the
    connection component to reuse the connection created by it,
    tPostgreSQLConnection_1 in this example.
  3. In the Source table field, enter the name of the table
    whose data changes will be captured, employee in this
    example.
  4. In the Table field, enter the name of the SCD dimension
    table that will store both the current and historical employee data,
    employee_scd in this example.
  5. Select Create table from the Action on
    table
    drop-down list to create the SCD dimension table.
  6. Click the […] button next to Edit
    schema
    and in the pop-up dialog box, define the schema by adding
    nine columns: sk ( as the primary key) and
    id of Integer type, name and
    role of String type, salary of
    Double type, start_date and
    end_date of Date type with the Date Pattern
    dd-MM-yyyy, and active_status
    and version of Integer type. When done, click
    OK to save the changes and close the dialog
    box.

    tPostgresqlSCDELT_8.png

  7. From the Surrogate key drop-down list, select the name
    of the column that will be used as the primary key of the SCD dimension table,
    sk in this example.
  8. Select DB sequence from the
    Creation drop-down list and in the
    Sequence field displayed, enter the name of the
    PostgreSQL sequence used to generate the surrogate key for the SCD Type 2
    method, employee_sequence in this example.
  9. Click the [+] button below the Source
    keys
    table to add a new line, and click the
    Name cell and select the key column of the source
    table from the drop-down list, id in this example.
  10. Select the Use SCD type 1 fields check box, click the
    [+] button below the SCD type 1
    fields
    table twice to add two lines. Then click each cell and
    from the drop-down list, select the column on which the SCD Type 1 method will
    be performed. In this example, they are name and
    role.
  11. Select the Use SCD type 2 fields check box, click the
    [+] button below the SCD type 2
    fields
    table to add a line. Then click the cell and select the
    column on which the SCD Type 2 method will be performed. In this example, it is
    salary.
  12. From the Start date and End date
    drop-down lists, select the columns used to hold the start date and end date
    values for the SCD Type 2 method respectively, start_date
    and end_date in this example.
  13. Select the Log active status check box and from the
    Active field drop-down list displayed, select the
    column used to hold the active status value for the SCD Type 2 method, which
    helps identify the active records, active_status in this
    example.
  14. Select the Log versions check box and from the
    Version field drop-down list, select the column used
    to hold the version number of the records for the SCD Type 2 method,
    version in this example.

Retrieving the data updates from the SCD dimension table

  1. Double-click the first tPostgreSQLInput component to
    open its Basic settings view.
  2. Select the Use an existing connection check box and from
    the Component List drop-down list displayed, select the
    connection component to reuse the connection created by it,
    tPostgreSQLConnection_1 in this example.
  3. Click the […] button next to Edit
    schema
    and in the pop-up dialog box, define the schema by adding
    nine columns: sk and id of Integer
    type as the primary key, name and
    role of String type, salary of
    Double type, start_date and
    end_date of Date type with the Date Pattern
    yyyy-MM-dd, and active_status
    and version of Integer type. When done, click
    OK to save the changes and close the dialog
    box.

    The schema of the first tPostgreSQLInput component is
    the same as the schema of the tPostgreSQLSCDELT1
    component, you can just copy and paste it.

  4. In the Query field, enter the SQL command used to
    retrieve data from the SCD dimension table, select * from
    employee_scd
    in this example.
  5. In the Table Name field, enter the name of the SCD
    dimension table where you will retrieve the data updates,
    employee_scd in this example.
  6. Double-click the first tLogRow component and in the
    Mode area on its Basic
    settings
    view, select Table to display
    the retrieved data in a table.

Updating data in the Postgresql table

  1. Double-click the second tFixedFlowInput component to
    open its Basic settings view.
  2. Click the […] button next to Edit
    schema
    and in the pop-up dialog box, define the schema by adding
    four columns: id of Integer type as the primary key,
    name and role of String type,
    and salary of Double type.

    This schema is the same as the schema of the first
    tFixedFlowInput component, you can just copy and
    paste it.

  3. Click OK to save the schema changes. In the pop-up
    dialog box, click Yes to propagate the
    schema to the next component.
  4. Select Use Inline Content in the
    Mode area. Then in the Content
    field displayed, enter the following employee data to update the existing
    data.

  5. Double-click the second tPostgreSQLOutput component to
    open its Basic settings view.
  6. Select the Use an existing connection check box and from
    the Component List drop-down list displayed, select the
    connection component to reuse the connection created by it,
    tPostgreSQLConnection_1 in this example.
  7. In the Table field, enter the name of the table, in
    which the data will be updated, employee in this
    example.
  8. Select Default from the Action on
    table
    drop-down list.
  9. Select Insert or update from the Action on
    data
    drop-down list.

Tracking data update changes and writing the changes into the SCD dimension table

  1. Double-click the second tPostgreSQLSCDELT component to
    open its Basic settings view.
  2. Repeat 2 through 14 in the procedure Tracking inserted data changes and writing the changes into a SCD dimension table to configure the second tPostgreSQLSCDELT
    component.

Retrieve the data update changes from the SCD dimension table

  1. Double-click the second tPostgreSQLInput component to
    open its Basic settings view.
  2. Repeat 2 through 5 in the procedure Retrieving the data updates from the SCD dimension table to configure the second tPostgreSQLInput component.
  3. Double-click the second tLogRow component and in the
    Mode area on its Basic
    settings
    view, select Table to display
    the retrieved data in a table.

Executing the Job to track data changes in a PostgreSQL table using tPostgreSQLSCDELT

  1. Press Ctrl + S to save
    the Job.
  2. Press F6 to execute the
    Job.

    tPostgresqlSCDELT_9.png

    As shown above, the old role developer for
    Thomas Johnson is overwritten directly by the new
    role tester because SCD Type 1 is performed on the
    role column, and a new record with the surrogate
    key value set to 26 is created for Teddy
    Brown
    ‘s salary update from 13000.00
    to 17000.00 because SCD Type 2 is performed on the
    salary column.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x