August 16, 2023

tStewardshipTaskOutput – Docs for ESB 6.x

tStewardshipTaskOutput

This component create tasks in the
Talend Data Stewardship Console
database and lists these tasks in the stewardship console.

This component is deprecated since 6.4 along with the deprecation of
Talend Data Stewardship Console. Consider migrating to
Talend Data Stewardship.

Warning:

This component is available in the Palette of the Studio only if you have subscribed to the relevant Talend Platform product.

tStewardshipTaskOutput writes data, in the form of
tasks, in the
Talend Data Stewardship Console
database and thus makes it possible to list these tasks in the data stewardship
console. An authorized steward can then intervene to do the composite matching on the
listed data or to insure that data is consistent and complete.

Note:

In order to better understand the purpose of this component, check the
Talend Data Stewardship Console User Guide
.

How to set the URL to access
Talend Data Stewardship Console

When using the components to interact with the
Talend Data Stewardship Console
, you need to set the URL correctly to access the
application:

  • To interact with the
    Talend Data Stewardship Console
    application using the SOAP services, the URL
    should be in the format of <protocol>://<host>:<port>/<context>/services/TDSCWS?wsdl.

  • To write tasks into the
    Talend Data Stewardship Console
    application, the URL should be in the format of
    <protocol>://<host>:<port>/<context>/services/dsctaskloader.

Note that the parameter <context> in the URL is
different depending on whether
Talend Data Stewardship Console

is installed standalone (standalone installation) or installed together with the MDM
server (embedded installation).

For more information about how to install
Talend Data Stewardship Console

as a standalone application, see
Talend Data Stewardship Console

User Guide.

For more information about how to install the MDM server, see the
Talend Installation and Upgrade
Guide

.

Below are the default parameters of the URL to access the
Talend Data Stewardship Console
application in the two installation modes:

Default settings

Standalone installation

Embedded installation

protocol

http http

host

localhost localhost

port

8080 8180

context

/org.talend.datastewardship /talendmdm

SOAP service URL

http://localhost:8080/org.talend.datastewardship/services/TDSCWS?wsdl http://localhost:8180/talendmdm/services/TDSCWS?wsdl

Task loader URL

http://localhost:8080/org.talend.datastewardship/services/dsctaskloader http://localhost:8180/talendmdm/services/dsctaskloader

tStewardshipTaskOutput Standard properties

These properties are used to configure tStewardshipTaskOutput
running in the Standard Job framework.

The Standard
tStewardshipTaskOutput component belongs to the Data Stewardship
Console (deprecated) family.

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of
fields that will be processed and passed on to the next component. The
schema is either built-in or remote in the Repository.

Click Edit schema to make changes to the schema.
If the current schema is of the Repository type, three
options are available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this
    option to change the schema to Built-in for
    local changes.

  • Update repository connection: choose this
    option to change the schema stored in the repository and decide whether to propagate
    the changes to all the Jobs upon completion. If you just want to propagate the
    changes to the current Job, you can select No
    upon completion and choose this schema metadata again in the [Repository Content] window.

 

Built-in: You create the schema
and store it locally for this component only. Related topic: see
Talend Studio User Guide
.

 

Repository: You have already
created the schema and stored it in the Repository. You can reuse it in
various projects and Job designs. Related topic: see
Talend Studio User Guide
.

Url

Enter the appropriate URL to access the
Talend Data Stewardship Console
application.

For more information about the URL settings, see How to set the URL to access Talend Data Stewardship Console.

Username and Password

Type in the user authentication data for the stewardship
console database.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Task name

Type in a name for the task you want to list in the
Talend Data Stewardship Console
.

Type

Select the type of the tasks you want to write:

Resolution:data resolution tasks
represent the results of the data matching processes done on data across
heterogeneous sources.

Data: data integrity tasks are
the results of the data integrity processes done on data.

For further information on task types and task
management, see
Talend Data Stewardship Console User
Guide
.

Created by

Type in the name of the task creator.

Note:

The task creators correspond to the users of
Talend MDM Web UI
. For further information, see
Talend MDM Web UI User
Guide
.

Owner

Type in the name of the task owner.

Note:

The task owners correspond to the users of
Talend MDM Web UI
. For further information, see
Talend MDM Web UI User
Guide
.

Star

Type in a number, 0 through 5, that you want to assign to
the tasks as a numerical rating, in the form of stars, to highlight
importance.

Tag

Type in the name of the tag category you want to
associate with the tasks you want to write.

Warning:

The tag categories must have been created in the
stewardship console beforehand. For further information about
how to create tag categories, see
Talend Data Stewardship Console User Guide
.

Looping column

Select a column in the input schema on which to base the
loop. Whenever the looping column value changes, the component will
close the previous element (task) and open a new one (new task).

Note:

The looping column is typically the group id
generated by the tMatchGroup
component. For further information, see tMatchGroup.

Source/Target selector

Select a column in the input schema that will decide if
the task records defined according to the looping column will be a
target record or a source record.

Source

Select a column in the input schema.

Score

Select the matching score column in the input schema.

Weights

Select the column that defines the matching distance for
each column in the input schema.

Extra info

If required, use the plus button to add one or more rows
for any extra information you want to add to any of the source records.

In the Title
column, enter the information key.

In the Message
column, enter the information you want to add. In the Column column, click in the added row
and select the source column to which you want to add the extra
information.

The data steward will be able to see this added
information any time he/she places the pointer on the source record
column in the
Talend Data Stewardship Console
. This information will help him/her making a more informed decision
when resolving the task.

Record column

Use the plus button to add as many rows as needed and
then click in each of the rows and select the columns in the input
schema that will form the target record.

Max tasks per commit

Define the maximum number of the tasks per commit.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

Use this component to write data records held in tasks.
This component must have an input flow.

You can also find more information about how to
increase the timeout values for a Job using
tStewardshipTaskOutput on Talend Help Center
(https://help.talend.com).

 

Scenario: Writing data records in the stewardship console database

This scenario applies only to a subscription-based Talend Platform solution with MDM or Talend Data Fabric.

This scenario describes a five-component Job that generates data records in the form
of tasks and loads them into the stewardship console database.

These tasks will need later the intervention of an authorized data steward to merge,
compare and resolve the data records that are held in these tasks. For further
information, see
Talend Data Stewardship Console
User Guide.

In this scenario:

  • A tFixedFlowInput component generates input
    data flow that has five columns: Source, Firstname,
    Lastname, DOB (date of birth), and
    PostalCode. This data has problems such as duplication,
    first or last names spelled differently or wrongly, different information for
    the same customer, etc.

  • A tMatchGroup data quality component carries
    out matching operations on data across the heterogeneous sources defined in the
    input Source column. This component groups the output
    columns by a blocking value to optimize the matching operation and compare only
    the records that have the same blocking value, the Source
    column in this scenario. For more information on grouping output columns and
    using blocking values, see tMatchGroup.

  • A tMap component filters the input flow into
    unique data records and data records that have matching distances.

  • The unique data records are displayed on the Run console via the tLogRow
    component. All other data records that have a matching distance are sent to the

    Talend Data Stewardship Console
    database through the
    tStewardshipTaskOutput component and are
    displayed in the stewardship console. An authorized data steward can then
    intervene to merge the data records with matching distances.

Use_Case_tStewardshipTaskOutput.png

For detail information about related scenarios, see Scenario 1: Generating functional keys in the output flow and
Scenario 2: Comparing columns and grouping in the output flow duplicate records that have the same functional key.

  • Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tMatchGroup,
    tMap, tStewardshipTaskOutput and tLogRow.

  • Connect the first three components together using the Main link.

  • Double-click tFixedFlowInput to display the
    Basic settings view and define the
    component properties as described in Scenario 1: Generating functional keys in the output flow.

    The tFixedFlowInput component generates an
    input data flow that has five columns: Source, Firstname,
    Lastname, DOB (date of birth), and
    PostalCode. This data has problems such as duplication,
    first or last names spelled differently or wrongly, different information for
    the same customer, etc.

Use_Case_tStewardshipTaskOutput3.png
  • Double-click the tMatchGroup component to
    display the Basic Settings view and define the
    component properties.

Use_Case_tStewardshipTaskOutput2.png
  • Click Sync columns to retrieve the schema
    from the preceding component.

  • If required, click the Edit schema button to
    view the input and output schema and do any modifications in the output
    schema.

Use_Case_tStewardshipTaskOutput4.png
Note:

In the output schema of this component, there are four output standard columns
that are read-only. For more information, see tMatchGroup Standard properties.

  • In the Key definition table, click the
    [+] button to add to the list the columns
    on which you want to do the matching operation, FirstName
    and LastName in this scenario.

  • Click in the first and second cells of the Matching
    type
    column and select from the list the method(s) to be used for
    the matching operation, Jaro-Winkler in this
    example.

  • Click in the first and second cells of the Confidence
    Weight
    column and set the numerical weights for each of the
    columns used as key attributes.

  • Click the [+] button below the Blocking Definition table to add a line in the table
    then click in the line and select from the list the column you want to use as a
    blocking value, Source in this example.

    Using a blocking value reduces the number of pairs of records that needs to
    be examined. The input data is partitioned into exhaustive blocks based on the
    data source. This will decrease the number of pairs to compare, as comparison is
    restricted to record pairs within each block.

  • Double-click the tMap component to open the
    Map Editor.

Use_Case_tStewardshipTaskOutput5.png

The input area to the left is already filled with the input schema coming from the
previous component in the Job design.

  • Click the [+] button in the upper right
    corner of the output area to add as many output tables as needed, two in this
    example uniques and groups. The first
    table will group the unique data records and the second will group all the
    records that have matching distances to the master record in each group.

  • Drop the input columns to fill in the first output schema. For further
    information regarding data mapping, see
    Talend Studio User
    Guide
    .

    All the columns will be automatically filled in the Schema Editor in the below half of the Map
    Editor
    .

  • Click

    Expression_Filter.png

    in the upper right corner of the first output table to add
    a condition to filter the data in the first output table:
    row2.GRP_SIZE == 1.

  • Drop the input columns to fill in the second output schema and add the
    following filter: row2.GRP_SIZE > 1 ||
    !row2.MASTER.

  • In the Schema Editor of the second output
    table, click the [+] button to add two extra
    columns: weight and istarget. The
    first to measure the matching distance and the second to decide if the record
    will be a target record or a source record.

  • Click Ok to close the Map Editor.

  • In the design workspace, right-click tMap and
    select the uniques link and drop it on the tLogRow component. Do the same to connect tMap to tStewardshipTaskOutput with the groups
    link.

  • Double-click the tStewardshipTaskOutput
    component to display its Basic settings view
    and define the component properties.

Use_Case_tStewardshipTaskOutput6.png
  • In the Schema list, select Built-In and click the […] button next to Edit
    schema
    to open a dialog box.

Use_Case_tStewardshipTaskOutput7.png

The data is collected from the columns defined in the groups
output table in the tMap component.

  • Click OK to close the dialog box and proceed
    to the next step.

  • In the Url field, enter the URL for
    connecting to the stewardship console database.

  • In the Username and Password fields, enter your login and password to connect to the
    MDM server.

  • In the Task name field, enter a functional
    name for the task you want to list in
    Talend Data Stewardship Console
    .

  • From the Type list, select the type of the
    tasks you want to write in the stewardship console: Resolution or Data. In this
    example, only resolution tasks are to be written.

    For further information on task type, see
    Talend Data Stewardship Console
    User Guide.

  • In the Created by field, enter between
    inverted commas the name of the task creator, Administrator
    in this example. The task creator corresponds to the users of
    Talend MDM Web UI
    . For further information, see

    Talend MDM Web UI
    User Guide.

  • In the Owner field, enter between inverted
    commas the name of the task owner, the user to whom the task is assigned,
    Administrator in this example.

Note:

Task can be assigned to a specific user either from the Basic settings view of the tStewardshipTaskOutput component, or directly from the stewardship
console by an administrator. For further information, see tStewardshipTaskOutput.

  • In the Star field, enter between inverted
    commas the number of stars, 0 through 5, you want to assign to the task in the
    stewardship console to highlight importance.

  • In the Tags field, enter between inverted
    commas the name of the tag category associated with the tasks you want to read,
    not used in this example.

    For further information, see
    Talend Data Stewardship Console
    User Guide.

  • From the Looping column list, select a column
    in the input schema on which to base the loop, GID in this
    Example.

  • From the Source/Target selector list, select
    the column that will decide if the record will be a target record or a source
    record.

  • From the Source list, select a source column
    in the input schema.

  • From the Score list, select the matching
    score column in the input schema.

  • From the Weights list, select the column that
    defines the matching distance for the input columns.

  • In the Extra info table, click the

    plus_button.png

    button to add one or several rows that you can use to add
    extra information to one or several record in the created task.

Note:

You can click the

AddAll_Button.png

button to add all your schema in one go without having to add
it row by row.

Use_Case_tStewardshipTaskOutput11.png
  • In the Title column, enter between inverted
    commas the role of the person who adds the information.

  • In the Info column, enter between inverted
    commas the extra information you want to attach to the selected column.

  • Click in the Scope column row and select from
    the list the record to which you want to add the extra information,
    PostalCode in this example.

    This will append a red mark to the PostalCode column
    when we open the relevant task in
    Talend Data Stewardship Console

When the data steward place the pointer on this mark, the attached information will
display. Such information may help the steward in resolving the data record.

  • In the Record Column table, click the

    plus_button.png

    button to add the rows you want to show in each of the
    tasks to create in
    Talend Data Stewardship Console
    .

  • Click in each of the rows and select the column you want show in each of the
    created tasks. In this example, each task must have four columns:
    Firstname, Lastname,
    PostalCode and DOB.

Note:

You can click the

AddAll_Button.png

button to add all your input schema in one go without having to
add it row by row.

  • Double-click the tLogRow component to display
    its Basic settings view and define the
    component properties.

  • Save your Job and press F6 to execute
    it.

Use_Case_tStewardshipTaskOutput8.png

The Run console displays the four columns from the
input flow.

The identifier for each group (task) is listed in the GID column
next to the corresponding record. The number of records in each of the tasks is listed
in the GRP_SIZE column and computed only on the master record. The
MASTER column indicates with “true” that the corresponding
record is a master record. The SCORE column lists the calculated
distance between the input record and the master record according to the Jargo-Winkler matching algorithm.

All other input records that have a matching distance are listed in
Talend Data Stewardship Console
waiting for a data steward to merge, compare and
resolve the data records.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x