July 30, 2023

tHashInput – Docs for ESB 7.x

tHashInput

Reads from the cache memory data loaded by tHashOutput to offer
high-speed data feed, facilitating transactions involving a large amount of
data.

The components of the Technical family are normally hidden from the Palette by default. For more information about how to show
them on the Palette, see
Talend Studio User
Guide
.

tHashInput Standard properties

These properties are used to configure tHashInput running in the Standard Job framework.

The Standard
tHashInput component belongs to the Technical family.

The component in this framework is available in all Talend
products
.

Basic settings

Schema and Edit
schema

A schema is a row description, it defines the number of fields to
be processed and passed on to the next component. The schema is
either built-in or remotely stored in the Repository.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

 

Built-in: The schema is created
and stored locally for this component only. Related topic: see the

Talend Studio User
Guide
.

 

Repository: The schema already
exists and is stored in the Repository, hence can be reused. Related
topic: see the
Talend Studio User
Guide
.

Link with a tHashOutput

Select this check box to connect to a tHashOutput component. It is always selected by
default.

Component list

Drop-down list of available tHashOutput components.

Clear cache after reading

Select this check box to clear the cache after reading the data
loaded by a certain tHashOutput
component. This way, the following tHashInput components, if any, will not be able to
read the cached data loaded by that tHashOutput component.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the component
level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used along with tHashOutput. It reads from the cache memory data
loaded by tHashOutput. Together,
these twin components offer high-speed data access to facilitate
transactions involving a massive amount of data.

Reading data from the cache memory for high-speed data access

The following Job reads from the cache memory a huge amount of data loaded by two
tHashOutput components and pass it to a tFileOutputDelimited. The goal of this scenario is to show
the speed at which mass data is read and written. In practice, data feed generated in
this way can be used as lookup table input for some use cases where a big amount of data
needs to be referenced.

Dropping and linking the components

  1. Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2), tHashOutput (X2), tHashInput and tFileOutputDelimited.
  2. Connect the first tFixedFlowInput to the
    first tHashOutput using a Row > Main
    link.
  3. Connect the second tFixedFlowInput to the
    second tHashOutput using a Row > Main
    link.
  4. Connect the first subJob (from tFixedFlowInput_1) to the second subJob (to tFixedFlowInput_2) using an OnSubjobOk link.
  5. Connect tHashInput to tFileOutputDelimited using a Row > Main
    link.
  6. Connect the second subJob to the last subJob using an OnSubjobOk link.

    tHashInput_1.png

Configuring the components

Configuring data inputs and hash cache

  1. Double-click the first tFixedFlowInput component to display its Basic settings view.

    tHashInput_2.png

  2. Select Built-In from the Schema drop-down list.

    Note:

    You can select Repository from
    the Schema drop-down list to fill
    in the relevant fields automatically if the relevant metadata has
    been stored in the Repository. For
    more information about Metadata,
    see the
    Talend Studio User
    Guide
    .

  3. Click Edit schema to define the data
    structure of the input flow. In this case, the input has two columns:
    ID and ID_Insurance, and then click OK to close the dialog box.

    tHashInput_3.png

  4. Fill in the Number of rows field to
    specify the entries to output, e.g. 50000.
  5. Select the Use Single Table check
    box. In the Values table and in the
    Value column, assign values to the
    columns, e.g. 1 for ID and 3
    for ID_Insurance.
  6. Perform the same operations for the second tFixedFlowInput component, with the only difference in
    the values. That is, 2 for ID and 4
    for ID_Insurance in this case.
  7. Double-click the first tHashOutput to
    display its Basic settings view.

    tHashInput_4.png

  8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
    previous component. Select Keep all
    from the Keys management drop-down list
    and keep the Append check box
    selected.
  9. Perform the same operations for the second tHashOutput component, and select the Link with a tHashOutput check box.

Configuring data retrieval from hash cache and data output

  1. Double-click tHashInput to display
    its Basic settings view.

    tHashInput_5.png

  2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
    which is the same as that of tHashOutput.
  3. Select tHashOutput_1 from the
    Component list drop down
    list.
  4. Double-click tFileOutputDelimited to
    display its Basic settings view.

    tHashInput_6.png

  5. Select Built-In from the Property Type drop-down list. In the
    File Name field, enter the full
    path and name of the file, e.g. “E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv”.
  6. Select the Include Header check box
    and click Sync columns to retrieve the
    schema from the previous component.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.
  2. Press F6, or click Run on the Run tab to
    execute the Job.

tHashInput_7.png

You can find that mass entries are written and read very rapidly.

Clearing the memory before loading data to it in case an iterator exists
in the same subJob

In this scenario, the usage of the Append option of
tHashOutput is demonstrated as it helps remove
repetitive or unwanted data in case an iterator exists in the same subJob as tHashOutput.

To build the Job, do the following:

Dropping and linking the components

  1. Drag and drop the following components from the Palette to the workspace: tLoop, tFixedFlowInput,
    tHashOutput, tHashInput and tLogRow.
  2. Connect tLoop to tFixedFlowInput using a Row
    > Iterate link.
  3. Connect tFixedFlowInput to tHashOutput using a Row > Main link.
  4. Connect tHashInput to tLogRow using a Row > Main link.
  5. Connect tLoop to tHashInput using an OnSubjobOk link.

    tHashInput_8.png

Configuring the components

Configuring data input and hash cache

  1. Double-click the tLoop component to
    display its Basic settings view.

    tHashInput_9.png

  2. Select For as the loop type. Type in
    1, 2
    1 in the From, To and Step fields respectively. Keep the Values are increasing check box
    selected.
  3. Double-click the tFixedFlowInput
    component to display its Basic settings
    view.

    tHashInput_10.png

  4. Select Built-In from the Schema drop-down list.

    Note:

    You can select Repository from
    the Schema drop-down list to fill
    in the relevant fields automatically if the relevant metadata has
    been stored in the Repository. For
    more information about Metadata,
    see the
    Talend Studio User
    Guide
    .

  5. Click Edit schema to define the data
    structure of the input flow. In this case, the input has one column:
    Name.

    tHashInput_11.png

  6. Click OK to close the dialog
    box.
  7. Fill in the Number of rows field to
    specify the entries to output, for example 1.
  8. Select the Use Single Table check
    box. In the Values table, assign a
    value to the Name field, e.g. Marx.
  9. Double-click tHashOutput to display
    its Basic settings view.

    tHashInput_12.png

  10. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
    previous component. Select Keep all
    from the Keys management drop-down list
    and deselect the Append check
    box.

Configuring data retrieval from hash cache and data output

  1. Double-click tHashInput to display
    its Basic settings view.

    tHashInput_13.png

  2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
    which is the same as that of tHashOutput.
  3. Select tHashOutput_2 from the
    Component list drop-down
    list.
  4. Double-click tLogRow to display its
    Basic settings view.

    tHashInput_14.png

  5. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
    previous component. In the Mode area,
    select Table (print values in cells of a
    table)
    .

Saving and executing the Job

  1. Press Ctrl+S to save the Job.
  2. Press F6, or click Run on the Run tab to
    execute the Job.

    You can find that only one row was output although two rows were generated
    by tFixedFlowInput.
    tHashInput_15.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x