August 17, 2023

tCassandraInput – Docs for ESB 5.x

tCassandraInput

tCassandraInput_icon32_white.png

Warning

This component will be available in the Palette of the studio on the condition that you have subscribed to
one of the Talend solutions with Big
Data.

tCassandraInput properties

Component family

Big Data / Cassandra

 

Function

tCassandraInput allows you to
read data from a Cassandra keyspace and send data in the Talend
flow.

Purpose

tCassandraInput allows you to
extract the desired data from a standard or super column family of a
Cassandra keyspace so as to apply changes to the data.

Basic settings

Use existing connection

Select this check box and in the Component List click the
relevant connection component to reuse the connection details you already defined.

 

DB Version

Select the Cassandra version you are using.

 

Host

Hostname or IP address of the Cassandra server.

 

Port

Listening port number of the Cassandra server.

 

Required authentication

Select this check box to provide credentials for the Cassandra
authentication.

This check box appears only if you do not select the Use existing connection check box.

 

Username

Fill in this field with the username for the Cassandra
authentication.

 

Password

Fill in this field with the password for the Cassandra
authentication.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Keyspace configuration

Keyspace

Type in the name of the keyspace from which you want to read
data.

Column family
configuration

Column family

Type in the name of the column family from which you want to read
data.

 

Column family type

Standard: Column family is of
standard type.

Super: Column family is of super
type.

 

Include key in output
columns

Select this check box to include the key of the column family in
output columns.

  • Key column: select the
    key column from the list.

 

Row key type

Select the appropriate Talend data type for the row key from the
list.

 

Row key Cassandra type

Select the corresponding Cassandra type for the row key from the
list.

Warning

The value of the Default
option varies with the selected row key type. For example, if
you select String from the
Row key type list, the
value of the Default option
will be UTF8.

For more information about the mapping table between Cassandra
type and Talend data type, see Mapping table between Cassandra type and Talend data type.

 

Include super key output
columns

Select this check box to include the super key of the column
family in output columns.

  • Super key column: select
    the desired super key column from the list.

This check box appears only if you select Super from the Column family
type
drop-down list.

 

Super column type

Select the type of the super column from the list.

 

Super column Cassandra
type

Select the corresponding Cassandra type for the super column from
the list.

For more information about the mapping table between Cassandra
type and Talend data type, see Mapping table between Cassandra type and Talend data type.

Query configuration

Specify row keys

Select this check box to specify the row keys of the column family
directly.

 

Row Keys

Type in the specific row keys of the column family in the correct
format depending on the row key type.

This field appears only if you select the Specify row keys check box.

 

Key start

Type in the start row key of the correct data type.

 

Key end

Type in the end row key of the correct data type.

 

Key limit

Type in the number of rows to be read between the start row key
and the end row key.

 

Specify columns

Select this check box to specify the column names of the column
family directly.

 

Columns

Type in the specific column names of the column family in the
correct format depending on the column type.

This field appears only if you select the Specify columns check box.

 

Columns range start

Type in the start column name of the correct data type.

 

Columns range end

Type in the end column name of the correct data type.

 

Columns range limit

Type in the number of columns to be read between the start column
and the end column.

 

Schema and Edit Schema

A schema is a row description. It defines the number of fields to be processed and passed on
to the next component. The schema is either Built-In or
stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are
available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

  • View schema: choose this option to view the
    schema only.

  • Change to built-in property: choose this option
    to change the schema to Built-in for local
    changes.

  • Update repository connection: choose this option to change
    the schema stored in the repository and decide whether to propagate the changes to
    all the Jobs upon completion. If you just want to propagate the changes to the
    current Job, you can select No upon completion and
    choose this schema metadata again in the [Repository
    Content]
    window.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component always needs an output link.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Mapping table between Cassandra type and Talend data type

The following table presents the mapping relationships between Cassandra type and
Talend data type.

Cassandra Type

Talend Data Type

BytesType

byte[]

AsciiType

String

UTF8Type

String

IntegerType

Object

Int32Type

Integer

LongType

Long

UUIDType

String

TimeUUIDType

String

DateType

Date

BooleanType

Boolean

FloatType

Float

DoubleType

Double

DecimalType

BigDecimal

Scenario: Handling data with Cassandra

This scenario describes a simple Job that reads the employee data from a CSV file,
writes the data to a Cassandra keyspace, then extracts the personal information of some
employees and displays the information on the console.

use_case_cassandrainput.png

This scenario requires six components, which are:

  • tCassandraConnection: opens a connection
    to the Cassandra server.

  • tFileInputDelimited: reads the input
    file, defines the data structure and sends it to the next component.

  • tCassandraOutput: writes the data it
    receives from the preceding component into a Cassandra keyspace.

  • tCassandraInput: reads the data from the
    Cassandra keyspace.

  • tLogRow: displays the data it receives
    from the preceding component on the console.

  • tCassandraClose: closes the connection to
    the Cassandra server.

Dropping and linking the components

  1. Drop the following components from the Palette onto the design workspace: tCassandraConnection, tFileInputDelimited, tCassandraOutput, tCassandraInput, tLogRow
    and tCassandraClose.

  2. Connect tFileInputDelimited to tCassandraOutput using a Row > Main link.

  3. Do the same to connect tCassandraInput to
    tLogRow.

  4. Connect tCassandraConnection to tFileInputDelimited using a Trigger > OnSubjobOk
    link.

  5. Do the same to connect tFileInputDelimited to tCassandraInput and tCassandraInput to tCassandraClose.

  6. Label the components to better identify their functions.

Configuring the components

Opening a Cassandra connection

  1. Double-click the tCassandraConnection
    component to open its Basic settings view
    in theComponent tab.

    use_case_cassandrainput1.png

  2. Select the Cassandra version that you are using from the DB Version list. In this example, it is Cassandra 1.1.2.

  3. In the Server field, type in the hostname
    or IP address of the Cassandra server. In this example, it is localhost.

  4. In the Port field, type in the listening
    port number of the Cassandra server.

  5. If required, type in the authentication information for the Cassandra
    connection: Username and Password.

Reading the input data

  1. Double-click the tFileInputDelimited
    component to open its Component view.

    use_case_cassandrainput2.png
  2. Click the […] button next to the
    File Name/Stream field to browse to the
    file that you want to read data from. In this scenario, the directory is
    D:/Input/Employees.csv. The CSV file
    contains four columns: id, age, name
    and ManagerID.

  3. In the Header field, enter 1 so that the first row in the CSV file will be
    skipped.

  4. Click Edit schema to define the data to
    pass on to the tCassandraOutput component.

    use_case_cassandrainput7.png

Writing data to a Cassandra keyspace

  1. Double-click the tCassandraOutput
    component to open its Basic settings view
    in the Component tab.

    use_case_cassandrainput3.png
  2. Type in required information for the connection or use the existing
    connection you have configured before. In this scenario, the Use existing connection check box is
    selected.

  3. In the Keyspace configuration area, type
    in the name of the keyspace: Employee in
    this example, and select Drop keyspace if exists and
    create
    from the Action on
    keyspace
    list.

  4. In the Column family configuration area,
    type in the name of the column family: Employee_Info in this example, and select Drop column family if exists and create from
    the Action on column family list.

    The Define column family structure check
    box appears. In this example, clear this check box.

  5. In the Action on data list, select the
    action you want to carry on, Upsert in
    this example.

  6. Click Sync columns to retrieve the schema
    from the preceding component.

  7. Select the key column of the column family from the Key column list. In this example, it is id.

    If needed, select the Include key in
    columns
    check box.

Reading data from the Cassandra keyspace

  1. Double-click the tCassandraInput
    component to open its Component
    view.

    use_case_cassandrainput4.png
  2. Type in required information for the connection or use the existing
    connection you have configured before. In this scenario, the Use existing connection check box is
    selected.

  3. In the Keyspace configuration area, type
    in the name of the keyspace: Employee in
    this example.

  4. In the Column family configuration area,
    type in the name of the column family: Employee_Info in this example.

  5. Select Edit schema to define the data
    structure to be read from the Cassandra keyspace. In this example, three
    columns id, name and age are
    defined.

    components-use_case_tcassandrainput_schema.png
  6. If needed, select the Include key in output
    columns
    check box, and then select the key column of the
    column family you want to include from the Key
    column
    list.

  7. From the Row key type list, select
    Integer because id is of integer type in this example.

    Keep the Default option for the row key
    Cassandra type because its value will become the corresponding Cassandra
    type Int32 automatically.

  8. In the Query configuration area, select
    the Specify row keys check box and specify
    the row keys directly. In this example, three rows will be read. Next,
    select the Specify columns check box and
    specify the column names of the column family directly. This scenario will
    read three columns from the keyspace: id,
    name and age.

  9. If needed, the Key start and the
    Key end fields allow you to define the
    range of rows, and the Key limit field
    allows you to specify the number of rows within the range of rows to be
    read. Similarly, the Columns range start
    and the Columns range end fields allow you
    to define the range of columns of the column family, and the Columns range limit field allows you to specify
    the number of columns within the range of columns to be read.

Displaying the information of interest

  1. Double-click the tLogRow component to
    open its Component view.

  2. In the Mode area, select Table (print values in cells of a table).

Closing the Cassandra connection

  1. Double-click the tCassandraClose
    component to open its Component
    view.

    use_case_cassandrainput5.png

  2. Select the connection to be closed from the Component List.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or
    clicking Run on the Run tab.

    The personal information of three employees is displayed on the
    console.

    use_case_cassandrainput6.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x