July 31, 2023

tSnowflakeInput – Docs for ESB Snow Flake 7.x

tSnowflakeInput

Reads data from a Snowflake table into the data flow of your Job based on an SQL
query.

Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:

tSnowflakeInput Standard properties

These properties are used to configure tSnowflakeInput running in the Standard Job framework.

The Standard
tSnowflakeInput component belongs to the
Cloud family.

The component in this framework is available in all Talend
products
.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click
Apply.

Property Type

Select the way the connection details
will be set.

  • Built-In: The connection details will be set
    locally for this component. You need to specify the values for all
    related connection properties manually.

  • Repository: The connection details stored
    centrally in Repository > Metadata will be reused by this component. You need to click
    the […] button next to it and in the pop-up
    Repository Content dialog box, select the
    connection details to be reused, and all related connection
    properties will be automatically filled in.

This property is not available when other connection component is selected
from the Connection Component drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this
component.

Account

In the Account field, enter, in double quotation marks, the account name
that has been assigned to you by Snowflake.

User Id and Password

Enter, in double quotation marks, your authentication
information to log in Snowflake.

  • In the User ID field, enter, in double quotation
    marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake.
    For details, ask the administrator of your Snowflake system.

  • To enter the password, click the […] button next to the
    password field, and then in the pop-up dialog box enter the password between double quotes
    and click OK to save the settings.

Warehouse

Enter, in double quotation marks, the name of the
Snowflake warehouse to be used. This name is case-sensitive and is normally upper
case in Snowflake.

Schema

Enter, within double quotation marks, the name of the
database schema to be used. This name is case-sensitive and is normally upper case
in Snowflake.

Database

Enter, in double quotation marks, the name of the
Snowflake database to be used. This name is case-sensitive and is normally upper
case in Snowflake.

Table

Click the […] button and in the displayed wizard, select the Snowflake
table to be used.

Schema and Edit Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Built-In: You create and store the schema locally for this component
only.

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to
be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the
component, select String for the
corresponding data in the Type
column of the schema editor wizard.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be rejected.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

Manual query

Select this check box and in the Full SQL query string field, enter, within double
quotation marks, the query statement to be executed. For information about the
Snowflake SQL commands, see SQL Command Reference.

Condition

Enter, in double quotation marks, the boolean expression
to be used to filter data from your Snowflake table. This expression is actually
used as WHERE clause in the SELECT query run by tSnowflakeInput.

Advanced settings

Additional JDBC
Parameters

Specify additional connection properties for the database connection you are
creating. The properties are separated by semicolon and each property is a key-value
pair, for example, encryption=1;clientname=Talend.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Use Custom Snowflake
Region
Select this check box to specify a custom
Snowflake region. This option is available only when you select Use This Component from the Connection Component drop-down list in the
Basic settings view.

  • Region ID: enter a
    region ID in double quotation marks, for example eu-west-1 or east-us-2.azure.

For more information on Snowflake Region
ID, see Supported Regions.

Login Timeout

Specify the timeout period (in minutes)
of Snowflake login attempts. An error will be generated if no response is received
in this period.

Tracing

Select the log level for the Snowflake JDBC driver. If
enabled, a standard Java log is generated.

Role

Enter, in double quotation marks, the default access
control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the
user ID you are using to connect to Snowflake. If this field is left empty, the
PUBLIC role is automatically granted. For information about Snowflake access control
model, see Understanding the Access Control
Model
.

Allow Snowflake to convert
columns and tables to uppercase

Select this check box to convert lowercase in the defined
table name and schema column names to uppercase. Note that unquoted
identifiers should match the Snowflake Identifier Syntax.

If you deselect the check box, all identifiers are
automatically quoted.

This property is not available when you select the
Manual Query check box.

For more information on the Snowflake Identifier Syntax,
see Identifier Syntax.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After
variable and it returns a string.

Usage

Usage rule

This component is start component of a data flow in your Job. It sends data to
other components via a Row link.

Writing data into and reading data from a Snowflake table

Here is an example of using Talend Snowflake components to connect to Snowflake,
create a Snowflake table, write data into the table, then read data from this table,
finally close the connection to Snowflake.

tSnowflakeInput_1.png

Creating a Job for writing and reading Snowflake data

Create a Job to connect to Snowflake, create a Snowflake table, write data into
the table and then read data from the table, finally close the connection to
Snowflake.

  • The Snowflake data warehouse to be used is started but not in the Suspended
    status.

  • The Snowflake role assigned to you has read and write permissions to the
    database to be connected.

If you are not sure about these requirements, ask the administrator of your Snowflake
system.

  1. In the
    Integration
    perspective of the Studio, create a new Job from the Job Designs node in the Repository tree view.
  2. Add a tSnowflakeConnection component, a
    tSnowflakeRow component, a
    tFixedFlowInput component, a
    tSnowflakeOutput component, a
    tSnowflakeInput component, a
    tLogRow component, and a
    tSnowflakeClose component to the Job.
  3. Connect the tFixedFlowInput component to the tSnowflakeOutput component using a Row > Main connection.
  4. Do the same to connect the tSnowflakeInput component to
    the tLogRow component.
  5. Connect the tSnowflakeConnection component to the tSnowflakeRow component using a Trigger > On Subjob Ok connection.
  6. Do the same to connect the tSnowflakeRow
    component to the tFixedFlowInput
    component, the tFixedFlowInput component to the
    tSnowflakeInput component, and the
    tSnowflakeInput component to the
    tSnowflakeClose component.

Connecting to Snowflake

Configure the tSnowflakeConnection component to connect to
Snowflake.

  1. Double-click the tSnowflakeConnection component to open its Basic settings view.
  2. In the Account field, enter
    the account name assigned by Snowflake.
  3. In the Snowflake Region field, select the region where the
    Snowflake database locates.
  4. In the User Id and the
    Password fields, enter the authentication
    information accordingly.

    Note that this user ID is your user login name. If you do not know your user login
    name yet, ask the administrator of your Snowflake system for details.

  5. In the Warehouse field,
    enter the name of the data warehouse to be used in Snowflake.
  6. In the Schema field, enter
    the name of the database schema to be used.
  7. In the Database field, enter
    the name of the database to be used.

Creating a Snowflake table

Configure the tSnowflakeRow component to create a new
table.

  1. Double-click the tSnowflakeRow component to open its
    Basic settings view on the
    Component tab.
  2. From the Connection Component drop-down list, select the
    tSnowflakeConnection component to reuse the
    connection created by it. In this example, it is
    tSnowflakeConnection_1.
  3. In the Query field, enter the SQL statement used to
    create a new table into which the data will be written.

    In this example, the following SQL statement is used to create or replace a
    table EMPLOYEE with four columns,
    ID of INT type, NAME of
    VARCHAR type, ONBOARD of DATE type, and
    SALARY of VARIANT type.

Writing data into Snowflake

Configure the tFixedFlowInput component and the
tSnowflakeOutput component to write data into
Snowflake.

  1. Double-click the tFixedFlowInput component to open its Basic settings view.

    tSnowflakeInput_2.png

  2. Click the […] button next
    to Edit schema and in the pop-up schema dialog
    box, define the schema by adding four columns, ID of int type,
    NAME and SALARY of String type, and
    ONBOARD of Date type.

    tSnowflakeInput_3.png

  3. Click OK to validate these
    changes and accept the propagation prompted by the pop-up dialog box.
  4. In the Number of rows field, enter the number of records to
    be generated, 5 in this example.
  5. In the Mode area, select
    Use Single Table and specify the value for each
    column.

    • ID: the automatically incremented number generated by
      the routine Numeric.sequence("id",1,1).

    • NAME: the random first name generated by the routine
      TalendDataGenerator.getFirstName().

    • ONBOARD: the random date generated by the routine
      TalendDate.getRandomDate("2007-09-01","2017-09-01").

    • SALARY: the semi-structured JSON data with the format
      {"Salary": value}, where the salary value is
      generated by the routine
      Numeric.random(100000,200000).

  6. Double-click the tSnowflakeOutput component to open its Basic settings view.
  7. From the Connection
    Component
    drop-down list, select the tSnowflakeConnection component to reuse the connection created by
    it.
  8. Click the […] button next
    to the Table field and in the pop-up dialog
    box, select the Use custom object check box and enter the name
    of the table into which the data will be written in the Object
    Name
    field. In this example, it is EMPLOYEE,
    the table created by the tSnowflakeRow component. When done,
    click OK to close the dialog box.
  9. From the Output action
    drop-down list, select Insert to write data to
    the table EMPLOYEE.

Reading data from Snowflake

Configure the tSnowflakeInput component and the
tLogRow component to retrieve data from Snowflake and output
the data on the console.

  1. Double-click the tSnowflakeInput component to open its Basic settings view.
  2. From the Connection
    Component
    drop-down list, select the tSalesforceConnection component to reuse the connection created
    by it.
  3. Click the […] button next to the
    Table field and in the pop-up dialog box, select the
    Use custom object check box and enter the name of the
    table from which the data will be retrieved in the Object
    Name
    field. In this example, it is
    EMPLOYEE, the table created by the
    tSnowflakeRow component. When done, click
    OK to close the dialog box.
  4. Click the […] button
    next to Edit schema and in the pop-up
    schema dialog box, define the schema by adding four columns,
    ID of int type, NAME and
    SALARY of String type, and
    ONBOARD of Date type. This schema is the same as the
    schema of the tSnowflakeOutput component.
  5. Click OK to validate
    these changes and accept the propagation prompted by the pop-up dialog
    box.
  6. Double-click the tLogRow
    component to open its Component
    view.
  7. In the Mode area, select
    Vertical (each row is a key/value
    list)
    for better readability of the results.

Closing the Snowflake connection

Configure the tSnowflakeClose component to close the
connection to Snowflake.

  1. Double-click the tSnowflakeClose
    component to open the Component
    tab.
  2. From the Connection Component drop-down list, select the
    component that opens the connection you need to close,
    tSnowflakeConnection_1 in this example.

Executing the Job to write and read data in Snowflake

After setting up the Job and configuring the components used in the Job for writing
and reading Snowflake data, you can then execute the Job and verify the Job execution
result.

  1. Press Ctrl + S to save the
    Job.
  2. Press F6 to run the Job.

    As shown below, the data is written into the table EMPLOYEE in
    Snowflake, then read from this table and outputted on the console in the
    Run view of the Studio.

    tSnowflakeInput_4.png

tSnowflakeInput properties for Apache Spark Batch (technical preview)

These properties are used to configure tSnowflakeInput running in the
Spark Batch Job framework.

The Spark Batch
tSnowflakeInput component belongs to the Databases family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Use an existing configuration

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

Account

In the Account field, enter, in double quotation marks, the account name
that has been assigned to you by Snowflake.

Region

Select an AWS or Azure region from the drop-down list.

Username and Password

Enter, in double quotation marks, your authentication
information to log in Snowflake.

  • In the User ID field, enter, in double quotation
    marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake.
    For details, ask the administrator of your Snowflake system.

  • To enter the password, click the […] button next to the
    password field, and then in the pop-up dialog box enter the password between double quotes
    and click OK to save the settings.

Database

Enter, in double quotation marks, the name of the
Snowflake database to be used. This name is case-sensitive and is normally upper
case in Snowflake.

Warehouse

Enter, in double quotation marks, the name of the
Snowflake warehouse to be used. This name is case-sensitive and is normally upper
case in Snowflake.

Schema and Edit Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Built-In: You create and store the schema locally for this component
only.

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to
be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the
component, select String for the
corresponding data in the Type
column of the schema editor wizard.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be rejected.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

Table Name Enter, within double quotation marks, the name of the Snowflake table
to be used. This name is case-sensitive and is normally upper case in
Snowflake.
Read from Select either Table or
Query from the dropdown list.

Advanced settings

Allow Snowflake to convert
columns and tables to uppercase

Select this check box to convert lowercase in the defined
table name and schema column names to uppercase. Note that unquoted
identifiers should match the Snowflake Identifier Syntax.

If you deselect the check box, all identifiers are
automatically quoted.

This property is not available when you select the
Manual Query check box.

For more information on the Snowflake Identifier Syntax,
see Identifier Syntax.

Use Custom Region Select this check box to use the customized Snowflake
region.
Custom Region Enter, within double quotation marks, the name of the region to be
used. This name is case-sensitive and is normally upper case in
Snowflake.
Trim all the String/Char columns

Select this check box to remove leading and trailing whitespace from all
the String/Char columns.

Trim column Remove the leading and trailing whitespace from the defined
columns.

Usage

Usage rule

This component is used as a start component and requires an output
link..

Use a tSnowFlakeConfiguration: update component in the same Job to connect
to Snowflake.

Spark Connection

In the Spark
Configuration
tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:

  • Yarn mode (Yarn client or Yarn cluster):

    • When using Google Dataproc, specify a bucket in the
      Google Storage staging bucket
      field in the Spark configuration
      tab.

    • When using HDInsight, specify the blob to be used for Job
      deployment in the Windows Azure Storage
      configuration
      area in the Spark
      configuration
      tab.

    • When using Altus, specify the S3 bucket or the Azure
      Data Lake Storage for Job deployment in the Spark
      configuration
      tab.
    • When using Qubole, add a
      tS3Configuration to your Job to write
      your actual business data in the S3 system with Qubole. Without
      tS3Configuration, this business data is
      written in the Qubole HDFS system and destroyed once you shut
      down your cluster.
    • When using on-premise
      distributions, use the configuration component corresponding
      to the file system your cluster is using. Typically, this
      system is HDFS and so use tHDFSConfiguration.

  • Standalone mode: use the
    configuration component corresponding to the file system your cluster is
    using, such as tHDFSConfiguration or
    tS3Configuration.

    If you are using Databricks without any configuration component present
    in your Job, your business data is written directly in DBFS (Databricks
    Filesystem).

This connection is effective on a per-Job basis.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x