July 31, 2023

tSnowflakeBulkExec – Docs for ESB Snow Flake 7.x

tSnowflakeBulkExec

Loads data from files in a folder into a Snowflake table. The folder
can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3) bucket,
or an Azure container.

Normally, the tSnowflakeOutputBulk and
tSnowflakeBulkExec components work together in a two-step process:

  1. The tSnowflakeOutputBulk component uploads incoming data to a
    storage.
  2. The tSnowflakeBulkExec component loads the data from a storage into a
    Snowflake database table.

You can transform the data before it is loaded into the database table in this two-step
process. These two steps are fused together in the tSnowflakeOutputBulkExec component, detailed in a separate section.

tSnowflakeBulkExec Standard properties

These properties are used to configure tSnowflakeBulkExec running in the Standard
Job framework.

The Standard
tSnowflakeBulkExec component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click
Apply.

Property
Type

Select the way
the connection details will be set.

  • Built-In: The connection details will be set
    locally for this component. You need to specify the values for
    all related connection properties manually.

  • Repository: The connection details stored
    centrally in Repository > Metadata will be reused by this component. You need to
    click the […] button
    next to it and in the pop-up Repository Content dialog box, select the
    connection details to be reused, and all related connection
    properties will be automatically filled in.

This property is available when Use this Component is selected from the
Connection Component
drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this
component.

Account

In the Account field, enter, in double quotation marks, the account name
that has been assigned to you by Snowflake.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

Snowflake
Region

Select an AWS region or an Azure region from
the Snowflake Region drop-down list.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

User Id and Password

Enter, in double quotation marks, your authentication
information to log in Snowflake.

  • In the User ID field, enter, in double quotation
    marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake.
    For details, ask the administrator of your Snowflake system.

  • To enter the password, click the […] button next to the
    password field, and then in the pop-up dialog box enter the password between double quotes
    and click OK to save the settings.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

Warehouse

Enter, in double quotation marks, the name of the
Snowflake warehouse to be used. This name is case-sensitive and is normally upper
case in Snowflake.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

Schema

Enter, within double quotation marks, the name of the
database schema to be used. This name is case-sensitive and is normally upper case
in Snowflake.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

Database

Enter, in double quotation marks, the name of the
Snowflake database to be used. This name is case-sensitive and is normally upper
case in Snowflake.

This field is available only when
Use this Component is selected from the
Connection Component drop-down list.

Table

Click the […] button and in the displayed wizard, select the Snowflake
table to be used.

To load the data into a new table, select
Use custom object in the wizard and
enter the name of the new table in Object
Name
field.

Schema and Edit Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Built-In: You create and store the schema locally for this component
only.

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to
be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the
component, select String for the
corresponding data in the Type
column of the schema editor wizard.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be rejected.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

Table
Action

Select the action to be carried out to the table.

  • NONE: Leave
    the table as is.
  • DROP_CREATE: Remove the table and create it again.
  • CREATE:
    Create a new table.
  • CREATE_IF_NOT_EXISTS: Create the table if it does
    not exist.
  • DROP_IF_EXISTS_AND_CREATE: Remove the table if it
    already exists and create again.
  • CLEAR:
    Remove all the data records in the table.
  • TRUNCATE:
    Remove all the rows in the table. This action releases the space
    occupied by the table.
Output
Action

Select the operation you want to perform to the incoming
data and data records in the Snowflake database table. You can insert,
delete, update or merge data in the Snowflake table. This option assumes
that the Snowflake table specified in Table field already exists.

  • INSERT:
    Insert new records in the Snowflake table.
  • UPDATE:
    Update existing records in the Snowflake table.
  • UPSERT:
    Create new records and update existing records. In the Upsert Key Column field displayed,
    you need to specify the key column to be used as the join key for
    the upsert operation.
  • DELETE:
    Remove records from the Snowflake table.
Storage Select the type of storage from which the data will be
loaded to the table.

  • Internal: Load data from files stored in an
    internal Snowflake storage folder. You need also to specify the
    folder within double quotation marks in Stage Folder.
  • S3:
    Load data from files stored in a folder under an Amazon S3
    bucket. You need also to provide information about your S3 user
    account, including Region, Access
    Key
    (in double quotation marks), Secret Key, Bucket (in double quotation
    marks), and Folder (in
    double quotation marks).
  • Azure:
    Load data from files stored in an Azure folder. You need also to
    provide information about your Azure user account, including
    Protocol, Account Name (in double
    quotation marks), Container (in double quotation marks),
    Folder (in double
    quotation marks), and SAS
    Token
    .
Stage
Folder
Specify the Snowflake stage folder to load data from.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

Region Specify the region where the S3 bucket locates.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Access Key and
Secret Key
Enter the authentication information required to connect to
the Amazon S3 bucket to be used.

To enter the password, click
the […] button next to the password
field, and then in the pop-up dialog box enter the password between double
quotes and click OK to save the
settings.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Bucket Enter the name of the bucket to be used to load data. This
bucket must already exist.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Folder Enter the folder (in double quotation marks) from which you
want to load data.

This field is available when S3 or Azure is selected
from the Storage drop-down
list.

Protocol Select the protocol used to create Azure connection.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Account
Name
Enter the name (in double quotation marks) of the Azure
storage account you need to access.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Container Specify the Azure container (in double quotation marks) used for storing and managing
data.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

SAS Token Specify the SAS token to grant limited access to objects in
your storage account.

To enter the SAS token, click the
[…] button next to the SAS token
field, and then in the pop-up dialog box enter the password between double
quotes and click OK to save the
settings.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Advanced settings

Additional JDBC
Parameters

Specify additional connection properties for the database connection you are
creating. The properties are separated by semicolon and each property is a key-value
pair, for example, encryption=1;clientname=Talend.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Use Custom Snowflake
Region
Select this check box to specify a custom
Snowflake region. This option is available only when you select Use This Component from the Connection Component drop-down list in the
Basic settings view.

  • Region ID: enter a
    region ID in double quotation marks, for example eu-west-1 or east-us-2.azure.

For more information on Snowflake Region
ID, see Supported Regions.

Login Timeout

Specify the timeout period (in minutes)
of Snowflake login attempts. An error will be generated if no response is received
in this period.

Role

Enter, in double quotation marks, the default access
control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the
user ID you are using to connect to Snowflake. If this field is left empty, the
PUBLIC role is automatically granted. For information about Snowflake access control
model, see Understanding the Access Control
Model
.

Allow Snowflake to convert
columns and tables to uppercase

Select this check box to convert lowercase in the defined
table name and schema column names to uppercase. Note that unquoted
identifiers should match the Snowflake Identifier Syntax.

If you deselect the check box, all identifiers are
automatically quoted.

This property is not available when you select the
Manual Query check box.

For more information on the Snowflake Identifier Syntax,
see Identifier Syntax.

Temporary Table
Schema
Specifies a schema for the temporary table.

The schema must exist.

Custom DB
Type
Select this check box to specify the DB type for each
column in the schema.

This property is available only when you
select an action with Create Table from the Table
Action
drop down list in the Basic
settings
.

Delete Storage Files On
Success
Delete all the files in the storage folder once the
data is loaded to the table successfully.

This field is
not available when you select Use Custom
Storage Location
.

S3 Max Error
Retry

Specify the maximum data loading
retries when an error occurs during loading data to or from the S3
folder. This parameter defaults to 3. A value of -1 specifies the maximum possible
retries. Only -1 or positive
integers are accepted.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Azure Max Error
Retry

Specify the maximum data loading
retries when an error occurs during loading data to or from the Azure
folder. This parameter defaults to 3. A value of -1 specifies the maximum possible
retries. Only -1 or positive
integers are accepted.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Use Custom S3 Connection
Configuration
Select this check box if you wish to use your custom
S3 configuration.

Option: select the parameter from the list.

Value: enter
the parameter value.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Use Custom Stage Prefix

Select this check box to specify
the path to the folder (with the current stage as the root) from which
the data is loaded. You need also to enter the path to the folder in the
field provided. For example, to load data stored in the files that are
located in myfolder1/myfolder2 under the stage, you need to type
"@~/myfolder1/myfolder2" in
the field.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

Once selected, the Stage
Folder
in Basic
settings
view becomes unavailable.

Use Custom Storage
Location
Select this check box to connect to a custom external
storage, for example, S3.
Copy Command Options Set parameters for the COPY INTO command by selecting
the following options from the drop-down list. The COPY INTO command is
provided by Snowflake. It loads data to a Snowflake database table.

  • Default: Carry out the COPY INTO operation using
    the default settings, as listed in the frame to the right.
  • Table:
    Set the COPY INTO operation parameters using the Options table. To set a
    parameter, click the plus button, select the parameter from the
    Option column, and
    set the parameter value in the Value column.
  • Manual:
    Set the COPY INTO operation parameters in the text frame to the
    right manually.

For information about the parameters of the COPY INTO command, see
the COPY INTO command.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

NB_SUCCESS

The number of rows successfully processed. This is an After variable and it returns an
integer.

NB_REJECT

The number of rows rejected. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After
variable and it returns a string.

Usage

Usage rule

This component is an end component of a data flow in your Job.
It receives data from other components through the Row > Main link.

Loading data in a Snowflake table using custom stage path

This scenario describes a Job that loads data into a Snowflake table via the internal
Snowflake stage configured using the Custom Stage Path property.

The input data contains several records.

Then, the input data is inserted into a Snowflake table and the path to the internal
Snowflake stage is customized in the Advanced settings of
tSnowflakeOutputBulk and
tSnowflakeBulkExec.

Finally, the input data is outputted from the tSnowflake table to the console.

Creating a Job for loading data using custom stage path

Create a Job to open a connection to access a Snowflake database, then create a
Snowflake table and several records and insert these records into the table via the
internal Snowflake stage configured using the Custom Stage Path property, and finally get
and display the records on the console.

tSnowflakeBulkExec_1.png

  1. Create a new Job and add a tSnowflakeConnection component, a
    tFixedFlowInput component, a
    tSnowflakeOutputBulk component, a
    tSnowflakeBulkExec component, a
    tSnowflakeInput component, a
    tSnowflakeClose component, and two
    tLogRow components by typing their names in the design
    workspace or dropping them from the Palette.
  2. Link the tFixedFlowInput component to the first
    tLogRow component using a Row >
    Main connection.
  3. Do the same to link the first tLogRow component to the
    tSnowflakeBulkExec component and link the
    tSnowflakeInput component to the second
    tLogRow component.
  4. Link the tSnowflakeConnection component to the
    tFixedFlowInput component using a
    Trigger > OnSubjobOk
    connection.
  5. Do the same to link the tFixedFlowInput component to the
    tSnowflakeInput component and the
    tSnowflakeInput component to the
    tSnowflakeClose component.
  6. Link the tSnowflakeOutputBulk component to the
    tSnowflakeBulkExec component using a
    Trigger > OnComponentOk
    connection.

Opening a connection to access a snowflake database

Configure the tSnowflakeConnection component to open a
connection to access a Snowflake database.

  1. Double-click the tSnowflakeConnection component to open its
    Basic settings view.

    tSnowflakeBulkExec_2.png

  2. In the Account, User Id,
    Password , Warehouse,
    Schema and Database fields, specify
    the authentication information required to access the Snowflake Database.
  3. Select an AWS region available for the Snowflake database from the
    Snowflake Region drop-down list. In this example, it is
    AWS US West.

Loading data to the Snowflake table

Configure the tSnowflakeOutputBulk and
tSnowflakeBulkExec components to load data to the Snowflake
table.

  1. Double-click the tFixedFlowInput component to open its
    Basic settings view.

    tSnowflakeBulkExec_3.png

  2. Click the tSnowflakeBulkExec_4.png button next to Edit schema to
    define the schema. In this example, the schema has only one column: Name.
  3. In the Mode area, select Use Inline
    Table
    . Then add three records. In this example, they are:

  4. Double-click the first tLogRow component to open its
    Basic settings view.
  5. In the Mode area, select Table to
    display data.
  6. Double-click the tSnowflakeOutputBulk component to open
    its Basic settings view.

    tSnowflakeBulkExec_5.png

  7. Specify the connection details required to access Snowflake. In this example,
    from the Connection Component drop-down list displayed,
    select the connection component to reuse its connection details you have already
    defined.
  8. If needed, click the Sync columns button to make sure
    the schema is properly retrieved from the preceding component.

    tSnowflakeBulkExec_6.png

  9. Select Internal from the Storage
    drop-down list and enter demo within double quotes in the
    Stage Folder field.
  10. Double-click the tSnowflakeBulkExec component to open
    its Basic settings view.

    tSnowflakeBulkExec_7.png

  11. Specify the connection details required to access Snowflake. In this example,
    from the Connection Component drop-down list displayed,
    select the connection component to reuse its connection details you have already
    defined.
  12. Click the tSnowflakeBulkExec_4.png button next to the Table field
    to select the table you want to load data. In this example, it is
    DEMO_NAMES.
  13. If needed, click the Sync columns button to make sure
    the schema is properly retrieved from the preceding component.

    tSnowflakeBulkExec_9.png

  14. Select INSERT from the Output
    Action
    drop-down list to insert data to the table
    DEMO_NAMES.
  15. Select Internal from the Storage
    drop-down list and enter demo in the Stage
    Folder
    field.

Using custom stage path to load data

Configure the Advanced settings of the
tSnowflakeOutputBulk and
tSnowflakeBulkExec components to use the Custom Stage Path
property to load data.

  1. Click Advanced settings of the
    tSnowflakeOutputBulk component to open its
    Advanced settings view.

    tSnowflakeBulkExec_10.png

  2. Select the Use Custom Stage Path check box and enter
    @~/demo within double quotes. As a result, the
    Stage Folder you specified in the Basic
    settings
    is disabled.
  3. Select Default from the Put Command
    Options
    drop-down list.
  4. Set Put Command Error Retry to
    3.
  5. Clear the Stop on non-empty Storage Folder check
    box.
  6. Click Advanced settings of the
    tSnowflakeBulkExec component to open its
    Advanced settings view.

    tSnowflakeBulkExec_11.png

  7. Select the Convert columns and table to uppercase check
    box to convert the columns to uppercase.
  8. Select Use Custom Stage Path and enter
    @~/demo within double quotes. As a result, the
    Stage Folder you specified in the Basic
    settings
    is disabled.
  9. Clear the Use Custom Storage Location check box.
  10. Select Manual from the Copy Command
    Options
    drop-down list and enter ON_ERROR='continue'
    FILE_FORMAT=(type=csv field_delimiter=',' compression=gzip
    field_optionally_enclosed_by='"')
    within double quotes.

Retrieving data from the Snowflake table

Configure the tSnowflakeInput component to retrieve data from
the Snowflake table, and then configure the second tLogRow
component to get and display data on the console.

  1. Double-click the tSnowflakeInput component to open its
    Basic settings view.

    tSnowflakeBulkExec_12.png

  2. Specify the connection details required to access Snowflake. In this example,
    from the Connection Component drop-down list displayed,
    select the connection component to reuse its connection details you have already
    defined.
  3. Click the tSnowflakeBulkExec_4.png button next to the Table field
    to select the table you want to retrieve data. In this example, it is
    DEMO_NAMES.
  4. Click the tSnowflakeBulkExec_4.png button next to Edit schema to
    open the schema dialog box and define the schema. In this example, the schema
    has only one column: Name.

    tSnowflakeBulkExec_15.png

  5. Click Advanced settings to open the Advanced
    settings
    view.

    tSnowflakeBulkExec_16.png

  6. Select the Convert columns and table to uppercase check
    box to convert the columns to uppercase.
  7. Double-click the second tLogRow component to open its
    Basic settings view.
  8. In the Mode area, select Table to
    display data.

Executing the Job

Press Ctrl + S to save the Job and
then F6 to execute the Job.

tSnowflakeBulkExec_17.png

As shown above, the Job is executed successfully and the records are displayed on
the console.

Related scenarios


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x