July 31, 2023

tSnowflakeOutputBulk – Docs for ESB Snow Flake 7.x

tSnowflakeOutputBulk

Writes incoming data to files generated in a folder. The folder
can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3) bucket,
or an Azure container.

Normally, the tSnowflakeOutputBulk and
tSnowflakeBulkExec components work together in a two-step process:

  1. The tSnowflakeOutputBulk component uploads incoming data to a
    storage.
  2. The tSnowflakeBulkExec component loads the data from a storage into a
    Snowflake database table.

You can transform the data before it is loaded into the database table in this two-step
process. These two steps are fused together in the tSnowflakeOutputBulkExec component, detailed in a separate section.

tSnowflakeOutputBulk Standard properties

These properties are used to configure tSnowflakeOutputBulk running in the Standard Job framework.

The Standard
tSnowflakeOutputBulk component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click
Apply.

Property Type

Select the way the connection details
will be set.

  • Built-In: The connection details will be set
    locally for this component. You need to specify the values for all
    related connection properties manually.

  • Repository: The connection details stored
    centrally in Repository > Metadata will be reused by this component. You need to click
    the […] button next to it and in the pop-up
    Repository Content dialog box, select the
    connection details to be reused, and all related connection
    properties will be automatically filled in.

This property is not available when other connection component is selected
from the Connection Component drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this
component.

Account

In the Account field, enter, in double quotation marks, the account name
that has been assigned to you by Snowflake.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Snowflake
Region

Select an AWS region or an Azure region from
the Snowflake Region drop-down list.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

User Id and Password

Enter, in double quotation marks, your authentication
information to log in Snowflake.

  • In the User ID field, enter, in double quotation
    marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake.
    For details, ask the administrator of your Snowflake system.

  • To enter the password, click the […] button next to the
    password field, and then in the pop-up dialog box enter the password between double quotes
    and click OK to save the settings.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Warehouse

Enter, in double quotation marks, the name of the
Snowflake warehouse to be used. This name is case-sensitive and is normally upper
case in Snowflake.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Schema

Enter, within double quotation marks, the name of the
database schema to be used. This name is case-sensitive and is normally upper case
in Snowflake.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Database

Enter, in double quotation marks, the name of the
Snowflake database to be used. This name is case-sensitive and is normally upper
case in Snowflake.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Schema and Edit Schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Built-In: You create and store the schema locally for this component
only.

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to
be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the
component, select String for the
corresponding data in the Type
column of the schema editor wizard.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Note that if the input value of any
non-nullable primitive field is null, the row of data including that field will
be rejected.

This
component offers the advantage of the dynamic schema feature. This allows you to
retrieve unknown columns from source files or to copy batches of columns from a source
without mapping each column individually. For further information about dynamic schemas,
see
Talend Studio

User Guide.

This
dynamic schema feature is designed for the purpose of retrieving unknown columns of a
table and is recommended to be used for this purpose only; it is not recommended for the
use of creating tables.

Storage Select the type of storage into which data will be
uploaded.

  • Internal: Store
    the data in a folder in the internal Snowflake storage. You need also to
    specify the folder within double quotation marks in Stage Folder.
  • S3: Store the
    data in an Amazon S3 folder. You need also to provide information about
    your S3 user account, including Region, Access
    Key
    (within double quotation marks), Secret Key, Bucket (within double quotation marks), and Folder (within double quotation
    marks).
  • Azure: Store the
    data in an Azure folder. You need also to provide information about your
    Azure user account, including Protocol, Account
    Name
    (within double quotation marks), Container (within double quotation
    marks), Folder (within double
    quotation marks), and SAS
    Token
    .
Stage
Folder
Specify the Snowflake stage folder to store the data.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

Region Specify the region where the S3 bucket locates.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Access Key and
Secret Key
Enter the authentication information required to connect to
the Amazon S3 bucket to be used.

To enter the password, click
the […] button next to the password
field, and then in the pop-up dialog box enter the password between double
quotes and click OK to save the
settings.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Bucket Enter the name of the bucket (in double quotation marks) to
be used for storing data. This bucket must already exist.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Folder Enter the name of the folder (in double quotation marks) in
which you want to store data. This folder will be created if it does not exist at
runtime.

This property is available only when S3 or Azure is selected from the Storage drop-down list.

Server-Side
Encryption
Select this check box to encrypt the files to be uploaded
to the S3 bucket on the server side. This check box is checked by default.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Protocol Select the protocol used to create an Azure connection.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Account
Name
Enter the Azure storage account name (in double quotation
marks).

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Container Enter the name (in double quotation marks) of the Azure
container used for storing data.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

SAS Token Specify the SAS token to grant limited access to objects in
your storage account.

To enter the SAS token, click the
[…] button next to the SAS token
field, and then in the pop-up dialog box enter the password between double
quotes and click OK to save the
settings.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Advanced settings

Additional JDBC
Parameters

Specify additional connection properties for the database connection you are
creating. The properties are separated by semicolon and each property is a key-value
pair, for example, encryption=1;clientname=Talend.

This field is available only when you
select Use this Component from the Connection Component drop-down list and select
Internal from the Storage drop-down list in the Basic settings view.

Use Custom Snowflake
Region
Select this check box to specify a custom
Snowflake region. This option is available only when you select Use This Component from the Connection Component drop-down list in the
Basic settings view.

  • Region ID: enter a
    region ID in double quotation marks, for example eu-west-1 or east-us-2.azure.

For more information on Snowflake Region
ID, see Supported Regions.

Login Timeout

Specify the timeout period (in minutes)
of Snowflake login attempts. An error will be generated if no response is received
in this period.

Role

Enter, in double quotation marks, the default access
control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the
user ID you are using to connect to Snowflake. If this field is left empty, the
PUBLIC role is automatically granted. For information about Snowflake access control
model, see Understanding the Access Control
Model
.

Use Custom Stage
Path

Select this check box to upload the data to the files
generated in a folder under the stage. You need also to enter the path to
the folder in the field provided. For example, to upload data to the files
generated in myfolder1/myfolder2 under the
stage, you need to type "@~/myfolder1/myfolder2" in the field.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

Once selected, the Stage Folder
in Basic settings view becomes unavailable.

Put Command
Options
Set parameters for the PUT command by selecting the
following options from the drop-down list. The PUT command is provided by
Snowflake. It uploads data to a Snowflake stage folder.

  • Default: Carry out the PUT operation using the
    default settings, as listed in the frame to the right.
  • Table:
    Set the PUT operation parameters using the Options table. To set a
    parameter, click the plus button, select the parameter from the
    Option column, and
    set the parameter value in the Value column.
  • Manual:
    Set the PUT operation parameters in the text frame to the right
    manually.

For information about the parameters of the PUT command, see the PUT command.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

Put Command Error
Retry

Specify the maximum data loading
retries when an error occurs during loading data to the internal
Snowflake storage. This parameter defaults to 3. A value of -1 specifies the maximum possible
retries. Only -1 or positive
integers are accepted.

This field is available when you
select Internal from the Storage drop-down list in the Basic settings view.

S3 Max Error
Retry

Specify the maximum data loading
retries when an error occurs during loading data to or from the S3
folder. This parameter defaults to 3. A value of -1 specifies the maximum possible
retries. Only -1 or positive
integers are accepted.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Azure Max Error
Retry

Specify the maximum data loading
retries when an error occurs during loading data to or from the Azure
folder. This parameter defaults to 3. A value of -1 specifies the maximum possible
retries. Only -1 or positive
integers are accepted.

This field is available when you select
Azure from the Storage drop-down list in the Basic
settings
view.

Use Custom S3 Connection
Configuration
Select this check box if you wish to use your custom
S3 configuration.

Option: select the parameter from the list.

Value: enter
the parameter value.

This field is available when you select
S3 from the Storage drop-down list in the Basic
settings
view.

Non-empty Storage Folder
Action
Specify the action to be performed when the storage folder
specified for uploading data is not empty.

  • Add New Files:
    continues to process the Job and adds new files to the folder.
  • Cancel Upload:
    stops the operation.
  • Replace Existing
    Files
    : cleans the storage folder before adding new
    files.
Chunk Size
(bytes)
Specify the size for the files generated, which
defaults to 52428800 bytes.

With this option specified,
the incoming data may be stored in multiple files. Since data is stored
in files on a record base, the actual size of each file generated can be
larger or smaller than the specified value, but no more than the size of
the last record stored in the file.

This option can significantly
affect the performance. So set it carefully. See File Sizing Best Practices and
Limitations
for related information.

Use Custom Local Folder Specify a local folder as a temporary folder for holding the files
generated. With this option selected, files for storing the incoming data are
first generated in the specified local folder and are then moved to the specified
storage after all the incoming data is uploaded.
Number of file requests
threads
Specify the number of threads used for sending Put
requests in parallel when writing the data in the files.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

NB_SUCCESS

The number of rows successfully processed. This is an After variable and it returns an
integer.

NB_REJECT

The number of rows rejected. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After
variable and it returns a string.

Usage

Usage rule

This component is an end component of a data
flow in your Job. It receives data from other components through the Row > Main link.

Related scenarios

For use cases in relation with tSnowflakeOutputBulk, see the
following scenario:


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x