tSnowflakeOutputBulkExec

Writes incoming data to files generated in a folder and then loads
the data into a Snowflake database table. The folder can be in an internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3)
bucket, or an Azure container.

This component incorporates the
operations of the tSnowflakeOutputBulk and the tSnowflakeBulkExec components.

tSnowflakeOutputBulkExec Standard properties

These properties are used to configure tSnowflakeOutputBulkExec running in the Standard Job framework.

The Standard
tSnowflakeOutputBulkExec component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database	Select a type of database from the list and click Apply.
Property Type	Select the way the connection details will be set. Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually. Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the […] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in. This property is available when Use this Component is selected from the Connection Component drop-down list.
Connection Component	Select the component that opens the database connection to be reused by this component.
Account	In the Account field, enter, in double quotation marks, the account name that has been assigned to you by Snowflake. This field is available only when Use this Component is selected from the Connection Component drop-down list.
Snowflake Region	Select an AWS region or an Azure region from the Snowflake Region drop-down list. This field is available when you select Internal from the Storage drop-down list in the Basic settings view.
User Id and Password	Enter, in double quotation marks, your authentication information to log in Snowflake. In the User ID field, enter, in double quotation marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake. For details, ask the administrator of your Snowflake system. To enter the password, click the […] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings. This field is available only when Use this Component is selected from the Connection Component drop-down list.
Warehouse	Enter, in double quotation marks, the name of the Snowflake warehouse to be used. This name is case-sensitive and is normally upper case in Snowflake. This field is available only when Use this Component is selected from the Connection Component drop-down list.
Schema	Enter, within double quotation marks, the name of the database schema to be used. This name is case-sensitive and is normally upper case in Snowflake. This field is available only when Use this Component is selected from the Connection Component drop-down list.
Database	Enter, in double quotation marks, the name of the Snowflake database to be used. This name is case-sensitive and is normally upper case in Snowflake. This field is available only when Use this Component is selected from the Connection Component drop-down list.
Table	Click the […] button and in the displayed wizard, select the Snowflake table to be used. To load the data into a new table, select Use custom object in the wizard and enter the name of the new table in Object Name field.
Schema and Edit Schema	A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word `line` when naming the fields. Built-In: You create and store the schema locally for this component only. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. If the Snowflake data type to be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the component, select `String` for the corresponding data in the Type column of the schema editor wizard. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window. Note that if the input value of any non-nullable primitive field is null, the row of data including that field will be rejected. This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide. This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.
Table Action	Select the action to be carried out to the table. NONE: Leave the table as is. DROP_CREATE: Remove the table and create it again. CREATE: Create a new table. CREATE_IF_NOT_EXISTS: Create the table if it does not exist. DROP_IF_EXISTS_AND_CREATE: Remove the table if it already exists and create again. CLEAR: Remove all the data records in the table. TRUNCATE: Remove all the rows in the table. This action releases the space occupied by the table.
Output Action	Select the operation you want to perform to the incoming data and data records in the Snowflake database table. You can insert, delete, update or merge data in the Snowflake table. This option assumes that the Snowflake table specified in Table field already exists. INSERT: Insert new records in the Snowflake table. UPDATE: Update existing records in the Snowflake table. UPSERT: Create new records and update existing records. In the Upsert Key Column field displayed, you need to specify the key column to be used as the join key for the upsert operation. DELETE: Remove records from the Snowflake table.
Storage	Select the type of storage to upload the incoming data to and to load data into the table from. Internal: Store the incoming data in a folder in the internal Snowflake storage and then load data from the folder. You need also to specify the folder within double quotation marks in Stage Folder. S3: Store the incoming data in an Amazon S3 folder and then load data from the folder. You need also to provide information about your S3 user account, including Region, Access Key (within double quotation marks), Secret Key, Bucket (within double quotation marks), and Folder (within double quotation marks). Azure: Store the incoming data in an Azure folder and then load data from the folder. You need also to provide information about your Azure user account, including Protocol, Account Name (within double quotation marks), Container (within double quotation marks), Folder (within double quotation marks), and SAS Token.
Stage Folder	Specify the folder under the Snowflake stage to write incoming data to and to load data from. This field is available when you select Internal from the Storage drop-down list in the Basic settings view.
Region	Specify the region where the S3 bucket locates. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Access Key and Secret Key	Enter the authentication information required to connect to the Amazon S3 bucket to be used. To enter the password, click the […] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Bucket	Enter the name of the bucket (in double quotation marks) to be used for storing the incoming data and loading data from. This bucket must already exist. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Folder	Enter the name of the folder (in double quotation marks) to be used for storing the incoming data and loading data from. This folder will be created if it does not exist at runtime. This property is available only when S3 or Azure is selected from the Storage drop-down list.
Server-Side Encryption	Select this check box to encrypt the files to be uploaded to the S3 bucket on the server side. This property is checked by default. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Protocol	Select the protocol used to create Azure connection. This field is available when you select Azure from the Storage drop-down list in the Basic settings view.
Account Name	Enter the Azure storage account name (in double quotation marks). This field is available when you select Azure from the Storage drop-down list in the Basic settings view.
Container	Enter the Azure container name (in double quotation marks). This field is available when you select Azure from the Storage drop-down list in the Basic settings view.
SAS Token	Specify the SAS token to grant limited access to objects in your storage account. To enter the SAS token, click the […] button next to the SAS token field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings. This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

Advanced settings

Additional JDBC Parameters	Specify additional connection properties for the database connection you are creating. The properties are separated by semicolon and each property is a key-value pair, for example, encryption=1;clientname=Talend. This field is available only when you select Use this Component from the Connection Component drop-down list and select Internal from the Storage drop-down list in the Basic settings view.
Use Custom Snowflake Region	Select this check box to specify a custom Snowflake region. This option is available only when you select Use This Component from the Connection Component drop-down list in the Basic settings view. Region ID: enter a region ID in double quotation marks, for example `eu-west-1` or `east-us-2.azure`. For more information on Snowflake Region ID, see Supported Regions.
Login Timeout	Specify the timeout period (in minutes) of Snowflake login attempts. An error will be generated if no response is received in this period.
Role	Enter, in double quotation marks, the default access control role to use to initiate the Snowflake session. This role must already exist and has been granted to the user ID you are using to connect to Snowflake. If this field is left empty, the PUBLIC role is automatically granted. For information about Snowflake access control model, see Understanding the Access Control Model.
Allow Snowflake to convert columns and tables to uppercase	Select this check box to convert lowercase in the defined table name and schema column names to uppercase. Note that unquoted identifiers should match the Snowflake Identifier Syntax. If you deselect the check box, all identifiers are automatically quoted. This property is not available when you select the Manual Query check box. For more information on the Snowflake Identifier Syntax, see Identifier Syntax.
Temporary Table Schema	Specify a schema for the temporary table. The schema must exist.
Custom DB Type	Select this check box to specify the DB type for each column in the schema. This property is available only when you select an action with Create Table from the Table Action drop down list in the Basic settings.
Delete Storage Files On Success	Delete all the files in the storage folder once the data is loaded to the table successfully. This field is not available when you select Use Custom Storage Location.
Use Custom Stage Path	Select this check box to upload the data to the files generated in a folder under the stage. You need also to enter the path to the folder in the field provided. For example, to upload data to the files generated in myfolder1/myfolder2 under the stage, you need to type `"@~/myfolder1/myfolder2"` in the field. This field is available when you select Internal from the Storage drop-down list in the Basic settings view. Once selected, the Stage Folder in Basic settings view becomes unavailable.
Use Custom S3 Connection Configuration	Select this check box if you wish to use your custom S3 configuration. Option: select the parameter from the list. Value: enter the parameter value. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Copy Command Options	Set parameters for the COPY INTO command by selecting the following options from the drop-down list. The COPY INTO command is provided by Snowflake. It loads data to a Snowflake database table. Default: Carry out the COPY INTO operation using the default settings, as listed in the frame to the right. Table: Set the COPY INTO operation parameters using the Options table. To set a parameter, click the plus button, select the parameter from the Option column, and set the parameter value in the Value column. Manual: Set the COPY INTO operation parameters in the text frame to the right manually. For information about the parameters of the COPY INTO command, see the COPY INTO command.
Put Command Options	Set parameters for the PUT command by selecting the following options from the drop-down list. The PUT command is provided by Snowflake. It uploads data to a Snowflake stage folder. Default: Carry out the PUT operation using the default settings, as listed in the frame to the right. Table: Set the PUT operation parameters using the Options table. To set a parameter, click the plus button, select the parameter from the Option column, and set the parameter value in the Value column. Manual: Set the PUT operation parameters in the text frame to the right manually. For information about the parameters of the PUT command, see the PUT command. This field is available when you select Internal from the Storage drop-down list in the Basic settings view.
Put Command Error Retry	Specify the maximum data loading retries when an error occurs during loading data to the internal Snowflake storage. This parameter defaults to 3. A value of `-1` specifies the maximum possible retries. Only `-1` or positive integers are accepted. This field is available when you select Internal from the Storage drop-down list in the Basic settings view.
S3 Max Error Retry	Specify the maximum data loading retries when an error occurs during loading data to or from the S3 folder. This parameter defaults to 3. A value of `-1` specifies the maximum possible retries. Only `-1` or positive integers are accepted. This field is available when you select S3 from the Storage drop-down list in the Basic settings view.
Azure Max Error Retry	Specify the maximum data loading retries when an error occurs during loading data to or from the Azure folder. This parameter defaults to 3. A value of `-1` specifies the maximum possible retries. Only `-1` or positive integers are accepted. This field is available when you select Azure from the Storage drop-down list in the Basic settings view.
Non-empty Storage Folder Action	Select any of the following options: Add New Files: continues to process the Job even if the selected storage folder already contains files and adds new files to the folder. Cancel Upload: stops the operation when the storage folder used for loading data is not empty. Replace Existing Files: cleans the storage folder before adding new files.
Chunk Size (bytes)	Specify the size for the files generated, which defaults to 52428800 bytes. With this option specified, the incoming data may be stored in multiple files. Since data is stored in files on a record base, the actual size of each file generated can be larger or smaller than the specified value, but no more than the size of the last record stored in the file. This option can significantly affect the performance. So set it carefully. See File Sizing Best Practices and Limitations for related information.
Number of file requests threads	Specify the number of threads used for sending Put requests in parallel when writing the data in the files.
tStatCatcher Statistics	Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE	The number of rows processed. This is an After variable and it returns an integer.
NB_SUCCESS	The number of rows successfully processed. This is an After variable and it returns an integer.
NB_REJECT	The number of rows rejected. This is an After variable and it returns an integer.
ERROR_MESSAGE	The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule	This component is an end component of a data flow in your Job. It receives data from other components through the Row > Main link. It can also send error messages to other components via a Row > Rejects link. The provided information about an error could be: The name of the column in error. The number of the row in error. The category of the error, such as a parsing error or a conversion error. The character offset in the line in error. The related error message. The offset of the byte in error. The number of the line in error. SQLSTATE for the error. Snowflake error code.

Usage rule

This component is
an
end component of a data flow in your Job. It receives data
from other components through
the
Row > Main
link.

It can also send error messages to other components via a Row >
Rejects link. The provided information about an error could
be:

The name of the column in error.
The number of the row in error.
The category of the error, such as a parsing error or a conversion
error.
The character offset in the line in error.
The related error message.
The offset of the byte in error.
The number of the line in error.
SQLSTATE for the error.
Snowflake error code.

Loading Data Using COPY Command

This scenario describes a Job that loads data into a Snowflake table using the COPY
command, where Copy Command Options is customized in ON_ERROR form in
case any record is rejected.

The input data contains several records, one of which violates the length limitation.

#Name

Harry

veryveryveryLongName

Jack

Then, the input data is inserted into a Snowflake table and the record that violates the
length limitation is rejected.

Creating a Job for loading data using the COPY command

Create a Job to open a connection to access a Snowflake database, then create a
Snowflake table and several records and insert these records into the table using the COPY
command, and finally get and display the records on the console.

Create a new Job and add a tSnowflakeConnection component, a
tFixedFlowInput component, a
tSnowflakeOutputBulkExec component, a
tSnowflakeInput component, a
tSnowflakeClose component, two
tSnowflakeRow components and two
tLogRow components by typing their names in the design
workspace or dropping them from the Palette.
Link the tFixedFlowInput component to the
tSnowflakeOutputBulkExec component using a
Row > Main connection.
Do the same to link the tSnowflakeOutputBulkExec component
to the first tLogRow component and link the
tSnowflakeInput component to the second
tLogRow component.
Link the tSnowflakeConnection component to the first
tSnowflakeRow component using a
Trigger > OnSubjobOk
connection.
Do the same to link the first tSnowflakeRow component to the
second tSnowflakeRow component, the second
tSnowflakeRow component to the
tFixedFlowInput component, the
tFixedFlowInput component to the
tSnowflakeInput component, the
tSnowflakeInput component to the
tSnowflakeClose component.

Opening a connection to access a Snowflake database

Configure the tSnowflakeConnection component to open a
connection to access a Snowflake database.

Double-click the tSnowflakeConnection component to open its
Basic settings view.
In the Account, User Id,
Password , Warehouse,
Schema and Database fields, specify
the authentication information required to access the Snowflake database.
Select an AWS region available for the Snowflake database from the
Snowflake Region drop-down list. In this example, it is
AWS US West.

Creating a Snowflake table

Configure the tSnowflakeRow to create a table. If the
table already exists, it will be dropped and then created. Next, configure the
tFixedFlowInput component to generate several records, one of
which violates the length limitation.

Double-click the tSnowflakeRow component to open its
Basic settings view.
Specify the connection details required to access Snowflake. In this example,
from the Connection Component drop-down list displayed,
select the connection component to reuse its connection details you have already
defined.
In the Queue field, enter the query statement between
double quotation marks to drop the table if it already exists. In this example,
it is drop table if exists demo_names.
Double-click the tSnowflakeRow component to open its
Basic settings view.
Specify the connection details required to access Snowflake. In this example,
from the Component List drop-down list displayed, select
the connection component to reuse its connection details you have already
defined.
In the Queue field, enter the query statement between
double quotation marks to create a table. In this example, it is
create table demo_names (name varchar(10));.
Double-click the tFixedFlowInput component to open its
Basic settings view.
Click the button next to Edit schema to
define the schema. In this example, the schema has only one column: Name.
In the Mode area, select Use Inline
Content. Then add three records. In this example, they
are:

Harry veryveryveryLongName Jack

1
2
3
4

Harry
veryveryveryLongName
Jack

Loading data to the Snowflake table

Configure the Basic settings of the
tSnowflakeOutputBulkExec component to insert data to the
Snowflake table.

Double-click the tSnowflakeOutputBulkExec component to
open its Basic settings view.
Specify the connection details required to access Snowflake. In this example,
from the Connection Component drop-down list displayed,
select the connection component to reuse its connection details you have already
defined.
In the Table field, select the table you created in the
second tSnowflakeRow component in the previous step. In
this example, it is DEMO_NAMES.
If needed, click the Sync columns button to make sure
the schema is properly retrieved from the preceding component.
Select INSERT from the Output
Action drop-down list.
Select Internal from the Storage
drop-down list and enter demo in the Stage
Folder field.

Using COPY command to load data

Configure the Advanced settings of the
tSnowflakeOutputBulkExec component to use the COPY command to
load data, and then configure the first tLogRow component to
display data on the console.

Click Advanced settings to open the Advanced settings
view.
Select the Convert columns and table to uppercase check
box and the Delete Storage Files On Success check box to
convert the columns to uppercase and delete all files in the storage once the
Job runs successfully.
Clear the Use Custom Stage Path check box. Otherwise,
the Stage Folder you specified in the Basic
settings makes no effect.
Select Manual from the Copy Command
Options drop-down list and enter ON_ERROR='continue' FILE_FORMAT=(type=csv field_delimiter=',' compression=gzip field_optionally_enclosed_by='"') within double quotes.
Select Default from the Put Command
Options drop-down list.
Set Put Command Error Retry to
3.
Clear the Stop on non-empty Storage Folder check box
.
Double-click the first tLogRow component to open its
Basic settings view.
In the Mode area, select Table to
display data.

Retrieving data from the Snowflake table

Configure the tSnowflakeInput component to retrieve data from
the Snowflake table, and then configure the second tLogRow
component to display data on the console.

Double-click the tSnowflakeInput component to open its
Basic settings view.
Specify the connection details required to access Snowflake. In this example,
from the Connection Component drop-down list displayed,
select the connection component to reuse its connection details you have already
defined.
In the Table field, select the table you created in the
second tSnowflakeRow component in the previous step. In
this example, it is DEMO_NAMES.
Click the button next to Edit schema to
open the schema dialog box and define the schema. In this example, the schema
has only one column: Name.
Click Advanced settings to open the Advanced settings
view.
Select the check box of Convert columns and table to
uppercase to convert the columns to uppercase.
Double-click the second tLogRow component to open its
Basic settings view.
In the Mode area, select Table to
display data.

Executing the Job

Press Ctrl + S to save the Job and
then F6 to execute the Job.

As shown above, the Job is executed successfully and the records are displayed on
the console, and the one that violates the length limitation is rejected.

Related scenarios

For use cases in relation with tSnowflakeOutputBulkExec, see the
following scenario:

Inserting data in bulk in MySQL database.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 7.x

0 Comments

Inline Feedbacks

View all comments

tSnowflakeOutputBulkExec – Docs for ESB 7.x

tSnowflakeOutputBulkExec