tRedshiftOutput
Writes, updates, modifies or deletes the data in a database.
tRedshiftOutput executes the action
defined on the table and/or on the data of a table, according to the
input flow from the previous component.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tRedshiftOutput Standard properties.
The component in this framework is available in all Talend
products. -
Spark Batch: see tRedshiftOutput properties for Apache Spark Batch.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming: see tRedshiftOutput properties for Apache Spark Streaming. The streaming
version of this component does not support Spark 1.3.This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tRedshiftOutput Standard properties
These properties are used to configure tRedshiftOutput running in the Standard Job framework.
The Standard
tRedshiftOutput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend
products.
connector. The properties related to database settings vary depending on your database
type selection. For more information about dynamic database connectors, see Dynamic database components.
Basic settings
Database |
Select a type of database from the list and click |
Property type |
Either Built-in or Repository |
 |
Built-in: No property data stored |
 |
Repository: Select the repository |
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database connection |
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to Note: When a Job contains the parent Job and the child Job, if you
need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:
For an example about how to share a database connection |
Host |
Hostname or IP address of the database server. |
Port |
Listening port number of the database server. |
Database |
Database name. The bucket and the Redshift database to be used |
Schema |
Exact name of the schema. |
Username and Password |
Database user authentication data. To enter the password, click the […] button next to the |
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
Table |
Name of the table to which the data will be written. Note that only |
Action on table |
On the table defined, you can perform one of the following
None: No operation is carried
Drop and create a table: The table is
Create a table: The table does not
Create a table if not exists: The table
Drop a table if exists and create: The
Clear a table: The table content is |
Action on data |
On the data of the table defined, you can perform:
Insert: Add new entries to the table.
Update: Make changes to existing Insert or update: Insert a new record. If Update or insert: Update the record with the
Delete: Remove entries corresponding to Warning:
It is necessary to specify at least one |
Schema and Edit |
A schema is a row description. It defines the number of fields This This |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the When the schema to be reused has default values that are You can find more details about how to |
 |
Click Edit
|
Die on error |
This check box is selected by default. Clear the check box to skip the |
Advanced settings
Use alternate |
Select this option to use a schema other than This option is available when Use an |
Extend Insert |
Select this check box to carry out a bulk insert of a defined Number of rows per Amazon Redshift requires the number of rows per insert to be less than 32767. Note:
This option is not compatible with the Reject link. You should therefore clear the |
Use Batch |
Select this check box to activate the batch mode for data processing. Note:
This check box is available only when you have selected the |
Batch Size |
Specify the number of records to be processed in each batch. This field appears only when the Use batch mode |
Commit every |
Enter the number of rows to be completed before committing batches of |
Additional Columns |
This option is not offered if you create (with or without drop) the DB |
 |
Name: Type in the name of the schema |
 |
SQL expression: Type in the SQL |
 |
Position: Select Before, Replace or After |
 |
Reference column: Type in a column of |
Use field options |
Select this check box to customize a request, especially when there is |
JDBC url |
Select a way to access to an Amazon Redshift database from the
JDBC url drop-down list.
|
tStat |
Select this check box to collect log data at the component |
Enable parallel execution |
Select this check box to perform high-speed data processing, by treating
multiple data flows simultaneously. Note that this feature depends on the database or the application ability to handle multiple inserts in parallel as well as the number of CPU affected. In the Number of parallel executions field, either:
Note that when parallel execution is enabled, it is not possible to use global
|
The Row > Reject link is not available if any of these three
options is selected: Die on error, Extend
Insert, and Use Batch. Also, to make sure your
job runs properly, do not select any of these three options with the presence of the
Row > Reject link.
Global Variables
Global Variables |
NB_LINE: the number of rows processed. This is an After
NB_LINE_UPDATED: the number of rows updated. This is an
NB_LINE_INSERTED: the number of rows inserted. This is an
NB_LINE_DELETED: the number of rows deleted. This is an
NB_LINE_REJECTED: the number of rows rejected. This is an
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data of a table in an Amazon Redshift database. It enables you to create a reject flow, with a Row > Rejects link filtering the data in error. For a usage example, see Retrieving data in error with a Reject link. |
Dynamic settings |
Click the [+] button to add a row in the table The Dynamic settings table is For examples on using dynamic parameters, see Reading data from databases through context-based dynamic connections and Reading data from different MySQL databases using dynamically loaded connection parameters. For more information on Dynamic |
Limitation |
Due to license incompatibility, one or more JARs required to use |
Related scenarios
For a related scenario, see Handling data with Redshift.
tRedshiftOutput properties for Apache Spark Batch
These properties are used to configure tRedshiftOutput running in the Spark Batch Job framework.
The Spark Batch
tRedshiftOutput component belongs to the Databases family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to |
Host |
Enter the endpoint of the database you need to connect to in |
Port |
Enter the port number of the database you need to connect to in The related information can be found in the Cluster Database For further information, see Managing clusters console. |
Username and Password |
Enter the authentication information to the Redshift database you To enter the password, click the […] button next to the |
Database |
Enter the name of the database you need to connect to in The related information can be found in the Cluster Database For further information, see Managing clusters console. The bucket and the Redshift database to be used |
Schema |
Enter the name of the database schema to be used in Redshift. The A schema in terms of Redshift is similar to a operating system |
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
S3 configuration |
Select the tS3Configuration component from which you want Spark to use the You need drop the tS3Configuration component to be used alongside |
S3 temp path |
Enter the location in S3 in which the data to be This path is independent of the temporary path you need |
Table |
Enter the name of the table to which the data will be written. If this table does not exist, you need to select Create from the Save mode list to allow tRedshiftOutput to create it. |
Schema and Edit |
A schema is a row description. It defines the number of fields |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
 |
Click Edit
|
Save mode |
Select the actions you want tRedshiftOutput to perform on the specified table.
|
Advanced settings
Distribution style |
Select the distribution style to be applied by tRedshiftOutput on the data to be written. For further information about each of the distribution style, see |
||
Define sort key |
Select this check box to sort the data to be written based on Once selecting it, you need to select the column(s) to be used as |
||
Use staging table |
Select the Use staging This feature is available only when you have selected Overwrite from the Save mode list and is recommended when you need to keep the |
||
Define pre-actions |
Select this check box and in the field that is displayed, add a For example, using the following statement, you remove all of the
rows from the Movie table that meet the condition over the Movie and the Director tables.
|
||
Define post-actions |
Select this check box and in the field that is displayed, add a For example, using the following statement, you grant the Select
privilege on the Movie table to the user ychen.
|
||
Define extra copy options |
Select this check box and in the field that is displayed, add a
tRedshiftOutput uses the Copy statement of For further information about the extra options you can choose, |
||
Use Timestamp format for Date type |
Select the check box to output dates, hours, minutes and seconds contained in your The format used by Deltalake is |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component should use a tRedshiftConfiguration component present in the same Job to This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Batch Job, see Writing and reading data from MongoDB using a Spark Batch Job.
tRedshiftOutput properties for Apache Spark Streaming
These properties are used to configure tRedshiftOutput running in the Spark Streaming Job framework.
The Spark Streaming
tRedshiftOutput component belongs to the Databases family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to |
Host |
Enter the endpoint of the database you need to connect to in |
Port |
Enter the port number of the database you need to connect to in The related information can be found in the Cluster Database For further information, see Managing clusters console. |
Username and Password |
Enter the authentication information to the Redshift database you To enter the password, click the […] button next to the |
Database |
Enter the name of the database you need to connect to in The related information can be found in the Cluster Database For further information, see Managing clusters console. The bucket and the Redshift database to be used |
Schema |
Enter the name of the database schema to be used in Redshift. The A schema in terms of Redshift is similar to a operating system |
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
S3 configuration |
Select the tS3Configuration component from which you want Spark to use the You need drop the tS3Configuration component to be used alongside |
S3 temp path |
Enter the location in S3 in which the data to be This path is independent of the temporary path you need |
Table |
Enter the name of the table to which the data will be written. If this table does not exist, you need to select Create from the Save mode list to allow tRedshiftOutput to create it. |
Schema and Edit |
A schema is a row description. It defines the number of fields |
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
 |
Click Edit
|
Save mode |
Select the actions you want tRedshiftOutput to perform on the specified table.
|
Advanced settings
Distribution style |
Select the distribution style to be applied by tRedshiftOutput on the data to be written. For further information about each of the distribution style, see |
||
Define sort key |
Select this check box to sort the data to be written based on Once selecting it, you need to select the column(s) to be used as |
||
Use staging table |
Select the Use staging This feature is available only when you have selected Overwrite from the Save mode list and is recommended when you need to keep the |
||
Define pre-actions |
Select this check box and in the field that is displayed, add a For example, using the following statement, you remove all of the
rows from the Movie table that meet the condition over the Movie and the Director tables.
|
||
Define post-actions |
Select this check box and in the field that is displayed, add a For example, using the following statement, you grant the Select
privilege on the Movie table to the user ychen.
|
||
Define extra copy options |
Select this check box and in the field that is displayed, add a
tRedshiftOutput uses the Copy statement of For further information about the extra options you can choose, |
||
Use Timestamp format for Date type |
Select the check box to output dates, hours, minutes and seconds contained in your The format used by Deltalake is |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component should use a tRedshiftConfiguration component present in the same Job to This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Streaming Job, see
Reading and writing data in MongoDB using a Spark Streaming Job.