tRedshiftOutput
Writes, updates, modifies or deletes the data in a database.
tRedshiftOutput executes the action
defined on the table and/or on the data of a table, according to the
input flow from the previous component.
Depending on the Talend solution you
are using, this component can be used in one, some or all of the following Job
frameworks:
-
Standard: see tRedshiftOutput Standard properties.
The component in this framework is generally available.
-
Spark Batch: see tRedshiftOutput properties for Apache Spark Batch.
The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data. -
Spark Streaming: see tRedshiftOutput properties for Apache Spark Streaming. The streaming
version of this component does not support Spark 1.3.The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.
tRedshiftOutput Standard properties
These properties are used to configure tRedshiftOutput running in the Standard Job framework.
The Standard
tRedshiftOutput component belongs to the Cloud and the Databases families.
The component in this framework is generally available.
Basic settings
|
Property type |
Either Built-in or Repository |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database connection |
|
Use an existing connection |
Select this check box and in the Component Note:
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:
For an example about how to share a database connection across Job levels, see |
|
Host |
Hostname or IP address of the database server. |
|
Port |
Listening port number of the database server. |
|
Database |
Database name. |
|
Schema |
Exact name of the schema. |
|
Username and Password |
Database user authentication data. To enter the password, click the […] button next to the |
|
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
|
Table |
Name of the table to which the data will be written. Note that only |
|
Action on table |
On the table defined, you can perform one of the following
None: No operation is carried
Drop and create a table: The table is
Create a table: The table does not
Create a table if not exists: The table
Drop a table if exists and create: The
Clear a table: The table content is |
|
Action on data |
On the data of the table defined, you can perform:
Insert: Add new entries to the table.
Update: Make changes to existing Insert or update: Insert a new record. If Update or insert: Update the record with the
Delete: Remove entries corresponding to Warning:
It is necessary to specify at least one |
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to This component offers the This dynamic schema |
|
|
Built-In: You create and store the |
|
|
Repository: You have already created When the schema to be reused has default values that are integers or You can find more details about how to verify default |
|
|
Click Edit schema to make changes to the schema.
|
|
Die on error |
This check box is selected by default. Clear the check box to skip the |
Advanced settings
|
Extend Insert |
Select this check box to carry out a bulk insert of a defined set of
Number of rows per insert: enter the Note:
This option is not compatible with the Reject link. You should therefore clear the check |
|
Use Batch |
Select this check box to activate the batch mode for data processing. Note:
This check box is available only when you have selected the |
|
Batch Size |
Specify the number of records to be processed in each batch.. This field appears only when the Use batch mode |
|
Commit every |
Enter the number of rows to be completed before committing batches of |
|
Additional Columns |
This option is not offered if you create (with or without drop) the DB |
|
|
Name: Type in the name of the schema |
|
|
SQL expression: Type in the SQL |
|
|
Position: Select Before, Replace or After |
|
|
Reference column: Type in a column of |
|
Use field options |
Select this check box to customize a request, especially when there is |
|
tStat |
Select this check box to collect log data at the component |
|
Enable parallel execution |
Select this check box to perform high-speed data processing, by treating multiple data flows
simultaneously. Note that this feature depends on the database or the application ability to handle multiple inserts in parallel as well as the number of CPU affected. In the Number of parallel executions field, either:
Note that when parallel execution is enabled, it is not possible to use global variables to
|
Global Variables
|
Global Variables |
NB_LINE: the number of rows processed. This is an After
NB_LINE_UPDATED: the number of rows updated. This is an
NB_LINE_INSERTED: the number of rows inserted. This is an
NB_LINE_DELETED: the number of rows deleted. This is an
NB_LINE_REJECTED: the number of rows rejected. This is an
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
|
Usage rule |
This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data of a table in an Amazon Redshift database. It enables you to create a reject flow, with a Row > Rejects link filtering the data in error. For a usage example, see Scenario: Retrieving data in error with a Reject link. |
|
Dynamic settings |
Click the [+] button to add a The Dynamic settings table is For examples on using dynamic parameters, see Scenario: Reading data from databases through context-based dynamic connections and Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters. For more information on Dynamic |
| Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
Related scenarios
For a related scenario, see Scenario: Handling data with Redshift.
tRedshiftOutput properties for Apache Spark Batch
These properties are used to configure tRedshiftOutput running in the Spark Batch Job framework.
The Spark Batch
tRedshiftOutput component belongs to the Databases family.
The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data.
Basic settings
|
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
|
|
Click this icon to open a database connection wizard and store the database connection For more information about setting up and storing database connection parameters, see |
|
Use an existing connection |
Select this check box and in the Component |
|
Host |
Enter the endpoint of the database you need to connect to in Redshift. |
|
Port |
Enter the port number of the database you need to connect to in Redshift. The related information can be found in the Cluster Database Properties area in the Web For further information, see Managing clusters console. |
|
Username and Password |
Enter the authentication information to the Redshift database you need to connect To enter the password, click the […] button next to the |
|
Database |
Enter the name of the database you need to connect to in Redshift. The related information can be found in the Cluster Database Properties area in the Web For further information, see Managing clusters console. |
|
Schema |
Enter the name of the database schema to be used in Redshift. The default schema is called A schema in terms of Redshift is similar to a operating system directory. For further |
|
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
|
S3 configuration |
Select the tS3Configuration component You need drop the tS3Configuration |
|
S3 temp path |
Enter the location in S3 in which the data to be transferred from or to This path is independent of the temporary path you need to set in the |
|
Table |
Enter the name of the table to which the data will be written. Note that only one table If this table does not exist, you need to select Create |
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to |
|
|
Built-In: You create and store the |
|
|
Repository: You have already created |
|
|
Click Edit schema to make changes to the schema.
|
|
Save mode |
Select the actions you want tRedshiftOutput to perform on
|
Advanced settings
|
Distribution style |
Select the distribution style to be applied by tRedshiftOutput on the data to be written. For further information about each of the distribution style, see Distribution styles. |
||
|
Define sort key |
Select this check box to sort the data to be written based on given columns of the data. Once selecting it, you need to select the column(s) to be used as the key(s) of the sort. |
||
|
Use staging table |
Select the Use staging table check box to make tRedshiftOutput create and write data in a staging table and upon This feature is available only when you have selected Overwrite from the Save mode list and is |
||
|
Define pre-actions |
Select this check box and in the field that is displayed, add a semicolon-separated(;) For example, using the following statement, you remove all of the rows from the Movie table that meet the condition over the Movie and the Director
tables.
|
||
|
Define post-actions |
Select this check box and in the field that is displayed, add a semicolon-separated(;) For example, using the following statement, you grant the Select privilege on the
Movie table to the user ychen.
|
||
|
Define extra copy options |
Select this check box and in the field that is displayed, add a semicolon-separated(;)
tRedshiftOutput uses the Copy statement of Redshift SQL For further information about the extra options you can choose, see Optional parameters. |
Usage
|
Usage rule |
This component is used as an end component and requires an input link. This component should use a tRedshiftConfiguration This component, along with the Spark Batch component Palette it belongs to, appears only Note that in this documentation, unless otherwise |
|
Spark Connection |
You need to use the Spark Configuration tab in
the Run view to define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Batch Job, see Writing and reading data from MongoDB using a Spark Batch Job.
tRedshiftOutput properties for Apache Spark Streaming
These properties are used to configure tRedshiftOutput running in the Spark Streaming Job framework.
The Spark Streaming
tRedshiftOutput component belongs to the Databases family.
The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.
Basic settings
|
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
|
|
Click this icon to open a database connection wizard and store the database connection For more information about setting up and storing database connection parameters, see |
|
Use an existing connection |
Select this check box and in the Component |
|
Host |
Enter the endpoint of the database you need to connect to in Redshift. |
|
Port |
Enter the port number of the database you need to connect to in Redshift. The related information can be found in the Cluster Database Properties area in the Web For further information, see Managing clusters console. |
|
Username and Password |
Enter the authentication information to the Redshift database you need to connect To enter the password, click the […] button next to the |
|
Database |
Enter the name of the database you need to connect to in Redshift. The related information can be found in the Cluster Database Properties area in the Web For further information, see Managing clusters console. |
|
Schema |
Enter the name of the database schema to be used in Redshift. The default schema is called A schema in terms of Redshift is similar to a operating system directory. For further |
|
Additional JDBC Parameters |
Specify additional JDBC properties for the connection you are creating. The |
|
S3 configuration |
Select the tS3Configuration component You need drop the tS3Configuration |
|
S3 temp path |
Enter the location in S3 in which the data to be transferred from or to This path is independent of the temporary path you need to set in the |
|
Table |
Enter the name of the table to which the data will be written. Note that only one table If this table does not exist, you need to select Create |
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to |
|
|
Built-In: You create and store the |
|
|
Repository: You have already created |
|
|
Click Edit schema to make changes to the schema.
|
|
Save mode |
Select the actions you want tRedshiftOutput to perform on
|
Advanced settings
|
Distribution style |
Select the distribution style to be applied by tRedshiftOutput on the data to be written. For further information about each of the distribution style, see Distribution styles. |
||
|
Define sort key |
Select this check box to sort the data to be written based on given columns of the data. Once selecting it, you need to select the column(s) to be used as the key(s) of the sort. |
||
|
Use staging table |
Select the Use staging table check box to make tRedshiftOutput create and write data in a staging table and upon This feature is available only when you have selected Overwrite from the Save mode list and is |
||
|
Define pre-actions |
Select this check box and in the field that is displayed, add a semicolon-separated(;) For example, using the following statement, you remove all of the rows from the Movie table that meet the condition over the Movie and the Director
tables.
|
||
|
Define post-actions |
Select this check box and in the field that is displayed, add a semicolon-separated(;) For example, using the following statement, you grant the Select privilege on the
Movie table to the user ychen.
|
||
|
Define extra copy options |
Select this check box and in the field that is displayed, add a semicolon-separated(;)
tRedshiftOutput uses the Copy statement of Redshift SQL For further information about the extra options you can choose, see Optional parameters. |
Usage
|
Usage rule |
This component is used as an end component and requires an input link. This component should use a tRedshiftConfiguration This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Spark Connection |
You need to use the Spark Configuration tab in
the Run view to define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Streaming Job, see
Reading and writing data in MongoDB using a Spark Streaming Job.