tS3Configuration
Reuses the connection configuration to S3N or S3A in the same Job.
tS3Configuration provides S3N or S3A
connection information for the file system related components used in
the same Spark Job. The Spark cluster to be used reads this
configuration to eventually connect to S3N (S3 Native Filesystem) or
S3A.
Depending on the Talend solution you
are using, this component can be used in one, some or all of the following Job
frameworks:
-
Spark Batch: see tS3Configuration properties for Apache Spark Batch.
The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data. -
Spark Streaming: see tS3Configuration properties for Apache Spark Streaming.
The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.
tS3Configuration properties for Apache Spark Batch
These properties are used to configure tS3Configuration running in the Spark Batch Job framework.
The Spark Batch
tS3Configuration component belongs to the Storage family.
The component in this framework is available only if you have subscribed to one
of the
Talend
solutions with Big Data.
Basic settings
Access Key |
Enter the access key ID that uniquely identifies an AWS Account. For |
Access Secret |
Enter the secret access key, constituting the security credentials in To enter the secret key, click the […] button next to |
Bucket name |
Enter the bucket name and its folder you need to use. You need to separate |
Temp folder |
Enter the location of the temp folder in S3. This folder will be |
Use s3a filesystem |
Select this check box to use the S3A filesystem instead of S3N, the This feature is available when you are using one of the following
distributions with Spark:
|
Set region |
Select this check box and select the region to connect to. This feature is available when you are using one of the following
distributions with Spark:
|
Set endpoint |
Select this check box and in the Endpoint field that is displayed, enter the Amazon region endpoint If you leave this check box clear, the endpoint will be the default one This feature is available when you are using one of the following
distributions with Spark:
|
Usage
Usage rule |
This component is used with no need to be connected to other components. You need to drop tS3Configuration This component, along with the Spark Batch component Palette it belongs to, appears only But only one tS3Configuration component is allowed per |
Spark Connection |
You need to use the Spark Configuration tab in
the Run view to define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
Related scenarios
For a scenario about how to use the same type of component in a Spark Batch Job, see Writing and reading data from MongoDB using a Spark Batch Job.
tS3Configuration properties for Apache Spark Streaming
These properties are used to configure tS3Configuration running in the Spark Streaming Job framework.
The Spark Streaming
tS3Configuration component belongs to the Storage family.
The component in this framework is available only if you have subscribed to Talend Real-time Big Data Platform or Talend Data
Fabric.
Basic settings
Access Key |
Enter the access key ID that uniquely identifies an AWS Account. For |
Access Secret |
Enter the secret access key, constituting the security credentials in To enter the secret key, click the […] button next to |
Bucket name |
Enter the bucket name and its folder you need to use. You need to separate |
Use s3a filesystem |
Select this check box to use the S3A filesystem instead of S3N, the This feature is available when you are using one of the following
distributions with Spark:
|
Set region |
Select this check box and select the region to connect to. This feature is available when you are using one of the following
distributions with Spark:
|
Set endpoint |
Select this check box and in the Endpoint field that is displayed, enter the Amazon region endpoint If you leave this check box clear, the endpoint will be the default one This feature is available when you are using one of the following
distributions with Spark:
|
Usage
Usage rule |
This component is used with no need to be connected to other components. You need to drop tS3Configuration This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
You need to use the Spark Configuration tab in
the Run view to define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
Related scenarios
For a scenario about how to use the same type of component in a Spark Streaming Job, see
Reading and writing data in MongoDB using a Spark Streaming Job.