tAmazonEMRManage
tAmazonEMRManage Standard properties
These properties are used to configure tAmazonEMRManage
running in the Standard Job framework.
The Standard
tAmazonEMRManage component belongs to the Cloud family.
The component in this framework is available in all Talend
products.
Basic settings
Access key and Secret key |
Specify the access keys (the access key ID in the Access To enter the secret key, click the […] button next to |
Inherit credentials from AWS role |
Select this check box to leverage the instance profile credentials. The credentials can |
Assume role |
If you temporarily need some access permissions associated |
Action |
Select an action to be performed from the list, either
Start or Stop.
|
Region |
Specify the AWS region by selecting a region name from |
Cluster name |
Enter the name of the cluster. |
Cluster version |
Select the version of the cluster. You can also select the Customize Version and This property is not available when the Customize Version and |
Application |
Select the applications to be installed on the You can also select the Customize Version and This property is available when an EMR version is |
Service role |
Enter the IAM (Identity and Access Management) role for |
Job flow role |
Enter the IAM role for the EC2 instances that Amazon EMR |
Enable log |
Select this check box to enable logging and in the field |
Use EC2 key pair |
Select this check box to associate an Amazon EC2 (Elastic |
Predicate |
Specify the cluster(s) that you want to stop:
This list is available only when Stop is selected from the Action list. |
Instance count |
Enter the number of Amazon EC2 instances to |
Master instance type |
Select the type of the master instance to initialize. |
Slave instance type |
Select the type of the slave instance to initialize. |
Advanced settings
STS Endpoint |
Select this check box and in the field displayed, specify the This check box is available only when the Assume role check box is selected. |
Wait for cluster ready |
Select this check box to let your Job wait until the |
Visible to all users |
Select this check box to make the cluster visible to all |
Termination Protect |
Select this check box to enable termination protection to |
Enable debug |
Select this check box to enable the debug mode. |
Customize Version and Application |
Select this check box to customize the version of the cluster and the
|
Subnet id |
Specify the identifier of the Amazon VPC (Virtual Private |
Availability Zone |
Specify the availability zone for your cluster’s EC2 |
Master security group |
Specify the security group for the master instance. |
Additional master security groups |
Specify additional security groups for the master |
Slave security group |
Specify the security group for the slave instances. |
Additional slave security groups |
Specify additional security groups for the slave |
Service Access Security Group |
Specify the identifier of the Amazon EC2 security group for the Amazon For how to create a private subnet to enable service access security |
Actions |
Specify the bootstrap actions associated with the
For more information about the bootstrap actions, see |
Steps |
Specify the job flow step(s) to be invoked on the cluster
For more information about the job flow steps, see StepConfig. |
Keep alive after steps complete |
Select this check box to keep the job flow alive after |
Wait for steps to complete |
Select this check box to let your Job wait until the job This check box is available only when the Wait for cluster ready check box is |
Properties |
Specify the classification and property information
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the Job level |
Global Variables
CLUSTER_FINAL_ID |
The ID of the cluster. This is an After variable and it returns a string. |
CLUSTER_FINAL_NAME |
The name of the cluster. This is an After variable and it returns a string. |
ERROR_MESSAGE |
The error message generated by the component when an error occurs. This is an After |
Usage
Usage rule |
tAmazonEMRManage is usually used |
Managing an Amazon EMR cluster
cluster.
Creating an Amazon EMR cluster management Job
Create a Job to start a new Amazon EMR cluster, then resize the
cluster, and finally list the ID and name information of the instance groups in the
cluster.
-
Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a tAmazonEMRListInstances component, and a tJava component by typing their names in the design workspace or
dropping them from the Palette. -
Link the tAmazonEMRManage component to
the tAmazonEMRResize component using a Trigger > OnSubjobOk
connection. -
Link the tAmazonEMRResize component to
the tAmazonEMRListInstances component using a
Trigger > OnSubjobOk connection. -
Link the tAmazonEMRListInstances
component to the tJava component using a Row > Iterate
connection.
Starting a new Amazon EMR cluster
component to start a new Amazon EMR cluster.
-
Double-click the tAmazonEMRManage
component to open its Basic settings view. -
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. - From the Action list, select Start to start a cluster.
-
Select the AWS region from the Region
drop-down list. In this example, it is Asia Pacific
(Tokyo). -
In the Cluster name field, enter the
name of the cluster to be started. In this example, it is
talend-doc-emr-cluster. -
From the Cluster version and Application drop-down list, select the version of the
cluster and the application to be installed on the cluster. -
Select the Enable log check box and in
the field displayed, specify the path to a folder in an S3 bucket where you want
Amazon EMR to write the log data. In this example, it is
s3://talend-doc-emr-bucket.
Resizing the Amazon EMR cluster by adding a new task instance group
Configure the tAmazonEMRResize
component to resize a running Amazon EMR cluster by adding a new task instance
group.
-
Double-click the tAmazonEMRResize
component to open its Basic settings view. -
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. -
From the Action drop-down list, select
Add task instance group to resize the cluster
by adding a new task instance group. -
In the Cluster id field, enter the ID
of the cluster to be resized. In this example, the returned value of the global
variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant global variable
from the list. -
In the Group name field, enter the
name of the task instance group to be added in the cluster. In this example, it is
talend-doc-instance-group. -
In the Instance count field, specify
the number of the instances to be created. -
From the Task instance type drop-down
list, select the type of the instances to be created.
Listing the instance groups in the Amazon EMR cluster
instance groups in a running cluster.
-
Double-click the tAmazonEMRListInstances component to open its Basic settings view.
-
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. -
Select the AWS region from the Region
drop-down list. In this example, it is Asia Pacific
(Tokyo). -
Clear the Filter master and core
instances check box to list all instance groups, including the
Master, Core, and Task type instance groups. -
In the Cluster id field, enter the ID
of the cluster for which to list the instance groups. In this example, the returned
value of the global variable CLUSTER_FINAL_ID of the previous
tAmazonEMRManage component is used. -
Double-click the tJava component to
open its Basic settings view. -
In the Code field, enter the following
code to print the ID and Name information of each instance group in the
cluster.1234System.out.println("===== Instance Group =====");System.out.println("Instance Group ID: " + (String)globalMap.get("tAmazonEMRListInstances_1_CURRENT_GROUP_ID"));System.out.println("Instance Group Name: " + (String)globalMap.get("tAmazonEMRListInstances_1_CURRENT_GROUP_NAME"));
Executing the Job to manage the Amazon EMR cluster
After setting up the Job and configuring the components used in the
Job for managing Amazon EMR cluster, you can then execute the Job and verify the Job
execution result.
-
Press Ctrl + S to save the Job and
then F6 to execute the Job.As shown above, the Job starts and resizes the Amazon EMR
cluster, and then lists all instance groups in the cluster. -
View the cluster details on the Amazon EMR Cluster List page to
validate the Job execution result.