tAmazonEMRManage
tAmazonEMRManage Standard properties
These properties are used to configure tAmazonEMRManage running in the Standard Job framework.
The Standard
tAmazonEMRManage component belongs to the Cloud family.
The component in this framework is generally available.
Basic settings
Access key and Secret |
Specify the access keys (the access key ID in the Access To enter the secret key, click the […] button next to |
Inherit credentials from AWS role |
Select this check box to leverage the instance profile credentials. These |
Assume role |
Select this check box and specify the values for the following parameters used to
create a new assumed role session.
For more information about assuming roles, see AssumeRole. |
Action |
Select an action to be performed from the list, either Start or Stop.
|
Region |
Specify the AWS region by selecting a region name from the list or |
Cluster name |
Enter the name of the cluster. |
Cluster version |
Select the version of the cluster. |
Application |
Select the applications to be installed on the cluster. This list is available only when an EMR version is selected from the |
Service role |
Enter the IAM (Identity and Access Management) role for the Amazon EMR |
Job flow role |
Enter the IAM role for the EC2 instances that Amazon EMR manages. The |
Enable log |
Select this check box to enable logging and in the field displayed |
Use EC2 key pair |
Select this check box to associate an Amazon EC2 (Elastic Compute |
Predicate |
Specify the cluster(s) that you want to stop:
This list is available only when Stop |
Instance count |
Enter the number of Amazon EC2 instances to initialize. |
Master instance type |
Select the type of the master instance to initialize. |
Slave instance type |
Select the type of the slave instance to initialize. |
Advanced settings
STS Endpoint |
Select this check box and in the field displayed, specify the AWS Security Token This check box is available only when the Assume |
Wait for cluster ready |
Select this check box to let your Job wait until the launch of the |
Visible to all users |
Select this check box to make the cluster visible to all IAM |
Termination Protect |
Select this check box to enable termination protection to prevent |
Enable debug |
Select this check box to enable the debug mode. |
Subnet id |
Specify the identifier of the Amazon VPC (Virtual Private Cloud) |
Availability Zone |
Specify the availability zone for your cluster’s EC2 instances. |
Master security group |
Specify the security group for the master instance. |
Additional master security groups |
Specify additional security groups for the master instance and |
Slave security group |
Specify the security group for the slave instances. |
Additional slave security groups |
Specify additional security groups for the slave instances and |
Actions |
Specify the bootstrap actions associated with the cluster, by clicking
For more information about the bootstrap actions, see BootstrapActionConfig. |
Steps |
Specify the job flow step(s) to be invoked on the cluster after its
For more information about the job flow steps, see StepConfig. |
Keep alive after steps complete |
Select this check box to keep the job flow alive after completing all |
Wait for steps to complete |
Select this check box to let your Job wait until the job flow steps This check box is available only when the Wait |
Properties |
Specify the classification and property information supplied to the
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the Job |
Global Variables
Global Variables |
CLUSTER_FINAL_ID: the ID of the cluster. This is an After
CLUSTER_FINAL_NAME: the name of the cluster. This is an
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
tAmazonEMRManage is usually used as a |
Managing an Amazon EMR cluster
cluster.
Creating an Amazon EMR cluster management Job
Create a Job to start a new Amazon EMR cluster, then resize the
cluster, and finally list the ID and name information of the instance groups in the
cluster.

-
Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a tAmazonEMRListInstances component, and a tJava component by typing their names in the design workspace or
dropping them from the Palette. -
Link the tAmazonEMRManage component to
the tAmazonEMRResize component using a Trigger > OnSubjobOk
connection. -
Link the tAmazonEMRResize component to
the tAmazonEMRListInstances component using a
Trigger > OnSubjobOk connection. -
Link the tAmazonEMRListInstances
component to the tJava component using a Row > Iterate
connection.
Starting a new Amazon EMR cluster
component to start a new Amazon EMR cluster.
-
Double-click the tAmazonEMRManage
component to open its Basic settings view. -
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. - From the Action list, select Start to start a cluster.
-
Select the AWS region from the Region
drop-down list. In this example, it is Asia Pacific
(Tokyo). -
In the Cluster name field, enter the
name of the cluster to be started. In this example, it is
talend-doc-emr-cluster. -
From the Cluster version and Application drop-down list, select the version of the
cluster and the application to be installed on the cluster. -
Select the Enable log check box and in
the field displayed, specify the path to a folder in an S3 bucket where you want
Amazon EMR to write the log data. In this example, it is
s3://talend-doc-emr-bucket.
Resizing the Amazon EMR cluster by adding a new task instance group
Configure the tAmazonEMRResize
component to resize a running Amazon EMR cluster by adding a new task instance
group.
-
Double-click the tAmazonEMRResize
component to open its Basic settings view. -
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. -
From the Action drop-down list, select
Add task instance group to resize the cluster
by adding a new task instance group. -
In the Cluster id field, enter the ID
of the cluster to be resized. In this example, the returned value of the global
variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant global variable
from the list. -
In the Group name field, enter the
name of the task instance group to be added in the cluster. In this example, it is
talend-doc-instance-group. -
In the Instance count field, specify
the number of the instances to be created. -
From the Task instance type drop-down
list, select the type of the instances to be created.
Listing the instance groups in the Amazon EMR cluster
instance groups in a running cluster.
-
Double-click the tAmazonEMRListInstances component to open its Basic settings view.
-
In the Access Key and Secret Key fields, enter the authentication credentials
required to access Amazon S3. -
Select the AWS region from the Region
drop-down list. In this example, it is Asia Pacific
(Tokyo). -
Clear the Filter master and core
instances check box to list all instance groups, including the
Master, Core, and Task type instance groups. -
In the Cluster id field, enter the ID
of the cluster for which to list the instance groups. In this example, the returned
value of the global variable CLUSTER_FINAL_ID of the previous
tAmazonEMRManage component is used. -
Double-click the tJava component to
open its Basic settings view. -
In the Code field, enter the following
code to print the ID and Name information of each instance group in the
cluster.1234System.out.println("===== Instance Group =====");System.out.println("Instance Group ID: " + (String)globalMap.get("tAmazonEMRListInstances_1_CURRENT_GROUP_ID"));System.out.println("Instance Group Name: " + (String)globalMap.get("tAmazonEMRListInstances_1_CURRENT_GROUP_NAME"));
Executing the Job to manage the Amazon EMR cluster
After setting up the Job and configuring the components used in the
Job for managing Amazon EMR cluster, you can then execute the Job and verify the Job
execution result.
-
Press Ctrl + S to save the Job and
then F6 to execute the Job.As shown above, the Job starts and resizes the Amazon EMR
cluster, and then lists all instance groups in the cluster. -
View the cluster details on the Amazon EMR Cluster List page to
validate the Job execution result.