July 30, 2023

tAmazonEMRManage – Docs for ESB 7.x

tAmazonEMRManage

Launches or terminates a cluster on Amazon EMR (Elastic MapReduce).

tAmazonEMRManage Standard properties

These properties are used to configure tAmazonEMRManage
running in the Standard Job framework.

The Standard
tAmazonEMRManage component belongs to the Cloud family.

The component in this framework is available in all Talend
products
.

Basic settings

Access key and Secret key

Specify the access keys (the access key ID in the Access
Key
field and the secret access key in the Secret
Key
field) required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key ID and secret access key).

To enter the secret key, click the […] button next to
the secret key field, and then in the pop-up dialog box enter the password between double
quotes and click OK to save the settings.

Inherit credentials from AWS role

Select this check box to leverage the instance profile credentials. The credentials can
be used on Amazon EC2 instances or AWS ECS, and are delivered through the Amazon EC2
metadata service. To use this option, your Job must be running within Amazon EC2 or
other services that can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to Applications Running on
Amazon EC2 Instances
.

Assume role

If you temporarily need some access permissions associated
to an AWS IAM role that is not granted to your user account, select this check box to
assume that role. Then specify the values for the following parameters to create a new
assumed role session.

Action

Select an action to be performed from the list, either
Start or Stop.

  • Start: launch an
    Amazon EMR cluster.

  • Stop: terminate an
    Amazon EMR cluster.

Region

Specify the AWS region by selecting a region name from
the list or entering a region between double quotation marks (for
example “us-east-1”). For more information about
how to specify the AWS region, see Choose an AWS
Region
.

Cluster name

Enter the name of the cluster.

Cluster version

Select the version of the cluster.

You can also select the Customize Version and
Application
check box on the Advanced
settings
view to customize the cluster version
information.

This property is not available when the Customize Version and
Application
check box is selected.

Application

Select the applications to be installed on the
cluster.

You can also select the Customize Version and
Application
check box on the Advanced
settings
view to customize the applications
information.

This property is available when an EMR version is
selected from the Cluster version
list and the Customize Version and Application
check box is cleared.

Service role

Enter the IAM (Identity and Access Management) role for
the Amazon EMR service. The default role is EMR_DefaultRole. To use this default role, you must have
already created it.

Job flow role

Enter the IAM role for the EC2 instances that Amazon EMR
manages. The default role is EMR_EC2_DefaultRole.
To use this default role, you must have already created it.

Enable log

Select this check box to enable logging and in the field
displayed specify the path to a folder in an S3 bucket where you want
Amazon EMR to write the log data.

Use EC2 key pair

Select this check box to associate an Amazon EC2 (Elastic
Compute Cloud) key pair with the cluster and in the field displayed
enter the name of your EC2 key pair.

Predicate

Specify the cluster(s) that you want to stop:

  • All running clusters:
    all running clusters will be stopped.

  • All running clusters with
    predefined name
    : the running cluster with a
    given name will be stopped. In the Cluster name field
    displayed, you need to specify the name of the cluster to be
    stopped.

  • Running cluster with predefined
    id
    : the running cluster with a given ID will
    be stopped. In the Cluster
    id
    field displayed, you need to specify the
    ID of the cluster to be stopped.

This list is available only when Stop is selected from the Action list.

Instance count

Enter the number of Amazon EC2 instances to
initialize.

Master instance type

Select the type of the master instance to initialize.

Slave instance type

Select the type of the slave instance to initialize.

Advanced settings

STS Endpoint

Select this check box and in the field displayed, specify the
AWS Security Token Service endpoint, for example, sts.amazonaws.com, where session credentials are retrieved from.

This check box is available only when the Assume role check box is selected.

Wait for cluster ready

Select this check box to let your Job wait until the
launch of the cluster is completed.

Visible to all users

Select this check box to make the cluster visible to all
IAM users.

Termination Protect

Select this check box to enable termination protection to
prevent instances in the cluster from shutting down due to errors or
issues during processing.

Enable debug

Select this check box to enable the debug mode.

Customize Version and Application

Select this check box to customize the version of the cluster and the
applications to be installed on the cluster.

  • Cluster version: enter the version of the
    cluster.

  • Applications: click the
    [+] button below the table to add as
    many rows as needed, each row for an application, and specify
    the application by clicking the right side of the cell and
    selecting the application from the drop-down list displayed, or
    just entering the application name in the cell if it is not in
    the list.

Subnet id

Specify the identifier of the Amazon VPC (Virtual Private
Cloud) subnet where you want the job flow to launch.

Availability Zone

Specify the availability zone for your cluster’s EC2
instances.

Master security group

Specify the security group for the master instance.

Additional master security groups

Specify additional security groups for the master
instance and separate them with a comma, for example, gname1, gname2, gname3.

Slave security group

Specify the security group for the slave instances.

Additional slave security groups

Specify additional security groups for the slave
instances and separate them with a comma, for example, gname1, gname2, gname3.

Service Access Security Group

Specify the identifier of the Amazon EC2 security group for the Amazon
EMR service to access clusters in VPC private subnet.

For how to create a private subnet to enable service access security
group on Amazon EMR, see Scenario 2: VPC with Public and
Private Subnets (NAT)
.

Actions

Specify the bootstrap actions associated with the
cluster, by clicking the [+]
button below the table to add as many rows as needed, each row for a
bootstrap action, and setting the following parameters for each
action:

  • Name: enter the name of
    the bootstrap action.

  • Script location: specify
    the location of the script run by the bootstrap action, for
    example, s3://ap-northeast-1.elasticmapreduce/bootstrap-actions/run-if.

  • Arguments: enter the list
    of command line arguments (separated by commas) passed to the
    bootstrap action script, for example, “arg0″,”arg1″,”arg2”.

For more information about the bootstrap actions, see
BootstrapActionConfig.

Steps

Specify the job flow step(s) to be invoked on the cluster
after its launch, by clicking the [+] button below the table to add as many rows as
needed, each row for a step, and setting the following parameters for
each step:

  • Name: enter the name of
    the job flow step.

  • Action on Failure: click
    the cell and from the drop-down list select the action to take
    if the job flow step fails.

  • Main Class: enter the
    name of the main class in the specified Java file. If not
    specified, the JAR file should specify a Main-Class in its
    manifest file.

  • Jar: enter the path to
    the JAR file run during the step, for example, “s3://inputjar/test.jar”.

  • Args: enter the list of
    command line arguments (separated by commas) passed to the JAR
    file’s main function when executed, for example, “arg0″,”arg1″,”arg2”.

For more information about the job flow steps, see StepConfig.

Keep alive after steps complete

Select this check box to keep the job flow alive after
completing all steps.

Wait for steps to complete

Select this check box to let your Job wait until the job
flow steps are completed.

This check box is available only when the Wait for cluster ready check box is
selected.

Properties

Specify the classification and property information
supplied to the configuration object of the EMR cluster to be created,
by clicking the [+] button below
the table to add as many rows as needed, each row for a property, and
setting the following parameters:

  • Classification: specify
    the classification of the configuration.

  • Key: enter the key of the
    property.

  • Value: enter the value of
    the property.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

CLUSTER_FINAL_ID

The ID of the cluster. This is an After variable and it returns a string.

CLUSTER_FINAL_NAME

The name of the cluster. This is an After variable and it returns a string.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After
variable and it returns a string.

Usage

Usage rule

tAmazonEMRManage is usually used
as a standalone component.

Managing an Amazon EMR cluster

Here’s an example of using Talend components to manage an Amazon EMR
cluster.

Creating an Amazon EMR cluster management Job

Create a Job to start a new Amazon EMR cluster, then resize the
cluster, and finally list the ID and name information of the instance groups in the
cluster.

tAmazonEMRManage_1.png

  1. Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a tAmazonEMRListInstances component, and a tJava component by typing their names in the design workspace or
    dropping them from the Palette.
  2. Link the tAmazonEMRManage component to
    the tAmazonEMRResize component using a Trigger > OnSubjobOk
    connection.
  3. Link the tAmazonEMRResize component to
    the tAmazonEMRListInstances component using a
    Trigger > OnSubjobOk connection.
  4. Link the tAmazonEMRListInstances
    component to the tJava component using a Row > Iterate
    connection.

Starting a new Amazon EMR cluster

Configure the tAmazonEMRManage
component to start a new Amazon EMR cluster.

  1. Double-click the tAmazonEMRManage
    component to open its Basic settings view.

    tAmazonEMRManage_2.png

  2. In the Access Key and Secret Key fields, enter the authentication credentials
    required to access Amazon S3.
  3. From the Action list, select Start to start a cluster.
  4. Select the AWS region from the Region
    drop-down list. In this example, it is Asia Pacific
    (Tokyo)
    .
  5. In the Cluster name field, enter the
    name of the cluster to be started. In this example, it is
    talend-doc-emr-cluster.
  6. From the Cluster version and Application drop-down list, select the version of the
    cluster and the application to be installed on the cluster.
  7. Select the Enable log check box and in
    the field displayed, specify the path to a folder in an S3 bucket where you want
    Amazon EMR to write the log data. In this example, it is
    s3://talend-doc-emr-bucket.

Resizing the Amazon EMR cluster by adding a new task instance group

Configure the tAmazonEMRResize
component to resize a running Amazon EMR cluster by adding a new task instance
group.

  1. Double-click the tAmazonEMRResize
    component to open its Basic settings view.

    tAmazonEMRManage_3.png

  2. In the Access Key and Secret Key fields, enter the authentication credentials
    required to access Amazon S3.
  3. From the Action drop-down list, select
    Add task instance group to resize the cluster
    by adding a new task instance group.
  4. In the Cluster id field, enter the ID
    of the cluster to be resized. In this example, the returned value of the global
    variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.

    Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant global variable
    from the list.
  5. In the Group name field, enter the
    name of the task instance group to be added in the cluster. In this example, it is
    talend-doc-instance-group.
  6. In the Instance count field, specify
    the number of the instances to be created.
  7. From the Task instance type drop-down
    list, select the type of the instances to be created.

Listing the instance groups in the Amazon EMR cluster

Configure the tAmazonEMRListInstances component and the tJava component to retrieve and display the ID and name information of all
instance groups in a running cluster.

  1. Double-click the tAmazonEMRListInstances component to open its Basic settings view.

    tAmazonEMRManage_4.png

  2. In the Access Key and Secret Key fields, enter the authentication credentials
    required to access Amazon S3.
  3. Select the AWS region from the Region
    drop-down list. In this example, it is Asia Pacific
    (Tokyo)
    .
  4. Clear the Filter master and core
    instances
    check box to list all instance groups, including the
    Master, Core, and Task type instance groups.
  5. In the Cluster id field, enter the ID
    of the cluster for which to list the instance groups. In this example, the returned
    value of the global variable CLUSTER_FINAL_ID of the previous
    tAmazonEMRManage component is used.
  6. Double-click the tJava component to
    open its Basic settings view.

    tAmazonEMRManage_5.png

  7. In the Code field, enter the following
    code to print the ID and Name information of each instance group in the
    cluster.

Executing the Job to manage the Amazon EMR cluster

After setting up the Job and configuring the components used in the
Job for managing Amazon EMR cluster, you can then execute the Job and verify the Job
execution result.

  1. Press Ctrl + S to save the Job and
    then F6 to execute the Job.

    tAmazonEMRManage_6.png

    As shown above, the Job starts and resizes the Amazon EMR
    cluster, and then lists all instance groups in the cluster.
  2. View the cluster details on the Amazon EMR Cluster List page to
    validate the Job execution result.

    tAmazonEMRManage_7.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x