tGoogleDataprocManage

Creates or deletes a Dataproc cluster in the Global region on Google Cloud
Platform.

tGoogleDataprocManage Standard properties

These properties are used to configure tGoogleDataprocManage running in the Standard Job framework.

The Standard
tGoogleDataprocManage component belongs to the Cloud family.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Project identifier	Enter the ID of your Google Cloud Platform project. If you are not certain about your project ID, check it in the Manage Resources page of your Google Cloud Platform services.
Cluster identifier	Enter the ID of your Dataproc cluster to be used.
Provide Google Credentials in file	Leave this check box clear, when you launch your Job from a given machine in which Google Cloud SDK has been installed and authorized to use your user account credentials to access Google Cloud Platform. In this situation, this machine is often your local machine. When you launch your Job from a remote machine, such as a Jobserver, select this check box and in the Path to Google Credentials file field that is displayed, enter the directory in which this JSON file is stored in the Jobserver machine. For further information about this Google Credentials file, see the administrator of your Google Cloud Platform or visit Google Cloud Platform Auth Guide.
Action	Select the action you want tGoogleDataprocManage to perform on the your cluster: `Start` to create a cluster `Stop` to destroy a cluster
Version	Select the version of the image to be used to create a Dataproc cluster.
Region	From this drop-down list, select the Google Cloud region to be used.
Zone	Select the geographic zone in which the computing resources are used and your data is stored and processed. The available zones vary depending on the region you have selected from the Regional drop-down list. A zone in terms of Google Cloud is an isolated location within a region, another geographical term employed by Google Cloud.
Instance configuration	Enter the parameters to determine how many masters and workers to be used by the Dataproc cluster to be created and the performance of these masters and workers.

Advanced settings

Wait for cluster ready	Select this check box to keep this component running until the cluster is completely set up. When you clear this check box, this component stops running immediately after sending the command to create.
Master disk size	Enter a number without quotation marks to determine the size of the disk of each master instance.
Master local SSD	Enter a number without quotation marks to determine the number of local solid-state drive (SSD) storage devices to be added to each master instance. According to Google, these local SSDs are suitable only for temporary storage such as caches, processing space or low value data. It is recommended to store important data to durable storage options of Google. For further information about the Google storage options, see Durable storage options.
Worker disk size	Enter a number without quotation marks to determine the size of the disk of each worker instance.
Worker local SSD	Enter a number without quotation marks to determine the number of local solid-state drive (SSD) storage devices to be added to each worker instance. According to Google, these local SSDs are suitable only for temporary storage such as caches, processing space or low value data. It is recommended to store important data to durable storage options of Google. For further information about the Google storage options, see Durable storage options.
Network or Subnetwork	Select either check box to use a Google Compute Engine network or subnetwork for the cluster to be created to enable intra-cluster communications. As Google does not allow network and subnetwork to be used concurrently, selecting one check box hides the other check box. For further information about Google Dataproc cluster network configuration, see Dataproc Network.
Initialization action	In this table, select the initialization actions that are available in the shared bucket on Google Cloud Storage to run on all the nodes in your Dataproc cluster immediately after this cluster is set up. If you need to use custom initialization scripts, upload them to this shared Google bucket so that tGoogleDataprocManage can read them. In the Executable file column, enter the Google Cloud Storage URI to these scripts to be used, for example gs://dataproc-initialization-actions/MyScript In the Executable timeout column, enter the amount of time within double quotation marks to determine the duration of the execution. If the executable is not completed at the end of this timeout, an explanatory error message is returned. The value is a string with up to nine fractional digits, for example, “3.5s” for 3.5 seconds. For further information about this shared bucket and the initialization actions, see Initialization actions.
tStatCatcher Statistics	Select this check box to collect log data at the component level.

Usage

Usage rule	This component is used standalone in a subJob.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 7.x

0 Comments

Inline Feedbacks

View all comments

tGoogleDataprocManage – Docs for ESB 7.x

tGoogleDataprocManage

tGoogleDataprocManage Standard properties

Basic settings

Advanced settings

Usage

My Website Links

Tags