tHDFSConfiguration properties for Apache Spark Batch
These properties are used to configure tHDFSConfiguration running in the Spark Batch Job framework.
The Spark Batch
tHDFSConfiguration component belongs to the Storage family.
Either Built-In or Repository.
Built-In: No property data stored centrally.
Repository: Select the repository file where the
Select the cluster you are using from the drop-down list. The options in the
list vary depending on the component you are using. Among these options, the following
ones requires specific configuration:
Select the version of the Hadoop distribution you are using. The available
Use kerberos authentication
If you are accessing the Hadoop cluster running
with Kerberos security, select this check box, then, enter the Kerberos
principal name for the NameNode in the field displayed. This enables you to use
your user name to authenticate against the credentials stored in Kerberos.
This check box is available depending on the Hadoop distribution you are
Use a keytab to
Select the Use a keytab to authenticate
Note that the user that executes a keytab-enabled Job is not necessarily
Type in the URI of the Hadoop NameNode, the master node of a
The User name field is available when you are not using
Enter the membership including the authentication user under which the HDFS instances were
Use datanode hostname
Select the Use datanode hostname check box to allow the
uses a default configuration for its engine to perform
operations in a Hadoop distribution. If you need to use a custom configuration in a specific
situation, complete this table with the property or properties to be customized. Then at
runtime, the customized property or properties will override those default ones.
For further information about the properties required by Hadoop and its related systems such
as HDFS and Hive, see the documentation of the Hadoop distribution you
are using or see Apache’s Hadoop documentation on http://hadoop.apache.org/docs and then select the version of the documentation you want. For demonstration purposes, the links to some properties are listed below:
Setup HDFS encryption configurations
If the HDFS transparent encryption has been enabled in your cluster, select
For further information about the HDFS transparent encryption and its KMS proxy, see Transparent Encryption in HDFS.
This component is used with no need to be connected to other
You need to drop tHDFSConfiguration along with the file
This component, along with the Spark Batch component Palette it belongs to,
Note that in this documentation, unless otherwise explicitly stated, a
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis.
Specific Spark timeout
When encountering network issues, Spark by
Add the following properties to