Property type
|
Either Built-In or Repository.
Built-In: No property data stored centrally.
Repository: Select the repository file where the
properties are stored.
|
Distribution
|
Select the cluster you are using from the drop-down list. The options in the
list vary depending on the component you are using. Among these options, the following
ones requires specific configuration:
-
If available in this Distribution drop-down list, the Microsoft HD Insight option allows you to
use a Microsoft HD Insight cluster. For this purpose, you need to configure
the connections to the WebHCat service, the HD Insight service and the
Windows Azure Storage service of that cluster in the areas that are
displayed. A demonstration video about how to configure this connection is
available in the following link: https://www.youtube.com/watch?v=A3QTT6VsNoM.
-
If you select Amazon
EMR, you can find more details about how
to configure an Amazon EMR cluster in Talend Help Center (https://help.talend.com).
-
The Custom option
allows you to connect to a cluster different from any of the distributions
given in this list, that is to say, to connect to a cluster not officially
supported by
Talend
.
-
Select Import from existing
version to import an officially supported distribution as base
and then add other required jar files which the base distribution does not
provide.
-
Select Import from zip to
import the configuration zip for the custom distribution to be used. This zip
file should contain the libraries of the different Hadoop elements and the index
file of these libraries.
In
Talend
Exchange, members of
Talend
community have shared some ready-for-use configuration zip files
which you can download from this Hadoop configuration
list and directly use them in your connection accordingly. However, because of
the ongoing evolution of the different Hadoop-related projects, you might not be
able to find the configuration zip corresponding to your distribution from this
list; then it is recommended to use the Import from
existing version option to take an existing distribution as base
to add the jars required by your distribution.
Note that custom versions are not officially supported by
Talend
.
Talend
and its community provide you with the opportunity to connect to
custom versions from the Studio but cannot guarantee that the configuration of
whichever version you choose will be easy, due to the wide range of different
Hadoop distributions and versions that are available. As such, you should only
attempt to set up such a connection if you have sufficient Hadoop experience to
handle any issues on your own.
Note:
In this dialog box, the active check box must be kept
selected so as to import the jar files pertinent to the connection to be
created between the custom distribution and this component.
For a step-by-step example about how to connect to a custom
distribution and share this connection, see Connecting to a custom Hadoop distribution.
|
Hadoop version
|
Select the version of the Hadoop distribution you are using. The available
options vary depending on the component you are using. Along with the evolution of
Hadoop, please note the following changes:
-
If you use Hortonworks Data
Platform V2.2, the configuration files of your cluster might
be using environment variables such as ${hdp.version}. If this is your situation, you need to set the mapreduce.application.framework.path property in
the Hadoop properties table of this
component with the path value explicitly pointing to the MapReduce framework
archive of your cluster. For
example:
|
mapreduce.application.framework.path=/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz#mr-framework |
-
If you use Hortonworks Data
Platform V2.0.0, the type of the operating system for
running the distribution and a
Talend
Job must be the same, such as Windows or Linux. Otherwise, you
have to use
Talend
Jobserver to execute the Job in the same type of operating
system in which the Hortonworks Data Platform
V2.0.0 distribution you are using is run.
|
Use kerberos authentication
|
If you are accessing the Hadoop cluster running
with Kerberos security, select this check box, then, enter the Kerberos
principal name for the NameNode in the field displayed. This enables you to use
your user name to authenticate against the credentials stored in Kerberos.
-
If this cluster is a MapR cluster of the version 4.0.1 or later, you can set the MapR
ticket authentication configuration in addition or as an alternative by following the
explanation in Connecting to a security-enabled MapR.
Keep in mind that this configuration generates a new MapR security ticket for the username
defined in the Job in each execution. If you need to reuse an existing ticket issued for the
same username, leave both the Force MapR ticket
authentication check box and the Use Kerberos
authentication check box clear, and then MapR should be able to automatically
find that ticket on the fly.
This check box is available depending on the Hadoop distribution you are
connecting to.
|
Use a keytab to
authenticate
|
Select the Use a keytab to authenticate
check box to log into a Kerberos-enabled system using a given keytab file. A keytab
file contains pairs of Kerberos principals and encrypted keys. You need to enter the
principal to be used in the Principal field and
the access path to the keytab file itself in the Keytab field. This keytab file must be stored in the machine in
which your Job actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not necessarily
the one a principal designates but must have the right to read the keytab file being
used. For example, the user name you are using to execute a Job is user1 and the principal to be used is guest; in this
situation, ensure that user1 has the right to read the keytab
file to be used.
|
NameNode URI
|
Type in the URI of the Hadoop NameNode, the master node of a Hadoop system. For
example, we assume that you have chosen a machine called masternode as the NameNode, then the location is hdfs://masternode:portnumber. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; if this WebHDFS is secured
with SSL, the scheme should be swebhdfs and you need to use
a tLibraryLoad in the Job to load the library required by
the secured WebHDFS.
|
User name
|
The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty, the user name of the machine
hosting the Studio will be used.
|
Group
|
Enter the membership including the authentication user under which the HDFS instances were
started. This field is available depending on the distribution you are using.
|
Use datanode hostname
|
Select the Use datanode
hostname check box to allow the Job to access datanodes via
their hostnames. This actually sets the dfs.client.use.datanode.hostname property to true. When connecting to a S3N filesystem, you must select this check
box.
|
Hadoop properties
|
Talend Studio
uses a default configuration for its engine to perform
operations in a Hadoop distribution. If you need to use a custom configuration in a specific
situation, complete this table with the property or properties to be customized. Then at
runtime, the customized property or properties will override those default ones.
For further information about the properties required by Hadoop and its related systems such
as HDFS and Hive, see the documentation of the Hadoop distribution you
are using or see Apache’s Hadoop documentation on http://hadoop.apache.org/docs and then select the version of the documentation you want. For demonstration purposes, the links to some properties are listed below:
|
Setup HDFS encryption configurations
|
If the HDFS transparent encryption has been enabled in your cluster, select
the Setup HDFS encryption configurations check
box and in the HDFS encryption key provider field
that is displayed, enter the location of the KMS proxy.
For further information about the HDFS transparent encryption and its KMS proxy, see Transparent Encryption in HDFS.
|