tHBaseOutput
Writes columns of data into a given HBase database.
tHBaseOutput receives
data from its preceding component, creates a table in a given HBase database and writes the
received data into this HBase table.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tHBaseOutput Standard properties.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric. -
MapReduce: see tHBaseOutput MapReduce properties (deprecated).
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Batch: see tHBaseOutput properties for Apache Spark Batch.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming: see tHBaseOutput properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tHBaseOutput Standard properties
These properties are used to configure tHBaseOutput running in the Standard Job framework.
The Standard
tHBaseOutput component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
|
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to |
Distribution |
Select the cluster you are using from the drop-down list. The options in the
list vary depending on the component you are using. Among these options, the following ones requires specific configuration:
|
HBase version |
Select the version of the Hadoop distribution you are using. The available |
Hadoop version of the distribution |
This list is displayed only when you have selected Custom from the distribution list to connect to a cluster not yet |
Zookeeper quorum |
Type in the name or the URL of the Zookeeper service you use to coordinate the transaction |
Zookeeper client port |
Type in the number of the client listening port of the Zookeeper service you are |
Use kerberos authentication |
If the database to be used is running with Kerberos security, select this
check box, then, enter the principal names in the displayed fields. You should be able to find the information in the hbase-site.xml file of the cluster to be used.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains Note that the user that executes a keytab-enabled Job is not necessarily |
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the When the schema to be reused has default values that are You can find more details about how to |
Set table Namespace mappings |
Enter the string to be used to construct the mapping between an Apache HBase table and a For the valid syntax you can use, see http://doc.mapr.com/display/MapR40x/Mapping+Table+Namespace+Between+Apache+HBase+Tables+and+MapR+Tables. |
Table name |
Type in the name of the HBase table you need create. |
Action on table |
Select the action you need to take for creating an HBase |
Custom Row Key |
Select this check box to use the customized row keys. Once For example, you can type in |
Families |
Complete this table to map the columns of the table to be used with the schema columns you The Column column of this table is automatically filled |
Custom timestamp column |
Select a Long column from your schema to provide |
Die on error |
This check box is cleared by default, meaning to skip the row on |
Advanced settings
Use batch mode |
Select this check box to activate the batch mode for data processing. |
Batch size |
Specify the number of records to be processed in each batch. This field appears only when the Use batch mode |
Properties |
If you need to use custom configuration for your database, complete this table with the For example, you need to define the value of the dfs.replication property as 1 for the Note:
This table is not available when you are using an existing |
tStatCatcher Statistics |
Select this check box to collect log data at the component |
Family parameters |
Type in the names and, when needs be, the custom performance Note: The parameter Compression type allows you to select the
format for output data compression. |
Global Variables
Global Variables |
NB_LINE: the number of rows read by an input component or
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is normally an end component of a Job and always |
Prerequisites |
Before starting, ensure that you have met the Loopback IP prerequisites expected by your The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
Related scenario
For related scenario to the Standard version of tHBaseOutput, see Exchanging customer data with HBase.
tHBaseOutput MapReduce properties (deprecated)
These properties are used to configure tHBaseOutput running in the MapReduce Job framework.
The MapReduce
tHBaseOutput component belongs to the MapReduce and the Databases families.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
The MapReduce framework is deprecated from Talend 7.3 onwards. Use Talend Jobs for Apache Spark to accomplish your integration tasks.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
Distribution |
Select the cluster you are using from the drop-down list. The options in the
list vary depending on the component you are using. Among these options, the following ones requires specific configuration:
In the Map/Reduce version of this component, the distribution you |
HBase version |
Select the version of the Hadoop distribution you are using. The available |
Zookeeper quorum |
Type in the name or the URL of the Zookeeper service you use to coordinate the transaction |
Zookeeper client port |
Type in the number of the client listening port of the Zookeeper service you are |
Use kerberos authentication |
If the database to be used is running with Kerberos security, select this
check box, then, enter the principal names in the displayed fields. You should be able to find the information in the hbase-site.xml file of the cluster to be used.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains Note that the user that executes a keytab-enabled Job is not necessarily |
Schema et Edit |
A schema is a row description. It defines the number of fields Click Edit
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Table name |
Type in the name of the HBase table in which you need to write |
Row key column |
Select the column used as the row key column of the HBase Then if needs be, select the Store row key |
Families |
Complete this table to map the columns of the table to be used with the schema columns you The Column column of this table is automatically filled |
Advanced settings
Use batch mode |
Select this check box to activate the batch mode for data processing. |
Batch size |
Specify the number of records to be processed in each batch. This field appears only when the Use batch mode |
Properties |
If you need to use custom configuration for your database, complete this table with the For example, you need to define the value of the dfs.replication property as 1 for the |
Use local timezone for date | Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
In a The Hadoop configuration you use for the whole Job and the Hadoop distribution you use for Once a Map/Reduce Job is opened in the workspace, tHBaseOutput as well as the MapReduce Note that in this documentation, unless otherwise |
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
Prerequisites |
Before starting, ensure that you have met the Loopback IP prerequisites expected by your The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
Related scenarios
No scenario is available for the Map/Reduce version of this component yet.
tHBaseOutput properties for Apache Spark Batch
These properties are used to configure tHBaseOutput running in the Spark Batch Job framework.
The Spark Batch
tHBaseOutput component belongs to the Databases family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Storage configuration |
Select the tHBaseConfiguration component from which the |
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
|
Schema et Edit |
A schema is a row description. It defines the number of fields Click Edit
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Table name |
Type in the name of the HBase table in which you need to write |
Row key column |
Select the column used as the row key column of the HBase Then if needs be, select the Store row |
Custom Row Key |
Select this check box to use the customized row keys. Once For example, you can type in |
Families |
Complete this table to map the columns of the table to be used with the schema columns you The Column column of this table is automatically filled |
Advanced settings
Use batch mode |
Select this check box to activate the batch mode for data processing. |
Batch size |
Specify the number of records to be processed in each batch. This field appears only when the Use batch mode |
Use local timezone for date | Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data. |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component uses a tHBaseConfiguration component present in the same Job to connect to This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Batch Job, see Writing and reading data from MongoDB using a Spark Batch Job.
tHBaseOutput properties for Apache Spark Streaming
These properties are used to configure tHBaseOutput running in the Spark Streaming Job framework.
The Spark Streaming
tHBaseOutput component belongs to the Databases family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Storage configuration |
Select the tHBaseConfiguration component from which the |
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
|
Schema et Edit |
A schema is a row description. It defines the number of fields Click Edit
|
 |
Built-In: You create and store the schema locally for this component |
 |
Repository: You have already created the schema and stored it in the |
Table name |
Type in the name of the HBase table in which you need to write |
Row key column |
Select the column used as the row key column of the HBase Then if needs be, select the Store row |
Custom Row Key |
Select this check box to use the customized row keys. Once For example, you can type in |
Families |
Complete this table to map the columns of the table to be used with the schema columns you The Column column of this table is automatically filled |
Advanced settings
Use batch mode |
Select this check box to activate the batch mode for data processing. |
Batch size |
Specify the number of records to be processed in each batch. This field appears only when the Use batch mode |
Use local timezone for date | Select this check box to use the local date of the machine in which your Job is executed. If leaving this check box clear, UTC is automatically used to format the Date-type data. |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component uses a tHBaseConfiguration component present in the same Job to connect to This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario about how to use the same type of component in a Spark Streaming Job, see
Reading and writing data in MongoDB using a Spark Streaming Job.