Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.
Component family |
Big Data / HBase |
|||
Function |
tHBaseInput extracts columns If you have subscribed to one of the Talend solutions with Big Data, you are |
|||
Purpose |
tHBaseInput reads data from a |
|||
Basic settings |
Property type |
Either Built-in or Repository. – Built-in : No property data – Repository : Select the |
||
|
Click this icon to open a database connection wizard and store the For more information about setting up and storing database |
|||
|
Use an existing connection NoteNot available for the Map/Reduce version of this |
Select this check box and in the Component List click the |
||
Version |
Distribution |
Select the cluster you are using from the drop-down list. The options in the list vary
In order to connect to a custom distribution, once selecting Custom, click the button to display the dialog box in which you can
In the Map/Reduce version of this component, the distribution you |
||
HBase version |
Select the version of the Hadoop distribution you are using. The available options vary
|
|||
Hadoop version of the |
This list is displayed only when you have selected Custom |
|||
|
Zookeeper quorum |
Type in the name or the URL of the Zookeeper service you use to coordinate the transaction |
||
|
Zookeeper client port |
Type in the number of the client listening port of the Zookeeper service you are |
||
Use kerberos authentication |
If you are accessing an HBase database running with Kerberos security, select this check If you need to use a Kerberos keytab file to log in, select Use a Note that the user that executes a keytab-enabled Job is not necessarily the one a |
|||
|
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit schema to make changes to the schema. If the
|
||
|
|
Built-In: You create and store the schema locally for this |
||
|
|
Repository: You have already created the schema and |
||
Table name | Type in the name of the HBase table from which you need to extract columns. |
|||
Define a row selection |
Select this check box and then in the Start Different from the filters you can set in the Advanced settings tab requiring the |
|||
Mapping |
Complete this table to map the columns of the HBase table to be used with the schema |
|||
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. |
||
Advanced settings |
tStatCatcher Statistics |
Select this check box to collect log data at the component level. |
||
Properties |
If you need to use custom configuration for your HBase, complete For example, you need to define the value of the dfs.replication property as 1 for the HBase configuration. Then you NoteThis table is not available when you are using an existing |
|||
Filter |
Is by filter |
Select this check box to use HBase filters to perform fine-grained Once selecting it, the Filter These filters are advanced features provided by HBase and subject |
||
Logical operation |
Select the operator you need to use to define the logical relation
|
|||
Filter |
Click the button under this table to add as many rows as required,
Depending on the Filter type you |
|||
Global Variables |
NB_LINE: the number of rows read by an input component or ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|||
Usage |
This component is a start component of a Job and always needs an |
|||
Usage in Map/Reduce Jobs |
In a Talend Map/Reduce Job, it is used as a start component and requires You need to use the Hadoop Configuration tab in the The Hadoop configuration you use for the whole Job and the Hadoop distribution you use for For further information about a Talend Map/Reduce Job, see the sections Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|||
Prerequisites |
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|||
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
This table presents the HBase filters available in Talend Studio and the parameters required by those filters.
Filter type |
Filter column |
Filter family | Filter operation | Filter value | Filter comparator type | Objective |
---|---|---|---|---|---|---|
Single Column Value Filter |
Yes |
Yes |
Yes |
Yes |
Yes |
It compares the values of a given column against the value defined |
Family filter |
Yes |
Yes |
Yes |
It returns the columns of the family that meets the filtering |
||
Qualifier filter |
Yes |
Yes |
Yes |
It returns the columns whose column qualifiers match the filtering |
||
Column prefix filter |
Yes |
Yes |
It returns all columns of which the qualifiers have the prefix |
|||
Multiple column prefix filter |
Yes (Multiple prefixes are separated by coma, for |
Yes |
It works the same way as a Column prefix |
|||
Column range filter |
Yes (The ends of a range are separated by coma. ) |
Yes |
It allows intra row scanning and returns all matching columns of a scanned row. |
|||
Row filter |
Yes |
Yes |
Yes | It filters on row keys and returns all rows that matches the filtering condition. |
||
Value filter |
Yes |
Yes |
Yes |
It returns only columns that have a specific value. |
The use explained above of the listed HBase filters is subject to revisions made by
Apache in its Apache HBase project; therefore, in order to fully understand how to use
these HBase filters, we recommend reading Apache’s HBase documentation.
In this scenario, a six-component Job is used to exchange customer data with a given
HBase.
The six components are:
-
tHBaseConnection: creates a connection to
your HBase database. -
tFixedFlowInput: creates the data to be
written into your HBase. In the real use case, this component could be
replaced by the other input components like tFileInputDelimited. -
tHBaseOutput: writes the data it receives
from the preceding component into your HBase. -
tHBaseInput: extracts the columns of
interest from your HBase. -
tLogRow: presents the execution
result. -
tHBaseClose: closes the
transaction.
To replicate this scenario, proceed as the following sections illustrate.
Note
Before starting the replication, your Hbase and Zookeeper service should have been
correctly installed and well configured. This scenario explains only how to use
Talend solution to make data transaction with a given HBase.
To do this, proceed as follows:
-
Drop tHBaseConnection, tFixedFlowInput, tHBaseOutput, tHBaseInput, tLogRow and
tHBaseClose from Palette onto the
Design workspace. -
Right-click tHBaseConnection to open its
contextual menu and select the Trigger >
On Subjob Ok link from this menu to
connect this component to tFixedFlowInput. -
Do the same to create the OnSubjobOk link
from tFixedFlowInput to tHBaseInput and then to tHBaseClose. -
Right-click tFixedFlowInput and select
the Row > Main link to connect this component to tHBaseOutput. -
Do the same to create the Main link from
tHBaseInput to tLogrow.
The components to be used in this scenario are all placed and linked. Then you
need continue to configure them sucessively.
To configure the connection to your Zookeeper service and thus to the HBase of
interest, proceed as follows:
-
On the Design workspace of your Studio, double-click the tHBaseConnection component to open its Component view.
-
Select Hortonworks Data Platform 1.0 from
the HBase version list. -
In the Zookeeper quorum field, type in
the name or the URL of the Zookeeper service you are using. In this example,
the name of the service in use is hbase. -
In the Zookeeper client port field, type
in the number of client listening port. In this example, it is 2181. -
If the Zookeeper znode parent location has been defined in the Hadoop
cluster you are connecting to, you need to select the Set zookeeper znode parent check box and enter the value of
this property in the field that is displayed.
To do this, proceed as follows:
-
On the Design workspace, double-click the tFixedFlowInput component to open its Component view.
-
In this view, click the three-dot button next to Edit schema to open the schema editor.
-
Click the plus button three times to add three rows and in the Column column, rename the three rows respectively
as: id,
name and age. -
In the Type column, click each of these
rows and from the drop-down list, select the data type of every row. In this
scenario, they are Integer for id and age,
String for name. -
Click OK to validate these changes and
accept the propagation prompted by the pop-up dialog box. -
In the Mode area, select the Use Inline Content (delimited file) to display
the fields for editing. -
In the Content field, type in the
delimited data to be written into the HBase, separated with the semicolon
“;
“. In this example, they are:12345678910111213141516171;Albert;23<span></span>2;Alexandre;24<span></span>3;Alfred-Hubert;22<span></span>4;André;40<span></span>5;Didier;28<span></span>6;Anthony;35<span></span>7;Artus;32<span></span>8;Benoît;56<span></span>9;Catherine;34<span></span>10;Charles;21<span></span>11;Christophe;36<span></span>12;Christian;67<span></span>13;Clément ;64<span></span>14;Danniel;54<span></span>15;Elisabeth;58<span></span>16;Emile;32<span></span>17;Gregory;30 <span></span> -
Double-click tHBaseOutput to open its
Component view.Note
If this component does not have the same schema of the preceding
component, a warning icon appears. In this case, click the Sync columns button to retrieve the schema
from the preceding one and once done, the warning icon disappears. -
Select the Use an existing connection
check box and then select the connection you have configured earlier. In
this example, it is tHBaseConnection_1. -
In the Table name field, type in the name
of the table to be created in the HBase. In this example, it is customer. -
In the Action on table field, select the
action of interest from the drop-down list. In this scenario, select
Drop table if exists and create. This
way, if a table named customer exists already in the HBase, it will be
disabled and deleted before creating this current table. -
Click the Advanced settings tab to open
the corresponding view. -
In the Family parameters table, add two
rows by clicking the plus button, rename them as family1 and family2
respectively and then leave the other columns empty. These two column
families will be created in the HBase using the default family performance
options.Note
The Family parameters table is
available only when the action you have selected in the Action on table field is to create a table in
HBase. For further information about this Family
parameters table, see tHBaseOutput. -
In the Families table of the Basic settings view, enter the family names in
the Family name column, each corresponding
to the column this family contains. In this example, the id and the age columns belong to family1 and the name
column to family2.Note
These column families should already exist in the HBase to be
connected to; if not, you need to define them in the Family parameters table of the Advanced settings view for creating them at
runtime.
To do this, perform the following operations:
-
Double-click tHBaseInput to open its
Component view. -
Select the Use an existing connection
check box and then select the connection you have configured earlier. In
this example, it is tHBaseConnection_1. -
Click the three-dot button next to Edit
schema to open the schema editor. -
Click the plus button three times to add three rows and rename them as
id, name and age respectively
in the Column column. This means that you
extract these three columns from the HBase. -
Select the types for each of the three columns. In this example, Integer for id
and age, String for name. -
Click OK to validate these changes and
accept the propagation prompted by the pop-up dialog box. -
In the Table name field, type in the
table from which you extract the columns of interest. In this scenario, the
table is customer. -
In the Mapping table, the Column column has been already filled
automatically since the schema was defined, so simply enter the name of
every family in the Column family column,
each corresponding to the column it contains. -
Double-click tHBaseClose to open its
Component view. -
In the Component List field, select the
connection you need to close. In this example, this connection is tHBaseConnection_1.
To execute this Job, press F6.
Once done, the Run view is opened automatically,
where you can check the execution result.
These columns of interest are extracted and you can process them according to
your needs.
Login to your HBase database, you can check the customer table this Job has created.