tPigStoreResult
tPigStoreResult Standard properties
These properties are used to configure tPigStoreResult running in the Standard Job framework.
The Standard
tPigStoreResult component belongs to the Big Data and the Processing families.
The component in this framework is available when you are using one of the Talend solutions with Big Data.
Basic settings
|
Property type |
Either Repository or Built-in. The Repository option allows you button appears, then you can click it to Otherwise, if you select Built-in, you need to manually set each of the |
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to Click Edit schema to make changes to the schema.
|
|
|
Built-In: You create and store the |
|
|
Repository: You have already created |
|
Use S3 endpoint |
Select this check box to write data into a given Amazon S3 bucket Once this Use S3 endpoint check box is
selected, you need to enter the following parameters in the fields that appear:
Note that the format of the S3 file is S3N (S3 Native Filesystem). |
|
Result folder URI |
Select the path to the result file in which data is stored. |
|
Remove result directory if exists |
Select this check box to remove an existing result directory.
Note:
This check box is disabled when you select HCatStorer from the Store function list. |
|
Store function |
Select a store function for data to be stored:
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific Parquet jar file and install it into the Studio.
This jar file can be downloaded from Apache’s site. You can find more details about how to install external modules in Talend Help Center (https://help.talend.com). |
|
HCataLog Configuration |
Fill the following fields to configure HCataLog managed tables on Distribution and Version: Select the Hadoop distribution to which you have defined the connection in the If that tPigLoad component connects to a
HCat metastore: Enter the
Database: The database in which
Table: The table in which data is
Partition filter: Fill this field Note:
HCataLog Configuration area |
| HBase configuration |
This area is available to the HBaseStorage function. The Distribution and Version: Select the Hadoop distribution to which you have defined the connection in the If that tPigLoad component connects to a Zookeeper quorum: Type in the name or the URL of the Zookeeper service you use to coordinate the transaction Zookeeper client port: Type in the number of the client listening port of the Zookeeper service you are Table name: Enter the name of the HBase table you need to store data in. The Row key column: Select the column used as the row key column of the HBase
Store row key column to Hbase Select this check box to make the row key column an HBase column Mapping: Complete this table to map the columns of the table to be used with the schema columns you The Column column of this table is automatically filled |
| Field separator |
Enter character, string or regular expression to separate fields for the transferred Note:
This field is enabled only when you select PigStorage from the Store function list. |
|
Sequence Storage configuration |
This area is available only to the SequenceFileStorage function. Since a SequenceFile Key column: Select the Key column of a key/value record. Value column Select the Value column of a key/value record. |
Advanced settings
|
Register jar |
Click the [+] button to add rows to the table and from these rows, browse to the jar |
| HBaseStorage configuration |
Add and set more HBaseStorage storer options in this table. The
loadKey: enter true to store the gt: the minimum key value; lt: the maximum key value;
gte: the minimum key value
lte: the maximum key value
limit: maxum number of rows to
caching: number of rows to
caster: the converter to use for |
|
Define the jars to register |
This check box appears when you are using tHCatStorer, while by default, you can leave it |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
Global Variables
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
|
Usage rule |
This component is always used to end a Pig process and needs This component reuses automatically the connection created by the Note that if you use Hortonworks |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|
Limitation |
Knowledge of Pig scripts is required. If you select HCatStorer as the store function, |
Related Scenario
-
Related scenario in which tPigStoreResult
uses the Local mode, see Scenario: Sorting data in ascending order of tPigSort. -
Related scenario in which tPigStoreResult
uses the Map/Reduce mode, see Scenario: Loading an HBase table
