tPigCode
Extends the functionalities of a
Talend
Job through using Pig scripts.
The tPigCode enters personalized Pig code to integrate
it in
Talend
program. You can execute this code only once.
tPigCode Standard properties
These properties are used to configure tPigCode running in the Standard Job framework.
The Standard
tPigCode component belongs to the Big Data and the Processing families.
The component in this framework is available when you are using one of the Talend solutions with Big Data.
Basic settings
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to Click Edit schema to make changes to the schema.
|
|
|
Built-In: You create and store the |
|
Repository: You have already created |
|
|
Scripts |
Type in Pig scripts you want to execute depending on the task you Pig components output tuples and automatically set up an alias for The alias syntax is |
Advanced settings
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
|
Enable escape |
Select this check box so that you can simply write plain Pig code |
Global Variables
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
|
Usage rule |
This component is commonly used as intermediate step together with A tPigCode component can execute If a particular .jar file is required to execute a statement, you |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|
Limitation |
Knowledge of Pig scripts is required. |
Scenario: Selecting a column of data from an input file and store it into a local
file
This scenario applies only to a Talend solution with Big Data.
This scenario describes a three-component Job that selects a column of data that
matches filter condition defined in tPigCode and stores
the result into a local file.
Setting up the Job
-
Drop the following components from the Palette to the design workspace: tPigCode, tPigLoad,
tPigStoreResult. -
Right-click tPigLoad to connect it to
tPigCode using a Row > Pig Combine connection. -
Right-click tPigCode to connect it to
tPigStoreResult using a Row > Pig Combine connection.
Loading the data
-
Double-click tPigLoad to open its
Basic settings view.
-
Click the three-dot button next to Edit
schema to add columns for tPigLoad.
-
Click the plus button to add Name,
Country and Age and click
OK to save the setting. - Select Local from the Mode area.
-
Fill in the Input filename field with the
full path to the input file.In this scenario, the input file is CustomerList
which contains rows of names, country names and age. - Select PigStorage from the Load function list.
- Leave rest of the settings as they are.
Configuring the tPigCode component
-
Double-click tPigCode component to open
its Basic settings view.
-
Click Sync columns to retrieve the schema
structure from the preceding component. -
Fill in the Script Code field with
following expression:This1tPigCode_1_row2_RESULT = foreach tPigLoad_1_row1_RESULT generate $0 as name;
filter expression selects column Name from
CustomerList.
Saving the result data to a local file
-
Double-click tPigStoreResult to open its
Basic settings view.
-
Click Sync columns to retrieve the schema
structure from the preceding component. -
Fill in the Result file field with the
full path to the result file.In this scenario, the result is saved in Result
file. -
Select Remove result directory if
exists. - Select PigStorage from the Store function list.
- Leave rest of the settings as they are.
Executing the Job
Save your Job and press F6 to run it.
The Result file is generated containing the selected column
of data.