Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.
Component family |
Big Data / Hadoop |
|
Function |
This component allows you to enter personalized Pig code to |
|
Purpose |
tPigCode extends the |
|
Basic settings |
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
Built-In: You create and store the schema locally for this |
|
Repository: You have already created the schema and |
||
Scripts |
Type in Pig scripts you want to execute depending on the task you Pig components output tuples and automatically set up an alias for The alias syntax is |
|
Advanced settings |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
Enable escape |
Select this check box so that you can simply write plain Pig code |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is commonly used as intermediate step together with A tPigCode component can execute If a particular .jar file is required to execute a statement, you |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
Knowledge of Pig scripts is required. |
This scenario describes a three-component Job that selects a column of data that
matches filter condition defined in tPigCode and stores
the result into a local file.
-
Drop the following components from the Palette to the design workspace: tPigCode, tPigLoad,
tPigStoreResult. -
Right-click tPigLoad to connect it to
tPigCode using a Row > Pig Combine connection. -
Right-click tPigCode to connect it to
tPigStoreResult using a Row > Pig Combine connection.
-
Double-click tPigLoad to open its
Basic settings view. -
Click the three-dot button next to Edit
schema to add columns for tPigLoad. -
Click the plus button to add Name,
Country and Age and click
OK to save the setting. -
Select Local from the Mode area.
-
Fill in the Input filename field with the
full path to the input file.In this scenario, the input file is CustomerList
which contains rows of names, country names and age. -
Select PigStorage from the Load function list.
-
Leave rest of the settings as they are.
-
Double-click tPigCode component to open
its Basic settings view. -
Click Sync columns to retrieve the schema
structure from the preceding component. -
Fill in the Script Code field with
following expression:1tPigCode_1_row2_RESULT = foreach tPigLoad_1_row1_RESULT generate $0 as name;This
filter expression selects column Name from
CustomerList.
-
Double-click tPigStoreResult to open its
Basic settings view. -
Click Sync columns to retrieve the schema
structure from the preceding component. -
Fill in the Result file field with the
full path to the result file.In this scenario, the result is saved in Result
file. -
Select Remove result directory if
exists. -
Select PigStorage from the Store function list.
-
Leave rest of the settings as they are.