tPigDistinct
tPigDistinct Standard properties
These properties are used to configure tPigDistinct running in the Standard Job framework.
The Standard
tPigDistinct component belongs to the Big Data and the Processing families.
The component in this framework is available when you are using one of the Talend solutions with Big Data.
Basic settings
|
Schema and Edit |
A schema is a row description. It defines the number of fields (columns) to Click Edit schema to make changes to the schema.
|
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
Advanced settings
|
Increase parallelism |
Select this check box to set the number of reduce tasks for the |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the |
Global Variables
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
|
Usage rule |
This component is commonly used as intermediate step together with Warning:
This component will not maintain the original order in |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|
Limitation |
Knowledge of Pig scripts is required. |
Related scenario
For more infomation regarding the tPigDistinct
component in use, see Scenario: Filtering rows of data based on a condition and saving the result to a local file of tPigFilterRow.