tPartition
Allows you to visually define how an input dataset is partitioned.
The tPartition splits the input dataset
into a given number of partitions.
tPartition properties for Apache Spark Batch
These properties are used to configure tPartition running in the Spark Batch Job framework.
The Spark Batch
tPartition component belongs to the Processing family.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync |
 Number of partitions |
Enter the number of partitions you want to split the input dataset up into. |
Partition key |
Complete this table to define the key to be used for the partitioning. In the Partition key table, the schema columns are This partitioning proceeds in the hash mode, that is to say, the |
Use custom partitioner |
Select this check box to use a Spark partitioner you need to
import from outside the Studio. For example, a partitioner you have developed by yourself. In this situation, you need to give the following information:
|
Sort within partitions |
Select this check box to sort the records within each This feature is useful when a partition contains several distinct
key values.
|
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
No scenario is available for the Spark Batch version of this component
yet.