tWindow
Applies a given Spark window on the incoming RDDs and sends the window-based RDDs
to its following component.
tWindow enables the Spark Job you are designing to
perform window operations. For further information about a Spark window, see the related
documentation at Window operations.
tWindow properties for Apache Spark Streaming
These properties are used to configure tWindow running in the Spark Streaming Job framework.
The Spark Streaming
tWindow component belongs to the Processing family.
The streaming version of this component is available in Talend Real Time Big Data Platform and in
Talend Data Fabric.
Basic settings
Window duration |
Enter, without quotation marks, the duration (in milliseconds) that defines the length of For example, if the batch size defined in the Spark configuration tab is 2 seconds, a |
Define the slide duration |
Select the Define the slide duration check box and in the For example, if the batch size defined in the Spark If you leave this check box clear, the slide duration is assumed to be the batch size Both the window duration and the slide duration must be multiples of the batch size defined in the Spark |
Usage
Usage rule |
This component is used as an intermediate step. This component does not change the data schema but controls the pace of the processing of This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a related scenario, see Analyzing a Twitter flow in near real-time.