Acts as interface to integrate Flume and the Spark Streaming Job developed with the
Studio to continuously read data from a given Flume agent.
data from a given Flume agent and sends this data to its following components.
tFlumeInput properties for Apache Spark Streaming
These properties are used to configure tFlumeInput running in the Spark Streaming Job framework.
The Spark Streaming
tFlumeInput component belongs to the Messaging
Host and Port
Enter the hostname and the port of the machine used as the sink (the data output point
Select the approach to read data from Flume.
For further information about these two approaches, see https://spark.apache.org/docs/1.3.1/streaming-flume-integration.html.
Schema and Edit
A schema is a row description. It defines the number of fields
Built-In: You create and store the schema locally for this component
Repository: You have already created the schema and stored it in the
This read-only line column is used by tFlumeInput to automatically extract the body of an input Flume event
Select the encoding from the list or select Custom and define it manually.
This encoding is used by tFlumeInput to decode the input
This component is used as a start component and requires an output link.
At runtime, the tFlumeInput component keeps listening to
This component, along with the Spark Streaming component Palette it belongs to, appears
Note that in this documentation, unless otherwise explicitly stated, a scenario presents
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis.
Due to license incompatibility, one or more JARs required to use
No scenario is available for the Spark Streaming version of this component