tKinesisInputAvro
Acts as consumer of an Amazon Kinesis stream to pull messages from this Kinesis
stream.
Using the Kinesis Client Library (KCL) provided by Amazon, tKinesisInputAvro consumes Avro-formatted
data from a given Amazon Kinesis stream (an ordered sequence of data
records), constructs an RDD out of this data and sends the RDD to its
following components.
tKinesisInputAvro properties for Apache Spark Streaming
These properties are used to configure tKinesisInputAvro running in the Spark Streaming Job framework.
The Spark Streaming
tKinesisInputAvro component belongs to the Messaging family.
The streaming version of this component is available in Talend Real Time Big Data Platform and in
Talend Data Fabric.
Basic settings
Schema and Edit |
A schema is a row description. It defines the number of fields |
Access key |
Enter the access key ID that uniquely identifies an AWS |
Secret key |
Enter the secret access key, constituting the security To enter the password, click the […] button next to the |
Stream name |
Enter the name of the Kinesis stream you want tKinesisInput to pull data from. |
Endpoint URL |
Enter the endpoint of the Kinesis service to be used. For example, https://kinesis.us-east-1.amazonaws.com. More valid Kinesis endpoint URLs |
Explicitly set authentication |
Select this check box to use the explicit authentication mechanism to connect to Kinesis. Since this security mechanism requires the AWS Region parameter to be explicitly set, you It is recommended to use the explicit authentication to gain better security when the While if you leave this check box clear, an older authentication mechanism is used. This |
Advanced settings
Checkpoint interval |
Enter the time interval (in millisecond) at the end of which tKinesisInput saves the position of its read in the Kinesis stream. Data records in a Kinesis stream are grouped into partitions (shards in terms of Kinesis) |
Initial position stream |
Select the starting position to read data from the stream in the absence of the Kinesis
checkpoint information.
|
Storage level |
Select how you want the received data to be cached. For further information about the |
Use hierarchical mode |
Select this check box to map the binary (including hierarchical) Avro schema to the Once selecting it, you need set the following parameter(s):
|
Usage
Usage rule |
This component is used as a start component and requires an output link. At runtime, this component keeps listening to the stream and reads new messages once they This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Limitation |
Due to license incompatibility, one or more JARs required to use |
Related scenarios
No scenario is available for the Spark Streaming version of this component
yet.