August 16, 2023

Using the tDataprepRun component to apply a preparation to a data sample in an Apache Spark Streaming Job – Docs for ESB 6.x

Using the tDataprepRun component to apply a preparation to a data
sample in an Apache Spark Streaming Job

This scenario applies only to Talend Real-time Big Data Platform or Talend Data Fabric.

The tDataprepRun component allows you to reuse an existing
preparation made in Talend Data Preparation,
directly in a Big Data Job. In other words, you can operationalize the process of applying a
preparation to input data with the same model.

The following scenario creates a simple Job that :

  • Reads a small sample of customer data,
  • applies an existing preparation on this data,
  • shows the result of the execution in the console.
Use_Case_tDataprepRun_spark_batch_1.png

This assumes that a preparation has been created beforehand, on a dataset with the same
schema as your input data for the Job. In this case, the existing preparation is called
datapreprun_spark. This simple preparation puts the customer last names
into upper case and applies a filter to isolate the customers from California, Texas and
Florida.

Use_Case_tDataprepRun_spark_batch_2.png
The sample data reads as
follows:
Note: The sample data is created for demonstration purposes only.

Prerequisite: ensure that the Spark
cluster has been properly installed and is running.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x