August 16, 2023

Using the tDataprepRun component to apply a preparation to a data sample in an Apache Spark Batch Job – Docs for ESB 6.x

Using the tDataprepRun component to apply a preparation
to a data sample in an Apache Spark Batch Job

This scenario applies only to a subscription-based Talend solution with Big data.

The tDataprepRun component allows you to reuse an existing
preparation made in Talend Data Preparation,
directly in a Big Data Job. In other words, you can operationalize the process of applying a
preparation to input data with the same model.

The following scenario creates a simple Job that :

  • Reads a small sample of customer data,
  • applies an existing preparation on this data,
  • shows the result of the execution in the console.
Use_Case_tDataprepRun_spark_batch_1.png

This assumes that a preparation has been created beforehand, on a dataset with
the same schema as your input data for the Job. In this case, the existing preparation is
called datapreprun_spark. This simple preparation puts the customer last
names into upper case and applies a filter to isolate the customers from California, Texas and
Florida.

Use_Case_tDataprepRun_spark_batch_2.png
The sample data reads as
follows:
Note: The sample data is created for demonstration purposes only.

Prerequisite: ensure that the Spark
cluster has been properly installed and is running.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x