Using the tDataprepRun component to apply a preparation to a data
sample in an Apache Spark Streaming Job

This scenario applies only to Talend Real-time Big Data Platform or Talend Data Fabric.

The tDataprepRun component allows you to reuse an existing
preparation made in Talend Data Preparation,
directly in a Big Data Job. In other words, you can operationalize the process of applying a
preparation to input data with the same model.

The following scenario creates a simple Job that :

Reads a small sample of customer data,
applies an existing preparation on this data,
shows the result of the execution in the console.

This assumes that a preparation has been created beforehand, on a dataset with the same
schema as your input data for the Job. In this case, the existing preparation is called
datapreprun_spark. This simple preparation puts the customer last names
into upper case and applies a filter to isolate the customers from California, Texas and
Florida.

The sample data reads as
follows:

James;Butt;California
Daniel;Fox;Connecticut
Donna;Coleman;Alabama
Thomas;Webb;Illinois
William;Wells;Florida
Ann;Bradley;California
Sean;Wagner;Florida
Elizabeth;Hall;Minnesota
Kenneth;Jacobs;Florida
Kathleen;Crawford;Texas
Antonio;Reynolds;California
Pamela;Bailey;Texas
Patricia;Knight;Texas
Todd;Lane;New Jersey
Dorothy;Patterson;Virginia

James;Butt;California

Daniel;Fox;Connecticut

Donna;Coleman;Alabama

Thomas;Webb;Illinois

William;Wells;Florida

Ann;Bradley;California

Sean;Wagner;Florida

Elizabeth;Hall;Minnesota

Kenneth;Jacobs;Florida

Kathleen;Crawford;Texas

Antonio;Reynolds;California

Pamela;Bailey;Texas

Patricia;Knight;Texas

Todd;Lane;New Jersey

Dorothy;Patterson;Virginia

Note: The sample data is created for demonstration purposes only.

Prerequisite: ensure that the Spark
cluster has been properly installed and is running.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 6.x

0 Comments

Inline Feedbacks

View all comments

Using the tDataprepRun component to apply a preparation to a data sample in an Apache Spark Streaming Job – Docs for ESB 6.x

Using the tDataprepRun component to apply a preparation to a data
sample in an Apache Spark Streaming Job

My Website Links

Tags