Reading and writing data in MongoDB using a Spark Streaming Job
This scenario applies only to Talend Real-time Big Data Platform or Talend Data Fabric.
In this scenario, you create a Spark Streaming Job to extract data about given movie
directors from MongoDB, use this data to filter and complete movie information and then
write the result into a MongoDB collection.
The sample data about movie directors reads as
follows:
follows:
1 2 3 4 5 |
1;Gregg Araki 2;P.J. Hogan 3;Alan Rudolph 4;Alex Proyas 5;Alex Sichel |
This data contains the names of these directors and the ID numbers distributed to
them.
The structure of this data in MongoDB reads as
follows:
follows:
1 2 3 4 5 |
{ "_id" : ObjectId("575546da3b1c7e22bc7b2189"), "person" : { "id" : 3, "name" : "Alan Rudolph" } } { "_id" : ObjectId("575546da3b1c7e22bc7b218b"), "person" : { "id" : 4, "name" : "Alex Proyas" } } { "_id" : ObjectId("575546da3b1c7e22bc7b218c"), "person" : { "id" : 5, "name" : "Alex Sichel" } } { "_id" : ObjectId("575546da3b1c7e22bc7b2188"), "person" : { "id" : 1, "name" : "Gregg Arakit" } } { "_id" : ObjectId("575546da3b1c7e22bc7b218a"), "person" : { "id" : 2, "name" : "P.J. Hogan" } } |
Note that the sample data is created for demonstration purposes only.
Prerequisites:
-
The Spark cluster and the MongoDB database to be used have been properly
installed and are running. -
The above-mentioned data has been loaded in the MongoDB collection to be
used.
To replicate this scenario, proceed as follows:
Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Login
0 Comments