Performing download analysis using a Spark Batch Job
This scenario applies only to a subscription-based Talend solution with Big data.
In this scenario, you create a Spark Batch Job to analyze how often a given product is downloaded.
In this Job, you analyze the download preference of some specific customers known to your
customer base.
follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
10103|Herbert|Clinton|FR|SILVER|28-06-2011|herbert.clinton@msn.com|6571183 10281|Bill|Ford|BE|PLATINUM|13-04-2014|bill.ford@gmail.com|6360604 10390|George|Garfield|GB|SILVER|12-02-2011|george.garfield@gmail.com|7919508 10566|Abraham|Garfield|CN|SILVER|11-10-2012|abraham.garfield@msn.com|9155569 10691|John|Polk|GB|SILVER|05-11-2012|john.polk@gmail.com|6488579 10884|Herbert|Hayes|GB|SILVER|12-10-2007|herbert.hayes@gmail.com|8728181 11020|Chester|Roosevelt|BE|GOLD|28-06-2008|chester.roosevelt@yahoo.com|4172181 11316|Franklin|Madison|BR|SILVER|08-01-2014|franklin.madison@gmail.com|4711801 11707|James|Tyler|ES|GOLD|25-03-2010|james.tyler@gmail.com|7276942 11764|Theodore|McKinley|GB|GOLD|24-08-2013|theodore.mckinley@gmail.com|3224767 11777|Warren|Madison|BE|N/A|23-12-2008|warren.madison@msn.com|6695520 11857|Ronald|Arthur|SG|PLATINUM|01-04-2009|ronald.arthur@msn.fr|6704785 11936|Theodore|Buchanan|NL|SILVER|14-11-2014|theodore.buchanan@yahoo.fr|2783553 11940|Lyndon|Wilson|BR|PLATINUM|27-07-2010|lyndon.wilson@yahoo.com|1247110 12214|Gerald|Jefferson|SG|N/A|06-06-2007|gerald.jefferson@yahoo.com|5879162 12382|Herbert|Taylor|IT|GOLD|22-04-2012|herbert.taylor@msn.com|3873628 12475|Richard|Kennedy|FR|N/A|29-12-2014|richard.kennedy@yahoo.fr|7287388 12479|Calvin|Eisenhower|ES|N/A|06-11-2008|calvin.eisenhower@yahoo.fr|1792573 12531|Chester|Arthur|JP|PLATINUM|23-01-2009|chester.arthur@msn.fr|8772326 12734|Jimmy|Buchanan|IT|SILVER|09-03-2010|jimmy.buchanan@gmail.com|7007786 |
This data contains these customers’ ID numbers known to this customer base, their first and
last names and country codes, their support levels and registration dates, their email
addresses and phone numbers.
1 2 3 4 5 6 7 8 9 10 |
10103|/download/products/talend-open-studio 10281|/services/technical-support 10390|/services/technical-support 10566|/download/products/data-integration 10691|/services/training 10884|/download/products/integration-cloud 11020|/services/training 11316|/download/products/talend-open-studio 11707|/download/products/talend-open-studio 11764|/customers |
This data contains the ID numbers of the customers who visited different
Talend
web
pages and the pages they visited.
By reading this data, you can find that the visits come from customers of different
support-levels for different purposes. The Job to be designed is used to identify the
sources of these visits against the sample customer base and analyze which product is most
downloaded by the Silver-level customers.
Note that the sample data is created for demonstration purposes only.
To replicate this scenario, proceed as follows: