Scenario: aggregating data from two relations using COGROUP
This scenario applies only to a Talend solution with Big Data.
In this scenario, a four-component Job is designed to aggregate two relations on top of a given Hadoop cluster.
The two relations used in this scenario consist of the following sample data:
-
12345678Alice,turtle,17Alice,goldfish,17Alice,cat,17Bob,dog,18Bob,cat,18John,dog,19Mary,goldfish,16Bill,dog,20
relation is composed of three columns that read owner, pet and age (of the owners). -
123456Cindy,AliceMark,AlicePaul,BobPaul,JaneJohn,MaryWilliam,Bill
relation provides a list of students’ names alongside their friends, of
which some are pet owners displayed in the first relation. Therefore, the
schema of this relation contains two columns: student and friend.
Before replicating this scenario, you need to write the sample data into the HDFS system of
the Hadoop cluster to be used. To do this, you can use tHDFSOutput. For further information about this component, see tHDFSOutput.
The data used in this scenario is inspired by the examples that Pig’s documentation
uses to explain the GROUP and the GOGROUP operators. For related information, please see
Apache’s documentation for Pig.
Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Login
0 Comments