Scenario: aggregating data from two relations using COGROUP

This scenario applies only to a Talend solution with Big Data.

In this scenario, a four-component Job is designed to aggregate two relations on top of a given Hadoop cluster.

The two relations used in this scenario consist of the following sample data:

Alice,turtle,17 Alice,goldfish,17 Alice,cat,17 Bob,dog,18 Bob,cat,18 John,dog,19 Mary,goldfish,16 Bill,dog,20

1
2
3
4
5
6
7
8

Alice,turtle,17
Alice,goldfish,17
Alice,cat,17
Bob,dog,18
Bob,cat,18
John,dog,19
Mary,goldfish,16
Bill,dog,20

This
relation is composed of three columns that read owner, pet and age (of the owners).
Cindy,Alice Mark,Alice Paul,Bob Paul,Jane John,Mary William,Bill

1
2
3
4
5
6

Cindy,Alice
Mark,Alice
Paul,Bob
Paul,Jane
John,Mary
William,Bill

This
relation provides a list of students’ names alongside their friends, of
which some are pet owners displayed in the first relation. Therefore, the
schema of this relation contains two columns: student and friend.

Before replicating this scenario, you need to write the sample data into the HDFS system of
the Hadoop cluster to be used. To do this, you can use tHDFSOutput. For further information about this component, see tHDFSOutput.

The data used in this scenario is inspired by the examples that Pig’s documentation
uses to explain the GROUP and the GOGROUP operators. For related information, please see
Apache’s documentation for Pig.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 6.x

0 Comments

Inline Feedbacks

View all comments

Scenario: aggregating data from two relations using COGROUP – Docs for ESB 6.x

Scenario: aggregating data from two relations using COGROUP

My Website Links

Tags