August 15, 2023

Scenario: aggregating data from two relations using COGROUP – Docs for ESB 6.x

Scenario: aggregating data from two relations using COGROUP

This scenario applies only to a Talend solution with Big Data.

In this scenario, a four-component Job is designed to aggregate two relations on top of a given Hadoop cluster.

use_case-tpigcogroup1.png
The two relations used in this scenario consist of the following sample data:

  1. This
    relation is composed of three columns that read owner, pet and age (of the owners).
  2. This
    relation provides a list of students’ names alongside their friends, of
    which some are pet owners displayed in the first relation. Therefore, the
    schema of this relation contains two columns: student and friend.

Before replicating this scenario, you need to write the sample data into the HDFS system of
the Hadoop cluster to be used. To do this, you can use tHDFSOutput. For further information about this component, see tHDFSOutput.

The data used in this scenario is inspired by the examples that Pig’s documentation
uses to explain the GROUP and the GOGROUP operators. For related information, please see
Apache’s documentation for Pig.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x