Scenario: Merging two datasets in HDFS

This scenario applies only to a Talend solution with Big Data.

This scenario illustrates how to use tSqoopMerge to
merge two datasets that are sequentially imported to HDFS from the same MySQL table,
with modifications of a record in between.

The first dataset (the old one before the modifications) to be used in this scenario
reads as follows:

id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,3000,2010-05-02 15:34:05

id,wage,mod_date

0,2000,2008-06-26 04:25:59

1,2300,2011-06-12 05:29:45

2,2500,2007-01-15 11:59:13

3,3000,2010-05-02 15:34:05

The path to it in HDFS is /user/ychen/target_old.

The second dataset (the new one after the modifications) to be used reads as follows:

id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,4000,2013-10-14 18:00:00

id,wage,mod_date

0,2000,2008-06-26 04:25:59

1,2300,2011-06-12 05:29:45

2,2500,2007-01-15 11:59:13

3,4000,2013-10-14 18:00:00

The path to it in HDFS is /user/ychen/target_new.

These datasets were both imported by tSqoopImport.
For a scenario about how to use tSqoopImport, see Scenario: Importing a MySQL table to HDFS.

The Job in this scenario merges these two datasets with the newer record overwriting
the older one.

Before starting to replicate this scenario, ensure that you have appropriate rights
and permissions to access the Hadoop distribution to be used. Then proceed as
follows:

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 6.x

0 Comments

Inline Feedbacks

View all comments

Scenario: Merging two datasets in HDFS – Docs for ESB 6.x

Scenario: Merging two datasets in HDFS

My Website Links

Tags