August 15, 2023

Preparing the Hive tables – Docs for ESB 6.x

Preparing the Hive tables

  1. Create the Hive table you want to write data in. In this scenario, this
    table is named as agg_result, and you can
    create it using the following statement in tHiveRow:
    create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'

    In this statement,
    ‘/user/ychen/hive/table/agg_result’
    is the directory used in
    this scenario to store this created table in HDFS. You need to replace it
    with the directory you want to use in your environment.
    For further information about tHiveRow,
    see tHiveRow.
  2. Create two input Hive tables containing the columns you want to join and
    aggregate these columns into the output Hive table, agg_result. The statements to be used are:
    create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer'
    and
    create table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'
  3. Use tHiveRow to load data into the two
    input tables, customer and state_city. The statements to be used are:
    "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer"
    and
    "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"

    The two files, customer.csv and
    State_City.csv, are two local files
    we created for this scenario. You need to create your own files to provide
    data to the input Hive tables. The data schema of each file should be
    identical with their corresponding table.
    You can use tRowGenerator and tFileOutputDelimited to create these two files
    easily. For further information about these two components, see tRowGenerator and tFileOutputDelimited.

    For further information about the Hive query language, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x