August 15, 2023

Getting the data from the HDFS – Docs for ESB 6.x

Getting the data from the HDFS

  1. Double-click tHDFSGet to define the
    component in its Basic settings
    view.

    Use_Case_tHDFSGet6.png

  2. Select, for example, Apache 0.20.2 from the Hadoop
    version
    list.
  3. In the NameNode URI, the
    Username, the Group fields, enter the connection parameters to
    the HDFS. If you are using WebHDFS, the location should be
    webhdfs://masternode:portnumber; if this WebHDFS is secured
    with SSL, the scheme should be swebhdfs and you need to use
    a tLibraryLoad in the Job to load the library required by
    the secured WebHDFS.
  4. In the HDFS directory field, type in
    location storing the loaded file in HDFS. In this example, it is
    /testFile.
  5. Next to the Local directory field, click
    the three-dot […] button to browse to the
    folder intended to store the files that are extracted out of the HDFS. In
    this scenario, the directory is:
    C:/hadoopfiles/getFile/.
  6. Click the Overwrite file field to stretch
    the drop-down.
  7. From the menu, select always.
  8. In the Files area, click the plus button
    to add a row in which you define the file to be extracted.
  9. In the File mask column, enter
    *.txt to replace newLine
    between quotation marks and leave the New
    name
    column as it is. This allows you to extract all the
    .txt files from the specified directory in the HDFS
    without changing their names. In this example, the file is
    in.txt
    .

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x