July 30, 2023

tFileFetch – Docs for ESB 7.x

tFileFetch

Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or SMB).

tFileFetch Standard properties

These properties are used to configure tFileFetch running in the Standard Job
framework.

The Standard
tFileFetch component belongs to the Internet family.

The component in this framework is available in all Talend
products
.

Basic settings

Protocol

Select the protocol you want to use from the
list and fill in the corresponding fields: http, https, ftp, smb.

The properties differ slightly depending on the
type of protocol selected. The additional fields are defined in this table,
after the basic settings.

URI

Type in the URI of the site from which the file
is to be fetched.

Use cache to save
resource

Select this check box to save the data in the
cache.

This option allows you to process the file data
flow (in streaming mode) without saving it on your drive. This is faster and
improves performance.

Domain

Enter the Microsoft server domain name.

Available for the smb protocol.

Username and Password

Enter the authentication information required
to access the server.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Available for the smb protocol.

Destination Directory

Browse to the destination folder where the file
fetched is to be placed.

Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors.

Destination Filename

Enter a new name for the file fetched.

If the Upload file
option in the Advanced settings view is
selected, the upload response will be saved in this file.

Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors.

Create full path according to
URI

It allows you to reproduce the URI directory
path. To save the file at the root of your destination directory, clear the
check box.

Available for the http, https and ftp protocols.

Add header

Select this check box if you want to add one or
more HTTP request headers as fetch conditions. In the Headers table, enter the name(s)
of the HTTP header parameter(s) in the Name field and the corresponding value(s) in the
Value field.

Available for the http and https protocols.

POST method

This check box is selected by default. It
allows you to use the POST method. In the Parameters table, enter the name of the
variable(s) in the Name
field and the corresponding value in the Value field.

Clear the check box if you want to use the GET
method.

Available for the http and https protocols.

Die on error

Clear this check box to skip the rows in error
and to complete the process for the error free rows

Available for the http, https and ftp protocols.

Read Cookie

Select this check box for tFileFetch to load a web
authentication cookie.

Available for the http, https, ftp and smb protocols.

Save Cookie

Select this check box to save the web page
authentication cookie. This means you will not have to log on to the same web
site in the future.

Available for the http, https, ftp and smb protocols.

Cookie file

Type in the full path to the file which you
want to use to save the cookie or click […] and browse to the desired file to save
the cookie.

Available for the http, https, ftp and smb protocols.

Cookie policy

Choose a cookie policy from this drop-down
list. Four options are available, BROWSER_COMPATIBILITY, DEFAULT, NETSCAPE and RFC_2109.

Available for the http, https, ftp and smb protocols.

Single cookie header

Check this box to put all cookies into one
request header for maximum compatibility among different servers.

Available for the http, https, ftp and smb protocols.

Advanced settings

tStatCatcher Statistics

Select this check box to collect the log data
at each component level.

Timeout

Enter the number of milliseconds after which
the protocol connection should close.

Available for the http and https protocols.

Print response to
console

Select this check box to print the server
response in the console.

Available for the http and https protocols.

Upload file

Select this check box to upload one or more
files to the server. For each file to be uploaded, click the [+] button beneath the table
displayed and set the following fields:

  • Name: the value
    of the name attribute of the <input type=”file”>
    field in the original HTML form.

  • File: the full
    path of the file to upload, e.g. "D:/filefetch.txt".

  • Content-Type: the
    content type of the file to upload. The default value is "application/octet-stream".

  • Charset: the
    character set of the file to upload. The default value is "ISO-8859-1".

Thhis option is available for the http and https protocols, with the
POST method option in the Basic settings view selected.

With this option selected, the upload response will be saved
in the file specified in the Destination
filename
field in the Basic
settings
view.

Enable proxy server

Select this check box if you are connecting
via a proxy and complete the fields which follow with the relevant
information.

Available for the http, https and ftp protocols.

Enable NTLM Credentials

Select this check box if you are using an NTLM
authentication protocol.

Domain: The client
domain name.

Host: The client’s IP
address.

Available for the http and https protocols.

Need authentication

Select this check box and enter the username
and password in the relevant fields, if they are required to access the
protocol.

Available for the http and https protocols.

Support redirection

Select this check box to repeat the
redirection request until redirection is successful and the file can be
retrieved.

Available for the http, https and ftp protocols.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is generally used as a start
component to feed the input flow of a Job and is often connected to the Job
using an OnSubjobOk or
OnComponentOk link,
depending on the context.

Limitation

Due to license incompatibility, one or more JARs required to use
this component are not provided. You can install the missing JARs for this particular
component by clicking the Install button
on the Component tab view. You can also
find out and add all missing JARs easily on the Modules tab in the
Integration
perspective of your studio. You can find more details about how to install external modules in
Talend Help Center (https://help.talend.com)
.

Fetching data through HTTP

This scenario describes a three-component Job which retrieves a file from an HTTP
website, reads data from the fetched file and displays the data on the console.

tFileFetch_1.png

Dropping and linking components

  1. Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.
  2. Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or
    On Component Ok connection.
  3. Link tFileInputDelimited to tLogRow using a Row > Main
    connection.

Configuring the components

  1. Double-click tFileFetch to open its
    Basic settings view.

    tFileFetch_2.png

  2. Select the protocol you want to use from the list. Here, http is selected.
  3. In the URI field, type in the URI where
    the file to be fetched can be retrieved from. You can paste the URI directly
    in your browser to view the data in the file.
  4. In the Destination directory field,
    browse to the folder where the fetched file is to be stored. In this
    example, it is D:/Output.
  5. In the Destination filename field, type
    in a new name for the file if you want it to be changed. In this example,
    new.txt.
  6. If needed, select the Add header check
    box and define one or more HTTP request headers as fetch conditions. For
    example, to fetch the file only if it has been modified since 19:43:31 GMT,
    October 29, 1994, fill in the Name and
    Value fields with “If-Modified-Since”
    and “Sat, 29 Oct 1994 19:43:31 GMT” respectively in the Headers table. For details about HTTP request
    header definitions, see Header Field Definitions.
  7. Double-click tFileInputDelimited to open
    its Basic settings view.

    tFileFetch_3.png

  8. In the File name field, type in the full
    path to the fetched file which had been stored locally.
  9. Click the […] button next to Edit schema to open the Schema dialog box. In this example, add one column
    output to store the data from the
    fetched file.

    tFileFetch_4.png

  10. Leave other settings as they are.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Press F6 or click Run on the Run tab to
    execute the Job.

    tFileFetch_5.png

    The data of the fetched file is displayed on the console.

Reusing stored cookie to fetch files through HTTP

This scenario describes a two-component Job which logs in a given HTTP website and
then using cookie stored in a user-defined local directory, fetches data from this
website.

tFileFetch_6.png

Dropping and linking components

  1. Drop two tFileFetch components onto your
    design workspace.
  2. Link the two components as subJobs using a Trigger > On Subjob Ok
    connection.

Configuring the components

Configuring the first subJob

  1. Double click tFileFetch_1 to
    open its component view.

    tFileFetch_7.png

  2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
  3. In the URI field, type in the
    URI through which you can log in the website and fetch the web page accordingly. In
    this example, the URI is https://www.codeproject.com/script/Membership/LogOn.aspx?download=true.
  4. In the Destination directory
    field, browse to the folder where the fetched web page is to be stored. This folder
    will be created on the fly if it does not exist. In this example, type in D:/download.
  5. In the Destination Filename
    field, type in a new name for the file if you want it to be changed. In this example,
    codeproject.html.
  6. Under the Parameters table,
    click the plus button to add two rows and fill in the credentials for accessing the
    desired website..

    In the Name column, type in
    a new name respectively for the two rows. In this example, they are Email and Password, which are required
    by the website you are logging in.
    In the Value column, type
    in the authentication information.
  7. Select the Save cookie check
    box.
  8. In the Cookie file field,
    type in the full path to the file which you want to use to save the cookie. In this
    example, it is D:/download/cookie.
  9. Click Advanced settings to
    open its view.
  10. Select the Support
    redirection
    check box so that the redirection request will be repeated
    until the redirection is successful.

Configuring the second subJob

  1. Double-click tFileFetch_2 to open its
    Component view.

    tFileFetch_8.png

  2. From the Protocol list, select http.
  3. In the URI field, type in the address
    from which you fetch the files of your interest. In this example, the
    address is
    http://www.codeproject.com/script/articles/download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader.
  4. In the Destination directory field, type
    in the directory or browse to the folder where you want to store the fetched
    files. This folder can be automatically created if it does not exist yet
    during the execution process. In this example, type in
    D:/download.
  5. In the Destination Filename field, type
    in a new name for the file if you want it to be changed. In this example,
    source.zip.
  6. Clear the POST method check box to
    deactivate the Parameters table.
  7. Select the Read cookie check box.
  8. In the Cookie file field, browse to the
    file which is used to save the cookie. In this example, it is D:/download/cookie.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.
  2. Press F6 or click Run on the Run tab to
    execute the Job.

    Then, go to the local directory D:/download to check the downloaded file.

    tFileFetch_9.png

Related scenario

For an example of transferring data in streaming mode, see Reading data from a remote file in streaming mode


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x