August 17, 2023

tFileFetch – Docs for ESB 5.x

tFileFetch

tFileFetch.png

tFileFetch properties

Component family

Internet

 

Function

tFileFetch retrieves a file via a
defined protocol.

Purpose

tFileFetch allows you to retrieve
file data according to the protocol which is in place.

Basic settings

Protocol

Select the protocol you want to use from the list and fill in the
corresponding fields: http, https, ftp,
smb
.

The properties differ slightly depending on the type of protocol
selected. The additional fields are defined in this table, after the
basic settings.

 

URI

Type in the URI of the site from which the file is to be
fetched.

 

Use cache to save resource

Select this check box to save the data in the cache.

This option allows you to process the file data flow (in streaming
mode) without saving it on your drive. This is faster and improves
performance.

Domain

Enter the Microsoft server domain name.

Available for the smb
protocol.

Username and Password

Enter the authentication information required to access the
server.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Available for the smb
protocol.

 

Destination Directory

Browse to the destination folder where the file fetched is to be
placed.

 

Destination Filename

Enter a new name for the file fetched.

 

Create full path according to URI

It allows you to reproduce the URI directory path. To save the
file at the root of your destination directory, clear the check
box.

Available for the http, https and ftp protocols.

 

Add header

Select this check box if you want to add one or more HTTP request
headers as fetch conditions. In the Headers table, enter the name(s) of the HTTP header
parameter(s) in the Name field and
the corresponding value(s) in the Value field.

Available for the http and
https protocols.

 

POST method

This check box is selected by default. It allows you to use the
POST method. In the Parameters
table, enter the name of the variable(s) in the Name field and the corresponding value in
the Value field.

Clear the check box if you want to use the GET method.

Available for the http and
https protocols.

 

Die on error

Clear this check box to skip the rows in error and to complete the
process for the error free rows

Available for the http, https and ftp protocols.

 

Read Cookie

Select this check box for tFileFetch to load a web authentication
cookie.

Available for the http, https, ftp and smb
protocols.

 

Save Cookie

Select this check box to save the web page authentication cookie.
This means you will not have to log on to the same web site in the
future.

Available for the http, https, ftp and smb
protocols.

 

Cookie file

Type in the full path to the file which you want to use to save
the cookie or click […] and
browse to the desired file to save the cookie.

Available for the http, https, ftp and smb
protocols.

 

Cookie policy

Choose a cookie policy from this drop-down list. Four options are available, BROWSER_COMPATIBILITY, DEFAULT, NETSCAPE and RFC_2109.

Available for the http, https, ftp and smb
protocols.

 

Single cookie header

Check this box to put all cookies into one request header for
maximum compatibility among different servers.

Available for the http, https, ftp and smb
protocols.

Advanced settings

tStatCatcher Statistics

Select this check box to collect the log data at each component
level.

Timeout

Enter the number of milliseconds after which the protocol
connection should close.

Available for the http and
https protocols.

 

Print response to console

Select this check box to print the server response in the
console.

Available for the http and
https protocols.

 

Upload file

Select this check box to upload one or more files to the server.
Then in the Files table displayed,
click the [+] button to add the
file(s) to upload and define the following parameters for each
file:

  • Name: the new name of the
    file after being uploaded, between double quotation
    marks.

  • File: the full path of
    the file to upload, e.g. “D:/filefetch.txt”.

  • Content-Type: the content
    type of the file to upload. The default value is “application/octet-stream”.

  • Charset: the character
    set of the file to upload. The default value is “ISO-8859-1”.

Available for the http and
https protocols.

 

Enable proxy server

Select this check box if you are connecting via a proxy and
complete the fields which follow with the relevant
information.

Available for the http, https and ftp protocols.

 

Enable NTLM Credentials

Select this check box if you are using an NTLM authentication
protocol.

Domain: The client domain
name.

Host: The client’s IP
address.

Available for the http and
https protocols.

 

Need authentication

Select this check box and enter the username and password in the
relevant fields, if they are required to access the protocol.

Available for the http and
https protocols.

 

Support redirection

Select this check box to repeat the redirection request until
redirection is successful and the file can be retrieved.

Available for the http, https and ftp protocols.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

This component is generally used as a start component to feed the
input flow of a Job and is often connected to the Job using an
OnSubjobOk or OnComponentOk link, depending on the
context.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not
provided. You can install the missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also find out and add all missing JARs easily on
the Modules tab in the Integration perspective
of your studio. For details, see https://help.talend.com/display/KB/How+to+install+external+modules+in+the+Talend+products
or the section describing how to configure the Studio in the Talend Installation and Upgrade
Guide
.

Scenario 1: Fetching data through HTTP

This scenario describes a three-component Job which retrieves a file from an HTTP
website, reads data from the fetched file and displays the data on the console.

Use_Case_tFileFetch1.png

Dropping and linking components

  1. Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.

  2. Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or
    On Component Ok connection.

  3. Link tFileInputDelimited to tLogRow using a Row > Main
    connection.

Configuring the components

  1. Double-click tFileFetch to open its
    Basic settings view.

    use_case_tfilefetch_2.png
  2. Select the protocol you want to use from the list. Here, http is selected.

  3. In the URI field, type in the URI where
    the file to be fetched can be retrieved from. You can paste the URI directly
    in your browser to view the data in the file.

  4. In the Destination directory field,
    browse to the folder where the fetched file is to be stored. In this
    example, it is D:/Output.

  5. In the Destination filename field, type
    in a new name for the file if you want it to be changed. In this example,
    new.txt.

  6. If needed, select the Add header check
    box and define one or more HTTP request headers as fetch conditions. For
    example, to fetch the file only if it has been modified since 19:43:31 GMT,
    October 29, 1994, fill in the Name and
    Value fields with “If-Modified-Since”
    and “Sat, 29 Oct 1994 19:43:31 GMT” respectively in the Headers table. For details about HTTP request
    header definitions, see Header Field Definitions.

  7. Double-click tFileInputDelimited to open
    its Basic settings view.

    use_case_tfilefetch_3.png
  8. In the File name field, type in the full
    path to the fetched file which had been stored locally.

  9. Click the […] button next to Edit schema to open the [Schema] dialog box. In this example, add one column
    output to store the data from the
    fetched file.

    use_case_tfilefetch_4.png
  10. Leave other settings as they are.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to
    execute the Job.

    use_case_tfilefetch_5.png

    The data of the fetched file is displayed on the console.

Scenario 2: Reusing stored cookie to fetch files through HTTP

This scenario describes a two-component Job which logs in a given HTTP website and
then using cookie stored in a user-defined local directory, fetches data from this
website.

Use_Case_tFileFetch2.png

Dropping and linking components

  1. Drop two tFileFetch components onto your
    design workspace.

  2. Link the two components as subjobs using a Trigger > On Subjob Ok
    connection.

Configuring the components

Configuring the first subjob

  1. Double click tFileFetch_1 to open its
    component view.

    Use_Case_tFileFetch3.png
  2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.

  3. In the URI field, type in the URI through
    which you can log in the website and fetch the web page accordingly. In this
    example, the URI is
    https://www.codeproject.com/script/Membership/LogOn.aspx?download=true.

  4. In the Destination directory field,
    browse to the folder where the fetched web page is to be stored. This folder
    will be created on the fly if it does not exist. In this example, type in
    D:/download.

  5. In the Destination Filename field, type
    in a new name for the file if you want it to be changed. In this example,
    codeproject.html.

  6. Under the Parameters table, click the
    plus button to add two rows and fill in the credentials for accessing the
    desired website..

    In the Name column, type in a new name
    respectively for the two rows. In this example, they are
    Email and Password, which are
    required by the website you are logging in.

    In the Value column, type in the
    authentication information.

  7. Select the Save cookie check box.

  8. In the Cookie file field, type in the
    full path to the file which you want to use to save the cookie. In this
    example, it is D:/download/cookie.

  9. Click Advanced settings to open its
    view.

  10. Select the Support redirection check box
    so that the redirection request will be repeated until the redirection is
    successful.

Configuring the second subjob

  1. Double-click tFileFetch_2 to open its
    Component view.

    Use_Case_tFileFetch4.png
  2. From the Protocol list, select http.

  3. In the URI field, type in the address
    from which you fetch the files of your interest. In this example, the
    address is
    http://www.codeproject.com/script/articles/download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader.

  4. In the Destination directory field, type
    in the directory or browse to the folder where you want to store the fetched
    files. This folder can be automatically created if it does not exist yet
    during the execution process. In this example, type in
    D:/download.

  5. In the Destination Filename field, type
    in a new name for the file if you want it to be changed. In this example,
    source.zip.

  6. Clear the POST method check box to
    deactivate the Parameters table.

  7. Select the Read cookie check box.

  8. In the Cookie file field, browse to the
    file which is used to save the cookie. In this example, it is D:/download/cookie.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to
    execute the Job.

    Then, go to the local directory D:/download to check the downloaded file.

    use_case_tfilefetch5.png

Related scenario

For an example of transferring data in streaming mode, see Scenario 2: Reading data from a remote file in streaming mode


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x