tFileFetch
tFileFetch Standard properties
These properties are used to configure tFileFetch running in the Standard Job
framework.
The Standard
tFileFetch component belongs to the Internet family.
The component in this framework is available in all Talend
products.
Basic settings
Protocol |
Select the protocol you want to use from the The properties differ slightly depending on the |
URI |
Type in the URI of the site from which the file |
Use cache to save |
Select this check box to save the data in the This option allows you to process the file data |
Domain |
Enter the Microsoft server domain name. Available for the smb protocol. |
Username and Password |
Enter the authentication information required To enter the password, click the […] button next to the Available for the smb protocol. |
Destination Directory |
Browse to the destination folder where the file Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors. |
Destination Filename |
Enter a new name for the file fetched. If the Upload file Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors. |
Create full path according to |
It allows you to reproduce the URI directory Available for the http, https and ftp protocols. |
Add header |
Select this check box if you want to add one or Available for the http and https protocols. |
POST method |
This check box is selected by default. It Clear the check box if you want to use the GET Available for the http and https protocols. |
Die on error |
Clear this check box to skip the rows in error Available for the http, https and ftp protocols. |
Read Cookie |
Select this check box for tFileFetch to load a web Available for the http, https, ftp and smb protocols. |
Save Cookie |
Select this check box to save the web page Available for the http, https, ftp and smb protocols. |
Cookie file |
Type in the full path to the file which you Available for the http, https, ftp and smb protocols. |
Cookie policy |
Choose a cookie policy from this drop-down Available for the http, https, ftp and smb protocols. |
Single cookie header |
Check this box to put all cookies into one Available for the http, https, ftp and smb protocols. |
Advanced settings
tStatCatcher Statistics |
Select this check box to collect the log data |
Timeout |
Enter the number of milliseconds after which Available for the http and https protocols. |
Print response to |
Select this check box to print the server Available for the http and https protocols. |
Upload file |
Select this check box to upload one or more
Thhis option is available for the http and https protocols, with the With this option selected, the upload response will be saved |
Enable proxy server |
Select this check box if you are connecting Available for the http, https and ftp protocols. |
Enable NTLM Credentials |
Select this check box if you are using an NTLM
Domain: The client
Host: The client’s IP Available for the http and https protocols. |
Need authentication |
Select this check box and enter the username Available for the http and https protocols. |
Support redirection |
Select this check box to repeat the Available for the http, https and ftp protocols. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the
INPUT_STREAM: the content of the file being fetched. This A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component is generally used as a start |
Limitation |
Due to license incompatibility, one or more JARs required to use |
Fetching data through HTTP
This scenario describes a three-component Job which retrieves a file from an HTTP
website, reads data from the fetched file and displays the data on the console.
Dropping and linking components
- Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.
-
Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or
On Component Ok connection. -
Link tFileInputDelimited to tLogRow using a Row > Main
connection.
Configuring the components
-
Double-click tFileFetch to open its
Basic settings view. - Select the protocol you want to use from the list. Here, http is selected.
-
In the URI field, type in the URI where
the file to be fetched can be retrieved from. You can paste the URI directly
in your browser to view the data in the file. -
In the Destination directory field,
browse to the folder where the fetched file is to be stored. In this
example, it is D:/Output. -
In the Destination filename field, type
in a new name for the file if you want it to be changed. In this example,
new.txt. -
If needed, select the Add header check
box and define one or more HTTP request headers as fetch conditions. For
example, to fetch the file only if it has been modified since 19:43:31 GMT,
October 29, 1994, fill in the Name and
Value fields with “If-Modified-Since”
and “Sat, 29 Oct 1994 19:43:31 GMT” respectively in the Headers table. For details about HTTP request
header definitions, see Header Field Definitions. -
Double-click tFileInputDelimited to open
its Basic settings view. -
In the File name field, type in the full
path to the fetched file which had been stored locally. -
Click the […] button next to Edit schema to open the Schema dialog box. In this example, add one column
output to store the data from the
fetched file. - Leave other settings as they are.
Saving and executing the Job
- Press Ctrl+S to save your Job.
-
Press F6 or click Run on the Run tab to
execute the Job.The data of the fetched file is displayed on the console.
Reusing stored cookie to fetch files through HTTP
This scenario describes a two-component Job which logs in a given HTTP website and
then using cookie stored in a user-defined local directory, fetches data from this
website.
Dropping and linking components
-
Drop two tFileFetch components onto your
design workspace. -
Link the two components as subJobs using a Trigger > On Subjob Ok
connection.
Configuring the components
Configuring the first subJob
-
Double click tFileFetch_1 to
open its component view. - Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
-
In the URI field, type in the
URI through which you can log in the website and fetch the web page accordingly. In
this example, the URI ishttps://www.codeproject.com/script/Membership/LogOn.aspx?download=true
. -
In the Destination directory
field, browse to the folder where the fetched web page is to be stored. This folder
will be created on the fly if it does not exist. In this example, type in D:/download. -
In the Destination Filename
field, type in a new name for the file if you want it to be changed. In this example,
codeproject.html. -
Under the Parameters table,
click the plus button to add two rows and fill in the credentials for accessing the
desired website..In the Name column, type in
a new name respectively for the two rows. In this example, they are Email and Password, which are required
by the website you are logging in.In the Value column, type
in the authentication information. -
Select the Save cookie check
box. -
In the Cookie file field,
type in the full path to the file which you want to use to save the cookie. In this
example, it is D:/download/cookie. -
Click Advanced settings to
open its view. -
Select the Support
redirection check box so that the redirection request will be repeated
until the redirection is successful.
Configuring the second subJob
-
Double-click tFileFetch_2 to open its
Component view. - From the Protocol list, select http.
-
In the URI field, type in the address
from which you fetch the files of your interest. In this example, the
address is
http://www.codeproject.com/script/articles/download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader
. -
In the Destination directory field, type
in the directory or browse to the folder where you want to store the fetched
files. This folder can be automatically created if it does not exist yet
during the execution process. In this example, type in
D:/download. -
In the Destination Filename field, type
in a new name for the file if you want it to be changed. In this example,
source.zip. -
Clear the POST method check box to
deactivate the Parameters table. - Select the Read cookie check box.
-
In the Cookie file field, browse to the
file which is used to save the cookie. In this example, it is D:/download/cookie.
Saving and executing the Job
- Press Ctrl+S to save your Job.
-
Press F6 or click Run on the Run tab to
execute the Job.Then, go to the local directory D:/download to check the downloaded file.
Related scenario
For an example of transferring data in streaming mode, see Reading data from a remote file in streaming mode