Component family |
Internet |
|
Function |
tFileFetch retrieves a file via a |
|
Purpose |
tFileFetch allows you to retrieve |
|
Basic settings |
Protocol |
Select the protocol you want to use from the list and fill in the The properties differ slightly depending on the type of protocol |
|
URI |
Type in the URI of the site from which the file is to be |
|
Use cache to save resource |
Select this check box to save the data in the cache. This option allows you to process the file data flow (in streaming |
Domain |
Enter the Microsoft server domain name. Available for the smb |
|
Username and Password |
Enter the authentication information required to access the To enter the password, click the […] button next to the Available for the smb |
|
|
Destination Directory |
Browse to the destination folder where the file fetched is to be |
|
Destination Filename |
Enter a new name for the file fetched. |
Create full path according to URI |
It allows you to reproduce the URI directory path. To save the Available for the http, https and ftp protocols. |
|
Add header |
Select this check box if you want to add one or more HTTP request Available for the http and |
|
POST method |
This check box is selected by default. It allows you to use the Clear the check box if you want to use the GET method. Available for the http and |
|
Die on error |
Clear this check box to skip the rows in error and to complete the Available for the http, https and ftp protocols. |
|
Read Cookie |
Select this check box for tFileFetch to load a web authentication Available for the http, https, ftp and smb |
|
Save Cookie |
Select this check box to save the web page authentication cookie. Available for the http, https, ftp and smb |
|
Cookie file |
Type in the full path to the file which you want to use to save Available for the http, https, ftp and smb |
|
Cookie policy |
Choose a cookie policy from this drop-down list. Four options are available, BROWSER_COMPATIBILITY, DEFAULT, NETSCAPE and RFC_2109. Available for the http, https, ftp and smb |
|
Single cookie header |
Check this box to put all cookies into one request header for Available for the http, https, ftp and smb |
|
Advanced settings |
tStatCatcher Statistics |
Select this check box to collect the log data at each component |
Timeout |
Enter the number of milliseconds after which the protocol Available for the http and |
|
Print response to console |
Select this check box to print the server response in the Available for the http and |
|
Upload file |
Select this check box to upload one or more files to the server.
Available for the http and |
|
Enable proxy server |
Select this check box if you are connecting via a proxy and Available for the http, https and ftp protocols. |
|
Enable NTLM Credentials |
Select this check box if you are using an NTLM authentication
Domain: The client domain
Host: The client’s IP Available for the http and |
|
Need authentication |
Select this check box and enter the username and password in the Available for the http and |
|
Support redirection |
Select this check box to repeat the redirection request until Available for the http, https and ftp protocols. |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the INPUT_STREAM: the content of the file being fetched. This A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is generally used as a start component to feed the |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
This scenario describes a three-component Job which retrieves a file from an HTTP
website, reads data from the fetched file and displays the data on the console.
-
Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.
-
Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or
On Component Ok connection. -
Link tFileInputDelimited to tLogRow using a Row > Main
connection.
-
Double-click tFileFetch to open its
Basic settings view. -
Select the protocol you want to use from the list. Here, http is selected.
-
In the URI field, type in the URI where
the file to be fetched can be retrieved from. You can paste the URI directly
in your browser to view the data in the file. -
In the Destination directory field,
browse to the folder where the fetched file is to be stored. In this
example, it is D:/Output. -
In the Destination filename field, type
in a new name for the file if you want it to be changed. In this example,
new.txt. -
If needed, select the Add header check
box and define one or more HTTP request headers as fetch conditions. For
example, to fetch the file only if it has been modified since 19:43:31 GMT,
October 29, 1994, fill in the Name and
Value fields with “If-Modified-Since”
and “Sat, 29 Oct 1994 19:43:31 GMT” respectively in the Headers table. For details about HTTP request
header definitions, see Header Field Definitions. -
Double-click tFileInputDelimited to open
its Basic settings view. -
In the File name field, type in the full
path to the fetched file which had been stored locally. -
Click the […] button next to Edit schema to open the [Schema] dialog box. In this example, add one column
output to store the data from the
fetched file. -
Leave other settings as they are.
This scenario describes a two-component Job which logs in a given HTTP website and
then using cookie stored in a user-defined local directory, fetches data from this
website.
-
Drop two tFileFetch components onto your
design workspace. -
Link the two components as subjobs using a Trigger > On Subjob Ok
connection.
Configuring the first subjob
-
Double click tFileFetch_1 to open its
component view. -
Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
-
In the URI field, type in the URI through
which you can log in the website and fetch the web page accordingly. In this
example, the URI is
https://www.codeproject.com/script/Membership/LogOn.aspx?download=true
. -
In the Destination directory field,
browse to the folder where the fetched web page is to be stored. This folder
will be created on the fly if it does not exist. In this example, type in
D:/download. -
In the Destination Filename field, type
in a new name for the file if you want it to be changed. In this example,
codeproject.html. -
Under the Parameters table, click the
plus button to add two rows and fill in the credentials for accessing the
desired website..In the Name column, type in a new name
respectively for the two rows. In this example, they are
Email and Password, which are
required by the website you are logging in.In the Value column, type in the
authentication information. -
Select the Save cookie check box.
-
In the Cookie file field, type in the
full path to the file which you want to use to save the cookie. In this
example, it is D:/download/cookie. -
Click Advanced settings to open its
view. -
Select the Support redirection check box
so that the redirection request will be repeated until the redirection is
successful.
Configuring the second subjob
-
Double-click tFileFetch_2 to open its
Component view. -
From the Protocol list, select http.
-
In the URI field, type in the address
from which you fetch the files of your interest. In this example, the
address is
http://www.codeproject.com/script/articles/download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader
. -
In the Destination directory field, type
in the directory or browse to the folder where you want to store the fetched
files. This folder can be automatically created if it does not exist yet
during the execution process. In this example, type in
D:/download. -
In the Destination Filename field, type
in a new name for the file if you want it to be changed. In this example,
source.zip. -
Clear the POST method check box to
deactivate the Parameters table. -
Select the Read cookie check box.
-
In the Cookie file field, browse to the
file which is used to save the cookie. In this example, it is D:/download/cookie.
For an example of transferring data in streaming mode, see Scenario 2: Reading data from a remote file in streaming mode