tFileList
filemask pattern.
hidden file, zero-byte file, and so on, as long as the file meets the conditions set in the
Files field.
tFileList Standard properties
These properties are used to configure tFileList running in the Standard Job framework.
The Standard
tFileList component belongs to the File and the Orchestration families.
The component in this framework is available in all Talend
products.
Basic settings
Directory |
Path to the directory where the files are stored. Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors. |
FileList Type |
Select the type of input you want to iterate on from the
Files if the input is a set of
Directories if the input is a set
Both if the input is a set of the |
Include subdirectories |
Select this check box if the selected input source type includes |
Case Sensitive |
Set the case mode from the list to either create or not create |
Generate Error if no file found |
Select this check box to generate an error message if no files or |
Use Glob Expressions as Filemask |
This check box is selected by default. It filters the results |
Files |
Click the plus button to add as many filter lines as needed:
Filemask: in the added filter |
Order by |
The folders are listed first of all, then the files. You can
By default: alphabetical order, by
By file name: alphabetical order or
By file size: smallest to largest
By modified date: most recent to Note:
If ordering by file name, in |
Order action |
Select a sort order by clicking one of the ASC: ascending order;
DESC: descending |
Advanced settings
Use Exclude Filemask |
Select this check box to enable Exclude Filemask field to exclude
Exclude Filemask: Fill in the Note:
File types in this field should be quoted with double |
Format file path to slash(/) style(useful on |
Select this check box to format the file path to slash(/) style which is useful on Windows. |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a Job level as well as at each component level. |
Global Variables
Global Variables |
CURRENT_FILE: the current file name. This is a Flow
CURRENT_FILEPATH: the current file path. This is a Flow
CURRENT_FILEEXTENSION: the extension of the current file.
CURRENT_FILEDIRECTORY: the current file directory. This
NB_FILE: the number of files iterated upon so far. This is
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
tFileList provides a list of |
Connections |
Outgoing links (from this component to another): Row: Iterate
Trigger: On Subjob Ok; On Subjob Incoming links (from one component to this one): Row: Iterate.
Trigger: Run if; On Subjob Ok; On For further information regarding connections, see |
Iterating on a file directory
The following scenario creates a three-component Job, which aims at listing files from
a defined directory, reading each file by iteration, selecting delimited data and
displaying the output in the Run log console.
Dropping and linking the components
- Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited, and tLogRow.
-
Right-click the tFileList component, and
pull an Iterate connection to the tFileInputDelimited component. Then pull a
Main row from the tFileInputDelimited to the tLogRow component.
Configuring the components
-
Double-click tFileList to display its
Basic settings view and define its
properties. -
Browse to the Directory that holds the
files you want to process. To display the path on the Job itself, use the
label (__DIRECTORY__) that shows up when you put the pointer anywhere in the
Directory field. Type in this label in
the Label Format field you can find if you
click the View tab in the Basic settings view. -
In the Basic settings view and from the
FileList Type list, select the source
type you want to process, Files in this
example. -
In the Case sensitive list, select a case
mode, Yes in this example to create case
sensitive filter on file names. -
Keep the Use Glob Expressions as Filemask
check box selected if you want to use global expressions to filter files,
and define a file mask in the Filemask
field. -
Double-click tFileInputDelimited to
display its Basic settings view and set its
properties. -
Enter the File Name field using a
variable containing the current filename path, as you filled in the
Basic settings of tFileList. Press Ctrl+Space
bar to access the autocomplete list of variables, and select
the global variable
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))
.
This way, all files in the input directory can be processed. - Fill in all other fields as detailed in the tFileInputDelimited section. Related topic: tFileInputDelimited.
-
Select the last component, tLogRow, to
display its Basic settings view and fill in
the separator to be used to distinguish field content displayed on the
console. Related topic: tLogRow.
Executing the Job
Press Ctrl + S to save your Job, and press
F6 to run it.
The Job iterates on the defined directory, and reads all included files. Then
delimited data is passed on to the last component which displays it on the
console.
Finding duplicate files between two folders
This scenario describes a Job that iterates on files in two folders, transforms the
iteration results to data flows to obtain a list of filenames, and then picks up all
duplicates from the list and shows them on the Run
console, as a preparation step before merging the two folders, for example.
Dropping and linking the components
- From the Palette, drop two tFileList components, two tIterateToFlow components, two tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and a tLogRow component onto the design workspace.
-
Link the first tFileList component to the
first tIterateToFlow component using a
Row > Iterate connection, and the connect the first tIterateToFlow component to the first tFileOutputDelimited component using a Row > Main
connection to form the first subJob. -
Link the second tFileList component to
the second tIterateToFlow component using a
Row > Iterate connection, and the connect the second tIterateToFlow component to the second tFileOutputDelimited component using a Row > Main
connection to form the second subJob. -
Link the tFileInputDelimited to the
tUniqRow component using a Row > Main
connection, and the tUniqRow component to
the tLogRow component using a Row > Duplicates
connection to form the third subJob. -
Link the three subJobs using Trigger >
On Subjob Ok connections so that they
will be triggered one after another, and label the components to better
identify their roles in the Job.
Configuring the components
-
In the Basic settings view of the first
tFileList component, fill the Directory field with the path to the first folder
you want to read filenames from, E:/DataFiles/DI/images in this example, and leave the other
settings as they are. -
Double-click the first tIterateToFlow
component to show its Basic settings
view. -
Double-click the […] button next to
Edit schema to open the Schema dialog box and define the schema of the
text file the next component will write filenames to. When done, click
OK to close the dialog box and
propagate the schema to the next component.In this example, the schema contains only one column: Filename. -
In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables, and
select the global variable
((String)globalMap.get("tFileList_1_CURRENT_FILE"))
to read
the name of each file in the input directory, which will be put into a data
flow to pass to the next component. -
In the Basic settings view of the first
tFileOutputDelimited component, fill
the File Name field with the path of the
text file that will store the filenames from the incoming flow, D:/temp/tempdata.csv in this example. This
completes the configuration of the first subJob. -
Repeat the steps above to complete the configuration of the second subJob,
but:-
fill the Directory field in the
Basic settings view of the
second tFileList component with the
other folder you want to read filenames from, E:/DataFiles/DQ/images in this
example. -
select the Append check box in
the Basic settings view of the
second tFileOutputDelimited
component so that the filenames previously written to the text file
will not be overwritten.
-
-
In the Basic settings view of the
tFileInputDelimited component, fill the
File name/Stream field with the path of
the text file that stores the list of filenames, D:/temp/tempdata.csv in this example, and define the file
schema, which contains only one column in this example, Filename. -
In the Basic settings view of the
tUniqRow component, select the
Key attribute check box for the only
column, Filename in this example. -
In the Basic settings view of the
tLogRow component, select the Table (print values in cells of a table) option
for better display effect.
Executing the Job
- Press Ctrl+S to save your Job.
-
Click Run or press F6 to run the Job.
All the duplicate files between the selected folders are displayed on the
console.
For other scenarios using tFileList, see tFileCopy.