July 30, 2023

tFileList – Docs for ESB 7.x

tFileList

Iterates a set of files or folders in a given directory based on a
filemask pattern.

Note: This component iterates over every file in a directory, including system file,
hidden file, zero-byte file, and so on, as long as the file meets the conditions set in the
Files field.

tFileList Standard properties

These properties are used to configure tFileList running in the Standard Job framework.

The Standard
tFileList component belongs to the File and the Orchestration families.

The component in this framework is available in all Talend
products
.

Basic settings

Directory

Path to the directory where the files are stored.

Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors.

FileList Type

Select the type of input you want to iterate on from the
list:

Files if the input is a set of
files,

Directories if the input is a set
of directories,

Both if the input is a set of the
above two types.

Include subdirectories

Select this check box if the selected input source type includes
sub-directories.

Case Sensitive

Set the case mode from the list to either create or not create
case sensitive filter on filenames.

Generate Error if no file found

Select this check box to generate an error message if no files or
directories are found.

Use Glob Expressions as Filemask

This check box is selected by default. It filters the results
using a Global Expression (Glob
Expressions
).

Files

Click the plus button to add as many filter lines as needed:

Filemask: in the added filter
lines, type in a filename or a filemask using special characters or
regular expressions.

Order by

The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:

By default: alphabetical order, by
folder then file;

By file name: alphabetical order or
reverese alphabetical order;

By file size: smallest to largest
or largest to smallest;

By modified date: most recent to
least recent or least recent to most recent.

Note:

If ordering by file name, in
the event of identical file names then modified date takes precedence. If ordering
by file size, in the event
of identical file sizes then file
name
takes precedence. If ordering by modified date, in the event of
identical dates then file name
takes precedence.

Order action

Select a sort order by clicking one of the
following radio buttons:

ASC: ascending order;

DESC: descending
order;

Advanced settings

Use Exclude Filemask

Select this check box to enable Exclude Filemask field to exclude
filtering condition based on file type:

Exclude Filemask: Fill in the
field with file types to be excluded from the Filemasks in the Basic
settings
view.

Note:

File types in this field should be quoted with double
quotation marks and seperated by comma.

Format file path to slash(/) style(useful on
Windows)

Select this check box to format the file path to
slash(/) style which is useful on Windows.

tStatCatcher Statistics

Select this check box to gather the Job processing
metadata at a Job level as well as at each component level.

Global Variables

Global Variables

CURRENT_FILE: the current file name. This is a Flow
variable and it returns a string.

CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.

CURRENT_FILEEXTENSION: the extension of the current file.
This is a Flow variable and it returns a string.

CURRENT_FILEDIRECTORY: the current file directory. This
is a Flow variable and it returns a string.

NB_FILE: the number of files iterated upon so far. This is
a Flow variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

tFileList provides a list of
files or folders from a defined directory on which it iterates

Connections

Outgoing links (from this component to another):

Row: Iterate

Trigger: On Subjob Ok; On Subjob
Error; Run if; On Component Ok; On Component Error.

Incoming links (from one component to this one):

Row: Iterate.

Trigger: Run if; On Subjob Ok; On
Subjob Error; On component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see

Talend Studio

User Guide.

Iterating on a file directory

The following scenario creates a three-component Job, which aims at listing files from
a defined directory, reading each file by iteration, selecting delimited data and
displaying the output in the Run log console.

tFileList_1.png

Dropping and linking the components

  1. Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited, and tLogRow.
  2. Right-click the tFileList component, and
    pull an Iterate connection to the tFileInputDelimited component. Then pull a
    Main row from the tFileInputDelimited to the tLogRow component.

Configuring the components

  1. Double-click tFileList to display its
    Basic settings view and define its
    properties.

    tFileList_2.png

  2. Browse to the Directory that holds the
    files you want to process. To display the path on the Job itself, use the
    label (__DIRECTORY__) that shows up when you put the pointer anywhere in the
    Directory field. Type in this label in
    the Label Format field you can find if you
    click the View tab in the Basic settings view.

    tFileList_3.png

  3. In the Basic settings view and from the
    FileList Type list, select the source
    type you want to process, Files in this
    example.
  4. In the Case sensitive list, select a case
    mode, Yes in this example to create case
    sensitive filter on file names.
  5. Keep the Use Glob Expressions as Filemask
    check box selected if you want to use global expressions to filter files,
    and define a file mask in the Filemask
    field.
  6. Double-click tFileInputDelimited to
    display its Basic settings view and set its
    properties.

    tFileList_4.png

  7. Enter the File Name field using a
    variable containing the current filename path, as you filled in the
    Basic settings of tFileList. Press Ctrl+Space
    bar
    to access the autocomplete list of variables, and select
    the global variable
    ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) .
    This way, all files in the input directory can be processed.
  8. Fill in all other fields as detailed in the tFileInputDelimited section. Related topic: tFileInputDelimited.
  9. Select the last component, tLogRow, to
    display its Basic settings view and fill in
    the separator to be used to distinguish field content displayed on the
    console. Related topic: tLogRow.

Executing the Job

Press Ctrl + S to save your Job, and press
F6 to run it.

tFileList_5.png

The Job iterates on the defined directory, and reads all included files. Then
delimited data is passed on to the last component which displays it on the
console.

Finding duplicate files between two folders

This scenario describes a Job that iterates on files in two folders, transforms the
iteration results to data flows to obtain a list of filenames, and then picks up all
duplicates from the list and shows them on the Run
console, as a preparation step before merging the two folders, for example.

tFileList_6.png

Dropping and linking the components

  1. From the Palette, drop two tFileList components, two tIterateToFlow components, two tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and a tLogRow component onto the design workspace.
  2. Link the first tFileList component to the
    first tIterateToFlow component using a
    Row > Iterate connection, and the connect the first tIterateToFlow component to the first tFileOutputDelimited component using a Row > Main
    connection to form the first subJob.
  3. Link the second tFileList component to
    the second tIterateToFlow component using a
    Row > Iterate connection, and the connect the second tIterateToFlow component to the second tFileOutputDelimited component using a Row > Main
    connection to form the second subJob.
  4. Link the tFileInputDelimited to the
    tUniqRow component using a Row > Main
    connection, and the tUniqRow component to
    the tLogRow component using a Row > Duplicates
    connection to form the third subJob.
  5. Link the three subJobs using Trigger >
    On Subjob Ok connections so that they
    will be triggered one after another, and label the components to better
    identify their roles in the Job.

Configuring the components

  1. In the Basic settings view of the first
    tFileList component, fill the Directory field with the path to the first folder
    you want to read filenames from, E:/DataFiles/DI/images in this example, and leave the other
    settings as they are.

    tFileList_7.png

  2. Double-click the first tIterateToFlow
    component to show its Basic settings
    view.

    tFileList_8.png

  3. Double-click the […] button next to
    Edit schema to open the Schema dialog box and define the schema of the
    text file the next component will write filenames to. When done, click
    OK to close the dialog box and
    propagate the schema to the next component.

    In this example, the schema contains only one column: Filename.
    tFileList_9.png

  4. In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables, and
    select the global variable
    ((String)globalMap.get("tFileList_1_CURRENT_FILE")) to read
    the name of each file in the input directory, which will be put into a data
    flow to pass to the next component.
  5. In the Basic settings view of the first
    tFileOutputDelimited component, fill
    the File Name field with the path of the
    text file that will store the filenames from the incoming flow, D:/temp/tempdata.csv in this example. This
    completes the configuration of the first subJob.

    tFileList_10.png

  6. Repeat the steps above to complete the configuration of the second subJob,
    but:

    • fill the Directory field in the
      Basic settings view of the
      second tFileList component with the
      other folder you want to read filenames from, E:/DataFiles/DQ/images in this
      example.

    • select the Append check box in
      the Basic settings view of the
      second tFileOutputDelimited
      component so that the filenames previously written to the text file
      will not be overwritten.

  7. In the Basic settings view of the
    tFileInputDelimited component, fill the
    File name/Stream field with the path of
    the text file that stores the list of filenames, D:/temp/tempdata.csv in this example, and define the file
    schema, which contains only one column in this example, Filename.

    tFileList_11.png

  8. In the Basic settings view of the
    tUniqRow component, select the
    Key attribute check box for the only
    column, Filename in this example.

    tFileList_12.png

  9. In the Basic settings view of the
    tLogRow component, select the Table (print values in cells of a table) option
    for better display effect.

Executing the Job

  1. Press Ctrl+S to save your Job.
  2. Click Run or press F6 to run the Job.

    All the duplicate files between the selected folders are displayed on the
    console.
    tFileList_13.png

For other scenarios using tFileList, see tFileCopy.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x