Warning
This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.
Component family |
Big Data/File |
|||
Function |
This component checks whether a file exists in a specific |
|||
Purpose |
This component checks the existence of a specific file in |
|||
Basic settings |
Property type |
Either Built-in or Repository
Built-in: No property data stored
Repository: Select the repository Since version 5.6, both the Built-In mode and the Repository mode are |
||
Use an existing connection |
Select this check box and in the Component List click the NoteWhen a Job contains the parent Job and the child Job, Component |
|||
Version |
Distribution |
Select the cluster you are using from the drop-down list. The options in the list vary
In order to connect to a custom distribution, once selecting Custom, click the button to display the dialog box in which you can
|
||
Hadoop version |
Select the version of the Hadoop distribution you are using. The available options vary
|
|||
Authentication |
Use kerberos authentication |
If you are accessing the Hadoop cluster running with Kerberos security, select this check This check box is available depending on the Hadoop distribution you are connecting |
||
Use a keytab to authenticate |
Select the Use a keytab to authenticate check box to log Note that the user that executes a keytab-enabled Job is not necessarily the one a |
|||
NameNode URI |
Type in the URI of the Hadoop NameNode. The NameNode is the master node of a Hadoop system. |
|||
|
User name |
Enter the user authentication name of HDFS. |
||
Group |
Enter the membership including the authentication user under which the HDFS instances were |
|||
HDFS directory |
Browse to, or enter the directory in HDFS where the data you need to use is. |
|||
|
File name or relative path |
Enter the name of the file you want to check whether this file |
||
Advanced settings |
Hadoop properties |
Talend Studio uses a default configuration for its engine to perform
For further information about the properties required by Hadoop and its related systems such
|
||
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a Job level as well as at each component level. |
|||
Dynamic settings |
Click the [+] button to add a row in the table and fill the The Dynamic settings table is available only when the For more information on Dynamic settings and context |
|||
Global Variables |
EXISTS: the result of whether a specified file exists. FILENAME: the name of the file processed. This is an ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|||
Usage |
tHDFSExist is a standalone |
|||
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
|||
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|||
Limitation |
JRE 1.6+ is required. |
In this scenario, the two-component Job checks whether a specific file exists in HDFS
and returns a message to indicate the result of the verification.
In the real-world practice, you can take further action to process the file checked
according to the verification result, using the other HDFS components provided with the
Studio.
Launch the Hadoop distribution in which you want to check the existence of a
particular file. Then, proceed as follows:
-
In the Integration perspective of
Talend Studio, create an empty Job, named hdfsexist_file for example, from the
Job Designs node in the Repository tree view.For further information about how to create a Job, see the Talend Studio User
Guide. -
Drop tHDFSExist and tMsgBox onto the workspace.
-
Connect them using the Trigger > Run if
link.
-
Double-click tHDFSExist to open its
Component view. -
In the Version area, select the Hadoop
distribution you are connecting to and its version. -
In the Connection area, enter the values
of the parameters required to connect to the HDFS.In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from the
current component. For further information, see tHDFSConnection. -
In the HDFS Directory field, browse to,
or enter the path to the folder where the file to be checked is. In this
example, browse to /user/ychen/data/hdfs/out/dest. -
In the File name or relative path field,
enter the name of the file you want to check the existence. For example,
output.csv.
-
Double-click tMsgBox to open its
Component view. -
In the Title field, enter the title to be
used for the pop-up message box to be created. -
In the Buttons list, select OK. This defines the button to be displayed on
the message box. -
In the Icon list, select Icon information.
-
In the Message field, enter the message
you want to displayed once the file checking is done. In this example, enter
“This file does not exist!”.
-
Click the If link to open the Basic settings view, where you are able to define
the condition for checking the existence of this file. -
In the Condition box, press Ctrl+Space to access the variable list and select
the global variable EXISTS. Type an
exclamation mark before the variable to negate the meaning of the
variable.
-
Press F6 to execute this Job.
Once done, a message box pops up to indicate that this file called output.csv does not exist in the directory you defined
earlier.
In the HDFS we check the existence of the file, browse to this directory
specified, you can see that this file does not exist.