tMongoDBGridFSPut
Connects to a MongoDB GridFS system to load files into it.
tMongoDBGridFSPut copies files from a
local directory, pastes them into a given MongoDB GridFS system and if needs be, renames
these files.
tMongoDBGridFSPut Standard properties
These properties are used to configure tMongoDBGridFSPut running in the Standard Job framework.
The Standard
tMongoDBGridFSPut component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
Use an existing connection |
Select this check box and in the Component List click the relevant connection component to Note that when a Job contains the parent Job and the child Job, |
Use replica set address or multiple query routers |
Select this check box to show the Server In the Server addresses table, define the sharded |
Server and Port |
IP address and listening port of the database server. Available when the Use replica set Note that if you use the authentication mechanisms to connect to this MongoDB |
Database |
Name of the database. |
Use SSL connection |
Select this check box to enable the SSL or TLS encrypted connection. Then you need to use the tSetKeystore Note that the SSL connection is available only for the version 2.4 + of |
Set read preference |
Select this check box and from the Read preference If you leave this check box clear, the Job uses the default Read preference, that is to For further information, see MongoDB’s documentation about Replication and its Read |
Required authentication |
Select this check box to enable the database authentication. Among the mechanisms listed on the Authentication mechanism For details about the other mechanisms in this list, see MongoDB Authentication from the MongoDB |
Set Authentication database |
If the username to be used to connect to MongoDB has been created in a specific For further information about the MongoDB Authentication database, see User Authentication database. |
Username and Password |
DB user authentication data. To enter the password, click the […] button next to the Available when the Required If the security system you have selected from the Authentication mechanism drop-down list is Kerberos, you need to |
Bucket |
Enter the name of the bucket you need to write files in. A bucket of |
Local Folder |
Browse to, or enter the path to the folder in which the files to be copied and written |
Use Perl5 Regex Expression as |
Select this check box if you want to use Perl5 regular expressions in the Files field as file For information about Perl5 regular expression syntax, see Perl5 Regular Expression Syntax. |
Files |
In the Files area, the fields to – File mask: type in the file – New name: give a new name to |
Advanced settings
tStatCatcher Statistics |
Select this check box to collect the log data at the component |
No query timeout |
Select this check box to prevent MongoDB servers from stopping idle A cursor for MongoDB is a pointer to the result set of a query. By |
Global Variables
Global Variables |
NB_FILE: the number of files processed. This is an After
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
This component combines MongoDB GridFS connection and data extraction, thus usually It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on the context. |
Managing files using MongoDB GridFS
This scenario applies only to Talend products with Big Data.
In this scenario, the MongoDB GridFS components are used to create a Job to manage video
files in MongoDB GridFS.
For further information about the GridFS system of MongoDB, see When to use GridFS.
For demonstration purposes, only one video file, called custom_hadoop.mp4, is used; you can use any of your own video files to replicate
this scenario.
Linking the components
-
In the
Integration
perspective of the Studio, create an empty Job,
named FS_video for example, from the Job Designs node in the Repository tree view.For further information about how to create a Job, see
Talend Studio User Guide. -
In the workspace, enter the name of the component to be used and select this
component from the list that appears. In this scenario, the components are
tMongoDBConnection, tMongoDBGridFSPut, tMongoDBGridFSList, tMongoDBGridFSProperties, tFilterColumns, tLogRow, tMongoDBGridFSGet and tMongoDBGridFSDelete. -
Connect tMongoDBConnection to tMongoDBGridFSPut using the Trigger
> On Subjob Ok link. - Repeat this operation to connect tMongoDBGridFSPut to tMongoDBGridFSList, tMongoDBGridFSList to tMongoDBGridFSGet, and then tMongoDBGridFSGet to tMongoDBGridFSDelete.
- Connect tMongoDBGridFSList to tMongoDBGridFSProperties using the Row > Iterate link. This link allows tMongoDBGridFSList to send data to tMongoDBGridFSProperties iteratively.
-
Connect tMongoDBGridFSProperties to tFilterColumns using the Row >
Main link. -
Do the same to connect tFilterColumns to
tLogRow.
Connecting to MongoDB
-
Double-click tMongoDBConnection to open its
Component view. -
From the DB version list, select the MongoDB
version you are using. -
In the Server and Port
fields, enter the authentication information required for the connection to
MongoDB.If you use the host name of the MongoDB server, ensure that you have added the
mapping between this host name and its IP address in the hosts file of the operating system in which the current Job is
executed. -
In the Database field, enter the name of the database
hosting GridFS. This database is created on the fly if it dose not exist.
Copying data to MongoDB GridFS
-
Double-click tMongoDBGridFSPut to open its
Component view. -
Select the Use existing connection check box
and from the Connection list, select the
component in which the MongoDB connection to be used is defined. -
In the Bucket field, enter the bucket to be
used to store files in GridFS. In this example, it is talend_channel/61. -
In the Local folder field, enter the path, or
browse to the folder where the files to be uploaded to GridFS are stored. As
explained previously, it is a video file called custom_hadoop.mp4. -
In the Files table, add one row by clicking
the [+] button and in the Filemask column, enter *.mp4
within the double quotation marks. This allows tMongoDBGridFSPut to copy all the files with the .mp4 extension from the local folder you have specified
to the bucket to be used in GridFS. -
Leave the New name column empty, that is to
say, leave the double quotation marks in this column as is, so as to keep the
name of this video unchanged after being copied to GridFS.
Listing files stored in MongoDB GridFS
Iterating on the files
-
Double-click tMongoDBGridFSList to open its
Component view. -
Select the Use existing connection check box
and from the Connection list, select the
component in which the MongoDB connection to be used is defined. -
In the Bucket field, enter the bucket in
which the files to be listed are stored. In this example, it is talend_channel/61. -
In the Query field, enter the query to select
the files you want tMongoDBGridFSList to
iterate on to generate different file lists. In this example, leave the default
one in order to iterate on all of the files stored in this talend_channel/61 bucket.As explained previously, only one file, custom_hadoop.mp4, is expected.
Extracting file metadata
-
Double-click tMongoDBGridFSProperties to open its
Component view. -
Select the Use existing connection check box
and from the Connection list, select the
component in which the MongoDB connection to be used is defined. -
In the Bucket field, enter the bucket in
which the files to be used are stored. In this example, it is talend_channel/61. -
From the Query type list, select the approach
you want to use to select the files about which you need to extract the
metadata. In this example, select Filename to
use the filename attribute of each GridFS
file for query. -
In the Filename field, press Ctrl + Space to display the variable list and choose the
variable to be used. In this example, select tMongoDBGridFSList.CURRENT_FILENAME from the list. Then the
expression to use the CURRENT_FILENAME
variable is automatically added.This allows tMongoDBGridFSProperties to read
each file name returned by tMongoDBGridFSList.
Filtering attributes
-
Double-click tFilterColumns to open its
Component view. - Click the […] button next to Edit schema to open the schema editor.
-
On the left side (the input side), select the column to be used and click the
button to move this column to the right side (the output
side). In this example, move every column to the right side except the contentType column.Each column represents a file attribute and the pre-defined schema of tMongoDBGridFSProperties automatically contains these
columns. -
Click OK to validate these changes and accept the
propagation prompted by the pop-up dialog box.
Downloading files from MongoDB GridFS
-
Double-click tMongoDBGridFSGet to open its
Component view. -
Select the Use existing connection check box
and from the Connection list, select the
component in which the MongoDB connection to be used is defined. -
In the Bucket field, enter the bucket in
which the files to be retrieved are stored. In this example, it is talend_channel/61. -
In the Local folder field, enter the path to
the local folder in which you want to store the downloaded files. In this
scenario, it is C:/tmp/output. -
Select the Use Document ID as output filename
check box to rename each downloaded file using the value of its ObjectID attribute.Since a file in GridFS is distinct by ID rather than by name, it is possible
that several files are using the same file name. For this reason, when
downloading this kind of files into the same directory without renaming them
differently, an exception is returned to alert that the file being downloaded
already exists. In order to avoid this error, you can either select the
Overwrite local files check box to replace the
existing one with the latest downloaded file or rename these files on the fly
using their IDs. In this example, the strategy of renaming these files is
adopted.
Remove files from MongoDB GridFS
-
Double-click tMongoDBGridFSDelete to open its
Component view. -
Select the Use existing connection check box
and from the Connection list, select the
component in which the MongoDB connection to be used is defined. -
In the Bucket field, enter the bucket in
which the files to be deleted are stored. In this example, it is talend_channel/61. -
From the Query type list, select the approach
you want to use to select the files to be deleted. In this example, select
Filename to use the filename attribute of each GridFS file for query. -
In the Filename field, enter the name of the
file to be deleted.
Executing the Job
Then you can run this Job.
The tLogRow component is used to present the execution
result of the
Job.
-
If you want to configure the presentation mode on its Component view, double-click the tLogRow component to open the Component view and in the Mode
area, then, select the Table (print values in cells of a
table) radio box. -
Press
F6
to run this
Job.
Once done, the Run view is opened automatically, where
the metadata of the video custom_hadoop.mp4 in GridFS
is displayed.
The downloaded file can be found in the directory C:/tmp/output, using its ID as its file name.