July 30, 2023

tMongoDBGridFSPut – Docs for ESB 7.x

tMongoDBGridFSPut

Connects to a MongoDB GridFS system to load files into it.

tMongoDBGridFSPut copies files from a
local directory, pastes them into a given MongoDB GridFS system and if needs be, renames
these files.

tMongoDBGridFSPut Standard properties

These properties are used to configure tMongoDBGridFSPut running in the Standard Job framework.

The Standard
tMongoDBGridFSPut component belongs to the Big Data and the Databases NoSQL families.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Property type

Either Built-In or Repository.

Built-In: No property data stored centrally.

Repository: Select the repository file where the
properties are stored.

Use an existing connection

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

Note that when a Job contains the parent Job and the child Job,
Component List presents only the
connection components in the same Job level.

Use replica set address or multiple query routers

Select this check box to show the Server
addresses
table.

In the Server addresses table, define the sharded
MongoDB databases or the MongoDB replica sets you want to connect
to.

Server and Port

IP address and listening port of the database server.

Available when the Use replica set
address
check box is not selected.

Note that if you use the authentication mechanisms to connect to this MongoDB
database, you must enter the name, rather than the IP address, of the host of the database
server.

Database

Name of the database.

Use SSL connection

Select this check box to enable the SSL or TLS encrypted connection.

Then you need to use the tSetKeystore
component in the same Job to specify the encryption information.

Note that the SSL connection is available only for the version 2.4 + of
MongoDB.

Set read preference

Select this check box and from the Read preference
drop-down list that is displayed, select the member to which you need to direct the read
operations.

If you leave this check box clear, the Job uses the default Read preference, that is to
say, uses the primary member in a replica set.

For further information, see MongoDB’s documentation about Replication and its Read
preferences.

Required authentication

Select this check box to enable the database authentication.

Among the mechanisms listed on the Authentication mechanism
drop-down list, the NEGOTIATE one is recommended if
you are not using Kerberos, because it automatically select the authentication mechanism
the most adapted to the MongoDB version you are using.

For details about the other mechanisms in this list, see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database

If the username to be used to connect to MongoDB has been created in a specific
Authentication database of MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.

For further information about the MongoDB Authentication database, see User Authentication database.

Username and Password

DB user authentication data.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Available when the Required
authentication
check box is selected.

If the security system you have selected from the Authentication mechanism drop-down list is Kerberos, you need to
enter the User principal, the Realm and the KDC
server
fields instead of the Username and the Password
fields.

Bucket

Enter the name of the bucket you need to write files in. A bucket of
GridFS is similar to a folder.

Local Folder

Browse to, or enter the path to the folder in which the files to be copied and written
to GridFS are stored.

Use Perl5 Regex Expression as
Filemask

Select this check box if you want to use Perl5 regular expressions in the Files field as file
filters. This is useful when the name of the file to be used contains special characters
such as parentheses.

For information about Perl5 regular expression syntax, see Perl5 Regular Expression Syntax.

Files

In the Files area, the fields to
be completed are:

File mask: type in the file
name to be selected from the local directory. Regular expression is
available.

New name: give a new name to
the loaded file.

Advanced settings

tStatCatcher Statistics

Select this check box to collect the log data at the component
level.

No query timeout

Select this check box to prevent MongoDB servers from stopping idle
cursors at the end of 10-minute inactivity of these cursors. In this
situation, an idle cursor will stay open until either the results of
this cursor are exhausted or you manually close it using the
cursor.close() method.

A cursor for MongoDB is a pointer to the result set of a query. By
default, that is to say, with this check box being clear, a MongoDB
server automatically stops idle cursors after a given inactivity period
to avoid excess memory use. For further information about MongoDB
cursors, see https://docs.mongodb.org/manual/core/cursors/.

Global Variables

Global Variables

NB_FILE: the number of files processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component combines MongoDB GridFS connection and data extraction, thus usually
used as a single-component subJob to copy data from a user-defined local
directory to GridFS.

It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on the context.

Managing files using MongoDB GridFS

This scenario applies only to Talend products with Big Data.

In this scenario, the MongoDB GridFS components are used to create a Job to manage video
files in MongoDB GridFS.

For further information about the GridFS system of MongoDB, see When to use GridFS.

For demonstration purposes, only one video file, called custom_hadoop.mp4, is used; you can use any of your own video files to replicate
this scenario.

tMongoDBGridFSPut_1.png

Linking the components

  1. In the
    Integration
    perspective of the Studio, create an empty Job,
    named FS_video for example, from the Job Designs node in the Repository tree view.

    For further information about how to create a Job, see
    Talend Studio User Guide
    .
  2. In the workspace, enter the name of the component to be used and select this
    component from the list that appears. In this scenario, the components are
    tMongoDBConnection, tMongoDBGridFSPut, tMongoDBGridFSList, tMongoDBGridFSProperties, tFilterColumns, tLogRow, tMongoDBGridFSGet and tMongoDBGridFSDelete.
  3. Connect tMongoDBConnection to tMongoDBGridFSPut using the Trigger
    > On Subjob Ok
    link.
  4. Repeat this operation to connect tMongoDBGridFSPut to tMongoDBGridFSList, tMongoDBGridFSList to tMongoDBGridFSGet, and then tMongoDBGridFSGet to tMongoDBGridFSDelete.
  5. Connect tMongoDBGridFSList to tMongoDBGridFSProperties using the Row > Iterate link. This link allows tMongoDBGridFSList to send data to tMongoDBGridFSProperties iteratively.
  6. Connect tMongoDBGridFSProperties to tFilterColumns using the Row >
    Main
    link.
  7. Do the same to connect tFilterColumns to
    tLogRow.

Connecting to MongoDB

  1. Double-click tMongoDBConnection to open its
    Component view.

    tMongoDBGridFSPut_2.png

  2. From the DB version list, select the MongoDB
    version you are using.
  3. In the Server and Port
    fields, enter the authentication information required for the connection to
    MongoDB.

    If you use the host name of the MongoDB server, ensure that you have added the
    mapping between this host name and its IP address in the hosts file of the operating system in which the current Job is
    executed.
  4. In the Database field, enter the name of the database
    hosting GridFS. This database is created on the fly if it dose not exist.

Copying data to MongoDB GridFS

  1. Double-click tMongoDBGridFSPut to open its
    Component view.

    tMongoDBGridFSPut_3.png

  2. Select the Use existing connection check box
    and from the Connection list, select the
    component in which the MongoDB connection to be used is defined.
  3. In the Bucket field, enter the bucket to be
    used to store files in GridFS. In this example, it is talend_channel/61.
  4. In the Local folder field, enter the path, or
    browse to the folder where the files to be uploaded to GridFS are stored. As
    explained previously, it is a video file called custom_hadoop.mp4.
  5. In the Files table, add one row by clicking
    the [+] button and in the Filemask column, enter *.mp4
    within the double quotation marks. This allows tMongoDBGridFSPut to copy all the files with the .mp4 extension from the local folder you have specified
    to the bucket to be used in GridFS.
  6. Leave the New name column empty, that is to
    say, leave the double quotation marks in this column as is, so as to keep the
    name of this video unchanged after being copied to GridFS.

Listing files stored in MongoDB GridFS

Iterating on the files

  1. Double-click tMongoDBGridFSList to open its
    Component view.

    tMongoDBGridFSPut_4.png

  2. Select the Use existing connection check box
    and from the Connection list, select the
    component in which the MongoDB connection to be used is defined.
  3. In the Bucket field, enter the bucket in
    which the files to be listed are stored. In this example, it is talend_channel/61.
  4. In the Query field, enter the query to select
    the files you want tMongoDBGridFSList to
    iterate on to generate different file lists. In this example, leave the default
    one in order to iterate on all of the files stored in this talend_channel/61 bucket.

    As explained previously, only one file, custom_hadoop.mp4, is expected.

Extracting file metadata

  1. Double-click tMongoDBGridFSProperties to open its
    Component view.

    tMongoDBGridFSPut_5.png

  2. Select the Use existing connection check box
    and from the Connection list, select the
    component in which the MongoDB connection to be used is defined.
  3. In the Bucket field, enter the bucket in
    which the files to be used are stored. In this example, it is talend_channel/61.
  4. From the Query type list, select the approach
    you want to use to select the files about which you need to extract the
    metadata. In this example, select Filename to
    use the filename attribute of each GridFS
    file for query.
  5. In the Filename field, press Ctrl + Space to display the variable list and choose the
    variable to be used. In this example, select tMongoDBGridFSList.CURRENT_FILENAME from the list. Then the
    expression to use the CURRENT_FILENAME
    variable is automatically added.

    This allows tMongoDBGridFSProperties to read
    each file name returned by tMongoDBGridFSList.

Filtering attributes

  1. Double-click tFilterColumns to open its
    Component view.

    tMongoDBGridFSPut_6.png

  2. Click the […] button next to Edit schema to open the schema editor.
  3. On the left side (the input side), select the column to be used and click the

    tMongoDBGridFSPut_7.png

    button to move this column to the right side (the output
    side). In this example, move every column to the right side except the contentType column.

    Each column represents a file attribute and the pre-defined schema of tMongoDBGridFSProperties automatically contains these
    columns.
    tMongoDBGridFSPut_8.png

  4. Click OK to validate these changes and accept the
    propagation prompted by the pop-up dialog box.

Downloading files from MongoDB GridFS

  1. Double-click tMongoDBGridFSGet to open its
    Component view.

    tMongoDBGridFSPut_9.png

  2. Select the Use existing connection check box
    and from the Connection list, select the
    component in which the MongoDB connection to be used is defined.
  3. In the Bucket field, enter the bucket in
    which the files to be retrieved are stored. In this example, it is talend_channel/61.
  4. In the Local folder field, enter the path to
    the local folder in which you want to store the downloaded files. In this
    scenario, it is C:/tmp/output.
  5. Select the Use Document ID as output filename
    check box to rename each downloaded file using the value of its ObjectID attribute.

    Since a file in GridFS is distinct by ID rather than by name, it is possible
    that several files are using the same file name. For this reason, when
    downloading this kind of files into the same directory without renaming them
    differently, an exception is returned to alert that the file being downloaded
    already exists. In order to avoid this error, you can either select the
    Overwrite local files check box to replace the
    existing one with the latest downloaded file or rename these files on the fly
    using their IDs. In this example, the strategy of renaming these files is
    adopted.

Remove files from MongoDB GridFS

  1. Double-click tMongoDBGridFSDelete to open its
    Component view.

    tMongoDBGridFSPut_10.png

  2. Select the Use existing connection check box
    and from the Connection list, select the
    component in which the MongoDB connection to be used is defined.
  3. In the Bucket field, enter the bucket in
    which the files to be deleted are stored. In this example, it is talend_channel/61.
  4. From the Query type list, select the approach
    you want to use to select the files to be deleted. In this example, select
    Filename to use the filename attribute of each GridFS file for query.
  5. In the Filename field, enter the name of the
    file to be deleted.

Executing the Job

Then you can run this Job.

The tLogRow component is used to present the execution
result of the
Job.

  1. If you want to configure the presentation mode on its Component view, double-click the tLogRow component to open the Component view and in the Mode
    area, then, select the Table (print values in cells of a
    table)
    radio box.
  2. Press
    F6
    to run this
    Job.

Once done, the Run view is opened automatically, where
the metadata of the video custom_hadoop.mp4 in GridFS
is displayed.

tMongoDBGridFSPut_11.png

The downloaded file can be found in the directory C:/tmp/output, using its ID as its file name.

tMongoDBGridFSPut_12.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x