July 30, 2023

tMongoDBBulkLoad – Docs for ESB 7.x

tMongoDBBulkLoad

Imports data files in different formats (CSV, TSV or JSON) into the specified
MongoDB database so that the data can be further processed.

tMongoDBBulkLoad Standard properties

These properties are used to configure tMongoDBBulkLoad running in the Standard Job framework.

The Standard
tMongoDBBulkLoad component belongs to the Big Data and the Databases NoSQLfamilies.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

MongoDB directory

Fill in this field with the MongoDB home directory.

Use local DB path

Select this check box to provide the information of the local database that you want to
use. MongoDB V3.0 and onward versions do not support this
feature.

  • Local DB path: type in the
    path to the local database specified when starting the MongoDB
    server.

Use replica set address

Select this check box to define a replica set to be connected.

  • Replica set name: specify the
    name of the replica set.

  • Replica address: specify
    multiple MongoDB database servers for failover as needed. Note
    that if you leave the replica host or replica port unspecified,
    their default values localhost and 27017 will be used.

Server

Hostname or IP address of the database server. Note that the default
value localhost will be used if the
server is not specified.

This field is available only when the Use replica set address check box is not
selected.

Port

Listening port of the database server. Note that the default value
27017 will be used if the port is
not specified.

This field is available only when the Use replica set address check box is not
selected.

Database

Type in the name of the database to import data to.

Collection

Type in the name of the collection to import data to.

Use SSL connection

Select this check box to enable the SSL or TLS encrypted connection.

Then you need to use the tSetKeystore
component in the same Job to specify the encryption information.

Note that the SSL connection is available only for the version 2.4 + of
MongoDB.

Drop collection if exist

Select this check box to remove the collection if it already
exists.

Required authentication

Select this check box to enable the database authentication.

Among the mechanisms listed on the Authentication mechanism
drop-down list, the NEGOTIATE one is recommended if
you are not using Kerberos, because it automatically select the authentication mechanism
the most adapted to the MongoDB version you are using.

For details about the other mechanisms in this list, see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database

If the username to be used to connect to MongoDB has been created in a specific
Authentication database of MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.

For further information about the MongoDB Authentication database, see User Authentication database.

Username and Password

DB user authentication data.

To enter the password, click the […] button next to the
password field, and then in the pop-up dialog box enter the password between double quotes
and click OK to save the settings.

Available when the Required
authentication
check box is selected.

If the security system you have selected from the Authentication mechanism drop-down list is Kerberos, you need to
enter the User principal, the Realm and the KDC
server
fields instead of the Username and the Password
fields.

Data file

Type in the full path of the file from which the data will be imported
or click the […] button to browse to
the desired data file.

Make sure that the data file is in standard format. For
example, the fields in CSV files should be separated with commas.

File type

Select the proper file type from the list. CSV, TSV and JSON are
supported.

The JSON file starts with an
array

Select this check box to allow tMongoDBBulkload to read the JSON files starting with an
array.

This check box appears when the File
type
you have selected is JSON.

Action on data

Select the action that you want to perform on the data.

  • Insert: Insert the data
    into the database.

    Note that when inserting data from CSV or TSV
    files into the MongoDB database, you need to specify fields
    either by selecting the First line is
    header
    check box or defining them in the
    schema.

  • Upsert: Insert the data if
    they do not exist or update the existing data.

    Note that when upserting data into the MongoDB
    database, you need to specify a list of fields for the query
    portion of the upsert operation.

Upsert fields

Customize the fields that you want to upsert as needed.

This table is available when you select Upsert from the Action on data list.

First line is header

Select this check box to use the first line in CSV or TSV files as a
header.

This check box is available only when you select CSV or
TSV from the File type list.

Ignore blanks

Select this check box to ignore the empty fields in CSV or TSV
files.

This check box is available only when you select CSV or
TSV from the File type list.

Print log

Select this check box to print logs.

Advanced settings

Additional arguments

Complete this table to use the additional arguments as required.

For example, you can use the argument “–jsonArray” to accept the
import of data expressed with multiple MongoDB documents within a single
JSON array. For more information about the additional arguments, go to
http://docs.mongodb.org/manual/reference/program/mongoimport/
and read the description of options.

tStatCatcher Statistics

Select this check box to collect the log data at a component level.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component can be used together with the tMongoDBInput component to check if the data is imported
as expected.

Limitation

The MongoDB client tool needs to be installed on the machine where
Jobs using this component are executed.

Importing data into MongoDB database

This scenario applies only to Talend products with Big Data.

The following scenario describes a Job that firstly imports data from a CSV file into
the specified MongoDB collection, then reads data from the MongoDB collection to check
if the import is successful, next continues to import data from a JSON file with the
same data structure into the same MongoDB collection, and finally displays the data from
the MongoDB collection to demonstrate that the data from the JSON file is also imported
successfully.

tMongoDBBulkLoad_1.png

Dropping and linking the components

  1. Drop the following components from the Palette onto the design workspace: two tMongoDBBulkLoad components, two tMongoDBInput components, and two tLogRow components.
  2. Connect the first tMongoDBBulkLoad to the
    first tMongoDBInput using a Trigger > OnSubjobOk link.
  3. Connect the first tMongoDBInput to the
    first tLogRow using a Row > Main link.
  4. Repeat the two steps above to connect the second tMongoDBBulkLoad to the second tMongoDBInput, and the second tMongoDBInput to the second tLogRow.
  5. Connect the first tMongoDBInput to the
    second tMongoDBBulkLoad using a Trigger > OnSubjobOk link.
  6. Label the two tLogRow components to
    better identify the data displayed on the console.

Configuring the components

Importing data from a CSV file

  1. Double-click the first tMongoDBBulkLoad
    component to open its Basic settings view
    in the Component tab.

    tMongoDBBulkLoad_2.png

  2. In the MongoDB directory field, type in
    the MongoDB home directory. In this example, it is D:/MongoDB.
  3. In the Server and Port fields, fill in the information required for the
    connection to MongoDB. In this example, type in localhost and 27017.
  4. In the Database field,
    type in the database to import data to, bookstore in this
    example.
  5. In the Collection field, type in the collection to
    import data to, books in this example.
  6. Select the Drop collection if exist check
    box to remove the specified collection if it already exists.
  7. Browse to the desired data file from which you want to import data. In
    this example, it is D:/Input/books.csv,
    which is a standard CSV file containing four columns: id, title,
    author, and category.id,title,author,category
    1,Computer Networks,Larry Peterson,Computer Science
    2,David Copperfield,Charles Dickens,Language&Literature
    3,Life of Pi,Yann Martel,Language&Literature

  8. Select CSV from the File type list.
  9. Select Insert from the Action on data list.
  10. Select the First line is
    header
    check box to use the first line in the CSV file as a
    header.
  11. Select the Ignore blanks check box to ignore the blank
    fields (if any) in the CSV file.

Validating that the CSV file is imported successfully

  1. Double-click the first tMongoDBInput component to open its Basic settings view in the Component tab.

    tMongoDBBulkLoad_3.png

  2. From the DB Version list, select the
    MongoDB version you are using.
  3. In the Server and Port fields, fill in the information required for the
    connection to MongoDB. In this example, type in localhost and 27017.
  4. In the Database field, type in the
    database from which the data will be read, bookstore in this example.
  5. In the Collection field, type in the
    collection from which the data will be read, books in this example.
  6. Click Edit schema to
    define the data structure to be read from the MongoDB collection.

    tMongoDBBulkLoad_4.png

  7. In the Mapping table, the Column field is automatically populated with the
    defined schema. You do not need to fill in the Parent
    node path
    column.
  8. Double-click the first tLogRow component to open its Basic
    settings
    view in the Component tab.

    tMongoDBBulkLoad_5.png

  9. In the Mode area, select Table (print values in cells of a table).

Importing data from a JSON file

  1. Double-click the second tMongoDBBulkLoad component to open its Basic settings view in the Component tab.

    tMongoDBBulkLoad_6.png

  2. In the MongoDB directory field, type in
    the MongoDB home directory. In this example, it is D:/MongoDB.
  3. In the Server and Port fields, fill in the information required for the
    connection to MongoDB. In this example, type in localhost and 27017.
  4. In the Database field, type in the target
    database to import data, bookstore in
    this example.
  5. In the Collection field, type in the target collection
    to import data, books in this example
  6. Browse to the desired data file from which you want to import data. Here,
    select books.json.{
    "id": "4",
    "title": "Les Miserables",
    "author": "Victor Hugo",
    "category": "Language&Literature"
    }
    {
    "id": "5",
    "title": "Advanced Database Systems",
    "author": "Carlo Zaniolo",
    "category": "Database"

    }

  7. Select JSON from the File type list.
  8. Select Insert from the Action on data list.
  9. Click the Advanced
    settings
    tab to define the additional arguments as needed.

    tMongoDBBulkLoad_7.png

    In this example, add the argument ” –jsonArray” to accept the
    imported data within a single JSON array.

Validating that the JSON file is imported successfully

  1. Repeat Step 1 through Step 7 described in the procedure Validating that the CSV file is imported successfully to configure the second tMongoDBInput component.

    tMongoDBBulkLoad_8.png

  2. Repeat Step 8 through Step 9 described in the procedure Validating that the CSV file is imported successfully to configure the second tLogRow component.

Saving and executing the Job

  1. Press Ctrl + S to save the Job.
  2. Execute the Job by pressing F6 or clicking Run on the
    Run tab.
tMongoDBBulkLoad_9.png

The data from the collection books in the MongoDB database bookstore is
displayed on the console, which contains the data imported from both the CSV file
books.csv and the JSON file books.json.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x