tMongoDBOutput
Executes the action defined on the collection in the MongoDB database.
tMongoDBOutput
inserts, updates, upserts or deletes documents in a MongoDB database collection based on
the incoming flow from the preceding component in the Job.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
-
Standard: see tMongoDBOutput Standard properties.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric. -
Spark Batch: see tMongoDBOutput properties for Apache Spark Batch.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric. -
Spark Streaming: see tMongoDBOutput properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
tMongoDBOutput Standard properties
These properties are used to configure tMongoDBOutput running in the Standard Job framework.
The Standard
tMongoDBOutput component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.
Basic settings
Use existing connection |
Select this check box and in the Component List click the relevant connection component to |
||||
DB Version |
List of the database versions. Available when the Use existing |
||||
Use replica set address |
Select this check box to show the Replica In the Replica address table, you can Available when the Use existing |
||||
Server and Port |
IP address and listening port of the database server. Available when the Use existing |
||||
Database |
Name of the database. |
||||
Use SSL connection |
Select this check box to enable the SSL or TLS encrypted connection. Then you need to use the tSetKeystore Note that the SSL connection is available only for the version 2.4 + of |
||||
Set write concern |
Select this check box to set the level of acknowledgement requested from for write For further information, see the related MongoDB documentation on http://docs.mongodb.org/manual/core/write-concern/. |
||||
Bulk write |
Select this check box to insert, update or remove data in bulk. Note this feature is available only when the version of MongoDB you are using is 2.6+. Then you need to select Ordered or Unordered to define how the MongoDB database processes the data
sent by the Studio.
In the Bulk write size field, enter the size of each |
||||
Required authentication |
Select this check box to enable the database authentication. Among the mechanisms listed on the Authentication mechanism For details about the other mechanisms in this list, see MongoDB Authentication from the MongoDB |
||||
Set Authentication database |
If the username to be used to connect to MongoDB has been created in a specific For further information about the MongoDB Authentication database, see User Authentication database. |
||||
Username and Password |
DB user authentication data. To enter the password, click the […] button next to the Available when the Required If the security system you have selected from the Authentication mechanism drop-down list is Kerberos, you need to |
||||
Collection |
Name of the collection in the MongoDB database. |
||||
Drop collection if exist |
Select this check box to drop the collection if it already |
||||
Action on data |
The following operations are available:
|
||||
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
Click Sync columns to retrieve the |
||||
 |
Built-In: You create and store the schema locally for this component |
||||
 |
Repository: You have already created the schema and stored it in the When the schema to be reused has default values that are You can find more details about how to |
||||
Mapping |
Each column of the schema defined for this component represents a field of the documents For example, in the document reading as
follows
The first and the last fields have person as their parent node but the _id field does not have any parent node. So once completed, this Mapping table should read as follows:
Not available when the Generate JSON |
||||
Die on error |
This check box is cleared by default, meaning to skip the row on error |
Advanced settings
Generate JSON Document |
Select this check box for JSON configuration:
Configure JSON Tree: click the
Group by: click the [+] button to add lines and choose the input
Remove root node: select this check
Data node and Query node (available for update and upsert actions): Warning:
These nodes are mandatory for update and upsert actions. They are |
No query timeout |
Select this check box to prevent MongoDB servers from stopping idle A cursor for MongoDB is a pointer to the result set of a query. By |
tStatCatcher Statistics |
Select this check box to collect the log data at the component |
Global Variables
Global Variables |
NB_LINE: the number of rows read by an input component or
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Usage
Usage rule |
tMongoDBOutput executes the action |
Limitation |
Note:
|
Creating a collection and writing data to it
This scenario applies only to Talend products with Big Data.
This scenario creates the collection blog and
writes post data to it.
Linking the components
-
Drop tMongoDBConnection, tFixedFlowInput, tMongoDBOutput, tMongoDBClose, tMongoDBInput and tLogRow
onto the workspace. - Rename tFixedFlowInput as blog_post_data, tMongoDBOutput as write_data_to_collection, tMongoDBInput as read_data_from_collection and tLogRow as show_data_from_collection.
- Link tMongoDBConnection to tFixedFlowInput using the OnSubjobOk trigger.
-
Link tFixedFlowInput to tMongoDBOutput using a Row > Main
connection. - Link tFixedFlowInput to tMongoDBInput using the OnSubjobOk trigger.
- Link tMongoDBInput to tMongoDBClose using the OnSubjobOk trigger.
-
Link tMongoDBInput to tLogRow using a Row > Main
connection.
Configuring the components
-
Double-click tMongoDBConnection to open
its Basic settings view. -
From the DB Version list, select the
MongoDB version you are using. -
In the Server and Port fields, enter the connection details.
In the Database field, enter the name of the MongoDB
database. -
Double-click tFixedFlowInput to open its
Basic settings view.Select Use Inline Content (delimited
file) in the Mode
area.In the Content field, enter the data to write to the
MongoDB database, for example:1231;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world...3;Andy;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle...2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field... -
Double-click tMongoDBOutput to open its
Basic settings view.Select the Use existing connection and
Drop collection if exist check
boxes.In the Collection field, enter the name
of the collection, namely blog. -
Click the […] button next to Edit schema to open the schema editor.
-
Click the [+] button to add five columns
in the right part, namely id, author, title, keywords and
contents, with the type as Integer and String respectively.Clickto copy all the columns to the input table.
Click Ok to close the editor. -
The columns now appear in the left part of the Mapping area.
For columns author, title, keywords and
contents, enter their parent node
post. By doing so, those nodes reside
under the node post in the MongoDB
collection. -
Double-click tMongoDBInput to open its
Basic settings view.Select the Use existing connection check
box.In the Collection field, enter the name
of the collection, namely blog. -
Click the […] button next to Edit schema to open the schema editor.
-
Click the [+] button to add five columns,
namely id, author, title, keywords and contents, with the type as Integer and String
respectively.Click OK to close the editor. -
The columns now appear in the left part of the Mapping area.
For columns author, title, keywords and contents,
enter their parent node post so that the
data can be retrieved from the correct positions. -
In the Sort by area, click the [+] button to add one line and enter id under Column.
Select asc from the Order asc or desc? column to the right of the id column. This way, the retrieved records will
appear in ascending order of the id
column.
Executing the Job
- Press Ctrl+S to save the Job.
-
Press F6 to run the Job.
-
Switch to the database talend and read data from the
collection blog in the MongoDB command
line client. You can find that author,
title, keywords and contents all
reside under the node post. Meanwhile,
the records are stored in the same order as the source input.
Upserting records in a collection
This scenario applies only to Talend products with Big Data.
This scenario upserts the collection blog as an existing
record has its author changed and a new record is added. Before the upsert, the collection blog looks like:
1 2 3 |
1;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world... 2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field... 3;Andy;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle... |
Such records can be inserted to the database following the instructions of
Creating a collection and writing data to it.
Linking the components
-
Drop tMongoDBConnection, tFixedFlowInput, tMongoDBOutput, tMongoDBClose, tMongoDBInput and tLogRow
from the Palette onto the design
workspace. - Rename tFixedFlowInput as blog_post_data, tMongoDBOutput as write_data_to_collection, tMongoDBInput as read_data_from_collection and tLogRow as show_data_from_collection.
- Link tMongoDBConnection to tFixedFlowInput using the OnSubjobOk trigger.
-
Link tFixedFlowInput to tMongoDBOutput using a Row > Main
connection. - Link tFixedFlowInput to tMongoDBInput using the OnSubjobOk trigger.
- Link tMongoDBInput to tMongoDBClose using the OnSubjobOk trigger.
-
Link tMongoDBInput to tLogRow using a Row > Main
connection.
Configuring the components
-
Double-click tMongoDBConnection to open
its Basic settings view. -
From the DB Version list, select the
MongoDB version you are using. -
In the Server and Port fields, enter the connection details.
In the Database field, enter the name of the MongoDB
database. -
Double-click tFixedFlowInput to open its
Basic settings view.Select Use Inline Content (delimited
file) in the Mode
area.In the Content field, enter the data for upserting the
MongoDB database, for example:12341;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world...2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field...3;Anderson;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle...4;Andy;Big Data Bang;Big Data,Talend;Talend, the driving force for Big Data applications...As shown above, the 3rd record has its author changed and the 4th record
is new. -
Double-click tMongoDBOutput to open its
Basic settings view.Select the Use existing connection and
Die on error check boxes.In the Collection field, enter the name
of the collection, namely blog.Select Upsert from the Action on data list. -
Click the […] button next to Edit schema to open the schema editor.
-
Click the [+] button to add five columns
in the right part, namely id, author, title, keywords and
contents, with the type as Integer and String respectively.Clickto copy all the columns to the input table.
Click Ok to close the editor. -
In the Advanced Settings view, select the
Generate JSON Document check
box.Select the Remove root node check box.In the Data node and Query node fields, enter “data” and “query”. -
Click the […] button next to Configure JSON Tree to open the configuration
interface. -
Right-click the node rootTag and select
Add Sub-element from the contextual
menu.In the dialog box that appears, type in data for the Data
node:Click OK to close the window.Repeat this operation to define query
as the Query node.Right-click the node data and select
Set As Loop Element from the contextual
menu.Warning:These nodes are mandatory for update and upsert actions. They are
intended to enable the update and upsert actions though will not be
stored in the database. -
Select all the columns under the Schema
list and drop them to the data node.In the window that appears, select Create as
sub-element of target node.Click OK to close the window.Repeat this operation to drop the id
column from the Schema list under the
Query node. -
Right-click the node id under data and select Add
Attribute from the contextual menu.In the dialog box that appears, type in type as the attribute name:Click OK to close the window.Right-click the node @type under
id and select Set A Fix Value from the contextual menu.In the dialog box that appears, type in integer as the attribute value, ensuring the id values are stored as integers in the
database.Click OK to close the window.Repeat this operation to set this attribute for the id node under Query.Click OK to close the JSON Tree
configuration interface. -
Double-click tMongoDBInput to open its
Basic settings view.Select the Use existing connection check
box.In the Collection field, enter the name
of the collection, namely blog.Click the […] button next to Edit schema to open the schema editor.Click the [+] button to add five columns,
namely id, author, title, keywords and contents, with the type as Integer and String
respectively.Click OK to close the editor.The columns now appear in the left part of the Mapping area.For columns author, title, keywords and contents,
enter their parent node post so that the
data can be retrieved from the correct positions. -
Double-click tLogRow to open its
Basic settings view.In the Mode area, select Table (print values in cells of a table for
better display.
Executing the Job
- Press Ctrl+S to save the Job.
-
Press F6 to run the Job.
As shown above, the 3rd record has its author updated and the 4th record
is inserted.
tMongoDBOutput properties for Apache Spark Batch
These properties are used to configure tMongoDBOutput running in the Spark Batch Job framework.
The Spark Batch
tMongoDBOutput component belongs to the Databases family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
||||
MongoDB configuration |
Select this check box and in the Component List click the relevant connection component to |
||||
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
If a column in the database is a JSON document and you need to read |
||||
Collection |
Enter the name of the collection to be used. A MongoDB collection is the equivalent of an RDBMS table and contains |
||||
Set write concern |
Select this check box to set the level of acknowledgement requested from for write For further information, see the related MongoDB documentation on http://docs.mongodb.org/manual/core/write-concern/. |
||||
Action on data |
The following operations are available:
|
||||
Mapping |
Each column of the schema defined for this component represents a field of the documents For example, in the document reading as
follows
The first and the last fields have person as their parent node but the _id field does not have any parent node. So once completed, this Mapping table should read as follows:
|
Advanced settings
Advanced Hadoop MongoDB |
Add properties to define extra operations you need tMongoDBOutput to perform when writing data. The available properties are listed and explained in MongoDB Connector for |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component should use a tMongoDBConfiguration component present in the same Job to connect This component, along with the Spark Batch component Palette it belongs to, Note that in this documentation, unless otherwise explicitly stated, a |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario in which tMongoDBOutput is used, see Writing and reading data from MongoDB using a Spark Batch Job.
tMongoDBOutput properties for Apache Spark Streaming
These properties are used to configure tMongoDBOutput running in the Spark Streaming Job framework.
The Spark Streaming
tMongoDBOutput component belongs to the Databases family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Property type |
Either Built-In or Repository. Built-In: No property data stored centrally.
Repository: Select the repository file where the |
||||
MongoDB configuration |
Select this check box and in the Component List click the relevant connection component to |
||||
Schema and Edit |
A schema is a row description. It defines the number of fields Click Edit
If a column in the database is a JSON document and you need to read |
||||
Collection |
Enter the name of the collection to be used. A MongoDB collection is the equivalent of an RDBMS table and contains |
||||
Set write concern |
Select this check box to set the level of acknowledgement requested from for write For further information, see the related MongoDB documentation on http://docs.mongodb.org/manual/core/write-concern/. |
||||
Action on data |
The following operations are available:
|
||||
Mapping |
Each column of the schema defined for this component represents a field of the documents For example, in the document reading as
follows
The first and the last fields have person as their parent node but the _id field does not have any parent node. So once completed, this Mapping table should read as follows:
|
Advanced settings
Advanced Hadoop MongoDB |
Add properties to define extra operations you need tMongoDBOutput to perform when writing data. The available properties are listed and explained in MongoDB Connector for |
Usage
Usage rule |
This component is used as an end component and requires an input link. This component should use a tMongoDBConfiguration component present in the same Job to connect This component, along with the Spark Streaming component Palette it belongs to, appears Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
Spark Connection |
In the Spark
Configuration tab in the Run view, define the connection to a given Spark cluster for the whole Job. In addition, since the Job expects its dependent jar files for execution, you must specify the directory in the file system to which these jar files are transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |
Related scenarios
For a scenario in which tMongoDBOutput is used, see Reading and writing data in MongoDB using a Spark Streaming Job.