July 30, 2023

tNeo4jOutput – Docs for ESB 7.x

tNeo4jOutput

Receives data from the preceding component and writes the data into Neo4j.

tNeo4jOutput is used to write data
into a Neo4j database, and/or update or delete entries in the database based on the index
defined.

tNeo4jOutput Standard properties

These properties are used to configure tNeo4jOutput running in the Standard Job framework.

The Standard
tNeo4jOutput component belongs to the Big Data and the Databases NoSQL families.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Use an existing connection

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

DB version

Select the Neo4j version you are using.

This component does not support Neo4j version V3.2.X. Do not reuse the connection to V3.2.X defined in a tNeo4jConnection component.

Do
not use a 2.X.X version and a 3.X.X version in the same Job. Otherwise,
class conflict issues occur.

Neo4j version 2.X.X is compatible only with Java 7 or higher but it offers
support of advanced features such as node labels.

This list is not shown if the Use an
existing connection
check box is selected.

Upon selecting a database version, you will be
prompted to install the corresponding database driver JAR files if not
yet installed. You can find more details
about how to install external modules in Talend Help Center (https://help.talend.com)
.

Remote server

Select this check box if you use a Neo4j remote server, and specify the root URL in the Server URL field.

  • Set username: this check box
    is available when you have selected the Use a remote server check box and the Neo4j version
    you are using is earlier than 2.2. If the remote Neo4j server
    you want to connect to does not require user credentials, leave
    it clear.

  • Username and Password: enter the authentication
    information to connect to the remote Neo4j server to be used.
    Since Neo4j 2.2, user credentials are always required.

This check box appears only if you do not select the Use an existing connection check box.

Database path

If you use Neo4j in embedded mode, specify the directory
to hold your data files. If the specified directory does not exist, it
will be created.

This field appears only if you do not select the
Use an existing connection
check box or the Remote server
check box.

Shutdown after
job

Select this check box to shutdown the Neo4j database connection when no more
operations on Neo4j are going to be performed after the current
component.

Alternatively, you can use tNeo4jClose to shutdown the
database.

This avoids errors such as “Id file not properly shutdown” at next execution
of Jobs involving Neo4j.

This check box is available only if the Use an existing
connection
check box is not selected.

Mapping

Click the […]
button or double-click the component on the design workspace to open the
indexes and relationships mapping editor. Use it to index node or create
relationships during the node insertion.

  • Select the Auto
    indexed
    check box for a column to automatically
    index nodes with this property.
  • Index
    creation
    : With a column selected, click the
    [+] button to create as
    many indexes as you want on nodes with the property corresponding to
    the selected column.

    • Name:
      Specify an index name in double quotes.
    • Key: Specify an index
      key in double quotes.
    • Value (empty for current
      row)
      : Specify an index value in double
      quotes. If you leave this field empty, the default value of
      the index added on each node will be the value of this
      property of the current node.
    • Unique: Select this check box if you want
      the defined index to be created only once within the graph,
      rather than on each node.
  • Relationship
    creation
    : With a column selected, click the
    [+] button to create as
    many relationships as you want for nodes with the property
    corresponding to the selected column.

    • Type:
      Specify a relationship type in double quotes.
    • Direction: Select a relationship direction,
      between Outgoing and
      Incoming.
    • Index
      name
      : Specify an index name for the
      relationship, in double quotes.
    • Index
      key
      : Specify an index key for the
      relationship, in double quotes.
    • Value (empty for current row):
      Specify an index value for the relationship, in double
      quotes. If you leave this field empty, the default value of
      the index added on the relationship will be the value of
      this property of the current node.

Use label (Neo4j > 2.0)

Select this check box to create nodes with a label. Enter
your label name in the Label name
field.

This check box is not shown if Neo4J 1.X.X is selected from the DB Version list or Delete is selected from the Data action list.

Note that this option works only with Neo4j 2.0 onwards
and Java 7.

Data action

On the data of the node, you can perform:

  • Insert: Add
    new node to the database.
  • Update: Make
    changes to existing entries.
  • Update or
    insert:
    Search the node with an index to update and
    make changes. If the node doesn’t exist, a new node will be
    inserted.
  • Delete:
    Remove nodes fetched by the index according to the input flow.

Index name

Specify the index name to query.

This field is available only if the action selected in
Data action is other than
Insert.

Index key

Specify the index key to query.

This field is available only if the action selected in
Data action is other than
Insert.

Index value

Select the index value to query.

This field is available only if the action selected in
Data action is other than
Insert.

Schema and Edit schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

Click Edit
schema
to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

 

Built-In: You create and store the schema locally for this component
only.

 

Repository: You have already created the schema and stored it in the
Repository. You can reuse it in various projects and Job designs.

When the schema to be reused has default values that are
integers or functions, ensure that these default values are not enclosed within
quotation marks. If they are, you must remove the quotation marks manually.

You can find more details about how to
verify default values in retrieved schema in Talend Help Center (https://help.talend.com).

Advanced settings

Commit every

Enter the number of rows to be completed before committing batches of
nodes to the DB. This option ensures transaction quality (but not
rollback) and, above all, better performance at execution.

Warning: This option is only supported by the
embedded mode of the database. You can’t make transactions in REST mode.

Batch import

Select this check box to activate the batch mode.

Warning:

  • This option is only supported by the embedded mode of the
    database.
  • It is recommended that you perform a backup operation before
    executing the Job to prevent data corruption.
Note: If you have configured index creation on multiple
columns in the Mapping table, it
is recommended that you select the Unique check box in the index setting for the last
column to avoid creating unwanted redundant indexes that may cause batch
load issues.

If you want more explanations about memory mapping configuration of
batch import, please refer to Neo4j documentation at: http://neo4j.com/docs/stable/batchinsert-examples.html.

Node store mapped memory

Type in the memory size in MB allocated to nodes.

Relationship store mapped memory

Type in the memory size in MB allocated to relationships.

Property store mapped memory

Type in the memory size in MB allocated to property.

String store mapped memory

Type in the memory size in MB allocated to strings.

Array store mapped memory

Type in the memory size in MB allocated to arrays.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used as an output component
and it always needs an incoming link.
Limitation n/a

Writing data to a Neo4j database and reading specific data from
it

This scenario applies only to Talend products with Big Data.

This basic scenario describes a Job composed of two subJobs: the first subJob reads
employees data from a CSV file and writes it to a Neo4j database, and then triggers the
second subJob, which reads the employees data based on certain query conditions from the
Neo4j database and displays the data on the Run
console.

Adding and linking components

  1. Create a Job and add the following components to the Job by typing theirs
    names in the design workspace or dropping them from the Palette:

    • a tFileInputDelimited component,
      to read the employees data from a CSV file,

    • a tNeo4jOutput component to write
      the employees data to a Neo4j database,

    • a tNeo4jIntput component to read
      the employees data from the Neo4j database based on given
      conditions, and

    • a tLogRow component to display
      the data on the Run console.

  2. Link the tFileInputDelimited component to
    the tNeo4jOutput component using a
    Row > Main connection.
  3. Link the tNeo4jIntput component to the
    tLogRow component using a Row > Main
    connection.
  4. Link the tFileInputDelimited component to
    the tNeo4jIntput component using a
    Trigger > On
    Subjob Ok
    connection.
  5. Label the components to better identify their roles in the Job.

    tNeo4jOutput_1.png

Configuring the components

Importing data to the Neo4j database

  1. Double-click the tFileInputDelimited
    component to open its Basic settings view
    on the Components tab.

    tNeo4jOutput_2.png

  2. In the File name/Stream field, specify
    the path to the CSV file that contains the employees data to read.

    The input CSV file used in this example is as follows:
  3. In the Header field, specify the number
    of rows to skip as header rows. In this example, the first row of the CSV
    file is the header row.
  4. Click the […] button next to Edit schema to open the Schema dialog box, and define the input schema based on
    the structure of the input file. In this example, the input schema is
    composed of six columns: employeeID
    (integer), employeeName (String),
    age (Integer), hireDate (Date), salary (Float), and managerID (String).

    When done, click OK to close the
    Schema dialog box and propagate the
    schema to the next component.
    tNeo4jOutput_3.png

  5. Click the tNeo4jOutput component and
    select the Component tab to open its
    Basic settings view.

    tNeo4jOutput_4.png

  6. Define a Neo4j database connection. In this example, the Neo4j database is
    accessible in REST mode, so select the Remote
    server
    check box and specify the URL of the Neo4j server in
    the Server URL field, “http://localhost:7474/db/data” in this
    example.
  7. If needed, click the Sync columns button
    to ensure the component has the same schema as the preceding
    component.

    Keep the rest of the parameters as they are.

Reading data from the Neo4j database

  1. Double-click the tNeo4jInput component to
    open its Basic settings view.

    tNeo4jOutput_5.png

  2. As in the tNeo4jOutput component, specify
    the URL of the Neo4j server to connect to, “http://localhost:7474/db/data” in this example.
  3. Click the […] button next to Edit schema and define the schema for employees
    information display. When done, click OK to
    close the Schema dialog box and propagate
    the schema to the next component.

    tNeo4jOutput_6.png

    The defined schema columns automatically appear in the Mapping table.
  4. In the Query field, type in the Cypher
    query to match the data to read from the Neo4j database. In this example,
    use the following Cypher query to find employees who are more than 40 years old and are under the manager
    m6.

  5. Fill the Return parameter field for each
    schema column with a return parameter in double quotes to map the node
    properties in the Neo4j database with the schema columns.
  6. Double-click the tLogRow component to
    open its Basic settings view, and select
    the Table (print values in cells of a
    table)
    option to display the retrieved information in a
    table.

Executing the Job

  1. Press Ctrl+S to save the Job.
  2. Press F6 or click Run on the Run tab to run
    the Job.

    tNeo4jOutput_7.png

    The employees data of the CSV file is written to the Neo4j database and
    then the information of employees matching the set conditions is retrieved
    from the Neo4j database and displayed on the console.

Writing family information to Neo4j and creating relationships

This scenario applies only to Talend products with Big Data.

This scenario describes a Job that will write family information to labeled nodes in a
remote Neo4j database and create relationships based on the family names.

Adding and linking components

  1. Create a Job and add the following components to the Job by typing theirs
    names in the design workspace or dropping them from the Palette:

    • a tFileInputDelimited component,
      to read the family data from a CSV file,

    • a tNeo4jOutput component to write
      the family data to a Neo4j database and create relationships between
      husband and wife.

  2. Link the tFileInputDelimited component to
    the tNeo4jOutput component using a
    Row > Main connection.
  3. Label the components to better identify their roles in the Job.

    tNeo4jOutput_8.png

Configuring the components

Configuring the data source

  1. Double-click the tFileInputDelimited
    component to open its Basic settings view
    on the Components tab.

    tNeo4jOutput_9.png

  2. In the File name/Stream field, specify
    the path to the CSV file that contains the family data to read.

    The input CSV file used in this example is as follows:
  3. In the Header field, specify the number
    of rows to skip as header rows. In this example, the first row of the CSV
    file is the header row.
  4. Click the […] button next to Edit schema to open the Schema dialog box, and define the input schema based on
    the structure of the input file. In this example, the input schema is
    composed of six columns: name (integer),
    gender (String), age (Integer), and family (String).

    When done, click OK to close the
    Schema dialog box and propagate the
    schema to the next component.
    tNeo4jOutput_10.png

Writing data to Neo4j and creating indexes and relationships

  1. Click the tNeo4jOutput component and
    select the Component tab to open its
    Basic settings view.

    tNeo4jOutput_11.png

  2. From the DB Version list, select
    Neo4J 2.X.X to enable node
    labeling.
  3. Define a Neo4j database connection. In this example, the Neo4j database is
    accessible in REST mode, so select the Remote
    server
    check box and specify the URL of the Neo4j server in
    the Server URL field, “http://localhost:7474/db/data” in this
    example.
  4. Double-click the tNeo4jOutput component
    or click the Mapping button on the
    component’s Basic settings view to open the
    index and relationship mapping editor.
  5. With the name column selected from the
    schema panel, click the Index creation tab,
    click the [+] button to add a row in the
    table, and create an index named first_name on this column:

    • In the Name field, enter
      first_name between double
      quotation marks.

    • In the Key field, enter first_name between double quotation
      marks to give the index a key.

    Then click in the schema panel to validate your index creation.
  6. With the family column selected from
    the schema panel, click the Index creation
    tab, click the [+] button to add a row in
    the table, and create an index named family on this column:

    • In the Name field, enter
      family between double
      quotation marks.

    • In the Key field, enter family_name between double quotation
      marks to give the index a key.

    Then click in the schema panel to validate your index creation.
    tNeo4jOutput_12.png

  7. With the family column selected from
    the schema panel, click the Relationship
    creation
    tab, click the [+]
    button to add a row in the table, and create a relationship named Spouse on this column based on the index named
    family:

    • In the Type field, enter
      Spouse between double
      quotation marks.

    • From the Direction list field,
      select either Outgoing or Incoming.

    • In the Index Name field, enter
      family between double
      quotation marks.

    • In the Index Key field, enter
      family_name between double
      quotation marks.

    Then click in the schema panel to validate your relationship creation, and
    click OK to close the mapping
    editor.
    tNeo4jOutput_13.png

  8. Select the Use label (Neo4j > 2.0) check
    box and enter Families between double
    quotation marks in the Label name field so
    that the nodes to be created will be labeled Families.
  9. From the Data action list, select
    Insert or update, and set a reference
    key in the Index area that appears:

    • In Index name field, enter
      first_name between double
      quotation marks.

    • In Index key field, enter
      first_name between double
      quotation marks.

    • From Index value field, select
      name. As the Value field is left blank in index
      creation, the index value will be the value of the name column for each row.

    This way, when the Job is executed, nodes will be inserted or updated in
    the Neo4j database based on the first_name index: for each data row, if a node containing
    the same first name already exists in the database, the node will be
    updated; otherwise, a new node will be created.

Executing the Job and checking the result

  1. Press Ctrl+S to save the Job, and press
    F6 or click Run on the Run tab to run
    the Job.
  2. In the address bar of your Web browser, enter the URL of the Neo4j
    database browser, http://localhost:7474/ in this example, and
    enter the following Cypher query in the command line to view the
    nodes.

    As shown in the graphic view, three pairs of nodes labeled Families have been created and those with the
    same family name are linked together via the relationship Spouse.
    tNeo4jOutput_14.png


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x