July 30, 2023

tNeo4jBatchOutputRelationship – Docs for ESB 7.x

tNeo4jBatchOutputRelationship

Receives data from the preceding component and writes relationships in bulk into a local Neo4j database.

tNeo4jBatchOutputRelationship Standard properties

These properties are used to configure tNeo4jBatchOutputRelationship running in the Standard Job framework.

The Standard
tNeo4jBatchOutputRelationship component belongs to the Big Data and the Databases NoSQL families.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Use existing connection

Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.

This component supports Neo4j version V3.2.X only and does not support the remote mode. Therefore, do not reuse the connection to versions other than V3.2.X defined in a tNeo4jConnection component and do not select the Remote server check box in tNeo4jConnection.

Do
not use a 2.X.X version and a 3.X.X version in the same Job. Otherwise,
class conflict issues occur.

Database path

Specify the directory to hold your data files.

This field appears only if you do not select the
Use an existing connection
check box.

Shutdown after
job

Select this check box to shutdown the Neo4j database connection when no more
operations on Neo4j are going to be performed after the current
component.

Alternatively, you can use tNeo4jClose to shutdown the
database.

This avoids errors such as “Id file not properly shutdown” at next execution
of Jobs involving Neo4j.

This check box is available only if the Use an existing
connection
check box is not selected.

Field for relationship types

Select the column from the input schema you have defined in the preceding components to provide types for the relationships to be created.

Direction of the relationship

Select the direction of the relationships to be created:

  • Outgoing: The
    relationship starts from the start node to the end node.

  • Incoming: The
    relationship starts from the end node to the start node.

Start node of the relationship

Defining the start node of each relationship using the node identifier:

  • Name of the batch index: select the tNeo4jBatchOutput component used to create the start nodes. The name of the index for these nodes are retrieved from that component.

  • Field name for the batch index: select the column from the input schema you have defined in the preceding components to provide the name of the start node of each relationship to be created.

End node of the relationship

Defining the end node of each relationship using the node identifier:

  • Name of the batch index: select the tNeo4jBatchOutput component used to create the end nodes. The name of the index for these nodes are retrieved from that component.

  • Field name for the batch index: select the column from the input schema you have defined in the preceding components to provide the name of the end node of each relationship to be created.

Die on error

Select the check box to stop the execution of the Job when an error
occurs.

Clear the check box to skip any rows on error and complete the
process for error-free rows.

Advanced settings

Neo4j configuration

Add parameters to the table to configure the database to be created.

For further information, see Neo4j documentation: Configuration settings.

When entering values, use the syntax demonstrated by the examples given alongside the column names of this Nodes files table.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

Global Variables

NB_LINE: the number of rows read by an input component or
transferred to an output component. This is an After variable and it returns an
integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

This component is used as an output
component and it always needs an incoming link.

Writing information of actors and movies to Neo4j with hierarchical
relationship using Neo4j Batch components

In this scenario, Neo4j Batch components are used to import data about
actors and movies from two CSV files in a local Neo4j database and create
relationship for the data based on another CSV file that describes
the actors’ roles in the movies.

This scenario applies only to Talend products with Big Data.

The Neo4j Batch components provided by Talend supports bulk writing to a local Neo4j database only. They can be used neither with Neo4j versions prior to V3.2.X nor alongside Neo4j components that are using one of those Neo4j versions.

The components to be used are:

  • One tNeo4jConnection component: it opens the connection to Neo4j to be reused.

  • Three tFileInputDelimited
    components: they read the input information of actors and
    movies.

  • Two tNeo4jBatchOutput components: they
    write information of movies and actors to the connected Neo4j
    database.

  • One tNeo4jBatchOutputRelationship component: it creates
    relationship between actors and movies.

  • One tNeo4jBatchSchema component: it creates an uniqueness constraint on the nodes in the database.

tNeo4jBatchOutputRelationship_1.png

Creating the Neo4j Batch Job

  1. Ensure that the status of your Neo4j service and Neo4j console is stop.

    If you are using command-line to manage Neo4j, you can use neo4j status to check
    the status; if you have installed the Neo4j desktop application, you can
    check it directly in this application.

  2. From the Repository on the Integration perspective, create a Job and add the components to be used by typing their names in the design workspace or dropping them from the Palette.
  3. Connect the first tFileInputDelimited component to the
    first tNeo4jBatchOutput component using a
    Row > Main link. This subJob imports the actors data in the Neo4j database.
  4. Connect the second tFileInputDelimited
    component to the second tNeo4jBatchOutput
    component using a Row > Main link. This subJob imports the movies data in the Neo4j database.
  5. Connect the third tFileInputDelimited
    component to the tNeo4jBatchOutputRelationship
    component using a Row > Main link. This subJob creates relationship between actors and movies.
  6. Connect the subJobs using Trigger > On Subjob Ok links.

Configuring the Neo4j connection to be reused

  1. Double-click the tNeo4jConnection
    component to open its Basic settings view.

    tNeo4jBatchOutputRelationship_2.png

  2. From the DB Version list, select Neo4J 3.2.X.
  3. Ensure that the Use a remote server check box is clear because that the Neo4j Batch components work only on the local mode.
  4. In the Database path field, enter the path or browse to the database file.

Bulk-writing the actors data in Neo4j

  1. Double-click the first tFileInputDelimited component to open its Component view.

    tNeo4jBatchOutputRelationship_3.png

  2. In the File name/Stream field, enter the path or browse to the CSV file that describes the actors’ IDs, names and their labels to be used in Neo4j.

    The input CSV file used in this example reads as follows:

    The double quotation marks on the actor names are not mandatory.

  3. Click the […] button next to Edit schema to open the schema editor, and define the input schema based on
    the structure of the input file.

    In this example, the columns are id,
    name and label, all of type
    String.

    tNeo4jBatchOutputRelationship_4.png

  4. Click OK to close this editor and accept the propagation
    of the schema to the next component.
  5. In the Field separator field, enter a comma (,) to
    replace the default semicolon (;).
  6. Double-click the first tNeo4jBatchOutput component to open its
    Component view.

    tNeo4jBatchOutputRelationship_5.png

  7. Select the Use an existing connection
    check box to reuse the Neo4j database connection opened by the tNeo4jConnection component.
  8. Verify that the Shutdown after Job check box is clear.
  9. From the Field that contains the label list drop-down list, select the column that provides labels.
  10. In the Index name field, enter the name of the index to be created for the nodes.
  11. From Import identifier drop-down list, select the column that provides IDs.

Bulk-writing the movies data into Neo4j

  1. Double-click the second tFileInputDelimited component to open its Component view.

    tNeo4jBatchOutputRelationship_6.png

  2. In the File name/Stream field, enter the path or browse to the CSV file that describes the movies’ IDs, names, release years and their labels to be used in Neo4j.

    The input CSV file used in this example reads as follows:

    The double quotation marks on the movie names are not mandate.

  3. Click the […] button next to Edit schema to open the schema editor, and define the input schema based on
    the structure of the input file.

    In this example, the columns are id,
    title, released and
    label.

    tNeo4jBatchOutputRelationship_7.png

  4. Click OK to close this editor and accept the propagation
    of the schema to the next component.
  5. In the Field separator field, enter a comma (,) to
    replace the default semicolon (;).
  6. Double-click the second tNeo4jBatchOutput component to open its
    Component view.

    tNeo4jBatchOutputRelationship_8.png

  7. Select the Use an existing connection
    check box to reuse the Neo4j database connection opened by the tNeo4jConnection component.
  8. Verify that the Shutdown after Job check box is clear.
  9. From the Field that contains the label list drop-down list, select the column that provides labels.
  10. In the Index name field, enter the name of the index to be created for the nodes.
  11. From Import identifier drop-down list, select the column that provides IDs.

Creating relationships in bulk

  1. Double-click the third tFileInputDelimited component to open its Component view.

    tNeo4jBatchOutputRelationship_9.png

  2. In the File name/Stream field, enter the path or browse to the CSV file that describes the actor-movie relationships.

    The input CSV file used in this example reads as follows:

    The double quotation marks on the role names are not mandatory. The value
    ACTED_IN is an user-defined relationship type
    that explains the relationship between the actors and the movies.

  3. Click the […] button next to Edit schema to open the schema editor, and define the input schema based on
    the structure of the input file.

    In this example, the columns are from,
    role, to and
    type.

    tNeo4jBatchOutputRelationship_10.png

  4. Click OK to close this editor and accept the propagation
    of the schema to the next component.
  5. In the Field separator field, enter a comma (,) to
    replace the default semicolon (;).
  6. Double-click the tNeo4jBatchOutputRelationship component
    to open its Component view.

    tNeo4jBatchOutputRelationship_11.png

  7. Select the Use an existing connection
    check box to reuse the Neo4j database connection opened by the tNeo4jConnection component.
  8. Verify that the Shutdown after Job check box is clear.
  9. From the Field for relationship type drop-down list,
    select the column that provides the relationship types.
  10. From the Direction of the relationship drop-down list,
    select Outgoing.
  11. In the Start node of the relationship area, select the
    tNeo4jBatchOutput component that provides the index of
    the start nodes, which is the asActors index in this
    example from the first tNeo4jBatchOutput. Then from the
    Field name for the batch index drop-down list, select the
    column that provides the actor names as the start nodes.
  12. Repeat this action in the End node of the relationship
    area to select the asMovie index from the second
    tNeo4jBatchOutput and then select the column that provides
    the movie names as the end nodes.

Adding uniqueness constraints on the nodes

  1. Double-click the tNeo4jBatchSchema component
    to open its Component view.

    tNeo4jBatchOutputRelationship_12.png

  2. Select the Use an existing connection
    check box to reuse the Neo4j database connection opened by the tNeo4jConnection component.
  3. Select the Shutdown after Job check box to properly close the connection after the execution.
  4. In the Schema definition table, add two rows by clicking the [+] button twice:

    1. In the Schema type column, select
      Node property is unique for both of the rows to
      add uniqueness constraints to nodes in Neo4j.
    2. In the For node with Label column, enter, within double quotation marks, Actor and Movie respectively, which are the labels used by the actor nodes and the movie nodes. Therefore, what you enter here must be identical with the labels previously used when creating those nodes.
    3. In the On property column, enter, within double
      quotation marks, the node properties to which you need to add uniqueness
      constraints. For the actor nodes, enter name and
      for the movie nodes, enter title. The values you
      enter here must be identical with the column names previously defined to
      provide actor names and movie names for the nodes to be created by the
      tNeo4jBatchOutput components.
  5. Press Ctrl+S to save the Job, and press
    F6 or click Run on the Run tab to execute
    the Job.
Once the Job runs successfully to the end, check the result in your Neo4j browser:

tNeo4jBatchOutputRelationship_13.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x