tFileInputJSON

tFileInputJSON properties

Component Family	File / Input
Function	tFileInputJSON extracts JSON data from a file according to the JSONPath query. If you have subscribed to one of the Talend solutions with Big Data, you are able to use this component in a Talend Map/Reduce Job to generate Map/Reduce code. For further information, see tFileInputJSON in Talend Map/Reduce Jobs. In that situation, tFileInputJSON belongs to the MapReduce component family.
Purpose	tFileInputJSON extracts JSON data from a file according to the JSONPath query, then transferring the data to a file, a database table, etc.
Basic settings	Property type	Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
		Built-in: No property data stored centrally.
		Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
		Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job flowcharts. Related topic: see Talend Studio User Guide.
	Read By XPath	This check box is selected by default. It allows you to show the Loop JSONPath query field and the Get nodes check box in the Mapping table.
	Use URL	Select this check box to retrieve data directly from the Web. URL: type in the URL path from which you will retrieve data.
	Filename	This field is not available if you select the Use URL check box. Click the […] button next to the field to browse to the file from which you will retrieve data or enter the full path to the file directly.
	Loop JSONPath query	JSONPath query to specify the loop node of the JSON data. Available when Read by XPath is selected.
	Mapping	Column: shows the schema defined in the Schema editor. JSONPath Query: specifies the JSON node that holds the desired data. For details about JSONPath expressions, go to http://goessner.net/articles/JsonPath/. Get nodes: available when Read by XPath is selected. Select this check box to extract the JSON data of all the nodes specified in the XPath query list or select the check box next to a specific node to extract its JSON data only.
	Die on error	Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link.
Advanced settings	Advanced separator (for numbers)	Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals.
	Validate date	Select this check box to check the date format strictly against the input schema. This check box is available only if the Read By XPath check box is selected.
	Encoding	Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.
	tStatCatcher Statistics	Select this check box to gather the Job processing metadata at a Job level as well as at each component level.
Global Variables	NB_LINE: the number of rows processed. This is an After variable and it returns an integer. ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	This component is a start component of a Job and always needs an output link.
Usage in Map/Reduce Jobs	In a Talend Map/Reduce Job, it is used as a start component and requires a transformation component as output link. The other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job. For further information about a Talend Map/Reduce Job, see the sections describing how to create, convert and configure a Talend Map/Reduce Job of the Talend Big Data Getting Started Guide. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Log4j	The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide. For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

tFileInputJSON in Talend
Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tFileInputJSON, as well as the whole Map/Reduce Job using it, generates
native Map/Reduce code. This section presents the specific properties of tFileInputJSON when it is used in that situation. For further
information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family	MapReduce / Input
Function	In a Map/Reduce Job, tFileInputJSON extracts data from one or more JSON files on HDFS and sends it to the following transformation component.
Basic settings	Property type	Either Built-in or Repository.
		Built-in: no property data stored centrally.
		Repository: reuse properties stored centrally under the File Json node of the Repository tree. The fields that come after are pre-filled in using the fetched data. For further information about the File Json node, see the section about setting up a JSON file schema in Talend StudioUser Guide.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
		Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.
	Folder/File	Enter the path to the file or folder on HDFS from which the data will be extracted. If the path you entered points to a folder, all files stored in that folder will be read. If the file to be read is a compressed one, enter the file name with its extension; then tFileInputJSON automatically decompresses it at runtime. The supported compression formats and their corresponding extensions are: DEFLATE: .deflate gzip: .gz bzip2: .bz2 LZO: .lzo Note that you need to ensure you have properly configured the connection to the Hadoop distribution to be used in the Hadoop configuration tab in the Run view.
	Die on error	Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.
	Loop Xpath query	Node within the JSON field, on which the loop is based.
	Mapping	Complete the Mapping table to extract the desired data. Column: columns defined in the schema to hold the data extracted from the JSON field. XPath query: XPath query to specify the node within the JSON field to be extracted. Get Nodes: this check box can be selected to get values from a nested node within the JSON field.
Advanced settings	Advanced separator (for number)	Select this check box to change the separator used for numbers. By default, the thousands separator is a coma (,) and the decimal separator is a period (.).
	Validate date	Select this check box to check the date format strictly against the input schema.
	Encoding	Select the encoding from the list or select Custom and define it manually.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	In a Talend Map/Reduce Job, it is used as a start component and requires a transformation component as output link. The other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. Once a Map/Reduce Job is opened in the workspace, tFileInputJSON as well as the MapReduce family appears in the Palette of the Studio. For further information about a Talend Map/Reduce Job, see the sections describing how to create, convert and configure a Talend Map/Reduce Job of the Talend Big Data Getting Started Guide. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Hadoop Connection	You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job. This connection is effective on a per-Job basis.
Prerequisites	The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio. The following list presents MapR related information for example. Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR client library to the PATH variable of that machine. According to MapR’s documentation, the library or libraries of a MapR client corresponding to each OS version can be found under MAPR_INSTALL hadoophadoop-VERSIONlib ative. For example, the library for Windows is lib ativeMapRClient.dll in the MapR client jar file. For further information, see the following link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr. Without adding the specified library or libraries, you may encounter the following error: `no MapRClient in java.library.path`. Set the `-Djava.library.path` argument, for example, in the Job Run VM arguments area of the Run/Debug view in the [Preferences] dialog box. This argument provides to the Studio the path to the native library of that MapR client. This allows the subscription-based users to make full use of the Data viewer to view locally in the Studio the data stored in MapR. For further information about how to set this argument, see the section describing how to view data of Talend Big Data Getting Started Guide. For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop distribution you are using.

Scenario 1: Extracting JSON data from a file

In this scenario, the tFileInputJSON component reads
the JSON data from a file using JSONPath queries and the tLogRow component shows the flat data extracted.

The JSON file contains information about a movie collection.

Linking the components

Drop tFileInputJSON and tLogRow from the Palette onto the Job designer.
Rename tFileInputJSON as read_JSON_data and tLogRow as show_data.
Link the components using a Row >
Main connection.

Configuring the components

Double-click tFileInputJSON to open its
Basic settings view:
Click the […] button next to the
Edit schema field to open the schema
editor.
Click the [+] button to add five columns,
namely type, movie_name, release,
rating and starring, with the type of String except for the column rating, which is Double.

Click OK to close the editor.
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components.

In the Filename field, fill in the path
to the JSON file.

In this example, the JSON file is as follows:

{"movieCollection": [
    {
        "type": "Action Movie",
        "name": "Brave Heart",
        "details": {
            "release": "1995",
            "rating": "5",
            "starring": "Mel Gibson"
        }
    },
    {
        "type": "Action Movie",
        "name": "Edge of Darkness",
        "details": {
            "release": "2010",
            "rating": "5",
            "starring": "Mel Gibson"
        }
    }
]}

{"movieCollection": [

{

"type": "Action Movie",

"name": "Brave Heart",

"details": {

"release": "1995",

"rating": "5",

"starring": "Mel Gibson"

}

{

"type": "Action Movie",

"name": "Edge of Darkness",

"details": {

"release": "2010",

"rating": "5",

"starring": "Mel Gibson"

}

]}

Clear the Read By XPath check box.
In the Mapping table, the schema
automatically appears in the Column
part.

In the JSONPath query column, enter the
following queries:
- For the columns type and
  name, enter the JSONPath
  queries “$.movieCollection[*].type” and “$.movieCollection[*].name”
  respectively. They correspond to the first nodes of the JSON
  data.
  
  Here, “$.movieCollection[*]”
  stands for the root node relative to the nodes type and name, namely movieCollection.
- For the columns release,
  rating and starring, enter the JSONPath queries
  “$..release”, “$..rating” and “$..starring” respectively.
  
  Here, “..” stands for the
  recursive decent of the details
  node, namely release, rating and starring.
Double-click tLogRow to display the
Basic settings view and select
Table (print values in cells of a
table) for a better display of the results.

Executing the Job

Press Ctrl+S to save the Job.
Press F6 to execute the Job.

As shown above, the source JSON data is collected in a flat file
table.

Scenario 2: Extracting JSON data from a file using XPath

In this scenario, the tFileInputJSON component reads
the JSON data from a file using XPath queries and the tLogRow component shows the flat data extracted.

The JSON file contains information about a movie collection.

Linking the components

Drop tFileInputJSON and tLogRow from the Palette onto the Job designer.
Rename tFileInputJSON as read_JSON_data and tLogRow as show_data.
Link the components using a Row >
Main connection.

Configuring the components

Double-click tFileInputJSON to open its
Basic settings view:
Click the […] button next to the
Edit schema field to open the schema
editor.
Click the [+] button to add five columns,
namely type, movie_name, release,
rating and starring, with the type of String except for the column rating, which is Double.

Click OK to close the editor.
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components.

In the Filename field, enter the path to
the JSON file.

In this example, the JSON file is as follows:

{"movieCollection": [
    {
        "type": "Action Movie",
        "name": "Brave Heart",
        "details": {
            "release": "1995",
            "rating": "5",
            "starring": "Mel Gibson"
        }
    },
    {
        "type": "Action Movie",
        "name": "Edge of Darkness",
        "details": {
            "release": "2010",
            "rating": "5",
            "starring": "Mel Gibson"
        }
    }
]}

{"movieCollection": [

{

"type": "Action Movie",

"name": "Brave Heart",

"details": {

"release": "1995",

"rating": "5",

"starring": "Mel Gibson"

}

{

"type": "Action Movie",

"name": "Edge of Darkness",

"details": {

"release": "2010",

"rating": "5",

"starring": "Mel Gibson"

}

]}

Make sure that the Read By XPath check
box is selected.
In the Loop JSONPath query field, enter
“/movieCollection/details”.
In the Mapping table, the schema
automatically appears in the Column
part.

In the XPath query column, enter the
following queries:
- For the columns type and
  name, enter the XPath queries
  “../type” and “../name” respectively. They correspond
  to the first nodes of the JSON data.
- For the columns release,
  rating and starring, enter the XPath queries
  “release”, “rating” and “starring” respectively.
Double-click tLogRow to display the
Basic settings view and select
Table (print values in cells of a
table) for a better display of the results.

Executing the Job

Press Ctrl+S to save the Job.
Press F6 to execute the Job.

As shown above, the source JSON data is collected in a flat file
table.

Scenario 3: Extracting JSON data from a URL

In this scenario, tFileInputJSON retrieves the
friends node from a JSON file that contains the
data of a Facebook user and tExtractJSONFields extracts
the data from the friends node for flat data
output.

Note that the JSON file is deployed on the Tomcat server, specifically, located in the
folder <tomcat path>/webapps/docs.

Linking the components

Drop the following components from the Palette onto the design workspace: tFileInputJSON, tExtractJSONFields and tLogRow.
Link tFileInputJSON and tExtractJSONFields using a Row > Main connection.
Link tExtractJSONFields and tLogRow using a Row > Main connection.

Configuring the components

Double-click tFileInputJSON to display
its Basic settings view.
Click the […] button next to the
Edit schema field to open the schema
editor.

Click the [+] button to add one column,
namely friends, of the String type.

Click OK to close the editor.

Clear the Read by XPath check box and
select the Use Url check box.

In the URL field, enter the JSON file
URL, “http://localhost:8080/docs/facebook.json” in this
case.

The JSON file is as follows:

{ "user": { "id": "9999912398",
            "name": "Kelly Clarkson",
            "friends": [
                 { "name": "Tom Cruise",
                   "id": "55555555555555",
                   "likes": {
                       "data": [
                            { "category": "Movie",
                              "name": "The Shawshank Redemption",
                              "id": "103636093053996",
                              "created_time": "2012-11-20T15:52:07+0000"
                            },
                            { "category": "Community",
                              "name": "Positiveretribution",
                              "id": "471389562899413",
                              "created_time": "2012-12-16T21:13:26+0000"
                            }
                                ]
                            }
                 },
                 { "name": "Tom Hanks",
                   "id": "88888888888888"
                   "likes": {
                        "data": [
                            { "category": "Journalist",
                              "name": "Janelle Wang",
                              "id": "136009823148851",
                              "created_time": "2013-01-01T08:22:17+0000"
                            },
                            { "category": "Tv show",
                              "name": "Now With Alex Wagner",
                              "id": "305948749433410",
                              "created_time": "2012-11-20T06:14:10+0000"
                            }
                            ]
                           }
                  }
                        ]
          }
}

{ "user": { "id": "9999912398",

"name": "Kelly Clarkson",

"friends": [

{ "name": "Tom Cruise",

"id": "55555555555555",

"likes": {

"data": [

{ "category": "Movie",

"name": "The Shawshank Redemption",

"id": "103636093053996",

"created_time": "2012-11-20T15:52:07+0000"

{ "category": "Community",

"name": "Positiveretribution",

"id": "471389562899413",

"created_time": "2012-12-16T21:13:26+0000"

}

]

}

{ "name": "Tom Hanks",

"id": "88888888888888"

"likes": {

"data": [

{ "category": "Journalist",

"name": "Janelle Wang",

"id": "136009823148851",

"created_time": "2013-01-01T08:22:17+0000"

{ "category": "Tv show",

"name": "Now With Alex Wagner",

"id": "305948749433410",

"created_time": "2012-11-20T06:14:10+0000"

}

]

}

]

}

Enter the URL in a browser. If the Tomcat server is running, the browser
displays:
In the Studio, in the Mapping table,
enter the JSONPath query “$.user.friends[*]” next to the friends column, retrieving the entire friends node from the source file.
Double-click tExtractJSONFields to
display its Basic settings view.
Click the […] button next to the
Edit schema field to open the schema
editor.
Click the [+] button in the right panel
to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the
data of relevant nodes in the JSON field friends.

Click OK to close the editor.
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components.
In the Loop XPath query field, enter
“/likes/data”.
In the Mapping table, type in the queries
of the JSON nodes in the XPath query
column. The data of those nodes will be extracted and passed to their
counterpart columns defined in the output schema.
Specifically, define the XPath query “../../id” (querying the “/friends/id” node) for the column id, “../../name”
(querying the “/friends/name” node) for
the column name, “id” for the column like_id, “name” for the
column like_name, and “category” for the column like_category.
Double-click tLogRow to display its
Basic settings view.

Select Table (print values in cells of a
table) for a better display of the results.

Executing the Job

Press Ctrl + S to save the Job.
Click F6 to execute the Job.

As shown above, the friends data of the Facebook user Kelly Clarkson is
extracted correctly.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tFileInputJSON – Docs for ESB 5.x

tFileInputJSON

tFileInputJSON properties

tFileInputJSON in Talend
Map/Reduce Jobs

Warning

Scenario 1: Extracting JSON data from a file

Linking the components

Configuring the components

Executing the Job

Scenario 2: Extracting JSON data from a file using XPath

Linking the components

Configuring the components

Executing the Job

Scenario 3: Extracting JSON data from a URL

Linking the components

Configuring the components

Executing the Job

My Website Links

Tags

tFileInputJSON

tFileInputJSON properties

tFileInputJSON in Talend Map/Reduce Jobs

Warning

Scenario 1: Extracting JSON data from a file

Linking the components

Configuring the components

Executing the Job

Scenario 2: Extracting JSON data from a file using XPath

Linking the components

Configuring the components

Executing the Job

Scenario 3: Extracting JSON data from a URL

Linking the components

Configuring the components

Executing the Job

tFileInputJSON in Talend
Map/Reduce Jobs