Component Family |
File / Input |
|
Function |
tFileInputJSON extracts JSON data If you have subscribed to one of the Talend solutions with Big Data, you are |
|
Purpose |
tFileInputJSON extracts JSON data |
|
Basic settings |
Property type |
Either Built-in or Repository. Since version 5.6, both the Built-In mode and the Repository mode are |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
Schema and Edit |
A schema is a row description. It defines the number of fields to Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
Read By XPath |
This check box is selected by default. It allows you to show the |
|
Use URL |
Select this check box to retrieve data directly from the Web. URL: type in the URL path from |
|
Filename |
This field is not available if you select the Use URL check box. Click the […] button next to |
|
Loop JSONPath query |
JSONPath query to specify the loop node of the JSON data. Available when Read by XPath is |
|
Mapping |
Column: shows the schema defined
JSONPath Query: specifies the JSON
Get nodes: available when Read by XPath is selected. Select this |
Die on error |
Select this check box to stop the execution of the Job when an |
|
Advanced settings |
Advanced separator (for numbers) |
Select this check box to modify the separators used for
Thousands separator: define
Decimal separator: define |
Validate date |
Select this check box to check the date format strictly against |
|
|
Encoding |
Select the encoding type from the list or select Custom and define it manually. This field |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
NB_LINE: the number of rows processed. This is an After ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This component is a start component of a Job and always needs an |
|
Usage in Map/Reduce Jobs |
In a Talend Map/Reduce Job, it is used as a start component and requires You need to use the Hadoop Configuration tab in the For further information about a Talend Map/Reduce Job, see the sections Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tFileInputJSON, as well as the whole Map/Reduce Job using it, generates
native Map/Reduce code. This section presents the specific properties of tFileInputJSON when it is used in that situation. For further
information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
Component family |
MapReduce / Input |
|
Function |
In a Map/Reduce Job, tFileInputJSON extracts data from one or more JSON |
|
Basic settings |
Property type |
Either Built-in or Repository. |
Built-in: no property data stored |
||
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the File |
||
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit schema to make changes to the schema. If the
|
|
Built-In: You create and store the schema locally for this |
||
Repository: You have already created the schema and |
||
|
Folder/File |
Enter the path to the file or folder on HDFS from which the data If the path you entered points to a folder, all files stored in If the file to be read is a compressed one, enter the file name
Note that you need |
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. |
Loop Xpath query |
Node within the JSON field, on which the loop is based. |
|
Mapping |
Complete the Mapping table to
|
|
Advanced settings |
Advanced separator (for number) |
Select this check box to change the separator used for numbers. By |
Validate date |
Select this check box to check the date format strictly against |
|
Encoding |
Select the encoding from the list or select Custom and define it manually. |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
In a Talend Map/Reduce Job, it is used as a start component and requires Once a Map/Reduce Job is opened in the workspace, tFileInputJSON as well as the MapReduce For further information about a Talend Map/Reduce Job, see the sections Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
|
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction
For further information about how to install a Hadoop distribution, see the manuals |
In this scenario, the tFileInputJSON component reads
the JSON data from a file using JSONPath queries and the tLogRow component shows the flat data extracted.
The JSON file contains information about a movie collection.
-
Drop tFileInputJSON and tLogRow from the Palette onto the Job designer.
-
Rename tFileInputJSON as read_JSON_data and tLogRow as show_data.
-
Link the components using a Row >
Main connection.
-
Double-click tFileInputJSON to open its
Basic settings view: -
Click the […] button next to the
Edit schema field to open the schema
editor. -
Click the [+] button to add five columns,
namely type, movie_name, release,
rating and starring, with the type of String except for the column rating, which is Double.Click OK to close the editor.
-
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components. -
In the Filename field, fill in the path
to the JSON file.In this example, the JSON file is as follows:
1234567891011121314151617181920{"movieCollection": [{"type": "Action Movie","name": "Brave Heart","details": {"release": "1995","rating": "5","starring": "Mel Gibson"}},{"type": "Action Movie","name": "Edge of Darkness","details": {"release": "2010","rating": "5","starring": "Mel Gibson"}}]} -
Clear the Read By XPath check box.
-
In the Mapping table, the schema
automatically appears in the Column
part.In the JSONPath query column, enter the
following queries:-
For the columns type and
name, enter the JSONPath
queries “$.movieCollection[*].type” and “$.movieCollection[*].name”
respectively. They correspond to the first nodes of the JSON
data.Here, “$.movieCollection[*]”
stands for the root node relative to the nodes type and name, namely movieCollection. -
For the columns release,
rating and starring, enter the JSONPath queries
“$..release”, “$..rating” and “$..starring” respectively.Here, “..” stands for the
recursive decent of the details
node, namely release, rating and starring.
-
-
Double-click tLogRow to display the
Basic settings view and select
Table (print values in cells of a
table) for a better display of the results.
In this scenario, the tFileInputJSON component reads
the JSON data from a file using XPath queries and the tLogRow component shows the flat data extracted.
The JSON file contains information about a movie collection.
-
Drop tFileInputJSON and tLogRow from the Palette onto the Job designer.
-
Rename tFileInputJSON as read_JSON_data and tLogRow as show_data.
-
Link the components using a Row >
Main connection.
-
Double-click tFileInputJSON to open its
Basic settings view: -
Click the […] button next to the
Edit schema field to open the schema
editor. -
Click the [+] button to add five columns,
namely type, movie_name, release,
rating and starring, with the type of String except for the column rating, which is Double.Click OK to close the editor.
-
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components. -
In the Filename field, enter the path to
the JSON file.In this example, the JSON file is as follows:
1234567891011121314151617181920{"movieCollection": [{"type": "Action Movie","name": "Brave Heart","details": {"release": "1995","rating": "5","starring": "Mel Gibson"}},{"type": "Action Movie","name": "Edge of Darkness","details": {"release": "2010","rating": "5","starring": "Mel Gibson"}}]} -
Make sure that the Read By XPath check
box is selected. -
In the Loop JSONPath query field, enter
“/movieCollection/details”. -
In the Mapping table, the schema
automatically appears in the Column
part.In the XPath query column, enter the
following queries:-
For the columns type and
name, enter the XPath queries
“../type” and “../name” respectively. They correspond
to the first nodes of the JSON data. -
For the columns release,
rating and starring, enter the XPath queries
“release”, “rating” and “starring” respectively.
-
-
Double-click tLogRow to display the
Basic settings view and select
Table (print values in cells of a
table) for a better display of the results.
In this scenario, tFileInputJSON retrieves the
friends node from a JSON file that contains the
data of a Facebook user and tExtractJSONFields extracts
the data from the friends node for flat data
output.
Note that the JSON file is deployed on the Tomcat server, specifically, located in the
folder <tomcat path>/webapps/docs.
-
Drop the following components from the Palette onto the design workspace: tFileInputJSON, tExtractJSONFields and tLogRow.
-
Link tFileInputJSON and tExtractJSONFields using a Row > Main connection.
-
Link tExtractJSONFields and tLogRow using a Row > Main connection.
-
Double-click tFileInputJSON to display
its Basic settings view. -
Click the […] button next to the
Edit schema field to open the schema
editor.Click the [+] button to add one column,
namely friends, of the String type.Click OK to close the editor.
-
Clear the Read by XPath check box and
select the Use Url check box.In the URL field, enter the JSON file
URL, “http://localhost:8080/docs/facebook.json” in this
case.The JSON file is as follows:
12345678910111213141516171819202122232425262728293031323334353637383940{ "user": { "id": "9999912398","name": "Kelly Clarkson","friends": [{ "name": "Tom Cruise","id": "55555555555555","likes": {"data": [{ "category": "Movie","name": "The Shawshank Redemption","id": "103636093053996","created_time": "2012-11-20T15:52:07+0000"},{ "category": "Community","name": "Positiveretribution","id": "471389562899413","created_time": "2012-12-16T21:13:26+0000"}]}},{ "name": "Tom Hanks","id": "88888888888888""likes": {"data": [{ "category": "Journalist","name": "Janelle Wang","id": "136009823148851","created_time": "2013-01-01T08:22:17+0000"},{ "category": "Tv show","name": "Now With Alex Wagner","id": "305948749433410","created_time": "2012-11-20T06:14:10+0000"}]}}]}} -
Enter the URL in a browser. If the Tomcat server is running, the browser
displays: -
In the Studio, in the Mapping table,
enter the JSONPath query “$.user.friends[*]” next to the friends column, retrieving the entire friends node from the source file. -
Double-click tExtractJSONFields to
display its Basic settings view. -
Click the […] button next to the
Edit schema field to open the schema
editor. -
Click the [+] button in the right panel
to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the
data of relevant nodes in the JSON field friends.Click OK to close the editor.
-
In the pop-up Propagate box, click
Yes to propagate the schema to the
subsequent components. -
In the Loop XPath query field, enter
“/likes/data”. -
In the Mapping table, type in the queries
of the JSON nodes in the XPath query
column. The data of those nodes will be extracted and passed to their
counterpart columns defined in the output schema. -
Specifically, define the XPath query “../../id” (querying the “/friends/id” node) for the column id, “../../name”
(querying the “/friends/name” node) for
the column name, “id” for the column like_id, “name” for the
column like_name, and “category” for the column like_category. -
Double-click tLogRow to display its
Basic settings view.Select Table (print values in cells of a
table) for a better display of the results.