then sends these fields as defined in the Schema to the next component.
tFileStreamInputXML properties for Apache Spark Streaming
These properties are used to configure tFileStreamInputXML running in the Spark Streaming Job framework.
The Spark Streaming
tFileStreamInputXML component belongs to the File family.
Define a storage configuration
Select the configuration component to be used to provide the configuration
If you leave this check box clear, the target file system is the local
The configuration component to be used must be present in the same Job.
Either Built-In or Repository.
Built-In: No property data stored centrally.
Repository: Select the repository file where the
The properties are stored centrally under the Hadoop
The fields that come after are pre-filled in using the fetched
For further information about the Hadoop
Schema and Edit
A schema is a row description. It defines the number of fields
Built-In: You create and store the schema locally for this component
Repository: You have already created the schema and stored it in the
Browse to, or enter the path pointing to the data to be used in the file system.
If the path you set points to a folder, this component will
read all of the files stored in that folder, for example,
/user/talend/in; if sub-folders exist, the sub-folders are automatically
ignored unless you define the property
spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive to be
true in the Advanced properties table in the
Spark configuration tab.
If you want to specify more than one files or directories in this
If the file to be read is a compressed one, enter the file name
The button for browsing does not work with the Spark
Element to extract
Enter the element from which you need to read the contents and the
The element defined in this field is used at the root node of any
Note that any content outside this element is ignored and the
Loop XPath query
Node of the tree, which the loop is based on.
Note its root is the element you have defined in the Element to extract field.
Column: Columns to map. They
XPath Query: Enter the fields to
Get nodes: Select this check box
For further information about the Document type, see
Die on error
Select the check box to stop the execution of the Job when an error
You may encounter encoding issues when you process the stored data. In that
Select the encoding from the list or select Custom
This component is used as a start component and requires an output link.
This component, along with the Spark Streaming component Palette it belongs to, appears
Note that in this documentation, unless otherwise explicitly stated, a scenario presents
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis.
No scenario is available for the Spark Streaming version of this component