tFileInputMSXML
fields as defined in the different schemas to the next components using Row
connections.
tFileInputMSXML Standard properties
These properties are used to configure tFileInputMSXML running in the Standard Job framework.
The Standard
tFileInputMSXML component belongs to the File and the XML families.
The component in this framework is available in all Talend
products.
Basic settings
File Name |
Name of the file and/or the variable to be processed. For further information about how to define and use a variable in Warning: Use absolute path (instead of relative path) for
this field to avoid possible errors. |
Root XPath query |
The root of the XML tree, which the query is based on. |
Enable XPath in column “Schema XPath loop” but lose the |
Select this check box if you want to define a XPath path in the Warning:
This options takes effect only if you select the Dom4j generation mode in the |
Outputs |
Schema: Define as many schemas as
Schema XPath loop: Enter the node
XPath Queries: Enter the fields
Create empty row: Select this |
Die on error |
Select this check box to stop the execution of the Job when an |
Advanced settings
Trim all column |
Select this check box to remove leading and trailing whitespaces |
Validate date |
Select this check box to check the date format strictly against |
Ignore DTD file | Select this check box to ignore the DTD file indicated in the XML file being processed. |
Generation mode |
Select the appropriate generation mode according to your memory
|
Encoding |
Select the encoding type from the list or select CUSTOM and define it manually. This field |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables
Global Variables |
NB_LINE: the number of rows processed. This is an After
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see |
Reading a multi-structure XML file
The following scenario describes a Job which reads a multi-structure XML file,
extracts the desired fields and displays them on the console.
Designing the Job
-
Drop a tFileInputMSXML component from the
Palette onto the design workspace and
double-click the component to open its Basic
settings view in the Component tab. -
Browse to the XML file you want to process. In this example, it is
D:/Input/multischema_xml.xml, which
contains the following data:12345678<root><toy>Cat</toy><record>We Belong Together</record><book>As You Like It</book><book>All's Well That Ends Well</book><record>When You Believe</record><toy>Dog</toy></root> -
In the Root XPath query field, enter the
root of the XML tree, which the query will be based on. In this example, it
is “/root”. -
Select the Enable XPath in column “Schema XPath
loop” but lose the order check box.In this example, to extract the desired fields, you need to define a XPath
path in the Schema XPath loop field in the
Outputs table for each output flow
while not keeping the order of the data shown in the source XML file. -
Click the plus button to add lines in the Outputs table where you can define the output schemas,
record and book in this
example. -
In the Outputs table, click in the
Schema cell and then click a three-dot
button to display a dialog box where you can define the schema name.Enter a name for the output schema and click OK to close the dialog box. -
The tFileInputMSXML schema editor
appears.Define the schema according to your need. - Do the same to define the output schema record.
-
In the Schema XPath loop cell, enter the
node of the XML tree, which the loop is based on. In this example, enter
“/book” and “/record” respectively. -
In the XPath Queries cell, enter the
fields to be extracted from the structured XML input. In this example, enter
the XPath query “.”. -
In the design workspace, drop two tLogRow
compnents from the Palette and connect
tFileInputMSXML to tLogRow1 and tLogRow2 using the book and
record links respectively.Rename the two tLogRow components as
book and record respectively.
Saving and executing the Job
- Press Ctrl+S to save your Job.
-
Execute the Job by pressing F6 or
clicking Run on the Run tab.The multi-structure XML file is read row by row and the extracted fields
are displayed on the console. The first two fields are for the book schema, and the last two fields are for
the record schema.