August 17, 2023

tFileInputMSXML – Docs for ESB 5.x

tFileInputMSXML

tFileInputMSXML_icon32.png

tFileInputMSXML Properties

Component family

XML or File/Input

 

Function

tFileInputMSXML reads and outputs
multiple schema within an XML structured file.

Purpose

tFileInputMSXML opens a complex
multi-structured file, reads its data structures (schemas) and then
uses Row links to send fields as
defined in the different schemas to the next Job components.

Basic settings

File Name

Name of the file and/or the variable to be processed.

For further information about how to define and use a variable in
a Job, see Talend Studio
User Guide.

 

Root XPath query

The root of the XML tree, which the query is based on.

 

Enable XPath in column “Schema XPath loop” but lose the
order

Select this check box if you want to define a XPath path in the
Schema XPath loop field of the
Outputs table while not keeping
the order of the data shown in the source XML file.

Warning

This options takes effect only if you select the Dom4j generation mode in the
Advanced settings
view.

 

Outputs

Schema: Define as many schemas as
needed.

Schema XPath loop: Enter the node
of the XML tree or XPath path which the loop is based on.

XPath Queries: Enter the fields
to be extracted from the structured input.

Create empty row: Select this
check box if you want to create empty rows for the empty field(s) in
the schema.

 

Die on error

Select this check box to stop the execution of the Job when an
error occurs. Clear the check box to skip the row on error and
complete the process for error-free rows.

Advanced settings

Trim all column

Select this check box to remove leading and trailing whitespaces
from defined columns.

 

Validate date

Select this check box to check the date format strictly against
the input schema.

Ignore DTD file Select this check box to ignore the DTD file indicated
in the XML file being processed.

 

Generation mode

Select the appropriate generation mode according to your memory
availability. The available modes are:

  • Slow and memory-consuming
    (Dom4j)

    Note

    This option allows you to use dom4j to process the XML
    files of high complexity.

  • Fast with low memory consumption
    (SAX)

 

Encoding

Select the encoding type from the list or select CUSTOM and define it manually. This field
is compulsory for DB data handling.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Reading a multi-structure XML file

The following scenario describes a Job which reads a multi-structure XML file,
extracts the desired fields and displays them on the console.

Designing the Job

  1. Drop a tFileInputMSXML component from the
    Palette onto the design workspace and
    double-click the component to open its Basic
    settings
    view in the Component tab.

    Use_Case_tFileInputMSXML1.png
  2. Browse to the XML file you want to process. In this example, it is
    D:/Input/multischema_xml.xml, which
    contains the following data:

  3. In the Root XPath query field, enter the
    root of the XML tree, which the query will be based on. In this example, it
    is “/root”.

  4. Select the Enable XPath in column “Schema XPath
    loop” but lose the order
    check box.

    In this example, to extract the desired fields, you need to define a XPath
    path in the Schema XPath loop field in the
    Outputs table for each output flow
    while not keeping the order of the data shown in the source XML file.

  5. Click the plus button to add lines in the Outputs table where you can define the output schemas,
    record and book in this
    example.

  6. In the Outputs table, click in the
    Schema cell and then click a three-dot
    button to display a dialog box where you can define the schema name.

    Enter a name for the output schema and click OK to close the dialog box.

    Use_Case_tFileInputMSXML2.png
  7. The tFileInputMSXML schema editor
    appears.

    Define the schema according to your need.

    use_case_tfileinputmsxml_schema.png
  8. Do the same to define the output schema record.

  9. In the Schema XPath loop cell, enter the
    node of the XML tree, which the loop is based on. In this example, enter
    “/book” and “/record” respectively.

  10. In the XPath Queries cell, enter the
    fields to be extracted from the structured XML input. In this example, enter
    the XPath query “.”.

  11. In the design workspace, drop two tLogRow
    compnents from the Palette and connect
    tFileInputMSXML to tLogRow1 and tLogRow2 using the book and
    record links respectively.

    Rename the two tLogRow components as
    book and record respectively.

    Use_Case_tFileInputMSXML3.png

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or
    clicking Run on the Run tab.

    The multi-structure XML file is read row by row and the extracted fields
    are displayed on the console. The first two fields are for the book schema, and the last two fields are for
    the record schema.

    Use_Case_tFileInputMSXML4.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x