July 30, 2023

tXSDValidator – Docs for ESB 7.x

tXSDValidator

Helps at controlling data and structure quality of the file or flow to
be processed.

tXSDValidator
validates an input XML file or an input XML flow against an XSD file and sends the
validation log to the defined output.

tXSDValidator Standard properties

These properties are used to configure tXSDValidator running in the Standard Job framework.

The Standard
tXSDValidator component belongs to the XML family.

The component in this framework is available in all Talend
products
.

Basic settings

Mode

Select the validation mode from the drop-down list.

  • File Mode: to validate an
    input file.

  • Flow Mode: to validate an
    input flow.

Schema and Edit
schema

A schema is a row description. It defines the number of fields to be
processed and passed on to the next component.

Note that when File Mode is selected
from the Mode list, the schema of this
component is read-only and it contains standard information regarding
the file validation.

XSD file

Specify the path to the XSD reference file. The HTTP URL is also
supported, for example,
http://localhost:8080/book.xsd.

This field is available only when File
Mode
is selected from the Mode drop-down list.

XML file

Specify the path to the XML file to be validated.

This field is available only when File
Mode
is selected from the Mode drop-down list.

If XML is valid, display

Type in the message to be displayed on the console if the XML file is
valid.

This field is available only when File
Mode
is selected from the Mode drop-down list.

If XML is invalid, display

Type in the message to be displayed on the console if the XML file is
invalid.

This field is available only when File
Mode
is selected from the Mode drop-down list.

Print to console

Select this check box to display the validation message on the
console.

This check box is available only when File
Mode
is selected from the Mode drop-down list.

Allocate

Click the [+] button to add as many
rows as needed, and in each row set the value of the following
columns:

  • Input Column: click the cell
    and select a column to be validated.

  • XSD File: enter the path to
    the corresponding XSD reference file.

This table is available only when Flow
Mode
is selected from the Mode drop-down list.

Advanced settings

Enable Features

Click the [+] button to add as many
rows as needed, and in each row enter the feature to be enabled on the
underlying parser between double quotation marks, for example, “http://apache.org/xml/features/honour-all-schemaLocations”.

For more information about the features, see https://xerces.apache.org/xerces2-j/features.html.

Encoding

Enter the encoding type between double quotation marks.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

DIFFERENCE: the result of the validation. This is a Flow
variable and it returns a string.

VALID: the validation result. This is a Flow variable and
it returns a boolean.

XSD_ERROR_MESSAGE: the xsd error message generated by the
component. This is a Flow variable and it returns a string.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see
Talend Studio

User Guide.

Usage

Usage rule

When File Mode is selected, this
component can be used as a standalone component but it is usually linked
to an output component to gather the log data.

Validating data flows against an XSD file

This scenario describes a Job that validates an XML column in the input file ShipOrder.csv against the XSD reference file ShipOrder.xsd and then outputs valid rows into the delimited
file ShipOrder_Valid.csv and invalid rows and error
messages into the delimited file ShipOrder_Invalid.csv.
For a similar use case that validates an XML file, see Validating XML files.

tXSDValidator_1.png

The content of the input file ShipOrder.csv that
includes the XML column ShipOrder to be validated is as
follows:

The content of the XSD reference file ShipOrder.xsd is
as follows:

Adding and linking components

  1. Create a new Job and add a tFileInputDelimited component, a tXSDValidator component, and two tFileOutputDelimited components by typing their names in the
    design workspace or dropping them from the Palette.
  2. Double-click the tXSDValidator component to
    open its Basic settings view and select
    Flow Mode from the Mode drop-down list.
  3. Link the tFileInputDelimited component to the
    tXSDValidator component using a Row > Main
    connection.
  4. Link the tXSDValidator component to the first
    tFileOutputDelimited component using a
    Row > Main connection to output valid rows.
  5. Link the tXSDValidator component to the
    second tFileOutputDelimited component using a
    Row > Rejects connection to output invalid rows.

Configuring the components

  1. Double-click the tFileInputDelimited
    component to open its Basic settings view on
    the Component tab.

    tXSDValidator_2.png

  2. In the File name/Stream field, specify the
    path to the input file. In this example, it is E:/ShipOrder.csv.

    In the Header field, enter 1 to skip the first header row of the input
    file.
    Click the […] button next to Edit schema and define the schema by adding two
    columns ID and ShipOrder of String type.
    tXSDValidator_3.png

  3. Double-click the tXSDValidator component to
    open its Basic settings view on the Component tab.

    tXSDValidator_4.png

  4. Click the Sync columns button to retrieve the
    schema from the preceding tFileInputDelimited
    component, and in the pop-up dialog box, click Yes to propagate the schema to the two tFileOutputDelimited components.

    Add a row in the Allocate table by clicking
    the [+] button. Then click the Input Column cell and select the XML column ShipOrder to be validated from the drop-down list.
    And in the XSD File cell, enter the path to the
    XSD reference file, E:/ShipOrder.xsd in this
    example.
  5. Double-click the first tFileOutputDelimited
    component to open its Basic settings view on
    the Component tab.

    tXSDValidator_5.png

  6. In the File Name field, specify the path to
    the output file that will store valid rows. In this example, it is E:/ShipOrder_Valid.csv.

    Select the Include Header check box to
    include column headers in the output file.
  7. Double-click the second tFileOutputDelimited
    component to open its Basic settings view on
    the Component tab.

    tXSDValidator_6.png

  8. Click the […] button next to Edit schema to view its schema.

    tXSDValidator_7.png

    You can see an extra column errorMessage that
    holds the error information for invalid rows is added automatically into the
    schema in addition to the two propagated columns.
  9. In the File Name field, specify the path to
    the output file that will store invalid rows and error messages. In this
    example, it is E:/ShipOrder_Invalid.csv.

    Select the Include Header check box to
    include column headers in the output file.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.
  2. Press F6 to run the Job.

    tXSDValidator_8.png

    As shown above, the output file ShipOrder_Valid.csv contains two valid rows, and the output file
    ShipOrder_Invalid.csv contains one
    invalid row that doesn’t define the orderid
    attribute and the error message.

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x