August 17, 2023

tFileInputMSDelimited – Docs for ESB 5.x

tFileInputMSDelimited

tFileInputMSDelimited_icon32.png

tFileInputMSDelimited properties

Component family

File/Input

 

Function

tFileInputMSDelimited reads a
complex multi-structured delimited file.

Purpose

tFileInputMSDelimited opens a
complex multi-structured file, reads its data structures (schemas)
and then uses Row links to send
fields as defined in the different schemas to the next Job
components.

Basic settings

Multi Schema Editor

The [Multi Schema Editor] helps
to build and configure the data flow in a multi-structure delimited
file to associate one schema per output.

For more information, see The Multi Schema Editor.

 

Output

Lists all the schemas you define in the [Multi Schema Editor], along with the related record
type and the field separator that corresponds to every schema, if
different field separators are used.

 

Die on error

Select this check box to stop the execution of the Job when an
error occurs. Clear the check box to skip the row on error and
complete the process for error-free rows.

Advanced settings

Trim all column

Select this check box to remove leading and trailing whitespaces
from defined columns.

 

Validate date

Select this check box to check the date format strictly against
the input schema.

 

Advanced separator (for numbers)

Select this check box to modify the separators used for
numbers:

Thousands separator: define
separators for thousands.

Decimal separator: define
separators for decimals.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a
Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Usage

Use this component to read multi-structured delimited files and
separate fields contained in these files using a defined
separator.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not
provided. You can install the missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also find out and add all missing JARs easily on
the Modules tab in the Integration perspective
of your studio. For details, see https://help.talend.com/display/KB/How+to+install+external+modules+in+the+Talend+products
or the section describing how to configure the Studio in the Talend Installation and Upgrade
Guide
.

The Multi Schema Editor

The [Multi Schema Editor] enables you to:

  • set the path to the source file,

  • define the source file properties,

  • define data structure for each of the output schemas.

Note

When you define data structure for each of the output schemas in the [Multi Schema Editor], column names in the different
data structures automatically appear in the input schema lists of the components
that come after tFileInputMSDelimited. However,
you can still define data structures directly in the Basic
settings
view of each of these components.

The [Multi Schema Editor] also helps to declare
the schema that should act as the source schema (primary key) from the incoming data
to insure its unicity.The editor uses this mapping to associate all schemas
processed in the delimited file to the source schema in the same file.

Note

The editor opens with the first column, that usually holds the record type
indicator, selected by default. However, once the editor is open, you can select
the check box of any of the schema columns to define it as a primary key.

The below figure illustrates an example of the [Multi Schema
Editor]
.

Use_Case_tFileInputMSDelimeted1.png

For detailed information about the usage of the Multi Schema
Editor
, see Scenario: Reading a multi structure delimited file.

Scenario: Reading a multi structure delimited file

The following scenario creates a Java Job which aims at reading three schemas in a
delimited file and displaying their data structure on the Run Job
console.

The delimited file processed in this example looks like the following:

Use_Case_tFileInputMSDelimeted2.png

Dropping and linking components

  1. Drop a tFileInputMSDelimited component
    and three tLogRow components from the
    Palette onto the design
    workspace.

  2. In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1, tLogRow2, and tLogRow3
    using the row_A_1, row_B_1, and row_C_1 links
    respectively.

    Use_Case_tFileInputMSDelimeted5.png

Configuring the components

  1. Double-click tFileInputMSDelimited to
    open the Multi Schema Editor.

    Use_Case_tFileInputMSDelimeted.png
  2. Click Browse… next to the File name field to locate the multi schema
    delimited file you need to process.

  3. In the File Settings area:

    -Select from the list the encoding type the source file is encoded in.
    This setting is meant to ensure encoding consistency throughout all input
    and output files.

    -Select the field and row separators used in the source file.

    Note

    Select the Use Multiple Separator
    check box and define the fields that follow accordingly if different
    field separators are used to separate schemas in the source file.

    A preview of the source file data displays automatically in the Preview panel.

    Use_Case_tFileInputMSDelimeted1.png

    Note

    Column 0 that usually holds the
    record type indicator is selected by default. However, you can select
    the check box of any of the other columns to define it as a primary
    key.

  4. Click Fetch Codes to the right of the
    Preview panel to list the type of
    schema and records you have in the source file. In this scenario, the source
    file has three schema types (A, B, C).

    Click each schema type in the Fetch Codes
    panel to display its data structure below the Preview panel.

  5. Click in the name cells and set column names for each of the selected
    schema.

    In this scenario, column names read as the following:

    -Schema A: Type, DiscName, Author,
    Date
    ,

    -Schema B: Type,
    SongName
    ,

    -Schema C: Type,
    LibraryName
    .

    You need now to set the primary key from the incoming data to insure its
    unicity (DiscName in this scenario). To do that:

  6. In the Fetch Codes panel, select the
    schema holding the column you want to set as the primary key (schema
    A in this scenario) to display its data
    structure.

  7. Click in the Key cell that corresponds to the
    DiscName column and select the check box that
    appears.

    Use_Case_tFileInputMSDelimeted4.png
  8. Click anywhere in the editor and the false in the
    Key cell will become
    true.

    You need now to declare the parent schema by which you want to group the
    other “children” schemas (DiscName in this scenario).
    To do that:

  9. In the Fetch Codes panel, select schema
    B and click the right arrow button to move it to
    the right. Then, do the same with schema C.

    Use_Case_tFileInputMSDelimeted3.png

    Note

    The Cardinality field is not
    compulsory. It helps you to define the number (or range) of fields in
    “children” schemas attached to the parent schema. However, if you set
    the wrong number or range and try to execute the Job, an error message
    will display.

  10. In the [Multi Schema Editor], click
    OK to validate all the changes you did
    and close the editor.

    The three defined schemas along with the corresponding record types and
    field separators display automatically in the Basic
    settings
    view of tFileInputMSDelimited.

    Use_Case_tFileInputMSDelimeted6.png

    The three schemas you defined in the [Multi Schema
    Editor]
    are automatically passed to the three tLogRow components.

  11. If needed, click the Edit schema button
    in the Basic settings view of each of the
    tLogRow components to view the input
    and output data structures you defined in the Multi
    Schema Editor
    or to modify them.

    Use_Case_tFileInputMSDelimeted7.png

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to
    execute the Job.

    The multi schema delimited file is read row by row and the extracted
    fields are displayed on the Run Job console
    as defined in the [Multi Schema
    Editor]
    .

    Use_Case_tFileInputMSDelimeted8.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x