tExtractDelimitedFields

tExtractDelimitedFields properties

Component family	Processing/Fields
Function	tExtractDelimitedFields generates multiple columns from a given column in a delimited file. If you have subscribed to one of the Talend solutions with Big Data, you are able to use this component in a Talend Map/Reduce Job to generate Map/Reduce code. For further information, see tExtractDelimitedFields in Talend Map/Reduce Jobs.
Purpose	tExtractDelimitedFields helps to extract ‘fields’ from within a string to write them elsewhere for example.
Basic settings	Field to split	Select an incoming field from the Field to split list to split.
	Ignore NULL as the source data	Select this check box to ignore the Null value in the source data. Clear this check box to generate the Null records that correspond to the Null value in the source data.
	Field separator	Enter character, string or regular expression to separate fields for the transferred data. Note Since this component uses regex to split a filed and the regex syntax uses special characters as operators, make sure to precede the regex operator you use as a field separator by a double backslash. For example, you have to use “\\|” instead of “\|”.
	Die on error	Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window. Click Sync columns to retrieve the schema from the previous component connected in the Job.
		Built-in: You create the schema and store it locally for the component. Related topic: see Talend Studio User Guide.
		Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job flowcharts. Related topic: see Talend Studio User Guide.
Advanced settings	Advanced separator (for number)	Select this check box to modify the separators used for numbers.
	Trim column	Select this check box to remove leading and trailing whitespace from all columns.
	Check each row structure against schema	Select this check box to check whether the total number of columns in each row is consistent with the schema. If not consistent, an error message will be displayed on the console.
	Validate date	Select this check box to check the date format strictly against the input schema.
	tStatCatcher Statistics	Select this check box to gather the processing metadata at the Job level as well as at each component level.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	This component handles flow of data therefore it requires input and output components. It allows you to extract data from a delimited field, using a Row > Main link, and enables you to create a reject flow filtering data which type does not match the defined type.
Log4j	The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide. For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.
Limitation	n/a

tExtractDelimitedFields in Talend Map/Reduce Jobs

Warning

The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.

In a Talend Map/Reduce Job, tExtractDelimitedFields, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tExtractDelimitedFields when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.

Component family	Processing / Fields
Basic settings	Property type	Either Built-in or Repository.
		Built-in: no property data stored centrally.
		Repository: reuse properties stored centrally under the Hadoop Cluster node of the Repository tree. The fields that come after are pre-filled in using the fetched data. For further information about the Hadoop Cluster node, see the Getting Started Guide.
	Field	Select the column in which the fields are to be split.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in.
		Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
		Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.
	Die on error	Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.
	Field separator	Enter character, string or regular expression to separate fields for the transferred data.
	CSV options	Select this check box to include CSV specific parameters such as Escape char and Text enclosure.
Advanced settings	Custom Encoding	You may encounter encoding issues when you process the stored data. In that situation, select this check box to display the Encoding list. Then select the encoding to be used from the list or select Custom and define it manually.
	Advanced separator (for number)	Select this check box to change the separator used for numbers. By default, the thousands separator is a coma (,) and the decimal separator is a period (.).
	Trim all columns	Select this check box to remove the leading and trailing whitespaces from all columns. When this check box is cleared, the Check column to trim table is displayed, which lets you select particular columns to trim.
	Check column to trim	This table is filled automatically with the schema being used. Select the check box(es) corresponding to the column(s) to be trimmed.
	Check each row structure against schema	Select this check box to check whether the total number of columns in each row is consistent with the schema. If not consistent, an error message will be displayed on the console.
	Check date	Select this check box to check the date format strictly against the input schema.
	Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric types	Select this check box if any of your numeric types (long, integer, short, or byte type), will be parsed from a hexadecimal or octal string.
	tStatCatcher Statistics	Select this check box to collect log data at the component level.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	If you have subscribed to one of the Talend solutions with Big Data, you can also use this component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate step and other components used along with it must be Map/Reduce components, too. They generate native Map/Reduce code that can be executed directly in Hadoop. Once a Map/Reduce Job is opened in the workspace, tExtractDelimitedFields as well as the MapReduce family appears in the Palette of the Studio. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Hadoop Connection	You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job. This connection is effective on a per-Job basis.

Scenario: Extracting fields from a comma-delimited file

This scenario describes a three-component Job where the tExtractdelimitedFields component is used to extract two columns from a
comma-delimited file.

First names and last names are extracted and displayed in the corresponding defined
columns on the console.

Linking the components

Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tExtractDelimitedFields, and tLogRow.
Connect them using the Row Main
links.

Configuring the components

Double-click the tFileInputDelimited
component to open its Basic settings
view.
In the Basic settings view, set Property Type to Built-In.
Click the […] button next to the
File Name field to select the path to
the input file.

Note

The File Name field is mandatory.

The input file used in this scenario is called test5.
It is a text file that holds comma-delimited data.
In the Basic settings view, fill in all
other fields as needed. For more information, see tFileInputDelimited. In this scenario, the header and the
footer are not set and there is no limit for the number of processed
rows
Click Edit schema to describe the data
structure of this input file. In this scenario, the schema is made of one
column, name.
Double-click the tExtractDelimitedFields
component to open its Basic settings
view.
From the Field to split list, select the
column to split, name in this scenario.
In the Field separator field, enter the
corresponding separator.
Click Edit schema to describe the data
structure of this processing component.
In the output panel of the [Schema of
tExtractDelimitedFields] dialog box, click the plus button to
add two columns for the output schema, firstname and
lastname.

In this scenario, we want to split the name column
into two columns in the output flow, firstname and
lastname.
Click OK to close the [Schema of tExtractDelimitedFields] dialog
box.
In the design workspace, select tLogRow
and click the Component tab to define its
basic settings. For more information, see tLogRow.

Executing the Job

Press Ctrl + S to save your Job.
Press F6 to execute it.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tExtractDelimitedFields – Docs for ESB 5.x