Component family |
File/Input |
|
Function |
Powerful feature which can replace number of other components of |
|
Purpose |
Opens a file and reads it row by row to split them up into fields |
|
Basic settings |
Property type |
Either Built-in or Repository. |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
File Name/Stream |
File name: Name of the file Stream: Data flow to be For further information about how to define and use a variable in Note that in the Map/Reduce version of |
|
Folder/File |
Browse to, or enter the directory in HDFS where the data you need to use is. If the path you set points to a folder, this component will read If you want to specify more than one files or directories in this If the file to be read is a compressed one, enter the file name
Note that you need |
|
Row separator |
Enter the separator used to identify the end of a row. |
|
Regex |
This field can contain multiple lines. Type in your regular
Note: Antislashes need to be WarningRegex syntax requires double quotes. |
|
Header |
Enter the number of rows to be skipped in the beginning of file. |
|
Footer |
Number of rows to be skipped at the end of the file. |
|
Limit |
Maximum number of rows to be processed. If Limit = 0, no row is |
|
Ignore error message for the unmatched record |
Select this check box to avoid outputing error messages for records that do not match |
|
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
Skip empty rows |
Select this check box to skip the empty rows. |
|
Die on error |
Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip any rows on error and complete the process for error-free rows. |
Advanced settings |
Encoding |
Select the encoding from the list or select Custom and In the Map/Reduce version of tFileInputRegex, you need to select the |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
NB_LINE: the number of rows processed. This is an After The NB_LINE ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
Use this component to read a file and separate fields contained in |
|
Usage in Map/Reduce Jobs |
In a Talend Map/Reduce Job, it is used as a start component and requires You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. For further information about a Talend Map/Reduce Job, see the sections Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
n/a |
The following scenario creates a two-component Job, reading data from an Input file
using regular expression and outputting delimited data into an XML file.
Dropping and linking the components
-
Drop a tFileInputRegex component from the
Palette to the design workspace. -
Drop a tFileOutputPositional component the
same way. -
Right-click on the tFileInputRegex component
and select Row >
Main. Drag this main row link
onto the tFileOutputPositional component and
release when the plug symbol displays.
Configuring the components
-
Select the tFileInputRegex again so the
Component view shows up, and define the
properties: -
The Job is built-in for this scenario. Hence, the Properties are set for this
station only. -
Fill in a path to the file in File Name
field. This field is mandatory. -
Define the Row separator identifying the end
of a row. -
Then define the Regular expression in order
to delimit fields of a row, which are to be passed on to the next component. You
can type in a regular expression using Java code, and on mutiple lines if
needed.Warning
Regex syntax requires double quotes.
-
In this expression, make sure you include all subpatterns matching the fields
to be extracted. -
In this scenario, ignore the header, footer and limit fields.
-
Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional component.
-
You can load or create the schema through the Edit
Schema function. -
Then define the second component properties:
-
Enter the Positional file output path.
-
Enter the Encoding standard, the output file
is encoded in. Note that, for the time being, the encoding consistency
verification is not supported. -
Select the Schema type. Click on Sync columns to automatically synchronize the schema
with the Input file schema.