Component family |
File/Output |
|
Function |
tFileOutputDelimited outputs data If you have subscribed to one of the Talend solutions with Big Data, you are |
|
Purpose |
This component writes a delimited file that holds data organized |
|
Basic settings |
Property type |
Either Built-in or Repository. |
|
|
Built-in: No property data stored |
|
|
Repository: Select the repository |
|
Use Output Stream |
Select this check box process the data flow of interest. Once you The data flow to be processed must be added to the flow in order This variable could be already pre-defined in your Studio or In order to avoid the inconvenience of hand writing, you could For further information about how to use a stream, see Scenario 2: Reading data from a remote file in streaming mode. |
|
File name |
Name or path to the output file and/or the variable to be This field becomes unavailable once you have selected the For further information about how to define and use a variable in |
|
Row Separator |
Enter the separator used to identify the end of a row. |
|
Field Separator |
Enter character, string or regular expression to separate fields for the transferred |
|
Append |
Select this check box to add the new rows at the end of the |
|
Include Header |
Select this check box to include the column header to the |
|
Compress as zip file |
Select this check box to compress the output file in zip |
|
Schema and Edit |
A schema is a row description, it defines the number of fields to Click Edit schema to make changes to the schema. If the
This component offers the advantage of the dynamic schema feature. This allows you to This dynamic schema feature is designed for the purpose of retrieving unknown columns |
|
|
Built-in: You can create the |
|
|
Repository: You have already |
|
Sync columns |
Click to synchronize the output file schema with the input file |
Advanced settings |
Advanced separator (for numbers) |
Select this check box to modify the separators used for
Thousands separator: define
Decimal separator: define |
|
CSV options |
Select this check box to include CSV specific parameters such as |
|
Create directory if not exists |
This check box is selected by default. It creates the directory |
|
Split output in several files |
In case of very big output files, select this check box to divide
Rows in each output file: set the |
|
Custom the flush buffer size |
Select this check box to define the number of lines to write
Row Number: set the number of lines |
|
Output in row mode |
Writes in row mode. |
|
Encoding |
Select the encoding from the list or select Custom and |
|
Don’t generate empty file |
Select this check box if you do not want to generate empty |
|
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at a |
Global Variables |
NB_LINE: the number of rows read by an input component or FILE_NAME: the name of the file being processed. This is ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
Use this component to write a delimited file and separate fields |
|
Log4j |
The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html. |
|
Limitation |
Due to license incompatibility, one or more JARs required to use this component are not |
Warning
The information in this section is only for users that have subscribed to one of
the Talend solutions with Big Data and is not applicable to
Talend Open Studio for Big Data users.
In a Talend Map/Reduce Job, tFileOutputDelimited, as well as the whole Map/Reduce Job using it,
generates native Map/Reduce code. This section presents the specific properties of
tFileOutputDelimited when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Big Data Getting Started Guide.
Component family |
MapReduce/Output |
|
Basic settings |
Property type |
Either Built-in or Repository. |
|
|
Built-in: No property data stored |
|
|
Repository: reuse properties The fields that come after are pre-filled in using the fetched For further information about the Hadoop |
|
Click this icon to open a database connection wizard and store the database connection For more information about setting up and storing database connection parameters, see |
|
|
Schema and Edit |
A schema is a row description, it defines the number of fields Click Edit schema to make changes to the schema. If the
|
|
|
Built-in: The schema will be |
|
|
Repository: The schema already |
|
Folder |
Browse to, or enter the directory in HDFS where the data you need to use is. This path must point to a folder rather than a file, because a Note that you need |
Action |
Select an operation for writing data: Create: Creates a file and write Overwrite: Overwrites the file |
|
|
Row separator |
Enter the separator used to identify the end of a row. |
|
Field separator |
Enter character, string or regular expression to separate fields for the transferred |
|
Include Header |
Select this check box to include the column header to the |
|
Custom encoding |
You may encounter encoding issues when you process the stored data. In that situation, select Select the encoding from the list or select Custom and |
Compress the data |
Select the Compress the data check box to compress the Hadoop provides different compression formats that help reduce the space needed for |
|
Merge result to single file |
Select this check box to merge the final part files into a single file and put that file in a Once selecting it, you need to enter the path to, or browse to the The following check boxes are used to manage the source and the target files:
This option is not available for a Sequence file. |
|
Advanced settings |
Advanced separator (for number) |
Select this check box to change the separator used for numbers. By This option is not available for a Sequence file. |
|
CSV options |
Select this check box to include CSV specific parameters such as |
|
Enable parallel execution |
Select this check box to perform high-speed data processing, by treating multiple data flows
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
In a Talend Map/Reduce Job, it is used as an end component and requires Once a Map/Reduce Job is opened in the workspace, tFileOutputDelimited as well as the Note that in this documentation, unless otherwise explicitly stated, a scenario presents |
|
Hadoop Connection |
You need to use the Hadoop Configuration tab in the This connection is effective on a per-Job basis. |
This scenario describes a three-component Job that extracts certain data from a file
holding information about clients, customers, and then writes the
extracted data in a delimited file.
In the following example, we have already stored the input schema under the Metadata node in the Repository tree view. For more information about storing schema metadata
in the Repository, see Talend Studio User Guide.
-
In the Repository tree view, expand
Metadata and File
delimited in succession and then browse to your input schema,
customers, and drop it on the design workspace. A
dialog box displays where you can select the component type you want to
use. -
Click tFileInputDelimited and then
OK to close the dialog box. A tFileInputDelimited component holding the name of
your input schema appears on the design workspace. -
Drop a tMap component and a tFileOutputDelimited component from the Palette to the design workspace.
-
Link the components together using Row >
Main connections.
Configuring the input component
-
Double-click tFileInputDelimited to open
its Basic settings view. All its property
fields are automatically filled in because you defined your input file
locally. -
If you do not define your input file locally in the Repository tree view, fill in the details manually after
selecting Built-in in the Property type list. -
Click the […] button next to the
File Name field and browse to the input
file, customer.csv in this example.Warning
If the path of the file contains some accented characters, you will
get an error message when executing your Job. For more information
regarding the procedures to follow when the support of accented
characters is missing, see the Talend Installation
and Upgrade Guide of the Talend
solution you are using. -
In the Row Separators and Field Separators fields, enter respectively
“
” and “;” as line and field
separators. -
If needed, set the number of lines used as header and the number of lines
used as footer in the corresponding fields and then set a limit for the
number of processed rows.In this example, Header is set to 6 while
Footer and Limit are not set. -
In the Schema field, schema is
automatically set to Repository and your
schema is already defined since you have stored your input file locally for
this example. Otherwise, select Built-in
and click the […] button next to
Edit Schema to open the [Schema] dialog box where you can define the
input schema, and then click OK to close
the dialog box.
Configuring the mapping component
-
In the design workspace, double-click tMap to open its editor.
-
In the tMap editor, click on top of the panel to the right to open the [Add a new output table] dialog box.
-
Enter a name for the table you want to create, row2
in this example. -
Click OK to validate your changes and
close the dialog box. -
In the table to the left, row1, select the first
three lines (Id, CustomerName and
CustomerAddress) and drop them to the table to the
right -
In the Schema editor view situated in the
lower left corner of the tMap editor,
change the type of RegisterTime to String in the table to the right. -
Click OK to save your changes and close
the editor.
Configuring the output component
-
In the design workspace, double-click tFileOutputDelimited to open its Basic
settings view and define the component properties. -
In the Property Type field, set the type
to Built-in and fill in the fields that
follow manually. -
Click the […] button next to the
File Name field and browse to the
output file you want to write data in,
customerselection.txt in this example. -
In the Row Separator and Field Separator fields, set
“
” and “;” respectively as
row and field separators. -
Select the Include Header check box if
you want to output columns headers as well. -
Click Edit schema to open the schema
dialog box and verify if the recuperated schema corresponds to the input
schema. If not, click Sync Columns to
recuperate the schema from the preceding component.
-
Press Ctrl+S to save your Job.
-
Press F6 or click Run on the Run tab to
execute the Job.The three specified columns Id,
CustomerName and
CustomerAddress are output in the defined output
file.
For an example of how to use dynamic
schemas with tFileOutputDelimited, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.
Based on the preceding scenario, this scenario saves the filtered data to a local file
using output stream.
-
Drop tJava from the Palette to the design workspace.
-
Connect tJava to tFileInputDelimited using a Trigger > On Subjob OK
connection.
-
Double-click tJava to open its Basic settings view.
-
In the Code area, type in the following
command:123new java.io.File("C:/myFolder").mkdirs();globalMap.put("out_file",newjava.io.FileOutputStream("C:/myFolder/customerselection.txt",false));Note
In this scenario, the command we use in the Code area of tJava will
create a new folder C:/myFolder where the output
file customerselection.txt will be saved. You can
customize the command in accordance with actual practice. -
Double-click tFileOutputDelimited to open
its Basic settings view. -
Select Use Output Stream check box to
enable the Output Stream field in which you
can define the output stream using command.Fill in the Output Stream field with
following command:1(java.io.OutputStream)globalMap.get("out_file")Note
You can customize the command in the Output
Stream field by pressing CTRL+SPACE to
select built-in command from the list or type in the command into
the field manually in accordance with actual practice. In this
scenario, the command we use in the Output
Stream field will call the
java.io.OutputStream
class to output the filtered
data stream to a local file which is defined in the Code area of tJava in this scenario. -
Click Sync columns to retrieve the schema
defined in the preceding component. -
Leave rest of the components as they were in the previous scenario.
-
Press Ctrl+S to save your Job.
-
Press F6 or click Run on the Run tab to
execute the Job.The three specified columns Id,
CustomerName and
CustomerAddress are output in the defined output
file.
For an example of how to use dynamic
schemas with tFileOutputDelimited, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.