Storage
|
To connect to an HDFS installation, select the Define a storage configuration component
check box and then select the name of the component to use from those
available in the drop-down list.
This option requires you to have previously configured the
connection to the HDFS installation to be used, as described in the
documentation for the tHDFSConfiguration component.
If you leave the Define a
storage configuration component check box unselected,
you can only convert files locally.
|
Configure Component
|
Before you configure this component, you must have already
added a downstream component and linked it to the tHMapInput component, and retreived the
schema from the downstream component.
To configure the component, click the […] button and, in the Component Configuration window, perform
the following actions.
-
Click the Select button next to the Record structure field and
then, in the Select a
Structure dialog box that opens, select the
map you want to use and then click OK.
This structure must have been previously
created in
Talend Data Mapper
.
-
Select the Input
Representation to use from the drop-down
list.
Supported input formats are Avro, COBOL, EDI,
Flat, IDocs, JSON and XML.
-
Click Next.
-
Tell the component where each new record
begins. In order for you to be able to do so, you need to
fully understand the structure of your data.
Exactly how you do this varies depending on
the input representation being used, and you will be
presented with one of the following options.
-
Select an appropriate record
delimiter for your data. Note that you must specify
this value without quotes.
-
Separator
lets you specify a separator indicator, such as
, to identify a new
line.
Supported indicators are
for a Unix-type new line,
for Windows and
for Mac, and for tab characters.
-
Start/End
with lets you specify the initial
characters that indicate a new record, such as <root, or the characters
that indicate where a record ends.
Start with
also supports new lines,
for a Unix-type new line,
for Windows and
for Mac, and for
tab characters.
Select the Regular
Expression check box if you to wish to
enter a regular expression to match the start of a
record. When you select XML or JSON, this check
box is selected by default and a pre-configured
regular expression is provided.
-
Sample File: To test the
signature with a sample file, click the
[…] button, browse to the
file you want to use as a sample, click
Open, and then click
Run to test your
sample.
Testing the signature lets you check
that the total number of records and their minimum
and maximum length corresponds to what you expect
based on your knowledge of the data. This step
assumes you have a local subset of your data to
use as a sample.
-
Click Finish.
-
If your input representation is COBOL
or Flat with positional and/or binary encoding
properties, define the signature for the input
record structure:
-
Input Record root
corresponds to the root element in your input
record.
-
Minimum Record
Size corresponds to the size in bytes
of the smallest record. If you set this value too
low, you may encounter performance issues, since
the component will perform more checks than
necessary when looking for a new record.
-
Maximum Record
Size corresponds to the size in bytes
of the largest record, and is used to determine
how much memory is allocated to read the
input.
-
Sample from Workspace or
Sample from File System: To
test the signature with a sample file, click the
[…]
button, and then browse to the file you want to
use as a sample.
Testing the signature lets you
check that the total number of records and their
minimum and maximum length corresponds to what you
expect based on your knowledge of the data. This
step assumes you have a local subset of your data
to use as a sample.
-
Footer Size
corresponds to the size in bytes of the footer, if
any. At runtime, the footer will be ignored rather
than being mistakenly included in the last record.
Leave this field empty if there is no footer.
-
Click the Next button to open
the Signature
Parameters window, select the fields
that define the signature of your record input
structure (that is, to identify where a new record
begins), update the Operation and Value columns as
appropriate, and then click Next.
-
In the Record
Signature Test window that opens, check
that your Records are correctly delineated by
scrolling through them with the
Back and
Next buttons and performing
a visual check, and then click
Finish.
-
Map the elements from the input structure to
the output structure in the new map that opens, and then
press Ctrl+S to save
your map.
For more information on creating maps, see
Talend Data Mapper User Guide.
|
Die on error
|
This check box is selected by default.
Clear the check box to skip any rows on error and complete the
process for error-free rows.
If you opt to clear the
check box, you can perform any of these options:
-
Connect the tHMapInput component to an output
component, for example tAvroOutput, using a Row > Rejects connection. In the output component, ensure that
you add a fixed metadata with the following columns:
- inputRecord: contains the rejected
input record during the transformation.
- recordId: refers to the record
identifier. For a text or binary input, the recordId
specifies the start offset of the record in the input
file. For an AVRO input, the recordId specifies the
timestamp when the input was processed.
- errorMessage: contains the
transformation status with details of the cause of the
transformation error.
-
You can retrieve the rejected records in a file.
One of these mechanisms triggers this feature: (1) a context
variable (talend_transform_reject_file_path) and (2) a
system variable set in the Advanced job parameters (spark.hadoop.talend.transform.reject.file.path).
When you set the file path on the Hadoop
Distributed File System (HDFS), no further configurations are
needed. When you set the file on Amazon S3 or any other
Hadoop-compatible file systems, add the associated Spark
advanced configuration parameter.
In case of errors at runtime, tHMapFile checks if one of the
mechanisms exists and, if so, appends the rejected record to the
designated file. The reject file content includes the
concatenation of the rejected records without any additional
metadata.
If the file system you use does not support
appending to a file, a separate file is created for each
rejection. The file uses the provided file path as the prefix
and adds a suffix that is the offset of the input file and the
size of the rejected record.
Note: Any errors while trying to store the reject are logged and the
processing continues.
|