Storage
|
To connect to an HDFS installation, select the Define a storage configuration component
check box and then select the name of the component to use from those
available in the drop-down list.
This option requires you to have previously configured the
connection to the HDFS installation to be used, as described in the
documentation for the tHDFSConfiguration component.
If you leave the Define a
storage configuration component check box unselected,
you can only convert files locally.
|
Configure Component
|
To configure the component, click the […] button and, in the Component Configuration window, perform
the following actions.
-
Click the Select button next to the Record structure field and then,
in the Select a Structure
dialog box that opens, select the structure you want to use when
converting your file and then click OK.
This structure must have been previously created
in
Talend Data Mapper
.
-
Select the Input
Representation to use from the drop-down
list.
Supported input formats are Avro, COBOL, EDI,
Flat, IDocs, JSON and XML.
-
Select the Output
Representation to use from the drop-down list.
The choices available for the Output representation depend on
what you choose for the Input representation.
Supported output formats are Avro, Flat, JSON and
XML.
-
Click Next.
-
Tell the component where each new record begins.
In order for you to be able to do so, you need to fully
understand the structure of your data.
Exactly how you do this varies depending on the
input representation being used, and you will be presented with
one of the following options.
-
Select an appropriate record delimiter
for your data. Note that you must specify this value
without quotes.
-
Separator
lets you specify a separator indicator, such as
, to identify a new
line.
Supported indicators are
for a Unix-type new line,
for Windows and
for Mac, and for tab characters.
-
Start/End
with lets you specify the initial
characters that indicate a new record, such as <root, or the characters
that indicate where a record ends. This can also
be a regular expression.
Start with
also supports new lines,
for a Unix-type new line,
for Windows and
for Mac, and for
tab characters.
-
Sample File: To test the
signature with a sample file, click the
[…] button, browse to the
file you want to use as a sample, click
Open, and then click
Run to test your
sample.
Testing the signature lets you check
that the total number of records and their minimum
and maximum length corresponds to what you expect
based on your knowledge of the data. This step
assumes you have a local subset of your data to
use as a sample.
-
If your input
representation is COBOL or Flat with positional and/or
binary encoding properties, define the signature for the
input record structure:
-
Input Record
root corresponds to the root element
in your input record.
-
Minimum Record
Size corresponds to the size in bytes
of the smallest record. If you set this value too
low, you may encounter performance issues, since
the component will perform more checks than
necessary when looking for a new record.
-
Maximum Record
Size corresponds to the size in bytes
of the largest record, and is used to determine
how much memory is allocated to read the
input.
- Maximum Block Size
(BLKSIZE) correspond to the size in
bytes of the largest block in Variable Blocked
files. If you do not have the exact value, you can
enter 32760, which is the
maximum BLKSIZE.
With the Variable Blocked
signature, each block is extracted as the Spark
record. Each map execution processes an entire
block and not an individual Cobol record, as with
other Cobol signatures.
-
Sample from
Workspace or Sample from File
System: To test the signature with a
sample file, click the […] button, and then
browse to the file you want to use as a sample.
Testing the signature lets you
check that the total number of records and their
minimum and maximum length corresponds to what you
expect based on your knowledge of the data. This
step assumes you have a local subset of your data
to use as a sample.
-
Footer
Size corresponds to the size in bytes
of the footer, if any. At runtime, the footer will
be ignored rather than being mistakenly included
in the last record. Leave this field empty if
there is no footer.
-
Click the
Next button to open the Signature
Parameters window, select the fields
that define the signature of your record input
structure (that is, to identify where a new record
begins), update the Operation and
Value columns as appropriate, and
then click Next.
-
In the Record Signature
Test window that opens, check that your
Records are correctly delineated by scrolling
through them with the Back and Next buttons and performing a visual
check, and then click Finish.
|