July 31, 2023

tAzureAdlsGen2Output – Docs for ESB ESB 7.x

tAzureAdlsGen2Output

Uploads incoming data to an ADLS Gen2 file system of an Azure storage
account in the specified format.

tAzureAdlsGen2Output Standard properties

These properties are used to configure tAzureAdlsGen2Output running in the Standard Job framework.

The Standard
tAzureAdlsGen2Output component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Property Type

Select the way the connection details
will be set.

  • Built-In: The connection details will be set
    locally for this component. You need to specify the values for all
    related connection properties manually.

  • Repository: The connection details stored
    centrally in Repository > Metadata will be reused by this component. You need to click
    the […] button next to it and in the pop-up
    Repository Content dialog box, select the
    connection details to be reused, and all related connection
    properties will be automatically filled in.

Schema and Edit schema

A schema is a row description. It defines the number of fields
(columns) to be processed and passed on to the next component. When you create a Spark
Job, avoid the reserved word line when naming the
fields.

  • Built-In: You create and store the schema locally for this component
    only.

  • Repository: You have already created the schema and stored it in the
    Repository. You can reuse it in various projects and Job designs.

Click Edit
schema
to make changes to the schema.

Note: If you
make changes, the schema automatically becomes built-in.
  • View schema: choose this
    option to view the schema only.

  • Change to built-in property:
    choose this option to change the schema to Built-in for local changes.

  • Update repository connection:
    choose this option to change the schema stored in the repository and decide whether
    to propagate the changes to all the Jobs upon completion. If you just want to
    propagate the changes to the current Job, you can select No upon completion and choose this schema metadata
    again in the Repository Content
    window.

Sync
colnmns

Click this button to retrieve the
schema from the previous component connected in the Job.

Authentication method

Select one of the following
authentication method from the drop-down list.

Account
name

Enter the name of the Data Lake
Storage account you need to access. Ensure that the administrator of the
system has granted you the appropriate access permissions to this
account.

Endpoint
suffix

Enter the Azure Storage service
endpoint.

The combination of the account name and the
Azure Storage service endpoint forms the endpoint of the storage
account.

Shared
key

Enter the key associated with the
storage account you need to access. Two keys are available for each account
and by default, either of them can be used for this access. To know how to
get your key, read Manage a storage account.

This field is available if you select Shared key from Authentication method drop-down list.

SAS
token

Enter your account SAS token. You can
get the SAS token for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://<$storagename><$service>.core.windows.net/<$sastoken>,
where <$storagename> is the
storage account name, <$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token value. For more
information, read Constructing the Account SAS
URI
.

This field is available if you select
Shared access signature from
Authentication method drop-down
list.

Check
connection

Click this button to validate the
connection parameters provided.

Filesystem

Enter the name of the target Blob
container.

You can also click the button to
the right of this field and select the desired Blob container from the list
in the dialog box.

Blobs
Path

Enter the path to the target
blobs.

Format

Set the format for the incoming data.
Currently, the following formats are supported: CSV, AVRO, JSON, and Parquet.

Field
Delimiter

Set the field delimiter. You can
select Semicolon, Comma, Tabulation, and
Space from the drop-down list; you can
also select Other and enter your own in
the Custom field delimiter field.

Record
Separator

Set the record separator. You can
select LF, CR, and CRLF from the
drop-down list; you can also select Other
and enter your own in the Custom Record
Separator
field.

Text
Enclosure Character

Enter the character used to enclose
text.

Escape
character

Enter the character of the row to be
escaped.

Header

Select this check box to insert a header row to the data. The schema column
names will be used as column headers.

File
Encoding

Select the file encoding from the
drop-down list.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Max batch
size

Set the maximum number of lines
allowed in each batch.

Do not change the default
value unless you are facing performance issues. Increasing the batch size
can improve the performance but a value too high could cause Job
failures.

Blob
Template Name

Enter a string as the name prefix for the Blob files generated. The name of a
Blob file generated will be the name prefix followed by another string.

Global Variables

ERROR_MESSAGE

The error message generated by the component when an error occurs. This
is an After variable and it returns a string.

NB_LINE

The number of rows successfully processed. This is an After variable and it returns
an integer.

Usage

Usage rule

This component is usually used as an end component of a Job or
subJob and it always needs an input link.

Accessing Azure ADLS Gen2 storage

This scenario demonstrates the use of the
tAzureAdlsGen2Output and tAzureAdlsGen2Input
components. In the first subJob, a tFixedFlowInput component passes
data to tAzureAdlsGen2Output, which then uploads the data to Azure ADLS
Gen2 storage; in the second subJob, tAzureAdlsGen2Input reads the data
and passes it to tLogRow.

In this scenario, the following data is uploaded and then retrieved.

This scenario requires an Azure storage user
account with permissions for reading and writing files.

Optionally, you can monitor the data using Microsoft Azure Storage
Explorer
, a utility for managing your Azure storage resources. Check Azure Storage Explorer for related information.

Accessing Azure ADLS Gen2 storage: establishing the Job

  1. Create a standard Job and drop tFixedFlowInput,
    tAzureAdlsGen2Output, tAzureAdlsGen2Input, and tLogRow onto the workspace.
  2. Connect tFixedFlowInput and
    tAzureAdlsGen2Output using the Row > Main link.
  3. Connect tAzureAdlsGen2Input and
    tLogRow using the Row > Main link.
  4. Connect tFixedFlowInput and
    tAzureAdlsGen2Input using the RowTrigger > OnSubjobOk link.

    tAzureAdlsGen2Output_1.png

Accessing Azure ADLS Gen2 storage: setting up the Job

  1. In the Basic settings
    view of tFixedFlowInput:

    • Click the Edit
      schema
      button and add two columns: id (type Integer) and name (type String);
    • Select Use Inline Content(delimited
      file)
      and enter the following into the Content field.
    • Leave other options as they are.
  2. In the Basic settings
    view of tAzureAdlsGen2Output:

    • Click the Edit
      schema
      button and add two columns: id (type Integer) and name (type String);
    • Provide your Azure storage user account credentials in
      the Authentication method,
      Account name, Endpoint suffix, and Shared key.
    • Validate your Azur storage user account by clicking
      Check connection.
    • Enter the name of an existing Blob container in Filesystem. You can also click to the right of this field and
      select the Blob container from the list in the dialog box.
    • In Blobs Path,
      enter the name of the directory where you want to put the data.
    • Select CSV for Format; Semicolon for
      Field Delimiter; and CRLF for Record
      Separator
      . Select the Header option.
    • Leave other options as they are.
  3. In the Advanced settings
    view of tAzureAdlsGen2Input, enter the
    prefix for the Blob files generated in the Blob Template
    Name
    field (data- in this example).
  4. Do exact the same described in step 2 for the
    tAzureAdlsGen2Input component. Be sure to propagate the schema to the subsequent
    component when prompted.
  5. In the Basic settings view of tLogRow:

    • Select Table (print values in cells of a
      table)
      .
    • Leave other options as they are.

Accessing Azure ADLS Gen2 storage: executing the Job

  1. Press F6 to run the Job.
  2. Check the result in the Run console.

    tAzureAdlsGen2Output_2.png
  3. (Option) Check the Blob file generated using Microsoft Azure Storage Explorer. See Get started with Storage Explorer
    for related information.

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x