August 17, 2023

tAzureStorageGet – Docs for ESB 5.x

tAzureStorageGet

tazurestorageget_icon32_white.png

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.

tAzureStorageGet properties

Component Family

Cloud / Azure Storage

 

Function

tAzureStorageGet connects to a
given Azure storage account and retrieves blobs from a given
container of that account.

Purpose

tAzureStorageGet allows you to
specify filters you want to apply on the virtual hierarchy of the
blobs and write selected blobs in a local folder.

Basic settings

Use an existing connection

Select this check box and in the Component List click the
relevant connection component to reuse the connection details you already defined.

 

Account name

Enter the name of the storage account you need to access. A storage account name can be found
in the Manage Access Keys dashboard of the Microsoft Azure Storage system to be used.

 

Account key

Enter the key associated with the storage account you need to access. Two keys are
available for each account and by default, either of them can be used for this
access.

 

Protocol

Select the protocol for this connection to be created.

 

Container

Enter the name of the container you need to retrieve blobs
from.

 

Local folder

Enter the path, or browse to the folder in which you need to store
the retrieved blobs.

 

Blobs

Complete this table to select the blobs to be retrieved. The
parameters to be provided are:

  • Blob prefix: enter
    the common prefix of the names of the blobs you need to
    retrieve. This prefix allows you to filter the blobs
    which have the specified prefix in their names in the
    given container.

    A blob name contains the virtual hierarchy of the blob itself. This hierarchy is a virtual
    path to that blob and is relative to the container where that blob is stored. For example,
    in a container named photos, the name of a photo blob
    might be 2014/US/Oakland/Talend.jpg.

    For this reason, when you define a prefix, you are actually designating a directory level
    as the blob filter, for example, 2014/ or 2014/US/.

    If you want to select the blobs stored directly beneath the container level, that is to say,
    the blobs without virtual path in their names, remove quotation marks and enter
    null.

  • Include
    subdirectories
    : select this check box to
    retrieve all of the sub-folders and the blobs in those
    folders beneath the designated directory level in the
    Blob prefix column.
    If you leave this check box clear, tAzureStorageGet returns only the blobs
    directly beneath that directory level.

  • Create parent
    directories
    : select this check box to
    replicate the virtual directory of the retrieved blobs
    in the local folder.

    Note that if you leave this check box clear, there
    must be the same directory in the local folder as the
    retrieved blobs have in the container; otherwise, those
    blobs cannot be retrieved.

 

Die on error

Select this check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows.
When errors are skipped, you can collect the rows on error using a Row
> Reject
link.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.

Usage

This component is used as a standalone component.

Knowledge about Microsoft Azure Storage is required.

Global Variables

ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable and it returns a string. This
variable functions only if the Die on error check box is
cleared, if the component has this check box.

ACCOUNT_NAME: the account name for accessing the storage.
This is an After variable and it returns a string.

ACCOUNT_KEY: the key associated with the account for
accessing the storage. This is an After variable and it returns a string.

CONTAINER: the container name used in this component.
This is an After variable and it returns a string.

LOCAL_FOLDER: the local directory used in this component.
This is an After variable and it returns a string.

A Flow variable functions during the execution of a component while an After variable
functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl +
Space
to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio
User Guide.

Log4j

The activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User
Guide
.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Retrieving files from a Azure Storage container

In this scenario, a five-component Job uses Azure Storage components to write files in
a given Azure Storage system and then retrieve selected files (blobs in terms of Azure
Storage) from that system.

use_case-tazurestorageget1.png

Before replicating this scenario, you must have appropriate rights and permissions to
read and write files in the Azure storage account to be used. For further information,
see Microsoft’s documentation for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.

The talendcontainer container used in this scenario
was created using tAzureStorageContainerCreate in the
scenario Scenario: Creating a container in Azure Storage.

Linking the components

  1. In the Integration perspective
    of the Studio, create an empty Job, named azureTalend for example, from the Job
    Designs
    node in the Repository tree view.

    For further information about how to create a Job, see Talend Studio User Guide.

  2. Drop tAzureStoragePut, tAzureStorageList, tJava and tAzureStorageGet
    onto the workspace.

  3. Connect the Azure Storage components using the Trigger > OnSubjobOk link while connect tAzureStorageList to tJava using the Row >
    Iterate
    link.

Connecting to an Azure storage account

  1. Double-click tAzureStorageConnection to
    open its Component view.

    use_case-tazurestoragecreate2.png
  2. In the Account name field, enter the name
    of the storage account to be connected to. In this example, it is talendstorage, an account that has been created
    for demonstration purposes.

  3. In the Account key field, paste the
    primary or the secondary key associated with the storage account to be used.
    These keys can be found in the Manage Access Key dashboard in the Azure
    Storage system to be connected to.

  4. From the Protocol list, select the
    protocol for the endpoint of the storage account to be used. In this
    example, it is HTTPS.

Writing files in Azure Storage

  1. Double-click tAzureStoragePut to open its
    Component view.

    use_case-tazurestorageget3.png
  2. Select the Use an existing connection
    check box and then select the connection you have configured earlier. In
    this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the
    name of the container you need to write files in. In this example, it is
    talendcontainer, a container created
    in the scenario Scenario: Creating a container in Azure Storage.

  4. In the Local folder field, enter the
    path, or browse, to the directory where the files to be used are stored. In
    this scenario, they are some pictures showing technical process and stored
    locally in E:/photos. Therefore, put
    E:/photos; this allows tAzureStoragePut to upload all the files of this
    folder and its sub-folders into the talendcontainer container.

    For demonstration purposes, the example photos are organized as follows in
    the E:/photos folder:

    • Directly beneath the E:/photos
      level:

      components-use_case_triakinput_1.png

      components-use_case_triakinput_2.png

      components-use_case_triakinput_3.png

      components-use_case_triakinput_4.png

    • In the E:/photos/mongodb/step1
      directory:

      components-use_case_tmongodbbulkload_1.png

      components-use_case_tmongodbbulkload_2.png

      components-use_case_tmongodbbulkload_3.png

      components-use_case_tmongodbbulkload_4.png

    • In the E:/photos/mongodb/step2
      directory:

      components-use_case_tmongodbbulkload_5.png

      components-use_case_tmongodbbulkload_6.png

      components-use_case_tmongodbbulkload_7.png

      components-use_case_tmongodbbulkload_8.png

  5. In the Azure Storage folder field, enter
    the directory where you want to write files. This directory will be created
    in the container to be used if it does not exist. In this example, enter
    photos.

    If you enter nothing but leave the default quotation marks as it is, then
    files, as well as their local directory, will be written directly beneath
    the container level.

Verifying the file transfer

Configuring tAzureStorageList

  1. Double-click tAzureStorageList to open
    its Component view.

    use_case-tazurestorageget2.png
  2. Select the Use an existing connection
    check box and then select the connection you have configured earlier. In
    this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the
    name of the container in which you need to check whether the given files
    exist. In this scenario, it is talendcontainer.

  4. Under the Blob filter table, click the
    [+] button to add one row in the
    table.

  5. In the Prefix column, enter the common
    prefix of the names of the files (blobs) to be checked. This prefix
    represents a virtual directory level you designate as the starting point
    down from which files (blobs) are checked. In this example, it is photos/.

    For further information about blob names, see http://msdn.microsoft.com/en-us/library/dd135715.aspx

  6. In the Include sub-directories column,
    select the check box in the newly added row. This allows tAzureStorageList to check all the files at any
    hierarchical level beneath the designated starting point.

Configuring tJava

  1. Double-click tJava to open its Component view.

    use_case-tazurestorageget4.png
  2. In the Code field, enter

  3. In the Outline panel, which, by default,
    is found to the left side of the Component
    view, expand the tAzureStorageList
    node.

    use_case-tazurestorageget5.png
  4. From the Outline panel, drop the
    CONTAINER_BLOB global variable into the
    parentheses in the code in the Component
    view so as to make the code read:

Retrieving selected files

  1. Double-click tAzureStorageGet to open its
    Component view.

    use_case-tazurestorageget6.png
  2. Select the Use an existing connection
    check box and then select the connection you have configured earlier. In
    this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the
    name of the container from which you need to retrieve files. In this
    scenario, it is talendcontainer.

  4. In the Local folder field, enter the
    path, or browse, to the directory where you want to put the retrieved files.
    In this example, it is E:/screenshots.

  5. Under the Blob table, click the [+] button to add one row in the table.

  6. In the Prefix column, enter the common
    name prefix of the files (blobs) to be retrieved. In this example, it is
    photos/mongodb/.

  7. In the Include sub-directories column,
    select the check box in the newly added row. This allows tAzureStorageGet to retrieve all the files
    (blobs) beneath the photos/mongodb/
    level.

  8. In the Create parent directories column,
    select the check box in the newly added row to create the same directory in
    the specified local folder as the retrieved blobs have in the
    container.

    Note that having this same directory is necessary for successfully
    retrieving blobs. If you leave this check box clear, then you need to create
    the same directory yourself in the target local folder.

Executing the Job

  • Press F6 to run this Job.

Once done, the Run view is opened automatically,
where you can check the execution result.

use_case-tazurestorageget7.png

You can read that the Job returns the list of the blobs with the photos prefix in the container.

This can also be seen in the web console of the Azure storage account:

use_case-tazurestorageget8.png

In the specified local folder, the blobs with the photos/mongodb/ prefix have been retrieved and their prefix
transformed to directories.

use_case-tazurestorageget9.png

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x