July 30, 2023

tDBFSGet – Docs for ESB 7.x

tDBFSGet

Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a
user-defined directory and if needs be, renames them.

The DBFS (Databricks Filesystem) components are designed for quick and straightforward data transferring with Databricks. If you need to handle more sophisticated scenarios for optimal performance, use Spark Jobs with Databricks.

tDBFSGet Standard properties

These properties are used to configure tDBFSGet running in the Standard Job framework.

The Standard
tDBFSGet component belongs to the Big Data and the File families.

The component in this framework is available in all Talend products with Big Data
and in Talend Data Fabric.

Basic settings

Property type

Either Built-In or Repository.

Built-In: No property data stored centrally.

Repository: Select the repository file where the
properties are stored.

Use an existing connection

Select this check box and in the Component List click the HDFS connection component from which
you want to reuse the connection details already defined.

Note that when a Job contains the parent Job and the child Job,
Component List presents only the
connection components in the same Job level.

Endpoint

In the Endpoint
field, enter the URL address of your Azure Databricks workspace.
This URL can be found in the Overview blade
of your Databricks workspace page on your Azure portal. For example,
this URL could look like https://westeurope.azuredatabricks.net.

Token

Click the […] button
next to the Token field to enter the
authentication token generated for your Databricks user account. You
can generate or find this token on the User
settings
page of your Databricks workspace. For
further information, see Token management from the
Azure documentation.

DBFS directory

In the DBFS directory field, enter the path pointing
to the data to be used in the DBFS file system.

Local directory

Browse to, or enter the local directory to store the files copied from
DBFS.

Overwrite file

Options to overwrite or not the existing file with the new one.

Include subdirectories

Select this check box if the selected input source type includes
sub-directories.

Files

In the Files area, the fields to be completed are:

File mask: type in the file name to be selected from
HDFS. Regular expression is available.

New name: give a new name to the obtained file.

Die on error

Select the check box to stop the execution of the Job when an error
occurs.

Clear the check box to skip any rows on error and complete the
process for error-free rows.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.

Usage

Usage rule

This component combines DBFS connection and data extraction, thus used as a
single-component subJob to copy data from DBFS to an user-defined local
directory.

It runs standalone and does not generate input or output flow for the other components. It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on the context.


Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x