Using the tDataprepRun component to promote a Job
leveraging a preparation across environments
This scenario applies only to a subscription-based Talend solution.
The tDataprepRun component allows you to reuse an existing preparation made in Talend Data Preparation, directly in a data integration, Spark Batch
or Spark Streaming Job. In other words, you can operationalize the process of applying a
preparation to input data with the same model.
A good practice when using Talend Data Preparation is to set up at least two environments to
work with: a development one, and a production one for example. When a preparation is ready on
the development environment, you can use the Import/Export Preparation
feature to promote it to the production environment, that has a different URL. For more
information, see Promoting a preparation across environments.
Following this logic, you will likely find yourself with a preparation that has the same name
on different environments. The thing is that preparations are not actually identified by their
name, but rather by a technical id, such as
prepid=faf4fe3e-3cec-4550-ae0b-f1ce108f83d5
. As a consequence, what you
really have is two dinstinct preparations, each with its specific id.
In case you wanted to operationalize this recipe in a Talend Job using the regular preparation selection
properties, you would actually need two Jobs: one for the preparation on the development
environment, with a specific url and id, and a second one for the production environment, with
different parameters.
Through the use the Dynamic preparation selection checkbox and some
context variables, you will be able to use a single Job to run your preparation, regardless of
the environment. Indeed, the dynamic preparation selection relies on the preparation path in
Talend Data Preparation, and not on the preparation id.
You will be able to use a single Job definition to later deploy on your development or
production environment
The following scenario creates a simple Job that:
- Receives data from a local CSV file containing customers data
- Dynamically retrieves an existing preparation based on its path and
environment - Applies the preparation on the input data
- Outputs the prepared data into a MySQL database.
In this example, the customers_leads preparation has
been created beforehand in Talend Data Preparation. This
simple preparation was created on a dataset that has the same schema as the CSV file used as
input for this Job, and its purpose is to remove invalid values from your customers data.