August 15, 2023

Modeling the accident-prone areas in a city – Docs for ESB 6.x

Modeling the accident-prone areas in a city

This scenario applies only to a subscription-based Talend solution with Big data.

In this scenario, the tKMeansModel component is used to
analyze a set of sample geographical data about the destination of ambulances in a city in
order to model the accident-prone areas.

A model like this can be employed to help determine the optimal locations for building hospitals.


You can download this sample data from here. It consists of pairs of latitudes and longitudes.

The sample data was randomly and automatically generated for demonstration purposes only
and in any case it does not reflect the situation of these areas in the real world.


  • The Spark version to be used is 1.4 onwards.

  • The sample data is stored in your Hadoop file system and you have proper rights
    and permissions to at least read it.

  • Your Hadoop cluster is properly installed and is running.

If you are not sure about these requirements, ask the administrator of your
Hadoop system.

The components to be used are:

  • tHDFSConfiguration: it defines the HDFS connection to
    be used by Spark and by the other components.

  • tFileInputDelimited: it loads the sample data into the
    data flow of the Job.

  • tReplicate: it replicates the sample data and caches the

  • tKMeansModel: it analyzes the data to train the model
    and writes the model to HDFS.

  • tModelEncoder: it pre-process the data to prepare proper
    feature vectors to be used by tKMeansModel.

  • tPredict: it applies the KMeans model on the
    replication of the sample data. In the real-world practice, this data should be a
    set of reference data to test the model accuracy.

  • tFileOutputDelimited: it writes the result of the
    prediction to HDFS.

Document get from Talend
Thank you for watching.
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x