Distributed Caching of Scientific Workflows in Multisite Cloud - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement
Communication Dans Un Congrès Année : 2020

Distributed Caching of Scientific Workflows in Multisite Cloud

Résumé

Many scientific experiments are performed using scientific workflows, which are becoming more and more data-intensive. We consider the efficient execution of such workflows in the cloud, leveraging the heterogeneous resources available at multiple cloud sites (geo-distributed data centers). Since it is common for workflow users to reuse code or data from other workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. In this paper, we propose a solution for distributed caching of scientific workflows in a multisite cloud. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation on a three-site cloud with a data-intensive application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of same input data for each new execution.
Fichier principal
Vignette du fichier
DEXA_2020.pdf (331.79 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02962579 , version 1 (09-10-2020)

Identifiants

Citer

Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti, Christophe Pradal, Francois Tardieu, et al.. Distributed Caching of Scientific Workflows in Multisite Cloud. DEXA 2020 - 31st International Conference on Database and Expert Systems Applications, Sep 2020, Bratislava, Slovakia. pp.51-65, ⟨10.1007/978-3-030-59051-2_4⟩. ⟨hal-02962579⟩
166 Consultations
198 Téléchargements

Altmetric

Partager

More