Skip to Main content Skip to Navigation
Conference papers

Distributed Caching of Scientific Workflows in Multisite Cloud

Abstract : Many scientific experiments are performed using scientific workflows, which are becoming more and more data-intensive. We consider the efficient execution of such workflows in the cloud, leveraging the heterogeneous resources available at multiple cloud sites (geo-distributed data centers). Since it is common for workflow users to reuse code or data from other workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. In this paper, we propose a solution for distributed caching of scientific workflows in a multisite cloud. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation on a three-site cloud with a data-intensive application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of same input data for each new execution.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Dominique Fournier <>
Submitted on : Friday, October 9, 2020 - 5:15:36 PM
Last modification on : Monday, March 29, 2021 - 2:44:21 PM


Files produced by the author(s)



Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti, Christophe Pradal, Francois Tardieu, et al.. Distributed Caching of Scientific Workflows in Multisite Cloud. DEXA 2020 - 31st International Conference on Database and Expert Systems Applications, Sep 2020, Bratislava, Slovakia. pp.51-65, ⟨10.1007/978-3-030-59051-2_4⟩. ⟨hal-02962579⟩



Record views


Files downloads