Skip to Main content Skip to Navigation
Conference papers

Distributed Caching of Scientific Workflows in Multisite Cloud

Abstract : Many scientific experiments are performed using scientific workflows, which are becoming more and more data-intensive. We consider the efficient execution of such workflows in the cloud, leveraging the heterogeneous resources available at multiple cloud sites (geo-distributed data centers). Since it is common for workflow users to reuse code or data from other workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. In this paper, we propose a solution for distributed caching of scientific workflows in a multisite cloud. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation on a three-site cloud with a data-intensive application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of same input data for each new execution.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download

https://hal.inrae.fr/hal-02962579
Contributor : Dominique Fournier <>
Submitted on : Friday, October 9, 2020 - 5:15:36 PM
Last modification on : Monday, March 29, 2021 - 2:44:21 PM

File

DEXA_2020.pdf
Files produced by the author(s)

Identifiers

Citation

Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti, Christophe Pradal, Francois Tardieu, et al.. Distributed Caching of Scientific Workflows in Multisite Cloud. DEXA 2020 - 31st International Conference on Database and Expert Systems Applications, Sep 2020, Bratislava, Slovakia. pp.51-65, ⟨10.1007/978-3-030-59051-2_4⟩. ⟨hal-02962579⟩

Share

Metrics

Record views

216

Files downloads

164