CMS workflow execution using intelligent job scheduling and data access strategies.
|Hasham, Khawar, Delgado Peris, Antonio, Anjum, Ashiq, Evans, Dave, Gowdy, Stephen, Hernandez, José M., Huedo, Eduardo, Hufnagel, Dirk, van Lingen, Frank, McClatchey, Richard and Metson, Simon
Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies for the individual workflow processes or actors. Minimizing these latencies will improve the overall execution time of a workflow and thus lead to a more efficient and robust processing environment. In this paper, we propose a pilot job concept that has intelligent data reuse and job execution strategies to minimize the scheduling, queuing, execution and data access latencies. The results have shown that significant improvements in the overall turnaround time of a workflow can be achieved with this approach. The proposed approach has been evaluated, first using the CMS Tier0 data processing workflow, and then simulating the workflows to evaluate its effectiveness in a controlled environment.
|Workflows; Latency; Pilot Jobs; Grid and clouds; Data cache
|IEEE Transactions on Nuclear Science
|Digital Object Identifier (DOI)
|Web address (URL)
|Publication process dates
|12 Oct 2018, 14:26
Archived with thanks to IEEE Transactions on Nuclear Science
|University of the West of England, European Center for Nuclear Research, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas, Fermi National Accelerator Laboratory, Universidad Complutense de Madrid and University of Bristol
File Access Level
File Access Level
0views this month
1downloads this month