Mostrar el registro sencillo del ítem

dc.contributor.authorKalé, Laxmikant
dc.contributor.authorMeneses-Rojas, Esteban
dc.date.accessioned2017-06-05T17:38:02Z
dc.date.available2017-06-05T17:38:02Z
dc.date.issued2015-03
dc.identifierhttps://link.springer.com/article/10.1007/s11227-015-1402-3es
dc.identifier.issn09208542
dc.identifier.urihttps://hdl.handle.net/2238/7187
dc.descriptionhttps://www.scopus.com/inward/record.url?eid=2-s2.0-84924559440&partnerID=40&md5=674dbf01584e263ed866e03fe29708bdes
dc.description.abstractThe continuous progress in the performance of supercomputers has made possible the understanding of many fundamental problems in science. Simulation, the third scientific pillar, constantly demands more powerful machines to use algorithms that would otherwise be unviable. That will inevitably lead to the deployment of an exascale machine during the next decade. However, fault tolerance is a major challenge that has to be overcome to make such a machine usable. With an unprecedented number of parts, machines at extreme scale will have a small mean-time-between-failures. The popular checkpoint/restart mechanism used in today’s machines may not be effective at that scale. One promising way to revamp checkpoint/restart is to use message-logging techniques. By storing messages during execution and replaying them in case of a failure, message logging is able to shorten recovery time and save a substantial amount of energy. The downside of message logging is that memory footprint may grow to unsustainable levels. This paper presents a technique that decreases the memory pressure in message-logging protocols by only storing the necessary messages in collective-communication operations. We introduce Camel, a protocol that has a low memory overhead for multicast and reduction operations. Our results show that Camel can reduce memory footprint in a molecular dynamics benchmark for more than 95 % on 16,384 cores. © 2015, Springer Science+Business Media New York.es
dc.language.isoenges
dc.publisherKluwer Academic Publisherses
dc.rightsacceso abierto*
dc.rights.urihttps://creativecommons.org/licenses/by-nc/3.0/cr/*
dc.sourceThe Journal of Supercomputing July 2015, Volume 71, Issue 7, pp 2516–2538es
dc.subjectResistencia, mensajes, protocoloses
dc.subjectResistenciaes
dc.subjectMensajeses
dc.subjectProtocóloses
dc.subjectResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Software engineeringes
dc.subjectResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer sciencees
dc.titleCamel: collective-aware message logginges
dc.typeartículo originales


Ficheros en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

acceso abierto
Excepto si se señala otra cosa, la licencia del ítem se describe como acceso abierto