Keynote #1: Challenges and Changes in HPC
Abstract: Supercomputing technology has been developing very fast, impacted the science and society deeply and broadly. Computing-driven and Bigdata-driven scientific discovery has become a necessary research approach in global environment, life science, nano-materials, high energy physics and other fields. Furthermore, the rapidly increasing computing requirements from economic and social development also call for the power of next generation supercomputing system. Nowadays, the development of computing science, data science and intelligent science has brought new changes and challenges in system, technology and application of HPC. The usage mode and delivery mode based on cloud computing also attracts supercomputer users. The future supercomputing system design faces many challenges, such as architecture, system software, application environment and so on. The report will analysis the features of HPC, Bigdata and AI application cases and usage modes of the current Supercomputing Center, then discuss the design of capable platform for convergence of HPC and AI on future supercomputing system.
Yutong Lu (Director of National supercomputing center in Guangzhou, China)
Bio:Yutong Lu, ISC Fellow, holds Professor position in Sun Yat-sen University (SYSU), and Director of National supercomputing center in Guangzhou. She is a member of Chinese national key R&D project HPC special expert committee, and Leader of Innovation Team of Zhujiang Talent Program, Guangdong Province. Her extensive research and development experience has spanned several generations of domestic supercomputers in China, she was deputy chief designer of Tianhe-2 system, the only system with six times’ No.1 on Top500 list so far. She has won the special prize and the first prize of National Science and Technology Progress Award in 2014 and 2009, published more than 100 papers and 30 patents. Her continuing research interests include parallel operating system, high speed communication, global file system, and advanced programming environments. She has undertaken some National Key R&D projects and Major projects of the NSFC on HPC and Bigdata. At present, she is devoted to the research and implementation of system and application platform for the convergence of HPC, Bigdata and AI in supercomputing center.
Keynote #2 : Resilient Scheduling for High-Performance Computing
Abstract: : In this talk, we will discuss resilient scheduling on high-performance computing (HPC) platforms. Resilience is (loosely) defined as surviving to failures. Failures are usually handled by adding redundancy, either continuously (replication) or at periodic intervals (migration from faulty node to spare node, rollback and recovery). However, the amount of replication and/or the frequency of checkpointing must be optimized carefully, and we will discuss how to optimally decide the checkpointing interval. We will also consider moldable jobs, which allow for choosing a processor allocation before execution. The objective here is to minimize the overall completion time of the jobs, or makespan, assuming that jobs are subject to arbitrary failure scenarios. Hence, jobs need to be re-executed each time they fail until successful completion. This work generalizes the classical framework where jobs are known offline and do not fail. We introduce a list-based algorithm, and prove new approximation ratios for several prominent speedup models (including roofline, communication, Amdahl). We also introduce a batch-based algorithm, where each job is allowed a restricted number of failures per batch, and prove a new approximation ratio for the arbitrary speedup model. Finally, we will discuss simulation results that compare the algorithms.
Anne Benoit (Associate Professor in the Computer Science Laboratory LIP, ENS Lyon, France)
Bio:Anne Benoit received the PhD degree from Institut National Polytechnique de Grenoble in 2003, and the Habilitation à Diriger des Recherches (HDR) from ENS Lyon in 2009. She is currently an Associate Professor in the Computer Science Laboratory LIP at ENS Lyon, France, and the Chair of the IEEE CS Technical Committee on Parallel Processing (TCPP). She is Associate Editor (in Chief) of Parco, and has been Associate Editor of IEEE TPDS, JPDC, and SUSCOM. She has chaired the Program Committee of several major conferences in her field, in particular SC, IPDPS, ICPP and HiPC. She is a senior member of the IEEE, and she has been elected a Junior Member of Institut Universitaire de France in 2009. She is the author of one book on algorithm design, 49 papers published in international journals, and 100 papers published in international conferences.
She is the advisor of 11 PhD theses. Her research interests include algorithm design and scheduling techniques for parallel and distributed platforms, with a focus on energy awareness and resilience.
See http://graal.ens-lyon.fr/~abenoit/ for further information.
Keynote #3 : Data-centric compute architectures and challenges
Abstract: : The exponential growth of data stored and used in modern systems indicates that we may need to consider changing the “traditional” computing paradigm based on CPU-centric approach (A.K.A Von Neumann architecture) with a new data-centric compute paradigm.
Recent development in inter and intracloud networks, the use of nonvolatile memories and the appearance of new types of applications seem to support such a paradigm shift, but many challenges are still needed to be resolved in order to support this new trend In this talk, I will discuss a data-centric compute paradigm, how new technologies support the need for such paradigm shift, and the challenges we are still facing to complete this revolution.
Avi Mendelson (Professor in the CS and EE departments Technion, Israel)
Bio: Avi Mendelson is a professor in the CS and EE departments Technion, Israel, and a member of the TCE (Technion Computer Engineering center). He earned his BSC and MSC degrees from the CS department, Technion, and got his PhD from University of Massachusetts at Amherst (UMASS).
Prof. Avi Mendelson has a blend of industrial and academic experience. As part of his industrial role, he spent 11 years in Intel, where he served as a senior architect and Principle engineer in the Mobile Computer Architecture Group, in Haifa. As part of this role he was in charge of the CMP definition and architecture of the Core Due 2. (the first intel’s CMP chip). He also served 4 years in Microsoft R&D center where he managed the Academic collaborations. Major part of his work in Microsoft was focused on student innovations and out-of-the-box usages of the cloud.
His research interests span over different areas such as Computer architecture, accelerators for machine learning, Hardware security, Power management, reliability, fault-tolerance and HPC.