March 28 (Sunday) |
-Tutorial
#1: RAMP Simulator Tutorial: Protoflex, FAST, HAsim, and RAMP-Gold
(8am ~ )
-Tutorial
#2: Intel Core i7 and Intel Xeon 5500 Microarchitecture, Optimization
and Performance Analysis
(8am ~ )
** Breakfast @ 7:30am and Lunch @12:00 (Click here for full tutorial schedule)
|
March 29 (Monday) |
8:00 - 8:45 Breakfast 8:45 - 9:00 Welcome by General Chair and Program Chair 9:00 -10:00 Keynote I (Nick Mitchell, IBM) 10:00 -10:30 Break 10:30 -11:45 Session 1 11:45 - 1:15 Lunch 1:15 - 2:30 Session 2 2:30 - 3:00 Break 3:00 - 4:40 Session 3 5:00 - 6:30 Reception & |
March 30 (Tuesday) |
8:00 - 8:45 Breakfast 8:45 - 9:45 Keynote II (David Shaw, D. E. Shaw Research and Center for Computational Biology and Bioinformatics, Columbia University) 9:45 -10:15 Break 10:15 -11:55 Session 4 11:55 - 1:30 Lunch 1:30 - 3:10 Session 5 3:10 - 3:40 Break 3:40 - 5:20 Session 6 5:20 - 5:30 Concluding Remarks |
Tutorial #1: RAMP Simulator Tutorial: Protoflex, FAST, HAsim, and
RAMP-Gold
Organizer: Derek Chiou (UT-Austin)
Tutorial #2: Intel Core i7 and Intel Xeon 5500 Microarchitecture,
Optimization and Performance Analysis
Organizer: David Levinthal (Intel)
8:00 - 8:45 Breakfast
8:45 - 9:00 Welcome (by the general chairs and program chair)
9:00 - 10:00 Keynote I
Session chair: David Brooks (Harvard University)
Title: The
Big Pileup
Speaker:
Nick Mitchell (IBM)
10:00 - 10:30 Break
10:30 - 11:45 Session 1: Interactive Workloads
Session chair: Lieven Eeckhout (Ghent University)
Dynamic Program Analysis
of Microsoft Windows Applications
Alex Skaletsky, Intel Corporation
Tevi Devor, Intel Corporation
Nadav Chachmon, Intel Corporation
Robert Cohn, Intel Corporation
Kim Hazelwood, University of Virginia
Vladimir Vladimirov, Intel Corporation
Moshe Bach, Intel Corporation
LagAlyzer: A latency
profile
analysis and visualization tool
Andrea Adamoli, University of Lugano
Milan Jovic, University of Lugano
Matthias Hauswirth, University of Lugano
Characterizing the Design
and Performance of Interactive Java Applications
Dmitrijs Zaparanuks,
University of Lugano
Matthias Hauswirth, University of Lugano
11:45 - 1:15 Lunch (provided)
1:15 - 2:30 Session 2: Performance Modeling Methodologies
Session chair: Bronis de Supinski (LLNL)
Synthesizing Memory-Level Parallelism Aware
Miniature Clones for SPEC CPU2006 and ImplantBench Workloads
Karthik Ganesan, University of Texas at Austin
Jungho Jo, University of Texas at Austin
Lizy K John, University of Texas at Austin
A Methodology for Facilitating a Fair
Comparison of Architecture Research Ideas
Veerle Desmet, Ghent University
Sylvain Girbal, Thales Research
Olivier Temam, INRIA Saclay
Statstack: Efficient Modeling of LRU caches
David Eklov, Uppsala University
Erik Hagersten, Uppsala University
2:30 - 3:00 Break
3:00 - 4:40 Session 3: Memory in Multicores
Session chair: Sally McKee (Chalmers)
Modeling Memory Concurrency for
Multi-Socket Multi-Core Systems
Anirban Mandal, Renaissance Computing Institute, UNC-Chapel Hill
Rob Fowler, Renaissance Computing Institute, UNC-Chapel Hill
Allan Porterfield, Renaissance Computing Institute, UNC-Chapel Hill
Cache Contention and Application
Performance Prediction for Multi-Core Systems
Chi Xu, University of Minnesota
Xi Chen, University of Michigan
Robert Dick, University of Michigan
Zhuoqing Morley Mao, University of Michigan
Memphis:
Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms
Collin McCurdy, Oak Ridge National Laboratory
Jeffrey Vetter, Oak Ridge National Laboratory
Understanding Transactional Memory
Performance
Donald E Porter, The University of Texas at Austin
Emmett Witchel, The University of Texas at Austin
5:00 - 6:30 Reception and Poster Session
Influences of SIMD Architectures for Scattered Data Interpolation Algorithm
Jean-Charles
Tournier, Martin Naef (ABB Inc.)
Hardware Prediction of OS Run-Length For Fine-Grained Resource Customization
David Nellans,
Kshitij Sudan, Rajeev Balasubramonian, Erik Brunvand (University of Utah)
Program Behavior Characterization in Large Memory Systems
Parijat Dube, Michael Tsao, Dan Poff, Li Zhang, Alan Bivens (IBM)
Simulation Environment for Studying Overlap of Communication and Computation
Vladimir Subotic*, Jesus
Labarta *, **, Mateo Valero *, ** (*Barcelona Supercomputing Center, **Universitat
Politecnica de Catalunya)
Parallel Scalability Comparison of Commodity Operating Systems on Large
Scale Multi-Cores
Yan
Cui, Yu Chen, Yuanchun Shi, Qingbo Wu* (Tsinghua, *National University of
Defense Technology)
Incorporating Instruction-Based Sampling into AMD CodeAnalyst and OProfile
Paul Drongowski,
Lei Yu, Frank Swehosky, Suravee Suthikulpanit, Robert Richter (Advanced
Micro Devices)
8:00 - 8:45 Breakfast
8:45 - 9:45 Keynote II
Session chair: David Brooks (Harvard University)
Title:
Using Special-Purpose Hardware to Achieve a Hundred-Fold
Speedup in Molecular Dynamics Simulations of Proteins
Speaker: David Shaw (D. E. Shaw
Research and Center for Computational Biology and Bioinformatics, Columbia
University)
9:45 - 10:15 Break
10:15 - 11:55 Session 4: Performance Analysis in Servers and Datacenters
Session chair: Resit Sendag (University of Rhode Island)
The Hadoop Distributed Filesystem:
Balancing Portability and Performance
Jeffrey Shafer, Rice University
Scott Rixner, Rice University
Alan L. Cox, Rice University
Scaling OLTP Applications on Commodity
Multi-Core Platforms
Yan Cui, Tsinghua University
Yu Chen, Tsinghua University
Yuanchun Shi, Tsinghua University
A Study of Hardware Assisted IP over
InfiniBand and its Impact on Enterprise Data Center Performance
Ryan E Grant, Queen's University
Pavan Balaji, Argonne National Laboratory
Ahmad Afsahi, Queen's University
Weak Execution Ordering - Exploiting
Iterative Methods on Many-Core GPUs
Jianmin Chen, UFL
Zhuo Huang, UFL
Feiqi Su, UFL
Jih-Kwon Peir, UFL
Jeff Ho, UFL
Lu Peng, LSU
11:55 - 1:30 Lunch (provided)
Session chair: Peter Sweeney (IBM)
Visualizing Complex Dynamics in Many-Core
Accelerator Architectures
Aaron Ariel, University of British Columbia
Wilson W. L.
Fung, University of British Columbia
Andrew E. Turner, University of British Columbia
Tor M. Aamodt, University of British Columbia
PEBIL: Efficient Static Binary
Instrumentation for Linux
Michael A. Laurenzano, San Diego Supercomputer Center
Mustafa M. Tikir, San Diego Supercomputer Center
Laura Carrington, San Diego Supercomputer Center
Allan Snavely, San Diego Supercomputer Center
High-Level Performance Modeling of
Task-Based Algorithms: A Blueprint for Understanding the Performance of TBB
Algorithms
Alexei Alexandrov, Intel
Douglas Armstrong, Intel
Donald Hayes, Intel
Hrabri Rajic, Intel
Michael Voss, Intel
Exploiting FPGAs for Technology-Aware
System-Level Evaluation of Multi-Core Architectures
Paolo Meloni, University of Cagliari - DIEE
Simone Secchi, University of Cagliari - DIEE
Luigi Raffo, University of Cagliari - DIEE
3:10 - 3:40 Break
3:40 - 5:20 Session 6: Microarchitecture Analysis
Session chair: Tejas Karkhanis (IBM)
Runahead Execution vs.
Conventional Data Prefetching in the IBM POWER6 Microprocessor
Harold W. Cain, IBM Research
Priya Nagpurkar, IBM Research
An Analysis of Hard to Predict
Branches
Celal Ozturk, University of Rhode Island
Resit Sendag, University of Rhode Island
Performance-Effective
Operation below Vcc-min
Nikolas Ladas, University of Cyprus
Yiannakis Sazeides, University of Cyprus
Veerle Desmet, Ghent University
Demystifying GPU
Microarchitecture through Microbenchmarking
Henry Wong, University of Toronto
Misel-Myrto Papadopoulou, University of Toronto
Maryam Sadooghi-Alvandi, University of Toronto
Andreas Moshovos, University of Toronto
5:20 - 5:30 Concluding Remarks
Title: The Big Pileup
Speaker: Nick Mitchell (IBM)
Abstract: Programmers no longer write monolithic applications, they assemble code from a sea of reusable libraries and frameworks. This layered process of construction has a magnifying effect on local coding decisions. Piece by innocent piece, seemingly harmless constant factors pile up. They become part of an interstitial excess, marbled throughout the code and APIs, and difficult to remove. It is not uncommon for large applications to miss their performance targets by an order of magnitude. We commonly see web requests create objects and invoke methods by the hundreds of thousands to retrieve and format a few database records. Current Java optimizers and garbage collectors don’t address many of these systemic problems. This talk discusses these issues, via many examples, with a goal of motivating research on the programming of large-scale artifacts in a way that local, often ad hoc, decisions can be unwound, rather then pile up, in the large.
Bio: Dr. Nick Mitchell is a member of IBM T.J. Watson Research Center, working in the Programming Models and Tools department. His research interests include the engineering of large-scale systems, program visualizations, scalability analysis, and program optimizations for complex systems. For the past ten years, he has been studying large-scale applications, working first-hand with IBM customers and IBM's own applications to resolve performance and scalability problems. He has developed visualization tools for use in performance and memory analysis that are in wide use, and has been crafting the study of runtime bloat. He received his Ph.D. in 2000 under the guidance of Larry Carter and Jeanne Ferrante.
Title: Using Special-Purpose Hardware to Achieve a Hundred-Fold Speedup in Molecular Dynamics Simulations of Proteins
Speaker: David Shaw (D. E. Shaw Research and Center for Computational Biology and Bioinformatics, Columbia University)
Abstract: Molecular dynamics (MD) simulation has long been recognized as a potentially transformative tool for understanding the behavior of proteins and other biological macromolecules, and for developing a new generation of precisely targeted drugs. Many biologically important phenomena, however, occur over timescales that have previously fallen far outside the reach of MD technology. We have constructed a specialized, massively parallel machine, called Anton, that is capable of performing atomic-level simulations of proteins at a speed roughly two orders of magnitude beyond that of the previous state of the art. The machine has now simulated the behavior of a number of proteins for periods as long as a millisecond -- approximately 100 times the length of the longest such simulation previously published -- revealing aspects of protein dynamics that were previously inaccessible to both computational and experimental study. The speed at which Anton performs these simulations is the result of a tightly coupled codesign process in which novel algorithms and architectural features were developed in concert, guided in large part by an iterative process of performance analysis and optimization.
Bio: Dr. David E. Shaw serves as chief scientist of D. E. Shaw Research and as a senior research fellow at the Center for Computational Biology and Bioinformatics at Columbia University. He received his Ph.D. from Stanford University in 1980, served on the faculty of the Computer Science Department at Columbia until 1986, and founded the D. E. Shaw group in 1988. Since 2001, Dr. Shaw has been involved in full-time, hands-on research in the field of computational biochemistry. His lab is currently involved in the development of new algorithms and machine architectures for high-speed molecular dynamics simulations of biological macromolecules, and in the application of such simulations to basic scientific research and computer-assisted drug design. Dr. Shaw was appointed to the President’s Council of Advisors on Science and Technology by President Clinton in 1994, and again by President Obama in 2009. He is a fellow of the American Academy of Arts and Sciences and of the American Association for the Advancement of Science, a member of the Computer Science and Telecommunications Board of the National Academies, and a winner of the ACM Gordon Bell Prize.