*Tutorials
- GPUWattch + GPGPU-Sim: An Integrated Framework for Energy Optimization in Manycore Architectures (Half-day, Morning) UTAustin, Wisconsin, UBC
- Modeling Exascale Applications with SST/macro and Eiger (Half-day, Afternoon) Georgia Tech
*Workshop
- Second International Workshop on Performance Analysis of Workload Optimized Systems (Full-day) IBM
7:30 -
8:30 Breakfast
8:30 - 10:00
sessions
10:00 - 10:30 (break)
10:30 - 12:00 sessions
12:00 - 1:30 Lunch
break
13:30 - 15:00 sessions
15:00 - 15:30 (break)
15:30 - 17:30
sessions
8:30
Breakfast
9:00 -
9:15 Welcome (by the
general and program chairs)
9:15 - 10:15
Keynote I
Peta Thread Computing
[Slides]
(Michael Shebanow, Samsung Research
America)
(Session Chair: Tor Aamodt, UBC)
10:30 - 11:45
Session 1: Best Paper Candidates
(Session Chair:
David Brooks)
· Sampled Simulation of Multi-Threaded Applications ~Best Paper Award~
Trevor
E. Carlson (Ghent University)
Wim Heirman (Ghent University)
Lieven Eeckhout (Ghent University)
· XAMP: an eXtensible Analytical Model Platform
Yipeng Wang (NCSU)
Yan
Solihin (NCSU)
· Synergistically Coupling of SSD and Hard Disk for QoS-aware Virtual Memory
Ke Liu (Wayne State University)
Xuechen Zhang (Georgia Institute of Technology)
Kei
Davis (Los Alamos National Laboratory)
Song
Jiang (Wayne State University)
11:45 - 1:15 Lunch
1:20 - 3:00 Session
2: Datacenter and Distributed Computing
(Session Chair:
Suzanne Rivoire)
· Increasing Transparent Page Sharing in Java
Kazunori
Ogata (IBM Research - Tokyo)
Tamiya Onodera (IBM Research - Tokyo)
· Understanding the Implications of Virtual Machine Management on Processor Microarchitecture Design
Xiufeng Sui (ICT, Chinese Academy of Sciences)
Tao
Sun (University of Science and Technology of China)
Tao
Li (University of Florida)
Lixin Zhang (ICT, Chinese Academy of Sciences)
Zilong Wang (University of Science and Technology of China)
· An Analytical Framework for Estimating TCO and Exploring Data Center Design Space
Damien
Hardy (University of Cyprus)
Marios Kleanthous (University of Cyprus)
Isidoros Sideris (University of Cyprus)
Ali
Saidi (ARM)
Emre Özer (ARM)
Yiannakis Sazeides (University of Cyprus)
· Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization
Lucas
Mello Schnorr (CNRS)
Arnaud
Legrand (CNRS)
Jean-Marc
Vincent (UJF)
(Session Chair: Andreas Moshovos)
·
Jung
Ho Ahn (Seoul National University)
Sheng Li (HP)
Seongil O (Seoul National University)
Norman
P. Jouppi (HP)
·
A
Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator
Nan
Jiang (Stanford University)
Daniel
U. Becker (Stanford University)
George
Michelogiannakis (Lawrence Berkeley National Lab)
James
Balfour (Google)
Brian
Towles (D .E. Shaw)
John
Kim (KAIST)
William
J. Dally (NVIDIA/Stanford University)
·
How
a Single Chip Causes Massive Power Bills GPUSimPow: A
GPGPU Power Simulator
Jan
Lucas (TU Berlin)
Sohan Lal (TU Berlin)
Michael
Andersch (TU Berlin)
Mauricio
Alvarez-Mesa (TU Berlin)
Ben
Juurlink (TU Berlin)
·
Parallel
GPU Architecture Simulation Framework Exploiting Work Allocation Unit
Parallelism
Sangpil
Lee (Yonsei University)
Won W. Ro (Yonsei University)
5:00 – 5:30
Poster Highlights
(Session Chair:
Ted Jiang)
·
Cache-Coherent
Shared Virtual Memory for Chips with Heterogeneous Cores
Blake A. Hechtman (Duke University/AMD)
Daniel J. Sorin (Duke University)
·
Exascale Workload
Characterization and Architecture Implications
Prasanna
Balaprakash (Argonne National Laboratory)
Darius Buntinas (Argonne National Laboratory)
Anthony Chan (Argonne
National Laboratory)
Apala
Guha (University of Chicago)
Rinku
Gupta (Argonne National Laboratory)
Sri Hari Krishna Narayanan (Argonne National Laboratory)
Andrew A. Chien (University of Chicago)
Paul Hovland (Argonne National Laboratory)
Boyana
Norris (Argonne National Laboratory)
·
EMERALD:
Characterization of Emerging Applications and Algorithms for Low-power Devices
Chuanjun
Zhang (Intel)
Gi
Hyun Ko (UIUC)
JungWook
Choi (UIUC)
Shang-nien Tsai (UIUC)
Minje
Kim (UIUC)
Abner
Guzman Rivera (UIUC)
Rob Rutenbar (UIUC)
Paris Smaragdis (UIUC)
Mi Sun Park
(Pennsylvania State University)
Vijay Narayanan
(Pennsylvania State University)
Hongyi
Xin (CMU)
Onur
Mutlu (CMU)
Bin Li (Intel)
Li Zhao (Intel)
Mei Chen (Intel)
Ravi Iyer (Intel)
·
PAPI
5.0: Measuring Power, Energy, and the Cloud
Vincent M. Weaver
(University of Maine)
Matt Johnson
(University of Tennessee)
Kiran
Kasichayanula (University of Tennessee)
James Ralph
(University of Tennessee)
Piotr
Luszczek (University of Tennessee)
Dan Terpstra (University of Tennessee)
Shirley Moore
(University of Texas El Paso)
John Nelson
(University of Tennessee)
·
Energy
Efficiency of Lossless Data Compression on a Mobile Device: An Experimental
Evaluation
Armen
Dzhagaryan (The University of Alabama in Huntsville)
Martin Burtscher (Texas State University, San Marcos)
Aleksandar
Milenkovic (The University of Alabama in Huntsville)
·
A
Virtual Power Management Simulation Framework for Computer Systems
Bishop Brock (IBM)
Srinivasan
Ramani (IBM)
Ken Vu (IBM)
Heather Hanson (IBM)
Michael Floyd (IBM)
·
Characterizing
the Microarchitectural Side Effects of Operating
System Calls
Addison Mayberry
(University of Massachusetts Amherst)
Matthew Laquidara (University of Massachusetts Amherst)
Charles Weems
(University of Massachusetts Amherst)
·
QTrace:
An Interface for Customizable Full System Instrumentation
Xin
Tong (University of Toronto)
Jack Luo (University of Toronto)
Andreas Moshovos (University of Toronto)
·
Trace
Filtering of Multithreaded Applications for CMP Memory Simulation
Alejandro Rico
(Barcelona Supercomputing Center/UPC)
Alex Ramirez
(Barcelona Supercomputing Center/UPC)
Mateo Valero
(Barcelona Supercomputing Center/UPC)
·
A
Statistical Machine Learning Based Modeling and Exploration Framework for
Run-time Cross-Stack Energy Optimization
Changshu
Zhang (University of North Carolina at Charlotte)
Arun
Ravindran (University of North Carolina at Charlotte)
·
Use
of Simple Analytic Performance Models for Streaming Data Applications Deployed
on Diverse Architectures
Jonathan Beard
(Washington University in St. Louis)
Roger Chamberlain
(Washington University in St. Louis)
·
A
Circuit-Architecture Co-optimization Framework for Evaluating Emerging Memory
Hierarchies
Xiangyu
Dong (Pennsylvania State University)
Norman P. Jouppi (HP)
Yuan Xie (Pennsylvania State University)
5:30 - 7:00 Reception and Poster Session
8:30
Breakfast
9:00 - 10:00 Keynote II Advancing Computer Systems without Technology Progress [Slides] (Christos Kozyrakis, Stanford University)
(Session Chair: Viji Srinivasan)
10:30 - 12:10 Session 4: Potpourri
(Session Chair: Stijn Eyerman)
·
A
Mathematical Hard Disk Timing Model for Full System Simulation
Benjamin
S Parsons (Purdue)
Vijay
S. Pai (Purdue)
·
Wall
Clock Based Synchronization: A Parallel Simulation Technology for Cluster
Systems
Xiaodong Zhu (University of Science and Technology of China)
Junmin Wu (University of Science and Technology of China)
Tao
Li (University of Florida)
Xianfen Cui (University of Science and Technology of China)
Xiufeng Sui (Institute of Computing Technology Chinese Academy Of
Sciences)
·
Performance
Analysis of Broadcasting Algorithms on the Intel Single Chip Cloud
John
Matienzo (University of Toronto)
Natalie
Enright Jerger (University
of Toronto)
·
Selecting
Benchmark Combinations for the Evaluation of Multicore
Throughput
Ricardo
A. Velasquez (IRISA/INRIA)
Pierre
Michaud (IRISA/INRIA)
Andre
Seznec (IRISA/INRIA)
12:10 - 1:45 Lunch
1:45 -
3:10 Session 5: Performance and Power Analysis
Techniques
(Session Chair: Pierre Michaud)
·
Pinpointing
Data Locality Bottlenecks with Low Overhead
Xu Liu (Rice University)
John
Mellor-Crummey (Rice University)
·
Power
Measurement Techniques on Standard Compute Nodes: A Quantitative Comparison
Daniel
Hackenberg (ZIH, TU Dresden)
Thomas
Ilsche (ZIH, TU Dresden)
Robert
Schöne (ZIH, TU Dresden)
Daniel
Molka (ZIH, TU Dresden)
Maik Schmidt (ZIH, TU Dresden)
Wolfgang
E. Nagel (ZIH, TU Dresden)
· Power/Performance Evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing
Karthikeyan Palavedu Saravanan
(Barcelona Supercomputing Center)
Paul
Carpenter (Barcelona Supercomputing Center)
Alex
Ramirez (Barcelona Supercomputing Center)
·
Non-Determinism
and Overcount on Modern Hardware Performance Counter
Implementations
Vincent
M. Weaver (University of Maine)
Dan
Terpstra (University of Tennessee)
Shirley
Moore (University of Texas El Paso)
3:40 - 5:20 Session
6: Emerging Platforms
(Session Chair: Vijay Janapa Reddi)
·
Characterizing
Scalar Opportunities in GPGPU Applications
Zhongliang Chen (Northeastern University)
David
Kaeli (Northeastern University)
Norman
Rubin (NVIDIA
Corporation)
·
Quantifying
the Energy Efficiency of FFT on Heterogeneous Platforms
Yash
Ukidave (Northeastern University)
Amir Kavyan Ziabari (Northeastern University)
Perhaad Mistry (Northeastern University)
Gunar Schirner (Northeastern University)
David Kaeli (Northeastern University)
·
ISA-Independent
Workload Characterization and its Implications for Specialized Architectures
Yakun Sophia Shao (Harvard University)
David
Brooks (Harvard University)
·
Evaluating
STT-RAM Technology as an Energy Efficient Main Memory Alternative
Emre Kultursay (Pennsylvania State University)
Mahmut Kandemir (Pennsylvania State
University)
Anand Sivasubramaniam (Pennsylvania State
University)
Onur Mutlu (Carnegie Mellon University)
5:20 - Concluding Remarks, Best Paper Award
Keynote I Peta Thread Computing (Michael Shebanow, Samsung Research America)
Abstract: GPUs are playing a transformative role throughout computing, but most notably in mobile applications. In present day application processors (APs), the "brains" of a mobile device, GPUs are now representing about half the silicon area. GPUs are becoming incredibly pervasive due to their inclusion in mobile devices. In 2012, over 700M smart phones were shipped and roughly 120M tablets, nearly 800M devices, each with a GPU on board. To perform their processing, GPUs rely on streaming programmable cores, generally known as SIMT processors (SIMT = single-instruction, multi-thread). SIMT processors are well known for their ability to handle very long latency memory accesses given sufficient thread-level parallelism (TLP). To exploit TLP, these multi-core GPUs are capable of running from 1000s to tens of 1000s of threads simultaneously. As mobile GPUs increase in speed and mobile games and other applications reach the complexity of today's desktop games, these GPUs will have tremendous processing capability. Multiply each GPUs parallelism by the number that will be deployed and world-wide, the total capacity of future GPUs will approach peta-scale numbers of threads. In this talk, I outline one future possibility: can we treat the entire interconnected network of mobile GPU devices and cloud GPU servers as one big processor? What are the implications for such a system, and what are some of the hurdles? And more importantly, what kind of applications might such a system enable?
Bio: Michael Shebanow joined Samsung Research America (SRA) in 2012 where he is Vice President of the Advanced Processor Lab (APL). Prior to Samsung, he was with NVIDIA where he worked in the Tesla product family architecture team (G80, GeForce 68xx series), in the Fermi product family architecture team (GF100) as one of its lead architects and manager of the shader processor design team, in the NVIDIA Research group investigating next generation graphics and unified programming models for GPUs, and in the Denver CPU design team. Prior to NVIDIA, he managed the development of a number of processors in multiple architecture families (x86-32, x86-64, SPARC v9, 68k, m88k), and in part represented Motorola on the Power PC architecture definition committee. While a graduate student at UC Berkeley, he was one of the original developers of HPS (superscalar, dynamically scheduled processor architectures). Dr. Shebanow holds 32 patents in graphics, processor design, and disk controller areas.
Keynote II Advancing Computer Systems without Technology Progress (Christos Kozyrakis, Stanford University)
Abstract: Computing is now an essential tool for all aspects of human endeavor, including healthcare, education, science, commerce, government, and entertainment. We expect our computers, whether those hidden away in data-centers or those in a handheld form factor, to be capable of running sophisticated algorithms that process rapidly growing volumes of data. In other words, we expect our computers to have exponentially increasing performance at constant cost (energy and chip area). For decades, CMOS technology has been our ally, providing exponential improvements in both transistor density and energy consumption, which we turned into exponential improvements in system performance. Unfortunately, we are now in a phase where transistor cost and energy consumption are barely scaling, making it necessary to rethink the way we build scalable systems.
In this talk, we will consider how to advance computer systems without technology progress. There are several promising directions that combined can provide improvements equivalent to several decades of Moore's law. These directions include massive parallelism with locality awareness, specialization, removing the bloat from our infrastructure, increasing system utilization, and embracing approximate computing. We will review motivating results in these areas, establish that they require cross-layer optimizations across both hardware and software, and discuss the remaining challenges that systems researchers must address.
Bio: Christos Kozyrakis is an Associate Professor of Electrical Engineering & Computer Science at Stanford University. He works on architectures, runtime environments, and programming models for parallel computing systems. At Berkeley, he developed the IRAM architecture, a novel media-processor system that combined vector processing with embedded DRAM technology. At Stanford, he co-led the Transactional Coherence and Consistency (TCC) project at Stanford that developed hardware and software mechanisms for programming with transactional memory. He also led the Raksha project, that developed practical hardware support and security policies to deter high-level and low-level security attacks against deployed software. Dr. Kozyrakis is currently working on hardware and software techniques for resource efficient cloud computing. He is also a member of the Pervasive Parallelism Lab at Stanford, a multi-faculty effort to make parallel computing practical for the masses.
Christos received a BS degree from the University of Crete (Greece) and a PhD degree from the University of California at Berkeley (USA), both in Computer Science. He is the Willard R. and Inez Kerr Bell faculty scholar at Stanford and a senior member of the ACM and the IEEE. Christos has received the NSF Career Award, an IBM Faculty Award, the Okawa Foundation Research Grant, and a Noyce Family Faculty Scholarship.