COMMITTEE

FINAL PROGRAM

Day 1 - April 21 (Sunday)

*Tutorials

- GPUWattch + GPGPU-Sim: An Integrated Framework for Energy Optimization in Manycore Architectures (Half-day, Morning) UTAustin, Wisconsin, UBC

- Modeling Exascale Applications with SST/macro and Eiger (Half-day, Afternoon) Georgia Tech

*Workshop

- Second International Workshop on Performance Analysis of Workload Optimized Systems (Full-day) IBM

    7:30 -   8:30   Breakfast
     8:30 - 10:00   sessions
    10:00 - 10:30 (break)
    10:30 - 12:00    sessions
    12:00 - 1:30     Lunch break
    13:30 - 15:00    sessions
    15:00 - 15:30 (break)
    15:30 - 17:30 sessions

Day 2 - April 22 (Monday)

8:30 Breakfast

9:00 - 9:15 Welcome (by the general and program chairs)

9:15 - 10:15 Keynote I Peta Thread Computing [Slides] (Michael Shebanow, Samsung Research America)

(Session Chair: Tor Aamodt, UBC)

10:30 - 11:45 Session 1: Best Paper Candidates

(Session Chair: David Brooks)

· Sampled Simulation of Multi-Threaded Applications ~Best Paper Award~

Trevor E. Carlson (Ghent University)

Wim Heirman (Ghent University)

Lieven Eeckhout (Ghent University)

· XAMP: an eXtensible Analytical Model Platform

Yipeng Wang (NCSU)

Yan Solihin (NCSU)

· Synergistically Coupling of SSD and Hard Disk for QoS-aware Virtual Memory

Ke Liu (Wayne State University)

Xuechen Zhang (Georgia Institute of Technology)

Kei Davis (Los Alamos National Laboratory)

Song Jiang (Wayne State University)

11:45 - 1:15 Lunch

1:20 - 3:00 Session 2: Datacenter and Distributed Computing

(Session Chair: Suzanne Rivoire)

· Increasing Transparent Page Sharing in Java

Kazunori Ogata (IBM Research - Tokyo)

Tamiya Onodera (IBM Research - Tokyo)

· Understanding the Implications of Virtual Machine Management on Processor Microarchitecture Design

Xiufeng Sui (ICT, Chinese Academy of Sciences)

Tao Sun (University of Science and Technology of China)

Tao Li (University of Florida)

Lixin Zhang (ICT, Chinese Academy of Sciences)

Zilong Wang (University of Science and Technology of China)

· An Analytical Framework for Estimating TCO and Exploring Data Center Design Space

Damien Hardy (University of Cyprus)

Marios Kleanthous (University of Cyprus)

Isidoros Sideris (University of Cyprus)

Ali Saidi (ARM)

Emre Özer (ARM)

Yiannakis Sazeides (University of Cyprus)

· Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization

Lucas Mello Schnorr (CNRS)

Arnaud Legrand (CNRS)

Jean-Marc Vincent (UJF)

3:25 - 5:00 Session 3: Tools

(Session Chair: Andreas Moshovos)

· McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling

Jung Ho Ahn (Seoul National University)

Sheng Li (HP)

Seongil O (Seoul National University)

Norman P. Jouppi (HP)

· A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator

Nan Jiang (Stanford University)

Daniel U. Becker (Stanford University)

George Michelogiannakis (Lawrence Berkeley National Lab)

James Balfour (Google)

Brian Towles (D .E. Shaw)

John Kim (KAIST)

William J. Dally (NVIDIA/Stanford University)

· How a Single Chip Causes Massive Power Bills GPUSimPow: A GPGPU Power Simulator

Jan Lucas (TU Berlin)

Sohan Lal (TU Berlin)

Michael Andersch (TU Berlin)

Mauricio Alvarez-Mesa (TU Berlin)

Ben Juurlink (TU Berlin)

· Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism

Sangpil Lee (Yonsei University)

Won W. Ro (Yonsei University)

5:00 – 5:30 Poster Highlights

(Session Chair: Ted Jiang)

· Cache-Coherent Shared Virtual Memory for Chips with Heterogeneous Cores

Blake A. Hechtman (Duke University/AMD)

Daniel J. Sorin (Duke University)

· Exascale Workload Characterization and Architecture Implications

Prasanna Balaprakash (Argonne National Laboratory)

Darius Buntinas (Argonne National Laboratory)

Anthony Chan (Argonne National Laboratory)

Apala Guha (University of Chicago)

Rinku Gupta (Argonne National Laboratory)

Sri Hari Krishna Narayanan (Argonne National Laboratory)

Andrew A. Chien (University of Chicago)

Paul Hovland (Argonne National Laboratory)

Boyana Norris (Argonne National Laboratory)

· EMERALD: Characterization of Emerging Applications and Algorithms for Low-power Devices

Chuanjun Zhang (Intel)

Gi Hyun Ko (UIUC)

JungWook Choi (UIUC)

Shang-nien Tsai (UIUC)

Minje Kim (UIUC)

Abner Guzman Rivera (UIUC)

Rob Rutenbar (UIUC)

Paris Smaragdis (UIUC)

Mi Sun Park (Pennsylvania State University)

Vijay Narayanan (Pennsylvania State University)

Hongyi Xin (CMU)

Onur Mutlu (CMU)

Bin Li (Intel)

Li Zhao (Intel)

Mei Chen (Intel)

Ravi Iyer (Intel)

· PAPI 5.0: Measuring Power, Energy, and the Cloud

Vincent M. Weaver (University of Maine)

Matt Johnson (University of Tennessee)

Kiran Kasichayanula (University of Tennessee)

James Ralph (University of Tennessee)

Piotr Luszczek (University of Tennessee)

Dan Terpstra (University of Tennessee)

Shirley Moore (University of Texas El Paso)

John Nelson (University of Tennessee)

· Energy Efficiency of Lossless Data Compression on a Mobile Device: An Experimental Evaluation

Armen Dzhagaryan (The University of Alabama in Huntsville)

Martin Burtscher (Texas State University, San Marcos)

Aleksandar Milenkovic (The University of Alabama in Huntsville)

· A Virtual Power Management Simulation Framework for Computer Systems

Bishop Brock (IBM)

Srinivasan Ramani (IBM)

Ken Vu (IBM)

Heather Hanson (IBM)

Michael Floyd (IBM)

· Characterizing the Microarchitectural Side Effects of Operating System Calls

Addison Mayberry (University of Massachusetts Amherst)

Matthew Laquidara (University of Massachusetts Amherst)

Charles Weems (University of Massachusetts Amherst)

· QTrace: An Interface for Customizable Full System Instrumentation

Xin Tong (University of Toronto)

Jack Luo (University of Toronto)

Andreas Moshovos (University of Toronto)

· Trace Filtering of Multithreaded Applications for CMP Memory Simulation

Alejandro Rico (Barcelona Supercomputing Center/UPC)

Alex Ramirez (Barcelona Supercomputing Center/UPC)

Mateo Valero (Barcelona Supercomputing Center/UPC)

· A Statistical Machine Learning Based Modeling and Exploration Framework for Run-time Cross-Stack Energy Optimization

Changshu Zhang (University of North Carolina at Charlotte)

Arun Ravindran (University of North Carolina at Charlotte)

· Use of Simple Analytic Performance Models for Streaming Data Applications Deployed on Diverse Architectures

Jonathan Beard (Washington University in St. Louis)

Roger Chamberlain (Washington University in St. Louis)

· A Circuit-Architecture Co-optimization Framework for Evaluating Emerging Memory Hierarchies

Xiangyu Dong (Pennsylvania State University)

Norman P. Jouppi (HP)

Yuan Xie (Pennsylvania State University)

5:30 - 7:00 Reception and Poster Session

Day 3 - April 23 (Tuesday)

8:30 Breakfast

9:00 - 10:00 Keynote II Advancing Computer Systems without Technology Progress [Slides] (Christos Kozyrakis, Stanford University)

(Session Chair: Viji Srinivasan)

10:30 - 12:10 Session 4: Potpourri

(Session Chair: Stijn Eyerman)

· A Mathematical Hard Disk Timing Model for Full System Simulation

Benjamin S Parsons (Purdue)

Vijay S. Pai (Purdue)

· Wall Clock Based Synchronization: A Parallel Simulation Technology for Cluster Systems

Xiaodong Zhu (University of Science and Technology of China)

Junmin Wu (University of Science and Technology of China)

Tao Li (University of Florida)

Xianfen Cui (University of Science and Technology of China)

Xiufeng Sui (Institute of Computing Technology Chinese Academy Of Sciences)

· Performance Analysis of Broadcasting Algorithms on the Intel Single Chip Cloud

John Matienzo (University of Toronto)

Natalie Enright Jerger (University of Toronto)

· Selecting Benchmark Combinations for the Evaluation of Multicore Throughput

Ricardo A. Velasquez (IRISA/INRIA)

Pierre Michaud (IRISA/INRIA)

Andre Seznec (IRISA/INRIA)

12:10 - 1:45 Lunch

1:45 - 3:10 Session 5: Performance and Power Analysis Techniques

(Session Chair: Pierre Michaud)

· Pinpointing Data Locality Bottlenecks with Low Overhead

Xu Liu (Rice University)

John Mellor-Crummey (Rice University)

· Power Measurement Techniques on Standard Compute Nodes: A Quantitative Comparison

Daniel Hackenberg (ZIH, TU Dresden)

Thomas Ilsche (ZIH, TU Dresden)

Robert Schöne (ZIH, TU Dresden)

Daniel Molka (ZIH, TU Dresden)

Maik Schmidt (ZIH, TU Dresden)

Wolfgang E. Nagel (ZIH, TU Dresden)

· Power/Performance Evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing

Karthikeyan Palavedu Saravanan (Barcelona Supercomputing Center)

Paul Carpenter (Barcelona Supercomputing Center)

Alex Ramirez (Barcelona Supercomputing Center)

· Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations

Vincent M. Weaver (University of Maine)

Dan Terpstra (University of Tennessee)

Shirley Moore (University of Texas El Paso)

3:40 - 5:20 Session 6: Emerging Platforms

(Session Chair: Vijay Janapa Reddi)

· Characterizing Scalar Opportunities in GPGPU Applications

Zhongliang Chen (Northeastern University)

David Kaeli (Northeastern University)

Norman Rubin (NVIDIA Corporation)

· Quantifying the Energy Efficiency of FFT on Heterogeneous Platforms

Yash Ukidave (Northeastern University)
Amir Kavyan Ziabari (Northeastern University)
Perhaad Mistry (Northeastern University)
Gunar Schirner (Northeastern University)
David Kaeli (Northeastern University)

· ISA-Independent Workload Characterization and its Implications for Specialized Architectures

Yakun Sophia Shao (Harvard University)

David Brooks (Harvard University)

· Evaluating STT-RAM Technology as an Energy Efficient Main Memory Alternative

Emre Kultursay (Pennsylvania State University)

Mahmut Kandemir (Pennsylvania State University)

Anand Sivasubramaniam (Pennsylvania State University)

Onur Mutlu (Carnegie Mellon University)

5:20 - Concluding Remarks, Best Paper Award

Keynote I Peta Thread Computing (Michael Shebanow, Samsung Research America)

Abstract: GPUs are playing a transformative role throughout computing, but most notably in mobile applications. In present day application processors (APs), the "brains" of a mobile device, GPUs are now representing about half the silicon area. GPUs are becoming incredibly pervasive due to their inclusion in mobile devices. In 2012, over 700M smart phones were shipped and roughly 120M tablets, nearly 800M devices, each with a GPU on board. To perform their processing, GPUs rely on streaming programmable cores, generally known as SIMT processors (SIMT = single-instruction, multi-thread). SIMT processors are well known for their ability to handle very long latency memory accesses given sufficient thread-level parallelism (TLP). To exploit TLP, these multi-core GPUs are capable of running from 1000s to tens of 1000s of threads simultaneously. As mobile GPUs increase in speed and mobile games and other applications reach the complexity of today's desktop games, these GPUs will have tremendous processing capability. Multiply each GPUs parallelism by the number that will be deployed and world-wide, the total capacity of future GPUs will approach peta-scale numbers of threads. In this talk, I outline one future possibility: can we treat the entire interconnected network of mobile GPU devices and cloud GPU servers as one big processor? What are the implications for such a system, and what are some of the hurdles? And more importantly, what kind of applications might such a system enable?

Bio: Michael Shebanow joined Samsung Research America (SRA) in 2012 where he is Vice President of the Advanced Processor Lab (APL). Prior to Samsung, he was with NVIDIA where he worked in the Tesla product family architecture team (G80, GeForce 68xx series), in the Fermi product family architecture team (GF100) as one of its lead architects and manager of the shader processor design team, in the NVIDIA Research group investigating next generation graphics and unified programming models for GPUs, and in the Denver CPU design team. Prior to NVIDIA, he managed the development of a number of processors in multiple architecture families (x86-32, x86-64, SPARC v9, 68k, m88k), and in part represented Motorola on the Power PC architecture definition committee. While a graduate student at UC Berkeley, he was one of the original developers of HPS (superscalar, dynamically scheduled processor architectures). Dr. Shebanow holds 32 patents in graphics, processor design, and disk controller areas.

Keynote II Advancing Computer Systems without Technology Progress (Christos Kozyrakis, Stanford University)

Abstract: Computing is now an essential tool for all aspects of human endeavor, including healthcare, education, science, commerce, government, and entertainment. We expect our computers, whether those hidden away in data-centers or those in a handheld form factor, to be capable of running sophisticated algorithms that process rapidly growing volumes of data. In other words, we expect our computers to have exponentially increasing performance at constant cost (energy and chip area). For decades, CMOS technology has been our ally, providing exponential improvements in both transistor density and energy consumption, which we turned into exponential improvements in system performance. Unfortunately, we are now in a phase where transistor cost and energy consumption are barely scaling, making it necessary to rethink the way we build scalable systems.

In this talk, we will consider how to advance computer systems without technology progress. There are several promising directions that combined can provide improvements equivalent to several decades of Moore's law. These directions include massive parallelism with locality awareness, specialization, removing the bloat from our infrastructure, increasing system utilization, and embracing approximate computing. We will review motivating results in these areas, establish that they require cross-layer optimizations across both hardware and software, and discuss the remaining challenges that systems researchers must address.

Bio: Christos Kozyrakis is an Associate Professor of Electrical Engineering & Computer Science at Stanford University. He works on architectures, runtime environments, and programming models for parallel computing systems. At Berkeley, he developed the IRAM architecture, a novel media-processor system that combined vector processing with embedded DRAM technology. At Stanford, he co-led the Transactional Coherence and Consistency (TCC) project at Stanford that developed hardware and software mechanisms for programming with transactional memory. He also led the Raksha project, that developed practical hardware support and security policies to deter high-level and low-level security attacks against deployed software. Dr. Kozyrakis is currently working on hardware and software techniques for resource efficient cloud computing. He is also a member of the Pervasive Parallelism Lab at Stanford, a multi-faculty effort to make parallel computing practical for the masses.

Christos received a BS degree from the University of Crete (Greece) and a PhD degree from the University of California at Berkeley (USA), both in Computer Science. He is the Willard R. and Inez Kerr Bell faculty scholar at Stanford and a senior member of the ACM and the IEEE. Christos has received the NSF Career Award, an IBM Faculty Award, the Okawa Foundation Research Grant, and a Noyce Family Faculty Scholarship.