ISPASS 2021 Program

Start	End	Monday, March 29, 2021
09:00	09:10	Welcome
09:10	10:00	Keynote: Memory Architectures for Programmable Forward-Looking Systems Speaker: Dan Lustig (NVIDIA's Architecture Research Group)
10:00	10:21	Session 1: Benchmarking Session Chair: Matthew D. Sinclair (University of Wisconsin - Madison, AMD Research)
10:00	10:02	GenomicsBench: A Benchmark Suite for Genomics Arun Subramaniyan (University of Michigan-Ann Arbor); Yufeng Gu (University of Michigan-Ann Arbor); Timothy Dunn (University of Michigan-Ann Arbor); Somnath Paul (Intel Corporation); Md Vasimuddin (Intel Corporation); Sanchit Misra (Intel Corporation); David Blaauw (University of Michigan-Ann Arbor); Satish Narayanasamy (University of Michigan-Ann Arbor); Reetuparna Das (University of Michigan-Ann Arbor)
10:02	10:04	GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs Trinayan Baruah (Northeastern University); Kaustubh Shivdikar (Northeastern University); Shi Dong (Cerebras); Yifan Sun (William & Mary); Saiful A. Mojumder (Boston University); Kihoon Jung (KAIST); Jose L. Abellan (Universidad Catolica de Murcia); Yash Ukidave (Millennium Management); Ajay Joshi (Boston University); John Kim (KAIST); David Kaeli (Northeastern University)
10:04	10:06	AIBench Training: Balanced Industry-Standard AI Training Benchmarking Fei Tang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Wanling Gao (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Jianfeng Zhan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Chuanxin Lan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences ); Xu Wen (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Lei Wang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Chunjie Luo (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Jiahui Dai (Beijing Academy of Frontier Sciences and Technology); Zheng Cao (Alibaba Group); Xingwang Xiong (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Zihan Jiang (Institute of Computing Technology, Chinese Academy of Sciences); Tianshu Hao (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Fanda Fan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Fan Zhang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Yunyou Huang (Guangxi Normal University); Jianan Chen (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Mengjia Du (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Rui Ren (China Electronics Technology Research Institute of Cyberspace Security); Chen Zheng (Institute of Software, Chinese Academy of Sciences); Daoyi Zheng (Baidu); Haoning Tang (Tencent); Kunlin Zhan (58.com); Biao Wang (NetEase); Defei Kong (ByteDance); Minghe Yu (Zhihu); Chongkang Tan (Lenovo); Huan Li (Paypal); Xinhui Tian (Moqi); Yatao Li (Microsoft Research Asia China); Gang Lu (Huawei); Junchao Shao (JD.com); Zhenyu Wang (CloudTa); Xiaoyu Wang (Intellifusion); Hainan Ye (Beijing Academy of Frontier Sciences and Technology)
10:06	10:21	Discussion & Questions
10:30	10:51	Session 2: GPUs Session Chair: Rachata Ausavarungnirun (TGGS, King Mongkut's University of Technology North Bangkok)
10:30	10:32	CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUs Petros Anastasiadis (National Technical University of Athens); Nikela Papadopoulou (National Technical University of Athens); Georgios Goumas (National Technical University of Athens); Nectarios Koziris (National Technical University of Athens)
10:32	10:34	Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures Atefeh Mehrabi (Duke University); Donghyuk Lee (NVIDIA); Niladrish Chatterjee (NVIDIA); Daniel J. Sorin (Duke University); Benjamin C. Lee (University of Pennsylvania); Mike O'Connor (NVIDIA / UT-Austin)
10:34	10:36	Analyzing Secure Memory Architecture for GPUs Shougang Yuan (NC State University); Ardhi Yudha (University of Central Florida); Yan Solihin (University of Central Florida); Huiyang Zhou (NC State University)
10:36	10:51	Discussion & Questions
10:51	11:21	Poster Session A
		MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing Gokul Subramanian Ravi (University of Chicago); Ramon Bertran and Pradip Bose (IBM Research); Mikko Lipasti (UW-Madison)
		ViStA: Video Streaming and Analytics Benchmark Navneet Raju, Hari Om, Rahul M Koushik and Subramaniam Kalambur (PES University,Bengaluru,India)
		Analysis of Factors Affecting Power Consumption and Energy Efficiency of SGEMM Workload on Low-Power 28nm Myriad-2 VPU Suyash Bakshi and Lennart Johnsson (University of Houston)
		A Defense-Inspired Benchmark Suite Pete Ehrett, Nathan Block, Bing Schaefer, Adrian Berding, John Paul Koenig, Pranav Srinivasan, Valeria Bertacco and Todd Austin (University of Michigan)
		An Automated Traffic Generation Framework for Performance Evaluation of Networks-on-Chip for Real World Use Cases Sri Harsha Gade (Arm Ltd., Bangalore); Anup Gangwar (Arm Ltd., Austin); Ambica Prasad, Nitin Kumar Agarwal and Ravishankar Sreedharan (Arm Ltd., Bangalore)
		How Do Graph Relabeling Algorithms Improve Memory Locality? Mohsen Koohi Esfahani, Peter Kilpatrick and Hans Vandierendonck (Queen's University Belfast)
		Designing GPU Architecture for Memory Bandwidth Reservation Emir C Marangoz, Kyoung-Don Kang and Seunghee Shin (The State University of New York at Binghamton)
		Reducing BERT Computation by Padding Removal and Curriculum Learning Wei Zhang, Wei Wei, Wen Wang, Lingling Jin and Zheng Cao (Alibaba Group)
		Efficient Split Counter Mode Encryption for NVM Qi Pei and Seunghee Shin (The State University of New York at Binghamton)
11:21	11:42	Session 3: Characterization Session Chair: Omer Khan (University of Connecticut)
11:21	11:23	AI Tax in Mobile SoCs: Quantifying the End-to-End AI Application Performance on Smartphones Michael Buch (Harvard University); Zahra Azad (Boston University); Ajay Joshi (Boston University); Vijay Janapa (Reddi Harvard/UT Austin/Google)
11:23	11:25	Performance Characterization of .NET Benchmarks Aniket Deshmukh (The University of Texas at Austin); Ruihao Li (The University of Texas at Austin); Rathijit Sen (Microsoft); Robert R. Henry (Microsoft); Monica Beckwith (Microsoft); Gagan Gupta (Microsoft)
11:25	11:27	Performance Analysis of Graph Neural Network Frameworks Junwei Wu (University of Science and Technology of China); Jingwei Sun (University of Science and Technology of China); Hao Sun (University of Science and Technology of China); Guangzhong Sun (University of Science and Technology of China)
11:27	11:42	Discussion & Questions
11:50	12:11	Session 4: Software Analysis Session Chair: Nikos Nikoleris (Arm Research)
11:50	11:52	Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven Approach Ali Zaidi (Arm Inc.); Konstantinos Iordanou (The University of Manchester); Mikel Lujan (The University of Manchester); Giacomo Gabrielli (Arm Inc.)
11:52	11:54	Real-Time Characterization of Data Access Correlations Bryan Harris (University of Louisville); Michael Marzullo (University of Louisville); Nihat Altiparmak (University of Louisville)
11:54	11:56	Comparative Code Structure Analysis using Deep Learning for Performance Prediction Nathan Pinnow (Lawrence Livermore National Laboratory); Tarek Ramadan (Texas State University); Tanzima Z. Islam (Texas State University); Chase Phelps (Texas State University); Jayaraman Thiagarajan (Lawrence Livermore National Laboratory)
11:56	12:11	Discussion & Questions

Start

End

Monday, March 29, 2021

09:00

09:10

Welcome

09:10

10:00

Keynote: Memory Architectures for Programmable Forward-Looking Systems

Speaker: Dan Lustig

(NVIDIA's Architecture Research Group)

10:00

10:21

Session 1: Benchmarking

Session Chair: Matthew D. Sinclair

(University of Wisconsin - Madison, AMD Research)

10:00

10:02

GenomicsBench: A Benchmark Suite for Genomics

Arun Subramaniyan (University of Michigan-Ann Arbor); Yufeng Gu (University of Michigan-Ann Arbor); Timothy Dunn (University of Michigan-Ann Arbor); Somnath Paul (Intel Corporation); Md Vasimuddin (Intel Corporation); Sanchit Misra (Intel Corporation); David Blaauw (University of Michigan-Ann Arbor); Satish Narayanasamy (University of Michigan-Ann Arbor); Reetuparna Das (University of Michigan-Ann Arbor)

10:02

10:04

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs

Trinayan Baruah (Northeastern University); Kaustubh Shivdikar (Northeastern University); Shi Dong (Cerebras); Yifan Sun (William & Mary); Saiful A. Mojumder (Boston University); Kihoon Jung (KAIST); Jose L. Abellan (Universidad Catolica de Murcia); Yash Ukidave (Millennium Management); Ajay Joshi (Boston University); John Kim (KAIST); David Kaeli (Northeastern University)

10:04

10:06

AIBench Training: Balanced Industry-Standard AI Training Benchmarking

Fei Tang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Wanling Gao (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Jianfeng Zhan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Chuanxin Lan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences ); Xu Wen (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Lei Wang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Chunjie Luo (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Jiahui Dai (Beijing Academy of Frontier Sciences and Technology); Zheng Cao (Alibaba Group); Xingwang Xiong (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Zihan Jiang (Institute of Computing Technology, Chinese Academy of Sciences); Tianshu Hao (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Fanda Fan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Fan Zhang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Yunyou Huang (Guangxi Normal University); Jianan Chen (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Mengjia Du (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences); Rui Ren (China Electronics Technology Research Institute of Cyberspace Security); Chen Zheng (Institute of Software, Chinese Academy of Sciences); Daoyi Zheng (Baidu); Haoning Tang (Tencent); Kunlin Zhan (58.com); Biao Wang (NetEase); Defei Kong (ByteDance); Minghe Yu (Zhihu); Chongkang Tan (Lenovo); Huan Li (Paypal); Xinhui Tian (Moqi); Yatao Li (Microsoft Research Asia China); Gang Lu (Huawei); Junchao Shao (JD.com); Zhenyu Wang (CloudTa); Xiaoyu Wang (Intellifusion); Hainan Ye (Beijing Academy of Frontier Sciences and Technology)

10:06

10:21

Discussion & Questions

10:30

10:51

Session 2: GPUs

Session Chair: Rachata Ausavarungnirun

(TGGS, King Mongkut's University of Technology North Bangkok)

10:30

10:32

CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUs

Petros Anastasiadis (National Technical University of Athens); Nikela Papadopoulou (National Technical University of Athens); Georgios Goumas (National Technical University of Athens); Nectarios Koziris (National Technical University of Athens)

10:32

10:34

Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures

Atefeh Mehrabi (Duke University); Donghyuk Lee (NVIDIA); Niladrish Chatterjee (NVIDIA); Daniel J. Sorin (Duke University); Benjamin C. Lee (University of Pennsylvania); Mike O'Connor (NVIDIA / UT-Austin)

10:34

10:36

Analyzing Secure Memory Architecture for GPUs

Shougang Yuan (NC State University); Ardhi Yudha (University of Central Florida); Yan Solihin (University of Central Florida); Huiyang Zhou (NC State University)

10:36

10:51

Discussion & Questions

10:51

11:21

Poster Session A

MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing

Gokul Subramanian Ravi (University of Chicago); Ramon Bertran and Pradip Bose (IBM Research); Mikko Lipasti (UW-Madison)

ViStA: Video Streaming and Analytics Benchmark

Navneet Raju, Hari Om, Rahul M Koushik and Subramaniam Kalambur (PES University,Bengaluru,India)

Analysis of Factors Affecting Power Consumption and Energy Efficiency of SGEMM Workload on Low-Power 28nm Myriad-2 VPU

Suyash Bakshi and Lennart Johnsson (University of Houston)

A Defense-Inspired Benchmark Suite

Pete Ehrett, Nathan Block, Bing Schaefer, Adrian Berding, John Paul Koenig, Pranav Srinivasan, Valeria Bertacco and Todd Austin (University of Michigan)

An Automated Traffic Generation Framework for Performance Evaluation of Networks-on-Chip for Real World Use Cases

Sri Harsha Gade (Arm Ltd., Bangalore); Anup Gangwar (Arm Ltd., Austin); Ambica Prasad, Nitin Kumar Agarwal and Ravishankar Sreedharan (Arm Ltd., Bangalore)

How Do Graph Relabeling Algorithms Improve Memory Locality?

Mohsen Koohi Esfahani, Peter Kilpatrick and Hans Vandierendonck (Queen's University Belfast)

Designing GPU Architecture for Memory Bandwidth Reservation

Emir C Marangoz, Kyoung-Don Kang and Seunghee Shin (The State University of New York at Binghamton)

Reducing BERT Computation by Padding Removal and Curriculum Learning

Wei Zhang, Wei Wei, Wen Wang, Lingling Jin and Zheng Cao (Alibaba Group)

Efficient Split Counter Mode Encryption for NVM

Qi Pei and Seunghee Shin (The State University of New York at Binghamton)

11:21

11:42

Session 3: Characterization

Session Chair: Omer Khan

(University of Connecticut)

11:21

11:23

AI Tax in Mobile SoCs: Quantifying the End-to-End AI Application Performance on Smartphones

Michael Buch (Harvard University); Zahra Azad (Boston University); Ajay Joshi (Boston University); Vijay Janapa (Reddi Harvard/UT Austin/Google)

11:23

11:25

Performance Characterization of .NET Benchmarks

Aniket Deshmukh (The University of Texas at Austin); Ruihao Li (The University of Texas at Austin); Rathijit Sen (Microsoft); Robert R. Henry (Microsoft); Monica Beckwith (Microsoft); Gagan Gupta (Microsoft)

11:25

11:27

Performance Analysis of Graph Neural Network Frameworks

Junwei Wu (University of Science and Technology of China); Jingwei Sun (University of Science and Technology of China); Hao Sun (University of Science and Technology of China); Guangzhong Sun (University of Science and Technology of China)

11:27

11:42

Discussion & Questions

11:50

12:11

Session 4: Software Analysis

Session Chair: Nikos Nikoleris

(Arm Research)

11:50

11:52

Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven Approach

Ali Zaidi (Arm Inc.); Konstantinos Iordanou (The University of Manchester); Mikel Lujan (The University of Manchester); Giacomo Gabrielli (Arm Inc.)

11:52

11:54

Real-Time Characterization of Data Access Correlations

Bryan Harris (University of Louisville); Michael Marzullo (University of Louisville); Nihat Altiparmak (University of Louisville)

11:54

11:56

Comparative Code Structure Analysis using Deep Learning for Performance Prediction

Nathan Pinnow (Lawrence Livermore National Laboratory); Tarek Ramadan (Texas State University); Tanzima Z. Islam (Texas State University); Chase Phelps (Texas State University); Jayaraman Thiagarajan (Lawrence Livermore National Laboratory)

11:56

12:11

Discussion & Questions

Start	End	Tuesday, March 30, 2021
09:00	09:50	Keynote: Systems for Precision Health Speaker: Reetuparna Das (University of Michigan)
09:50	10:25	Session 5: Best Paper Nominations Session Chair: Trevor E. Carlson (National University of Singapore)
09:50	09:52	Understanding Capacity-Driven Scale-Out Neural Recommendation Inference Michael Lui (Drexel University); Yavuz Yetim (Facebook); Oz Ozkan (Facebook); Zhuoran Zhao (Facebook); Shin-Yeh Tsai (Facebook); Carole-Jean Wu (Facebook); Mark Hempstead (Tufts University)
09:52	09:54	Re-establishing Fetch-Directed Instruction Prefetching: An Industry Perspective Yasuo Ishii (Arm); Jaekyu Lee (Arm Research); Krishnendra Nathella (Arm Research); Dam Sunwoo (Arm Research)
09:54	09:56	Enabling reproducible and agile full-system simulation Bobby R. Bruce (University of California, Davis); Ayaz Akram (University of California, Davis); Hoa Nguyen (University of California, Davis); Kyle Roarty (University of Wisconsin-Madison); Mahyar Samani (University of California, Davis); Marjan Fariborz (University of California, Davis); Trivikram Reddy (University of California, Davis); Matthew D. Sinclair (University of Wisconsin-Madison, AMD Research); Jason Lowe-Power (University of California, Davis)
09:56	09:58	A Case Against Hardware Managed DRAM Caches for NVRAM based Systems Mark Hildebrand (University of California, Davis); Julian T. Angeles (University of California, Davis); Jason Lowe-Power (University of California, Davis); Venkatesh Akella (University of California, Davis)
09:58	10:00	Characterizing Massively Parallel Polymorphism Mengchi Zhang (Purdue University); Ahmad Alawneh (Purdue University); Timothy G. Rogers (Purdue University)
10:00	10:25	Discussion & Questions
10:25	10:55	Poster Session B
		Pinpointing the Memory Behaviors of DNN Training Jiansong Li (Institute of Computing Technology, Chinese Academy of Sciences); Xiao Dong (Youtu Lab, Tencent); Guangli Li (Institute of Computing Technology, Chinese Academy of Sciences); Peng Zhao (2012 Labs, Huawei Technology Co., Ltd); Xueying Wang (Institute of Computing Technology, Chinese Academy of Sciences); Xianzhi Yu (Noah‚Äôs Ark Lab, Huawei Technology Co., Ltd); Wei Cao, Lei Liu, and Xiaobing Feng (Institute of Computing Technology, Chinese Academy of Sciences)
		Thermal-Aware Overclocking for Smartphones Guru Prasad Srinivasa (University at Buffalo); David Werner and Mark Hempstead (Tufts University); Geoffrey Challen (University of Illinois)
		The Impact of SoC Integration and OS Deployment on the Reliability of Arm Processors Pablo Bodmann (UFRGS); George Papadimitriou (University of Athens); Dimitris Gizopoulos (University of Athens); Paolo Rech (Politecnico di Torino)
		Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms Jingyi Xu, Sehoon Kim, Borivoje Nikolic, and Yakun Sophia Shao (University of California, Berkeley)
		Architecture-Level Energy Estimation for Heterogeneous Computing Systems Francis Wang, Yannan Nellie Wu, Matthew Woicik, Vivienne Sze and Joel S. Emer (Massachusetts Institute of Technology)
		Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators Yannan Nellie Wu (MIT); Po-An Tsai and Angshuman Parashar (NVIDIA); Vivienne Sze (MIT); Joel S. Emer (MIT/NVIDIA)
		Splash-4: Improving Scalability with Lock-Free Constructs Eduardo José Gómez Hernández and Ruixiang Shao (University of Murcia); Christos Sakalis and Stefanos Kaxiras (Uppsala University); Alberto Ros (University of Murcia)
		Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization Wonkyung Jung (Seoul National University); Eojin Lee (Samsung Electronics); Sangpyo Kim, Namhoon Kim and Keewoo Lee (Seoul National University); Chohong Min (Ewha Woman's University); Jung Hee Cheon and Jung Ho Ahn (Seoul National University)
		Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators Subhankar Pal (University of Michigan); Swagath Venkataramani, Viji Srinivasan and Kailash Gopalakrishnan (IBM Research)
10:55	11:23	Session 6: Datacenters and HPC Session Chair: Christina Delimitrou (Cornell University)
10:55	10:57	Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads? Zahra Azad (Boston University); Rathijit Sen (Microsoft); Kwanghyun Park (Microsoft); Ajay Joshi (Boston University)
10:57	10:59	TPUPoint: Automatically Characterizing Hardware Accelerated Data Center Machine Learning Program Behavior Abenezer Wudenhe (University of California, Riverside); Hung-Wei Tseng (University of California, Riverside)
10:59	11:01	Pitfalls of InfiniBand with On-Demand Paging Takuya Fukuoka (The University of Tokyo); Shigeyuki Sato (The University of Tokyo); Kenjiro Taura (University of Tokyo)
11:01	11:03	Analyzing the Interplay Between Random Shuffling and Storage Devices for Efficient Machine Learning Zhi-Lin Ke (National Taiwan University); Hsiang-Yun Cheng (Academia Sinica); Chia-Lin Yang (National Taiwan University); Han-Wei Huang (National Taiwan University)
11:03	11:23	Discussion & Questions
11:30	11:51	Session 7: HW and Co-Design Session Chair: Arrvindh Shriraman (Simon-Fraser University)
11:30	11:32	E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device Sheng-Chun Kao (Georgia Institute of Technology); Tushar Krishna (Georgia Institute of Technology)
11:32	11:34	FireMarshal: Making HW/SW Co-Design Reproducible and Reliable Nathan Pemberton (University of California, Berkeley); Alon Amid (University of California, Berkeley)
11:34	11:36	COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors Jerry Zhao (University of California, Berkeley); Abraham Gonzalez (University of California, Berkeley); Alon Amid (University of California, Berkeley); Sagar Karandikar (University of California, Berkeley); Krste Asanovic (University of California Berkeley)
11:36	11:51	Discussion & Questions
11:51	12:01	Closing Remarks

Start

End

Tuesday, March 30, 2021

09:00

09:50

Keynote: Systems for Precision Health

Speaker: Reetuparna Das

(University of Michigan)

09:50

10:25

Session 5: Best Paper Nominations

Session Chair: Trevor E. Carlson

(National University of Singapore)

09:50

09:52

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Michael Lui (Drexel University); Yavuz Yetim (Facebook); Oz Ozkan (Facebook); Zhuoran Zhao (Facebook); Shin-Yeh Tsai (Facebook); Carole-Jean Wu (Facebook); Mark Hempstead (Tufts University)

09:52

09:54

Re-establishing Fetch-Directed Instruction Prefetching: An Industry Perspective

Yasuo Ishii (Arm); Jaekyu Lee (Arm Research); Krishnendra Nathella (Arm Research); Dam Sunwoo (Arm Research)

09:54

09:56

Enabling reproducible and agile full-system simulation

Bobby R. Bruce (University of California, Davis); Ayaz Akram (University of California, Davis); Hoa Nguyen (University of California, Davis); Kyle Roarty (University of Wisconsin-Madison); Mahyar Samani (University of California, Davis); Marjan Fariborz (University of California, Davis); Trivikram Reddy (University of California, Davis); Matthew D. Sinclair (University of Wisconsin-Madison, AMD Research); Jason Lowe-Power (University of California, Davis)

09:56

09:58

A Case Against Hardware Managed DRAM Caches for NVRAM based Systems

Mark Hildebrand (University of California, Davis); Julian T. Angeles (University of California, Davis); Jason Lowe-Power (University of California, Davis); Venkatesh Akella (University of California, Davis)

09:58

10:00

Characterizing Massively Parallel Polymorphism

Mengchi Zhang (Purdue University); Ahmad Alawneh (Purdue University); Timothy G. Rogers (Purdue University)

10:00

10:25

Discussion & Questions

10:25

10:55

Poster Session B

Pinpointing the Memory Behaviors of DNN Training

Jiansong Li (Institute of Computing Technology, Chinese Academy of Sciences); Xiao Dong (Youtu Lab, Tencent); Guangli Li (Institute of Computing Technology, Chinese Academy of Sciences); Peng Zhao (2012 Labs, Huawei Technology Co., Ltd); Xueying Wang (Institute of Computing Technology, Chinese Academy of Sciences); Xianzhi Yu (Noah‚Äôs Ark Lab, Huawei Technology Co., Ltd); Wei Cao, Lei Liu, and Xiaobing Feng (Institute of Computing Technology, Chinese Academy of Sciences)

Thermal-Aware Overclocking for Smartphones

Guru Prasad Srinivasa (University at Buffalo); David Werner and Mark Hempstead (Tufts University); Geoffrey Challen (University of Illinois)

The Impact of SoC Integration and OS Deployment on the Reliability of Arm Processors

Pablo Bodmann (UFRGS); George Papadimitriou (University of Athens); Dimitris Gizopoulos (University of Athens); Paolo Rech (Politecnico di Torino)

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms

Jingyi Xu, Sehoon Kim, Borivoje Nikolic, and Yakun Sophia Shao (University of California, Berkeley)

Architecture-Level Energy Estimation for Heterogeneous Computing Systems

Francis Wang, Yannan Nellie Wu, Matthew Woicik, Vivienne Sze and Joel S. Emer (Massachusetts Institute of Technology)

Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators

Yannan Nellie Wu (MIT); Po-An Tsai and Angshuman Parashar (NVIDIA); Vivienne Sze (MIT); Joel S. Emer (MIT/NVIDIA)

Splash-4: Improving Scalability with Lock-Free Constructs

Eduardo José Gómez Hernández and Ruixiang Shao (University of Murcia); Christos Sakalis and Stefanos Kaxiras (Uppsala University); Alberto Ros (University of Murcia)

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization

Wonkyung Jung (Seoul National University); Eojin Lee (Samsung Electronics); Sangpyo Kim, Namhoon Kim and Keewoo Lee (Seoul National University); Chohong Min (Ewha Woman's University); Jung Hee Cheon and Jung Ho Ahn (Seoul National University)

Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

Subhankar Pal (University of Michigan); Swagath Venkataramani, Viji Srinivasan and Kailash Gopalakrishnan (IBM Research)

10:55

11:23

Session 6: Datacenters and HPC

Session Chair: Christina Delimitrou

(Cornell University)

10:55

10:57

Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?

Zahra Azad (Boston University); Rathijit Sen (Microsoft); Kwanghyun Park (Microsoft); Ajay Joshi (Boston University)

10:57

10:59

TPUPoint: Automatically Characterizing Hardware Accelerated Data Center Machine Learning Program Behavior

Abenezer Wudenhe (University of California, Riverside); Hung-Wei Tseng (University of California, Riverside)

10:59

11:01

Pitfalls of InfiniBand with On-Demand Paging

Takuya Fukuoka (The University of Tokyo); Shigeyuki Sato (The University of Tokyo); Kenjiro Taura (University of Tokyo)

11:01

11:03

Analyzing the Interplay Between Random Shuffling and Storage Devices for Efficient Machine Learning

Zhi-Lin Ke (National Taiwan University); Hsiang-Yun Cheng (Academia Sinica); Chia-Lin Yang (National Taiwan University); Han-Wei Huang (National Taiwan University)

11:03

11:23

Discussion & Questions

11:30

11:51

Session 7: HW and Co-Design

Session Chair: Arrvindh Shriraman

(Simon-Fraser University)

11:30

11:32

E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device

Sheng-Chun Kao (Georgia Institute of Technology); Tushar Krishna (Georgia Institute of Technology)

11:32

11:34

FireMarshal: Making HW/SW Co-Design Reproducible and Reliable

Nathan Pemberton (University of California, Berkeley); Alon Amid (University of California, Berkeley)

11:34

11:36

COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors

Jerry Zhao (University of California, Berkeley); Abraham Gonzalez (University of California, Berkeley); Alon Amid (University of California, Berkeley); Sagar Karandikar (University of California, Berkeley); Krste Asanovic (University of California Berkeley)

11:36

11:51

Discussion & Questions

11:51

12:01

Closing Remarks