Technical Program Schedule

Saturday, February 10th, 2007
All Day Events Workshop RIDMS-2: Second Workshop on Real-Time and Interactive Digital Media Supercomputing

Sunday, February 11th, 2007
All Day Events Workshop INTERACT-11: Eleventh Workshop on Interaction between Compilers and Computer Architectures
Workshop CAECW-10: Tenth Workshop on Computer Architecture Evaluation using Commercial Workloads
Morning Events Workshop RIDMS-2: Second Workshop on Real-Time and Interactive Digital Media Supercomputing
Workshop CMP-MSI: First Workshop on Chip Multiprocessor Memory Systems and Interconnects
Tutorial: Practical Cache Performance Modeling for Computer Architects
Y. Solihin (NCSU), T. Puzak, and P. Emma (IBM Research)
Afternoon Events Workshop CARD: First Workshop on Computer Architecture Research Directions
Tutorial: Microprocessor Memory Array Circuits for Architects
6:00PM - 8:00PM HPCA Conference Reception

Monday, February 12th, 2007
7:30AM - 8:30AM Breakfast
8:30AM - 8:50AM Welcome Message
8:50AM - 10:00AM Keynote I
Interconnect-Centric Computing
Bill Dally (Willard R. and Inez Kerr Bell Professor of Engineering and Chairman, Department of Computer Science, Stanford University)
Abstract: As we enter the many-core era, the interconnection networks of a computer system, rather than the processor or memory modules, will dominate its performance. Several recent developments in interconnection network architecture including global adaptive routing, high-radix routers, and technology-matched topologies offer large improvements in the performance and efficiency of this critical component. The implementation of a portion of several interconnection networks on multi-core chips also raises new opportunities and challenges for network design. This talk explores the role of interconnection networks in modern computer systems, recent developments in network architecture and design, and the challenges of on-chip interconnection networks. Examples will be drawn from several systems including the Cray BlackWidow. Slides
10:00AM - 10:30AM Break
10:30AM - 12:00PM Session 1: Multiprocessor Architectures
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
H. Dybdahl (Norwegian University of Science and Technology) and P. Stenström (Chalmers)
Evaluating MapReduce for Multicore and Multiprocessor Systems
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski and C. Kozyrakis (Stanford University)
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications
H. Zhong, S. Lieberman, and S. Mahlke, (University of Michigan)
12:00PM - 1:30PM Lunch
1:30PM - 3:00PM Session 2: Industry
Implications of Device Timing Variability on Full Chip Timing
M. Annavaram, E. Grochowski, and P. Reed (Intel)
Optical Interconnect Opportunities for Future Server Memory Systems
Y. Katayama and A. Okazaki (IBM)
3:00PM - 3:30PM Break
3:30PM - 5:00PM Session 3: Prefetching
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
S. Srinath (Microsoft and The University of Texas at Austin), O. Mutlu (Microsoft Research), H. Kim, and Y. Patt (University of Texas at Austin)
Improving Branch Prediction and Predicated Execution in Out-of-Order Processors
E. Quiñones, J.M. Parcerisa (Universitat Politècnica de Catalunya) and A. González (Intel and Universitat Politècnica de Catalunya)
Accelerating and Adapting Precomputation Threads for Efficient Prefetching
W. Zhang, B. Calder, and D. Tullsen (University of California, San Diego)
6:00PM - 7:30PM TCC Business Meeting

Tuesday, February 13th, 2007
7:30AM - 8:30AM Breakfast
8:30AM - 9:30AM Keynote II
Petascale Computing Research Challenges – A Manycore Perspective
Steve Pawlowski (Senior Fellow and Chief Technology Officer of the Digital Enterprise Group, Intel Corporation)
Abstract: Future High Performance Computing will undoubtedly reach Petascale and beyond. Today’s HPC is tomorrow’s Personal Computing. What are the evolving processor architectures towards Multi-core and Many-core for the best performance per watt; memory bandwidth solutions to feed the ever more powerful processors; intra-chip interconnect options for optimal bandwidth vs. power? With Moore’s Law continuing to prove its viability and shrinking transistors’ geometry mean that improving reliability is even more challenging. Intel Senior Fellow and Chief Technology Officer of Intel’s Digital Enterprise Group, Steve Pawlowski, will provide his technology vision, insight and research challenges to achieve the vision of Petascale computing and beyond. Slides
9:30AM - 10:00AM Break
10:00AM - 12:00PM Session 4: Memory Systems I (Parallel Session)
A Scalable, Non-blocking Approach to Transactional Memory
H. Chafi, J. Casper, B. Carlstrom, A. McDonald, C. Cao Minh, W. Baek, C. Kozyrakis, and K. Olukotun (Stanford University)
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Over-heads and Scaling
B. Ganesh, B. Jacob (University of Maryland College Park), D. Wang (Metaram), and A. Jaleel (Intel)
HARD: Hardware-Assisted Lockset-Based Race Detection
P. Zhou, R. Teodorescu, and Y. Zhou (University of Illinois at Urbana-Champaign)
Colorama: Architectural Support for Data-Centric Synchronization
L. Ceze, Pablo Montesinos (University of Illinois at Urbana-Champaign), C. von Praun (IBM T J Watson) and J. Torrellas (University of Illinois at Urbana- Champaign)
Session 5: Error Detection and Fault-Tolerance (Parallel Session)
Error Detection Via Online Checking of Cache Coherence with Token Coherence Signatures
A. Meixner and D. Sorin (Duke University)
A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures
R. Fernndez-Pascual, J. Garca, M. Acacio, and J. Duato, (Universidad de Murcia and Universidad Politcnica de Valencia)
Perturbation-Based Fault Screening
P. Racunas (Intel), K. Constantinides (University of Michigan), S. Manne, and S. Mukherjee (Intel)
Application-Level Correctness and its Impact on Fault Tolerance
X. Li, and D. Yeung (University of Maryland at College Park)
12:00PM - 1:30PM Lunch
1:30PM - 3:00PM Session 6: Thermal Modeling and SIMD (Parallel Session)
Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors
K. Puttaswamy and G. Loh (Georgia Institute of Technology)
Modeling and Managing Thermal Profiles of Rack-Mounted Servers with ThermoStat
J. Choi, Y. Kim, A. Sivasubramaniam, J. Srebric, Q. Wang (Pennsylvania State University), and J. Lee (KAIST)
Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping
N. Clark, A. Hormati, S. Mahlke (University of Michigan), S. Yehia, and K. Flautner (ARM)
Session 7: Chip Multiprocessors, Simultaneous Multi-threading, and Caches (Parallel Session)
Interactions Between Compression and Prefetching in Chip Multiprocessors
A. Alameldeen (Intel) and D. Wood (University of Wisconsin-Madison)
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors
S. Eyerman and L. Eeckhout (Ghent University)
Line Distillation: Increasing Cache Capacity By Filtering Unused Words in Cache Lines
M. Qureshi, M. Suleman, and Y. Patt (University of Texas at Austin)
3:00PM - 3:30PM Break
3:30PM - 5:00PM Panel
Researching Novel Systems: To Instantiate, Emulate, Simulate, or Analyticate?
Moderator: Doug Burger (University of Texas at Austin)

Panel Members:
Joel Emer (Intel)
Phil Emma (IBM)
Steve Keckler (University of Texas at Austin)
Yale Patt (University of Texas at Austin)
Dave Patterson (University of California, Berkeley)

Description: The computer architecture research community has a rich menu of methodological options, which includes building full system prototypes, measuring in simulation, emulating on FPGAs, or constructing sophisticated analytic models. However, building custom systems has become enormously expensive, especially given the current funding climate. Simulations have become enormously complex as well, often including full operating systems. Analytic models have become less popular as system complexity has grown. Finally, some argue that FPGA emulation of hardware is the right approach for the future, while others opine that it is the worst of all worlds. This panel will debate these various points of view, which are of great interest to the funding sponsors of our community.
6:00PM - 10:30PM Banquet

Wednesday, February 14th, 2007
7:00AM - 8:00AM Breakfast
8:00AM - 10:00AM Session 8: Memory Systems II
LogTM SE: Decoupling Hardware Transactional Memory from Caches
L. Yen, J. Bobba, M. Marty, K. Moore, H. Volos, M. Hill, M. Swift, and D. Wood (University of Wisconsin-Madison)
MemTracker: Efficient and Programmable Support for Memory Access Monitoring and Debugging
G. Venkataramani, B. Roemer (Georgia Institute of Technology), Y. Solihin (North Carolina State University) and M. Prvulovic (Georgia Institute of Technology)
A Burst Scheduling Access Reordering Mechanism
J. Shao and B. Davis, (Michigan Technological University)
Exploiting Postdominance for Speculative Parallelization
M. Agarwal, K. Malik, K. Woley, S. Stone, M. Frank (University of Illinois at Urbana Champaign)
10:00AM - 10:30AM Break
10:30AM - 12:30PM Session 9: Virtual Machines, Caches and Modeling
Concurrent Direct Network Access for Virtual Machine Monitors
P. Willmann, J. Shafer, D. Carr (Rice University), A. Menon (EPFL), S. Rixner, A. Cox (Rice University), and W. Zwaenepoel (EPFL)
A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
Y. Jin, E. Kim (Texas A&M University), K. Yum (University of Texas, San Antonio)
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
L. Cheng, J. Carter (University of Utah), and D. Dai (SGI)
Illustrative Design Space Studies with Microarchitectural Regression Models
B. Lee and D. Brooks (Harvard University)
12:30PM Conference Program Ends