Research

Sustainable NPU Architecture

The proliferation of AI applications has made the efficient design of neural processing units (NPUs) crucial. As AI models diversify to serve a wide range of purposes, they exhibit substantial variation not only in size but also in operator types, data formats, and numeric formats. In response, both industry and academia have proposed customized solutions, designing new NPU chips tailored to each emerging model. This trend raises a fundamental question: is this approach sustainable?

Scalable on-chip memory systems and management techniques for scale-out NPU architectures
On-chip memory management technique for embedding vector operations
Energy- and cost-efficient heterogeneous off-chip memory systems for NPUs
Processing unit architectures supporting arbitrary numeric formats
Simulation infrastructure for NPUs

Future Memory Systems

Comprehensive analysis of CXL-based server memory systems
Data object-aware memory management for CXL-based memory systems
Software-based cache coherence for CXL-based memory systems

Software Techniques for GPUs

Automatic thread allocation technique for multi-tenant inference on embedded GPUs
Memory oversubscription-aware tensor migration scheduling technique
Accelerating K-means clustering algorithm on embedded GPUs