Research
Sustainable NPU Architecture
The proliferation of AI applications has made the efficient design of neural processing units (NPUs) crucial. As AI models diversify to serve a wide range of purposes, they exhibit substantial variation not only in size but also in operator types, data formats, and numeric formats. In response, both industry and academia have proposed customized solutions, designing new NPU chips tailored to each emerging model. This trend raises a fundamental question: is this approach sustainable?
- Scalable on-chip memory systems and management techniques for scale-out NPU architectures
- On-chip memory management technique for embedding vector operations
- Energy- and cost-efficient heterogeneous off-chip memory systems for NPUs
- Processing unit architectures supporting arbitrary numeric formats
- Simulation infrastructure for NPUs
Future Memory Systems
- Comprehensive analysis of CXL-based server memory systems
- Data object-aware memory management for CXL-based memory systems
- Software-based cache coherence for CXL-based memory systems
Software Techniques for GPUs
- Automatic thread allocation technique for multi-tenant inference on embedded GPUs
- Memory oversubscription-aware tensor migration scheduling technique
- Accelerating K-means clustering algorithm on embedded GPUs