Sustainable NPU Architecture

The surge in AI applications has made the design of efficient neural processing units (NPUs) crucial. The rapid growth of AI applications leads to diverse characteristics in AI models tailored to their purpose. This diversity expands across all aspects, including operator types, data formats, and numeric formats that AI models utilize, beyond just model sizes. Industry and academia propose customized solutions and design new NPU chips for emerging AI models with novel characteristics.
We raise the question: "Is this approach sustainable?"

  • Shared on-chip memory architecture and management technique for scale-out NPU architectures
  • On-chip memory management technique for any embedding vector operation
  • Energy and cost efficient heterogeneous off-chip memory system for NPUs
  • Processing unit architecture for abitrary numeric formats
  • Future Memory Systems

    To be added.

  • Comprehensive analysis of CXL-based server memory systems
  • Software Techniques for GPUs

    To be added.

  • Automatic thread allocation technique for multi-tenant inference on embedded GPUs
  • Memory oversubscription-aware tensor migration scheduling technique
  • Accelerating K-means clustering algorithm on embedded GPUs