Publications
TM-Training: An Energy-Efficient Tiered Memory System for Deep Learning Training in NPUs
ACM Transactions of Storage, 2025
TLP Balancer: Predictive Thread Allocation for Multi-Tenant Inference in Embedded GPUs
IEEE Embedded Systems Letters, 2024
SAVector: Vectored Systolic Arrays
IEEE Access, 2024
Energy-Efficient On-Chip Memory Management for Any Embedding Vector Operation
In preparation for submission to an international conference
Unified Address Translation for DNN Accelerators
In submission to an international conference
Accelerating K-Means Clustering in Mobile Platforms
In submission to an international conference
A Behavioral Analysis of Memory Management Software in CXL Memory Systems
In submission to an international conference
Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage
In submission to IEEE Computer Architecture Letters
A DNN Accelerator Supporting Arbitrary Numeric Formats
In preparation for submission to an international conference