Publications
HALO: Hybrid Systolic Arrays via Logical Partitioning for Acceleration of Complex-Valued Neural Networks
IEEE International Symposium on Workload Characterization (IISWC 2025)
MOST: Memory Oversubscription-aware Scheduling for Tensor Migration on GPU Unified Storage
IEEE Computer Architecture Letters, 2025
Kubism: Disassembling and Reassembling K-Means Clustering for Mobile Heterogeneous Platforms
26th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2025)
TM-Training: An Energy-Efficient Tiered Memory System for Deep Learning Training in NPUs
ACM Transactions of Storage, 2025
TLP Balancer: Predictive Thread Allocation for Multi-Tenant Inference in Embedded GPUs
IEEE Embedded Systems Letters, 2024
SAVector: Vectored Systolic Arrays
IEEE Access, 2024
Energy-Efficient On-Chip Memory Management for Embedding Vector Operation
In preparation for submission to an international conference
Mitigating Address Translation Overhead in Flash-Based Tiered Memory Systems
In submission to an international conference
A Behavioral Analysis of Memory Management Software in CXL Memory Systems
In submission to an international conference
A DNN Accelerator Supporting Arbitrary Numeric Formats
In preparation for submission to an international conference