DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures
Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula
Key claim
DCC achieves up to 13.17x speedup on PIM devices.
DCC is a new ML compiler that optimizes data rearrangements and compute code for PIM devices, significantly improving performance. It achieves up to 13.17x speedup on specific PIM architectures compared to GPU-only execution, which is crucial for builders focused on maximizing efficiency in ML applications.
DCC introduces a novel approach to jointly optimize data rearrangements and compute code for PIM systems.
The methodology includes rigorous evaluations and is open-sourced, supporting its claims.
Deep reliability assessment
The methodology supports the claim that DCC can significantly optimize data rearrangement and compute scheduling for PIM architectures, but it may overclaim the extent of performance improvements without considering specific hardware limitations or configurations. The results are based on simulations and may not fully translate to all real-world scenarios.
Reproducibility
yes, DCC is open-sourced at https://github.com/SPIN-Research-Group/DCC.
Discussion questions
- What assumptions about the interdependence of data rearrangement and compute scheduling might not hold for all types of ML kernels?
- How can builders effectively integrate DCC into existing ML workflows, especially when dealing with diverse hardware setups?
- What specific conditions or configurations would lead to DCC underperforming compared to traditional methods?
Key figure
Figure 1 illustrates the workflow of near-bank PIM architecture, detailing the three steps of input data rearrangement, computation execution, and output data rearrangement.