An image enhancing pattern-based sparsity for real-time DNN execution with compiler
A Unified framework for DNN model compression using Reweighted L1 method
Automatic DNN structured pruning framework for ultra-high compression rates
Locality-aware weight pruning for fast DNN inference on GPUs
Achieve Real-Time DNN Execution on Off-the-Shelf Mobile Devices with Compression-Compilation Co-Design
The PCONV acceleration framework uses a new pruning dimension — fine-grained pruning patterns inside the coarse-grained structures to guarantee both the high accuracy and hardware friendliness. The flexibility relies on the mathematical-sound pattern-based sparsity, which maintains high accuracy of the sparse DNN with high sparsity ratio. The pattern-based sparsity naturally match the source-to-source compilation and code optimization technique, which bridge the gap between algorithm-level compression and embedded hardware acceleration. PCONV achieves real-time execution of most representative DNN structures on the off-the-shelf smartphones.