Neural MagicinDeep SparseoBERT: Compound Sparsification Delivers Smaller Accurate Models for NLPGPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse4 min read·May 20, 2022--1--1
Neural MagicinDeep SparseSparsify Hugging Face BERT for Better CPU Performance & Smaller File SizeGet Started: Sparsify Hugging Face BERT Using Your Data2 min read·Oct 8, 2021----
Neural MagicinDeep SparsePruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better…Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to…6 min read·Aug 23, 2021----
Neural MagicinCodeXYOLOv5: Tiny Footprint & GPU Results on CPUs — Neural MagicPrune and quantize YOLOv5 for a 10x increase in performance with 12x smaller model files.8 min read·Aug 6, 2021----
Neural MagicinDeep SparseTutorial: Real-time YOLOv3 on a Laptop Using Sparse QuantizationSparsifying YOLOv3 (or any other model) involves removing redundant information from neural networks using algorithms such as pruning and…1 min read·May 25, 2021----
Neural MagicinDeep SparseYOLOv3 on CPUs: Sparsifying to Achieve GPU-Level PerformanceUse CPUs to decrease costs and increase deployment flexibility while still achieving GPU-class performance.9 min read·Apr 1, 2021----
Neural MagicinDeep SparseDelivering GPU-Class Performance on CPUs: How Neural Magic’s Deep Sparse Technology WorksWhile mapping the neural connections in the brain at MIT, Neural Magic’s founders Nir Shavit and Alexander Matveev were frustrated with…4 min read·Mar 22, 2021----
Neural MagicinDeep SparseSparsifying for Better ResNet-50 Performance on CPUsIn this post, we elaborate on how we measured, on commodity cloud hardware, the throughput and latency of five ResNet-50 v1 models…7 min read·Mar 11, 2021----