Neural Magic – Medium

Neural Magic
in
Deep Sparse

oBERT: Compound Sparsification Delivers Smaller Accurate Models for NLP

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse

4 min readMay 20, 2022

--

1

oBERT: Compound Sparsification Delivers Smaller Accurate Models for NLP

--

1

Neural Magic
in
Deep Sparse

Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

Get Started: Sparsify Hugging Face BERT Using Your Data

2 min readOct 8, 2021

--

Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

--

Neural Magic
in
Deep Sparse

Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better…

Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to…

6 min readAug 23, 2021

--

Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better…

--

Neural Magic
in
CodeX

YOLOv5: Tiny Footprint & GPU Results on CPUs — Neural Magic

Prune and quantize YOLOv5 for a 10x increase in performance with 12x smaller model files.

8 min readAug 6, 2021

--

DeepSparse vs. ONNX vs. PyTorch YOLOv5 Performance

--

Neural Magic
in
Deep Sparse

Tutorial: Real-time YOLOv3 on a Laptop Using Sparse Quantization

Sparsifying YOLOv3 (or any other model) involves removing redundant information from neural networks using algorithms such as pruning and…

1 min readMay 25, 2021

--

YOLOv3 on a Laptop Example

--

Neural Magic
in
Deep Sparse

YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance

Use CPUs to decrease costs and increase deployment flexibility while still achieving GPU-class performance.

9 min readApr 1, 2021

--

Comparison of the real-time performance of YOLOv3 (batch size 1) for different CPU implementations to common GPU benchmarks.

--

Neural Magic
in
Deep Sparse

Delivering GPU-Class Performance on CPUs: How Neural Magic’s Deep Sparse Technology Works

While mapping the neural connections in the brain at MIT, Neural Magic’s founders Nir Shavit and Alexander Matveev were frustrated with…

4 min readMar 22, 2021

--

Delivering GPU-Class Performance on CPUs: How Neural Magic’s Deep Sparse Technology Works

--

Neural Magic
in
Deep Sparse

Sparsifying for Better ResNet-50 Performance on CPUs

In this post, we elaborate on how we measured, on commodity cloud hardware, the throughput and latency of five ResNet-50 v1 models…

7 min readMar 11, 2021

--

ResNet-50 Throughput

--

Neural Magic

Neural Magic

Optimize your DL models with ease. Run on CPUs at GPU speeds. The future of #deeplearning is sparse.

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams