Neural Magic – Medium

Neural Magic

Published in
Deep Sparse

oBERT: Compound Sparsification Delivers Smaller Accurate Models for NLP

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse

May 20, 2022

oBERT: Compound Sparsification Delivers Smaller Accurate Models for NLP

May 20, 2022

Published in
Deep Sparse

Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

Get Started: Sparsify Hugging Face BERT Using Your Data

Oct 8, 2021

Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

Oct 8, 2021

Published in
Deep Sparse

Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better…

Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to…

Aug 23, 2021

Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better…

Aug 23, 2021

Published in
CodeX

YOLOv5: Tiny Footprint & GPU Results on CPUs — Neural Magic

Prune and quantize YOLOv5 for a 10x increase in performance with 12x smaller model files.

Aug 6, 2021

DeepSparse vs. ONNX vs. PyTorch YOLOv5 Performance

Aug 6, 2021

Published in
Deep Sparse

Tutorial: Real-time YOLOv3 on a Laptop Using Sparse Quantization

Sparsifying YOLOv3 (or any other model) involves removing redundant information from neural networks using algorithms such as pruning and…

May 25, 2021

YOLOv3 on a Laptop Example

May 25, 2021

Published in
Deep Sparse

YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance

Use CPUs to decrease costs and increase deployment flexibility while still achieving GPU-class performance.

Apr 1, 2021

Comparison of the real-time performance of YOLOv3 (batch size 1) for different CPU implementations to common GPU benchmarks.

Apr 1, 2021

Published in
Deep Sparse

Delivering GPU-Class Performance on CPUs: How Neural Magic’s Deep Sparse Technology Works

While mapping the neural connections in the brain at MIT, Neural Magic’s founders Nir Shavit and Alexander Matveev were frustrated with…

Mar 22, 2021

Delivering GPU-Class Performance on CPUs: How Neural Magic’s Deep Sparse Technology Works

Mar 22, 2021

Published in
Deep Sparse

Sparsifying for Better ResNet-50 Performance on CPUs

In this post, we elaborate on how we measured, on commodity cloud hardware, the throughput and latency of five ResNet-50 v1 models…

Mar 11, 2021

ResNet-50 Throughput

Mar 11, 2021

Neural Magic

Neural Magic

Optimize your DL models with ease. Run on CPUs at GPU speeds. The future of #deeplearning is sparse.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech