Releases · neuralmagic/deepsparse

01 Jul 16:34

jeanniefinks

v1.0.0

fb673ec

DeepSparse v1.0.0

New Features:

Support added for running multiple models with the same engine when using the Elastic Scheduler.
When using the Elastic Scheduler, the caller can now use the num_streams argument to tune the number of requests that are processed in parallel.
Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
AWS SageMaker example created.

Changes:

Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Performance:

Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

Resolved Issues:

When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
Assertion error addressed for Reduce operations where the reduction axis is of length 1.
Rare assertion failure addressed related to Tensor Columns.
When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

Known Issues:

In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
The engine will crash with an assertion failure when setting the num_streams parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.

Assets 8

02 Jun 14:38

jeanniefinks

v0.12.2

73ffe09

DeepSparse v0.12.2 Patch Release

This is a patch release for 0.12.0 that contains the following changes:

Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

Assets 8

05 May 18:25

jeanniefinks

v0.12.1

b030881

DeepSparse v0.12.1 Patch Release

This is a patch release for 0.12.0 that contains the following changes:

Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
DeepSparse Server now exposes proper routes for SageMaker.
Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

Assets 8

22 Apr 13:44

jeanniefinks

v0.12.0

01a427a

DeepSparse v0.12.0

New Features:

Documentation:

SparseServer.UI: a Streamlit app for deploying the DeepSparse Server for exploring the inference performance of BERT on the question answering task.
DeepSparse Server README: deepsparse.server capabilities, including single model and multi-model inferencing.
Twitter NLP Inference Examples added.

Changes:

Performance:

Speedup for large batch sizes when using sync mode on AMD EPYC processors.
AVX2 improvements for
- Up to 40% speedup out of the box for dense quantized models.
- Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
Speedup from sparsity realized for ConvInteger operators.
Model compilation time decreased on systems with many cores.
Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

DeepSparse README: references to deepsparse.server, deepsparse.benchmark, and Transformer pipelines.
DeepSparse Benchmark README: highlights of deepsparse.benchmark CLI command.
Transformers 🤗 Inference Pipelines: examples included on how to run inference via Python for several NLP tasks.

Resolved Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

Assets 8

23 Mar 19:57

jeanniefinks

v0.11.2

4bfde08

DeepSparse v0.11.2 Patch Release

This is a patch release for 0.11.0 that contains the following changes:

Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

Assets 8

21 Mar 13:56

jeanniefinks

v0.11.1

d16ca23

DeepSparse v0.11.1 Patch Release

This is a patch release for 0.11.0 that contains the following changes:

When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).
The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
Fixed a performance regression on BERT batch size 1 sequence length 128 models.

Assets 8

11 Mar 18:31

jeanniefinks

v0.11.0

46810d4

DeepSparse v0.11.0

New Features:

High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
CCX detection added to the DeepSparse Engine for AMD systems.
deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

FP32 sparse BERT models
batch size 1 networks
quantized sparse BERT models
Pooling operations

Resolved Issues:

When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.

Assets 8

03 Feb 16:40

jeanniefinks

v0.10.0

b27fbda

DeepSparse v0.10.0

New Features:

Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
Sparse quantized network performance improved on machines that do not support VNNI instructions.
Performance improved for dense BERT with large batch sizes.

Resolved Issues:

Possible crashes eliminated for:
- Pooling operations with small image sizes
- Rarely, networks containing convolution or GEMM operations
- Some models with many residual connections

Known Issues:

None

Assets 8

14 Dec 22:21

jeanniefinks

v0.9.1

ed22c2c

DeepSparse v0.9.1 Patch Release

This is a patch release for 0.9.0 that contains the following changes:

YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
Broadcasted inputs to elementwise operators no longer fail with an assertion error.
Int64 multiplications no longer fail with an illegal instruction on AVX2.

Assets 8

01 Dec 16:05

jeanniefinks

v0.9.0

74558ca

DeepSparse v0.9.0

New Features:

Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
Up-to-date version check implemented for DeepSparse.
YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
- The softmax operator can now take advantage of tensor columns.
- Inference batch sizes that are not divisible by 16 are now supported.
Various performance improvements made to:
- certain networks, such as YOLOv5, on AVX2 systems.
- dense convolutions on some AVX-512 systems.
API docs recompiled.

Resolved Issues:

In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Features:

Changes:

Performance:

Resolved Issues:

Known Issues:

Uh oh!

Uh oh!

Uh oh!

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

Known Issues:

Uh oh!

Uh oh!

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

Uh oh!

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

Releases: neuralmagic/deepsparse

DeepSparse v1.0.0

New Features:

Changes:

Performance:

Resolved Issues:

Known Issues:

Uh oh!

DeepSparse v0.12.2 Patch Release

Uh oh!

DeepSparse v0.12.1 Patch Release

Uh oh!

DeepSparse v0.12.0

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

DeepSparse v0.11.2 Patch Release

Known Issues:

Uh oh!

DeepSparse v0.11.1 Patch Release

Uh oh!

DeepSparse v0.11.0

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

DeepSparse v0.10.0

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!

DeepSparse v0.9.1 Patch Release

Uh oh!

DeepSparse v0.9.0

New Features:

Changes:

Resolved Issues:

Known Issues:

Uh oh!