Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Releases: neuralmagic/deepsparse

DeepSparse v1.0.0

01 Jul 16:34
Compare
Choose a tag to compare

New Features:

  • Support added for running multiple models with the same engine when using the Elastic Scheduler.
  • When using the Elastic Scheduler, the caller can now use the num_streams argument to tune the number of requests that are processed in parallel.
  • Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
  • Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
  • AWS SageMaker example created.

Changes:

  • Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Performance:

  • Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

Resolved Issues:

  • When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
  • Assertion error addressed for Reduce operations where the reduction axis is of length 1.
  • Rare assertion failure addressed related to Tensor Columns.
  • When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

Known Issues:

  • In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
  • The engine will crash with an assertion failure when setting the num_streams parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
  • In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.

DeepSparse v0.12.2 Patch Release

02 Jun 14:38
73ffe09
Compare
Choose a tag to compare

This is a patch release for 0.12.0 that contains the following changes:

  • Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

DeepSparse v0.12.1 Patch Release

05 May 18:25
b030881
Compare
Choose a tag to compare

This is a patch release for 0.12.0 that contains the following changes:

  • Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
  • DeepSparse Server now exposes proper routes for SageMaker.
  • Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

DeepSparse v0.12.0

22 Apr 13:44
01a427a
Compare
Choose a tag to compare

New Features:

Documentation:

Changes:

Performance:

  • Speedup for large batch sizes when using sync mode on AMD EPYC processors.
  • AVX2 improvements for
    • Up to 40% speedup out of the box for dense quantized models.
    • Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
  • Speedup from sparsity realized for ConvInteger operators.
  • Model compilation time decreased on systems with many cores.
  • Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
  • Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

Resolved Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
  • Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

  • When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

DeepSparse v0.11.2 Patch Release

23 Mar 19:57
4bfde08
Compare
Choose a tag to compare

This is a patch release for 0.11.0 that contains the following changes:

  • Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

DeepSparse v0.11.1 Patch Release

21 Mar 13:56
d16ca23
Compare
Choose a tag to compare

This is a patch release for 0.11.0 that contains the following changes:

  • When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).
  • The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
  • Fixed a performance regression on BERT batch size 1 sequence length 128 models.

DeepSparse v0.11.0

11 Mar 18:31
46810d4
Compare
Choose a tag to compare

New Features:

  • High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
  • CCX detection added to the DeepSparse Engine for AMD systems.
  • deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

  • FP32 sparse BERT models
  • batch size 1 networks
  • quantized sparse BERT models
  • Pooling operations

Resolved Issues:

  • When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
  • Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
  • PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

  • When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.

DeepSparse v0.10.0

03 Feb 16:40
b27fbda
Compare
Choose a tag to compare

New Features:

  • Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
  • NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
  • Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
  • deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
  • deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

  • More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
  • Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
  • Sparse quantized network performance improved on machines that do not support VNNI instructions.
  • Performance improved for dense BERT with large batch sizes.

Resolved Issues:

  • Possible crashes eliminated for:
    • Pooling operations with small image sizes
    • Rarely, networks containing convolution or GEMM operations
    • Some models with many residual connections

Known Issues:

  • None

DeepSparse v0.9.1 Patch Release

14 Dec 22:21
ed22c2c
Compare
Choose a tag to compare

This is a patch release for 0.9.0 that contains the following changes:

  1. YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
  2. GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
  3. Broadcasted inputs to elementwise operators no longer fail with an assertion error.
  4. Int64 multiplications no longer fail with an illegal instruction on AVX2.

DeepSparse v0.9.0

01 Dec 16:05
74558ca
Compare
Choose a tag to compare

New Features:

  • Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
  • Up-to-date version check implemented for DeepSparse.
  • YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

  • The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
  • Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
    • The softmax operator can now take advantage of tensor columns.
    • Inference batch sizes that are not divisible by 16 are now supported.
  • Various performance improvements made to:
    • certain networks, such as YOLOv5, on AVX2 systems.
    • dense convolutions on some AVX-512 systems.
  • API docs recompiled.

Resolved Issues:

  • In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

  • YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
  • In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.