Torch-TensorRT v2.4.0
C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12
Torch-TensorRT 2.4.0 targets PyTorch 2.4, CUDA 12.4 (builds for CUDA 11.8/12.1 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118 https://download.pytorch.org/whl/cu121) and TensorRT 10.1.
This version introduces official support for the C++ runtime on the Windows platform, though it is limited to the dynamo frontend, supporting both AOT and JIT workflows. Users can now utilize both Python and C++ runtimes on Windows. Additionally, this release expands support to include all Aten Core Operators, except torch.nonzero, and significantly increases dynamic shape support across more converters. Python 3.12 is supported for the first time in this release.
Full Windows Support
In this release we introduce both C++ and Python runtime support in Windows. Users can now directly optimize PyTorch models with TensorRT on Windows, with no code changes. C++ runtime is the default option and users can enable Python runtime by specifying use_python_runtime=True
import torch
import torch_tensorrt
import torchvision.models as models
model = models.resnet18(pretrained=True).eval().to("cuda")
input = torch.randn((1, 3, 224, 224)).to("cuda")
trt_mod = torch_tensorrt.compile(model, ir="dynamo", inputs=[input])
trt_mod(input)Enhanced Op support in Converters
Support for Converters is near 100% of core ATen. At this point fall back to PyTorch execution is either due to specific limitations of converters or some combination of user compiler settings (e.g. torch_executed_ops, dynamic shape). This release also expands the number of operators that support dynamic shape. dryrun will provide specific information on your model + settings support.
What's Changed
- fix: FakeTensors appearing in
get_attrcalls by @gs-olive in #2669 - feat: support adaptive_avg_pool1d dynamo converter by @zewenli98 in #2614
- fix: Add cmake missing source file ref for core_lowering.passes by @Arktische in #2672
- ci: Torch nightly version upgrade to
2.4.0by @gs-olive in #2704 - Add support for
aten.pixel_unshuffledynamo converter by @HolyWu in #2696 - feat: support aten.atan2 converter by @chohk88 in #2689
- feat: support aten.index_select converter by @chohk88 in #2710
- feat: support aten.isnan converter by @chohk88 in #2711
- feat: support adaptive avg pool 2d and 3d dynamo converters by @zewenli98 in #2632
- feat: support aten.expm1 converter by @chohk88 in #2714
- fix: Add dependencies to Docker container for
aptversioning TRT by @gs-olive in #2746 - fix: Missing parameters in compiler settings by @gs-olive in #2749
- fix: param bug in
test_binary_ops_atenby @zewenli98 in #2733 - aten::empty_like by @apbose in #2654
- empty_permute decomposition by @apbose in #2698
- Removing grid lowering by @apbose in #2686
- Selectively enable different frontends by @narendasan in #2693
- chore(deps): bump transformers from 4.33.2 to 4.36.0 in /tools/perf by @dependabot in #2555
- Fix upsample converter not properly registered by @HolyWu in #2683
- feat: TS Add converter support for aten::grid_sampler by @mfeliz-cruise in #2717
- fix: Bump
torchvisionversion by @gs-olive in #2770 - fix: convert_module_to_trt_engine by @zewenli98 in #2728
- chore: cherry pick of save API by @peri044 in #2719
- chore: Upgrade TensorRT version to TRT 10 EA (#2699) by @peri044 in #2774
- Fix minor grammatical corrections by @aakashapoorv in #2779
- feat: cherry-pick of Implement symbolic shape propagation, sym_size converter by @peri044 in #2751
- feat: cherry-pick of torch.compile dynamic shapes by @peri044 in #2750
- chore: bump deps for default workspace file by @narendasan in #2786
- fix: Point infra branch to main by @gs-olive in #2785
- "empty_like" decomposition test correction by @apbose in #2784
- chore: Bump versions by @narendasan in #2787
- fix: refactor layer norm converter with INormalization Layer by @zewenli98 in #2755
- TRT-10 GA Support for main branch by @zewenli98 in #2781
- chore(//tests): Update tests to use assertEqual by @narendasan in #2800
- feat: Add support for
is_causalargument in attention by @gs-olive in #2780 - feat: Adding support for native int64 by @narendasan in #2789
- chore: small mypy issue by @narendasan in #2803
- Rand converter - evaluator by @apbose in #2580
- cherry-pick: Python Runtime Windows Builds on TRT 10 (#2764) by @gs-olive in #2776
- feat: support 1d ITensor offsets for embedding_bag converter by @zewenli98 in #2677
- chore(deps): bump transformers from 4.36.0 to 4.38.0 in /tools/perf by @dependabot in #2766
- fix: a bug in func run_test_compare_tensor_attributes_only by @zewenli98 in #2809
- Fix ModuleNotFoundError in ptq by @HolyWu in #2814
- docs: Example on how to use custom kernels in Torch-TensorRT by @narendasan in #2812
- typo fix in doc on saving models by @laikhtewari in #2818
- chore: Remove CUDNN dependencies by @zewenli98 in #2804
- fix: bug in elementwise base for static inputs by @zewenli98 in #2819
- Use environment for docgen by @atalman in #2826
- tool: Opset coverage notebook by @narendasan in #2831
- ci: Add release flag for nightly build tag by @gs-olive in #2821
- [doc] Update options documentation for torch.compile by @lanluo-nvidia in #2834
- feat(//py/torch_tensorrt/dynamo): Support for BF16 by @narendasan in #2833
- feat: data parallel inference examples by @bowang007 in #2805
- fix: bugs in TRT 10 upgrade by @zewenli98 in #2832
- feat: support aten._cdist_forward converter by @chohk88 in #2726
- chore: cherry pick of #2805 by @bowang007 in #2851
- feat: Add support for multi-device safe mode in C++ by @gs-olive in #2824
- feat: support aten.log1p converter by @chohk88 in #2823
- feat: support aten.as_strided converter by @chohk88 in #2735
- fix: Fix deconv kernel channel num_output_maps where wts are ITensor by @andi4191 in #2678
- Aten scatter converter by @apbose in #2664
- fix user_guide and tutorial docs by @yoosful in #2854
- chore: Make from and to methods use the same TRT API by @narendasan in #2858
- add aten.topk implementation by @lanluo-nvidia in #2841
- feat: support aten.atan2.out converter by @chohk88 in #2829
- chore: update docker, refactor CI TRT dep to main by @peri044 in #2793
- feat: Cherry pick of Add validators for dynamic shapes in converter registration by @peri044 in #2849
- feat: support aten.diagonal converter by @chohk88 in #2856
- Remove ops from decompositions where converters exist by @HolyWu in #2681
- slice_scatter decomposition by @apbose in #2519
- select_scatter decomp by @apbose in #2515
- manylinux wheel file build update for TensorRT-10.0.1 by @lanluo-nvidia in #2868
- replace itemset due to numpy version 2.0 removed itemset api by @lanluo-nvidia in #2879
- chore: cherry-pick of DS feature by @peri044 in #2857
- feat: TS Add converter support for aten::flip by @mfeliz-cruise in #2722
- ptq test error correction by @apbose in #2860
- feat: Add dynamic shape support for sub by @keehyuna in #2888
- feat: dynamic shapes support for sqrt and copy by @chohk88 in #2889
- add dynamic shape support for aten.ops.gt and aten.ops.ge by @lanluo-nvidia in #2883
- chore: cherry-pick FP8 by @peri044 in #2892
- add dynamic shape support for sin/cos/cat by @lanluo-nvidia in #2887
- Cancel in-progress ci build when a new commit is pushed by @lanluo-nvidia in #2903
- readme by @laikhtewari in #2864
- Only trigger doc gen if it is not a pytorchbot commit by @lanluo-nvidia in #2909
- fix: Handle dynamic shapes in where ops by @keehyuna in #2853
- chore: Dynamic support for split (#2871) into main by @peri044 in #2914
- feat: C++ runtime on Windows by @HolyWu in #2806
- chore: cherry pick of #2709 by @peri044 in #2850
- Add dynamic shape support for layer_norm/native_group_norm/group_norm by @lanluo-nvidia in #2908
- feat: dynamic shapes support for neg ops by @keehyuna in #2878
- empty_stride decomposition by @apbose in #2859
- empty_memory_format evaluator by @apbose in #2745
- gather converter by @apbose in #2905
- feat: Win/Linux Dual Compatible
WORKSPACE+ Upgrade CUDA + Upgrade PyT by @gs-olive in #2907 - chore: add dynamic shapes section in the resnet tutorial by @peri044 in #2904
- fix: Remove build artifact by @gs-olive in #2924
- feat: Use a global timing cache and add a save option by @peri044 in #2898
- chore: fix ValueRanges computation in symbolic nodes by @peri044 in #2918
- scatter CI failures by @apbose in #2925
- chore: Update layer_norm converter to use INormalizationLayer by @mfeliz-cruise in #2509
- Add dynamic shape support for leaky_relu/elu/hard_sigmoid/softplus by @lanluo-nvidia in #2927
- feat: Improve logging throughout the Dynamo path by @gs-olive in #2405
- fix unsqueeze cannot work on more than 1 dynamic_shape dimensions by @lanluo-nvidia in #2933
- feat: support
native_dropoutdynamo converter by @zewenli98 in #2931 - feat: support aten index_put converter for accumulate=False by @chohk88 in #2880
- feat: support aten.resize_ converter by @chohk88 in #2874
- fix the docker build failure on main by @lanluo-nvidia in #2942
- feat: Add Branches to Docker Build File by @gs-olive in #2935
- add dynamic shape support for amax/amin/max/min/prod/sum by @lanluo-nvidia in #2943
- fix: bug in vgg16_fp8_ptq example by @zewenli98 in #2950
- Fixed layernorm when weight and bias is None in Stable Diffusion 3 by @cehongwang in #2936
- chore: dynamic shape support for rsqrt/erf ops by @keehyuna in #2929
- feat: dynamic shape support for tan, sinh, cosh, asin and acos by @chohk88 in #2941
- fix: Repair integer inputs in dynamic shape cases by @gs-olive in #2876
- Update PYTORCH to 2.4 by @lanluo-nvidia in #2953
- Automate release artifacts build: usage pytorch cxx11 builder base image by @lanluo-nvidia in #2988
- chore: cherrypick of #2855 by @zewenli98 in #3027
- cherry pick 2740 to release2.4 branch. by @lanluo-nvidia in #3033
- cherry pick from 3008 to release/2.4 by @lanluo-nvidia in #3035
- assertEquals is deprecated in TestCase in Python 3.12 by @lanluo-nvidia in #3038
- fix the artifacts name issue by @lanluo-nvidia in #3041
New Contributors
- @Arktische made their first contribution in #2672
- @aakashapoorv made their first contribution in #2779
- @atalman made their first contribution in #2826
- @yoosful made their first contribution in #2854
Full Changelog: v2.3.0...v2.4.0