Skip to content

Commit 69fd560

Browse files
authored
[ROCm] update release notes (#82)
1 parent d286a35 commit 69fd560

10 files changed

+65
-104
lines changed

2.9.0/done/result_jit.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@ The categories below are as follows:
4141
- [BE][8/16] fix typos in torch/ (torch/csrc/jit/) ([#156318](https://github.com/pytorch/pytorch/pull/156318))
4242
- [BE][10/16] fix typos in torch/ (torch/csrc/jit/) ([#156320](https://github.com/pytorch/pytorch/pull/156320))
4343
- [nativert] Add OSS version of ModelRunner ([#159268](https://github.com/pytorch/pytorch/pull/159268))
44-
- [ROCm] Fix resource_strings.h ([#159996](https://github.com/pytorch/pytorch/pull/159996))
4544
- added stubs for jit tree views ([#156504](https://github.com/pytorch/pytorch/pull/156504))
4645
- Remove ts to export retracer ([#156857](https://github.com/pytorch/pytorch/pull/156857))
4746
- [BE][12/16] fix typos in torch/ ([#156602](https://github.com/pytorch/pytorch/pull/156602))

2.9.0/done/result_rocm.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
2+
# Release Notes worksheet rocm
3+
4+
The main goal of this process is to rephrase all the commit messages below to make them **clear and easy to read** by the end user. You should follow the following instructions to do so:
5+
6+
* **Please clean up and format commit titles to be readable by the general PyTorch user.** Make sure you're [following the guidance here](https://docs.google.com/document/d/14OmgGBr1w6gl1VO47GGGdwrIaUNr92DFhQbY_NEk8mQ/edit)! Your resulting notes must be consistent and easy to read.
7+
* Please sort commits into the following categories (you should not rename the categories!), I tried to pre-sort these to ease your work, feel free to move commits around if the current categorization is not good.
8+
* Anything that is not public facing needs to be removed.
9+
* If anything is miscategorized/belongs to another domain, move it to `miscategorized.md`.
10+
* Please scan through `miscategorized.md` and handle any commits that belong within your domain according to these instructions.
11+
* We place a lot of emphasis on the “BC-breaking” and “deprecation” sections. Those should be where the most effort goes in. The “improvements” and “bug fixes” for Python API should be nice as well.
12+
* Once you are finished, move this very file from `todo/` to `done/` and submit a pull request.
13+
14+
The categories below are as follows:
15+
16+
* BC breaking: All commits that are BC-breaking. These are the most important commits. If any pre-sorted commit is actually BC-breaking, do move it to this section. Each commit should contain a paragraph explaining the rational behind the change as well as an example for how to update user code [BC-Guidelines](https://docs.google.com/document/d/14OmgGBr1w6gl1VO47GGGdwrIaUNr92DFhQbY_NEk8mQ/edit#heading=h.a9htwgvvec1m).
17+
* Deprecations: All commits introducing deprecation. Each commit should include a small example explaining what should be done to update user code.
18+
* new_features: All commits introducing a new feature (new functions, new submodule, new supported platform etc)
19+
* improvements: All commits providing improvements to existing feature should be here (new backend for a function, new argument, better numerical stability)
20+
* bug fixes: All commits that fix bugs and behaviors that do not match the documentation
21+
* performance: All commits that are added mainly for performance (we separate this from improvements above to make it easier for users to look for it)
22+
* documentation: All commits that add/update documentation
23+
* Developers: All commits that are not end-user facing but still impact people that compile from source, develop into pytorch, extend pytorch, etc
24+
* not user facing: All commits that are not public end-user facing and hence should be dropped from the release notes
25+
26+
## rocm
27+
### bc breaking
28+
### deprecation
29+
### new features
30+
- OCP Micro-scaling Format (mx-fp8/mx-fp4) Support ([#151360](https://github.com/pytorch/pytorch/pull/151360))
31+
- Support experimental CU carveout torch._C._set_sm_carveout_experimental() ([#149466](https://github.com/pytorch/pytorch/pull/149466))
32+
- Add FP8 rowwise support to _scaled_grouped_mm ([#159075](https://github.com/pytorch/pytorch/pull/159075))
33+
### improvements
34+
- Additional hipify mappings ([#158056](https://github.com/pytorch/pytorch/pull/158056), [#158352](https://github.com/pytorch/pytorch/pull/158352), [#161992](https://github.com/pytorch/pytorch/pull/161992))
35+
- composable_kernel (CK) backend user interface refactored to improve user experience ([#152951](https://github.com/pytorch/pytorch/pull/152951))
36+
- Allow use of rocSOLVER for Cholesky inversion. ([#157154](https://github.com/pytorch/pytorch/pull/157154))
37+
- AOT Inductor enable gfx950 for max autotune using CK ([#159195](https://github.com/pytorch/pytorch/pull/159195))
38+
- Add flag torch.backends.miopen.immediate to toggle MIOpen Immediate Mode instead of relying on deterministic=True + benchmark=False ([#158951](https://github.com/pytorch/pytorch/pull/158951))
39+
- MIOpen convolutions no longer call reshape_ or unexpectedly change memory formats ([#161687](https://github.com/pytorch/pytorch/pull/161687))
40+
### bug fixes
41+
- inductor with cudagraph trees hip:0 device error is resolved ([#161221](https://github.com/pytorch/pytorch/pull/161221))
42+
- ROCm 7.0 BC-breaking change to amdclang compiler `warpSize` no longer constexpr ([#156979](https://github.com/pytorch/pytorch/pull/156979))
43+
- ROCm 7.0 BC-breaking change to hiprtc needed fix resource_strings.h and jit_utils.cpp ([#159292](https://github.com/pytorch/pytorch/pull/159292), [#159996](https://github.com/pytorch/pytorch/pull/159996))
44+
- On Windows fix some build failures and support some BLAS calls ([#161981](https://github.com/pytorch/pytorch/pull/161981))
45+
- On Windows fix undefined symbol linker error after exposing MIOpen symbols ([#156479](https://github.com/pytorch/pytorch/pull/156479))
46+
- On Windows fix finding ROCm/HIP version ([#156486](https://github.com/pytorch/pytorch/pull/156486))
47+
- On Windows fix LoadHIP handling of environment variable paths on Windows. ([#159080](https://github.com/pytorch/pytorch/pull/159080))
48+
- On Windows add hipcc compatibility flags to cpp_extension.py. ([#159790](https://github.com/pytorch/pytorch/pull/159790))
49+
- Symmetric memory set handle type for ROCm ([#161741](https://github.com/pytorch/pytorch/pull/161741))
50+
- In SDPA via AOTriton, logsumexp needs scaling back to natural base. ([#156903](https://github.com/pytorch/pytorch/pull/156903))
51+
- Check stream graph capture status in memcpy_and_sync inline function ([#158165](https://github.com/pytorch/pytorch/pull/158165))
52+
### performance
53+
- SDPA now uses AOTriton to 0.11b ([#161754](https://github.com/pytorch/pytorch/pull/161754))
54+
- hipblaslt is used by default on gfx908 for ROCm >= 6.3 ([#159092](https://github.com/pytorch/pytorch/pull/159092))
55+
- Enable miopen channels last 3d for conv and batchnorm ([#160529](https://github.com/pytorch/pytorch/pull/160529))
56+
- Remove extra transposes in NHWC convolutions on MIOpen ([#160435](https://github.com/pytorch/pytorch/pull/160435))
57+
- Remove extra sync in tensor.item() ([#158486](https://github.com/pytorch/pytorch/pull/158486))
58+
- Elementwise and reduction kernel perf improvements ([#159430](https://github.com/pytorch/pytorch/pull/159430), [#159652](https://github.com/pytorch/pytorch/pull/159652), [#160444](https://github.com/pytorch/pytorch/pull/160444), [#160466](https://github.com/pytorch/pytorch/pull/160466), [#161054](https://github.com/pytorch/pytorch/pull/161054), [#161180](https://github.com/pytorch/pytorch/pull/161180), [#161181](https://github.com/pytorch/pytorch/pull/161181))
59+
- Symmetric Memory Performance improvements for two-shot allreduce ([#156746](https://github.com/pytorch/pytorch/pull/156746))
60+
- Enable build of fbgemm_gpu genai sources for grouped gemm support. ([#160676](https://github.com/pytorch/pytorch/pull/160676))
61+
### docs
62+
### devs
63+
### Untopiced
64+
### not user facing
65+
### security

2.9.0/todo/result_distributed.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,6 @@ The categories below are as follows:
5252
- [tp] improve parallelize_module API to support more cases ([#157182](https://github.com/pytorch/pytorch/pull/157182))
5353
- Script for consolidation of sharded safetensor files ([#154743](https://github.com/pytorch/pytorch/pull/154743))
5454
- HF - consolidate shards of safetensors files to full tensors in finish step ([#156705](https://github.com/pytorch/pytorch/pull/156705))
55-
- [ROCm][SymmetricMemory] Performance improvements for two-shot allreduce ([#156746](https://github.com/pytorch/pytorch/pull/156746))
56-
- [ROCm] Remove use of `warpsize` on host-side compilation ([#156979](https://github.com/pytorch/pytorch/pull/156979))
5755
- [SymmMem] Add NVSHMEM_CHECK macro ([#157174](https://github.com/pytorch/pytorch/pull/157174))
5856
- [PT] support custom all_gather and reduce_scatter comms ([#155189](https://github.com/pytorch/pytorch/pull/155189))
5957
- Fix typo: 'Intializes' → 'Initializes' in _distributed_c10d.pyi docst… ([#157455](https://github.com/pytorch/pytorch/pull/157455))
@@ -201,8 +199,6 @@ The categories below are as follows:
201199
- [SymmMem] Increase minimum nthreads to cover sync needs in NVL72 ([#161983](https://github.com/pytorch/pytorch/pull/161983))
202200
- [SymmMem] Use non-blocking version of getmem ([#162006](https://github.com/pytorch/pytorch/pull/162006))
203201
- [c10d] Lessen density of barrier warning ([#162015](https://github.com/pytorch/pytorch/pull/162015))
204-
- [ROCm/Windows] Fix build failures and support some BLAS calls ([#161981](https://github.com/pytorch/pytorch/pull/161981))
205-
- [Symmetric memory] set handle type for ROCm ([#161741](https://github.com/pytorch/pytorch/pull/161741))
206202
- [PP] Add profiling to schedule execution ([#160753](https://github.com/pytorch/pytorch/pull/160753))
207203
- [DCP][HuggingFace] Add Support for dequantization of SafeTensors checkpoints ([#160682](https://github.com/pytorch/pytorch/pull/160682))
208204
- Don't require FakeStore to be passed into fake backend ([#162164](https://github.com/pytorch/pytorch/pull/162164))

2.9.0/todo/result_inductor.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@ The categories below are as follows:
6262
- Add inputs and outputs in Triton Kernel FX Graph segment ([#158174](https://github.com/pytorch/pytorch/pull/158174))
6363
- [Optimus] Support decompose mm with dynamic shapes ([#158821](https://github.com/pytorch/pytorch/pull/158821))
6464
- Enable dynamic shapes for foreach operations by default ([#158985](https://github.com/pytorch/pytorch/pull/158985))
65-
- [ROCm][CK][Inductor] enable gfx950 for max autotune with CK ([#159195](https://github.com/pytorch/pytorch/pull/159195))
6665
- [cutlass] rename EVT args within kernels for code caching ([#159243](https://github.com/pytorch/pytorch/pull/159243))
6766
- All custom operators go through Inductor's graph.call_function ([#159174](https://github.com/pytorch/pytorch/pull/159174))
6867
- [AOTInductor] Add test for enabling CUDACachingAllocator for AOTInductor's Weight ([#159279](https://github.com/pytorch/pytorch/pull/159279))
@@ -161,7 +160,6 @@ The categories below are as follows:
161160
- Support caching if joint_custom_pre_pass/joint_custom_post_pass implement the proper interface ([#157990](https://github.com/pytorch/pytorch/pull/157990))
162161
- Fix is_unaligned usage of statically_known_true ([#157845](https://github.com/pytorch/pytorch/pull/157845))
163162
- Return false in statically_known_multiple_of if numerator has more than 20 unique symbols ([#157855](https://github.com/pytorch/pytorch/pull/157855))
164-
- [ROCm][Inductor][CK] update API for gemm-multiD change ([#156122](https://github.com/pytorch/pytorch/pull/156122))
165163
- Add size_hints to cache key ([#158026](https://github.com/pytorch/pytorch/pull/158026))
166164
- [Bugfix][Inductor] Fix dependency list merged incorrectly for a custom op with multiple mutated inputs and None return type. ([#157133](https://github.com/pytorch/pytorch/pull/157133))
167165
- [aot] add format_consts_to_cpp function for further development. ([#157608](https://github.com/pytorch/pytorch/pull/157608))
@@ -294,7 +292,6 @@ The categories below are as follows:
294292
- Add kernel stack traces tlparse dump (#160608) ([#160779](https://github.com/pytorch/pytorch/pull/160779))
295293
- [MTIA] add correct name for CFF in tlparse ([#160599](https://github.com/pytorch/pytorch/pull/160599))
296294
- Add cutedsl template support to compile ([#160108](https://github.com/pytorch/pytorch/pull/160108))
297-
- [ROCm][inductor][dashboard] Add GPT2ForSequenceClassification to use_larger_multiplier_for_smaller_tensor list ([#160001](https://github.com/pytorch/pytorch/pull/160001))
298295
- Add signpost to provenance tracking error ([#160755](https://github.com/pytorch/pytorch/pull/160755))
299296
- [cpp][inductor] Fix crash on bmm when input is used twice. ([#160087](https://github.com/pytorch/pytorch/pull/160087))
300297
- Fix duplicated kernel name in kernel stack trace tracking ([#160905](https://github.com/pytorch/pytorch/pull/160905))

2.9.0/todo/result_nn_frontend.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@ The categories below are as follows:
4141
- Support deterministic upsample trilinear backward ([#154239](https://github.com/pytorch/pytorch/pull/154239))
4242
- Add device check in `mse_loss` ([#155089](https://github.com/pytorch/pytorch/pull/155089))
4343
- Fused RMSNorm Housekeeping ([#159317](https://github.com/pytorch/pytorch/pull/159317))
44-
- [ROCm] revamp miopen integration ([#161687](https://github.com/pytorch/pytorch/pull/161687))
4544
- NLLLoss: validate target is 0D when input is 1D ([#161412](https://github.com/pytorch/pytorch/pull/161412))
4645
### not user facing
4746
- add test_batchnorn_2D and 3D tests ([#156498](https://github.com/pytorch/pytorch/pull/156498))

2.9.0/todo/result_quantization.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,5 @@ The categories below are as follows:
8888
- Fix qembeddingbag_byte_prepack_meta to use sym_sizes ([#159985](https://github.com/pytorch/pytorch/pull/159985))
8989
- Using std::make_unique<T>() instead of unique<T>(new T()) ([#160723](https://github.com/pytorch/pytorch/pull/160723))
9090
- Using std::vector or c10::SmallVector instead of CArray ([#160959](https://github.com/pytorch/pytorch/pull/160959))
91-
- [ROCm] fix numpy version detection and adjust fudge_factors for MI355 ([#161429](https://github.com/pytorch/pytorch/pull/161429))
9291
- Enable more nightly tests on s390x ([#160893](https://github.com/pytorch/pytorch/pull/160893))
9392
### security

2.9.0/todo/result_releng.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@ The categories below are as follows:
5353
- [BE] bump test dependency `z3-solver` to drop using deprecated `pkg_resources` ([#158905](https://github.com/pytorch/pytorch/pull/158905))
5454
- Enable MI355X PyTorch CI testing. ([#158889](https://github.com/pytorch/pytorch/pull/158889))
5555
- Setup TorchBench in Docker (d72ebefe3fa)
56-
- [ROCm] Update jit_utils.cpp trait modification based on HIP version. ([#159292](https://github.com/pytorch/pytorch/pull/159292))
5756
- Enable sample nightly PT2 benchmark on B200 ([#158011](https://github.com/pytorch/pytorch/pull/158011))
5857
- [Take 2] Setup TorchBench in Docker ([#159300](https://github.com/pytorch/pytorch/pull/159300))
5958
- [BE]: ruff PLC0207 - use maxsplit kwarg ([#160107](https://github.com/pytorch/pytorch/pull/160107))
@@ -127,7 +126,6 @@ The categories below are as follows:
127126
- [audio hash update] update the pinned audio hash ([#158402](https://github.com/pytorch/pytorch/pull/158402))
128127
- [BE] Get rid of final mentions of BUILD_SPLIT_CUDA ([#158453](https://github.com/pytorch/pytorch/pull/158453))
129128
- ci: Update lint workflow to only run on changed files for PRs ([#158518](https://github.com/pytorch/pytorch/pull/158518))
130-
- [ROCm][CI] Last known good HIP patch ([#158596](https://github.com/pytorch/pytorch/pull/158596))
131129
- Fix s390x CI: ensure that all python dependencies are installed when … ([#158552](https://github.com/pytorch/pytorch/pull/158552))
132130
- Use linux.12xlarge.memory to build for H100/sm_90 ([#158598](https://github.com/pytorch/pytorch/pull/158598))
133131
- setup pinned commit for vllm in pytorch ci ([#158591](https://github.com/pytorch/pytorch/pull/158591))

2.9.0/todo/result_rocm.md

Lines changed: 0 additions & 45 deletions
This file was deleted.

0 commit comments

Comments
 (0)