[pull] main from llvm:main #5634

pull · 2025-09-12T01:14:13Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

This PR refactors alignment validation in MLIR's MemRef and SPIRV dialects: - Use `IntValidAlignment` for consistent type safety across MemRef and SPIRV dialects - Eliminate duplicate validation logic in `MemRefOps.cpp` - Adjust error messages in `invalid.mlir` to match improved validation This is the first of two PRs addressing issue #155677.

The refactoring lead to an additional data transfer. This changes the assumed transfers in the check-strings to work with that changed behavior.

Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.

Extension of #158152 for MLIR. --------- Signed-off-by: Sarnie, Nick <[email protected]>

…59103) PR #159045 made the constructor constexpr, which allows `-Wunused-variable` to trigger. However, we don't really care if a statistic is unused if `LLVM_ENABLE_STATS` is 0.

…rn a string (NFC) (#159089) These functions will see more uses in a future patch. This also resolves a FIXME.

…onDAG (#155256) Based on comment of #153600 (comment), Add a helper function isTailCall for getting libcall in SelectionDAG.

)

…et (#158597) Fixes #157252. Peephole optimization tends to fold: ``` add %gpr1, %stack, 0 subs %gpr2, %gpr1, 0 ``` to ``` adds %gpr2, %stack, 0 ``` This patch undoes the fold in `rewriteAArch64FrameIndex` to process `adds` on the stack object.

Fixes an issue in commit 3946c50, PR #135349. The DebugSSAUpdater class performs raw pointer allocations. It frees these properly in reset(), but does not do so in its destructor - as an immediate fix, this patch adds a destructor which frees the allocations correctly. I'll be merging this immediately to fix the issue, but will be open to post-commit review and/or producing a better fix in a follow-up commit.

The S_NOP instruction has an immediate operand which is one less than the number of cycles to delay for. The maximum value that may be encoded in this field was increased in GFX8 and again in GFX12.

…58026) IR has the `contract` to indicate that contraction is allowed. Testing shouldn't rely on global flag to perform contraction. This is a prerequisite before making backends rely only on the IR to perform contraction. See more here: https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract/80909/5

…oROCDLOps.cpp (NFC)

…lf (#158832) gpu.subgroup_mma_elementwise supports mulf op type. Add conversion for it.

) After replacing VGPR MFMAs with the AGPR form, we've alleviated VGPR pressure which may have triggered spills during allocation. Identify these spill slots, and try to reassign them to newly freed VGPRs, and replace the spill instructions with copies. Fixes #154260

…2D.cpp (NFC)

This PR is a part of the effort to make the VFS used in the compiler more explicit and consistent. Instead of creating the VFS deep within the compiler (in `CompilerInstance::createFileManager()`), clients are now required to explicitly call `CompilerInstance::createVirtualFileSystem()` and provide the base VFS from the outside. This PR also helps in breaking up the dependency cycle where creating a properly configured `DiagnosticsEngine` requires a properly configured VFS, but creating properly configuring a VFS requires the `DiagnosticsEngine`. Both `CompilerInstance::create{FileManager,Diagnostics}()` now just use the VFS already in `CompilerInstance` instead of taking one as a parameter, making the VFS consistent across the instance sub-object.

…and` inst's (#158097) Resolves #157371 We can eliminate one of the `fcmp` when we have two same `olt` or `ogt` instructions matched in `or`/`and` simplification.

This will be used to build hexagon-builtins for baremetal. Signed-off-by: Kushal Pal <[email protected]>

When the ARRAY has polymorphic type, its element type may not match the element type of BOUNDARY. Fixes #158382.

…), C)) (#155141) Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is: ``` define i64 @test_smin_neg_one(i64 %a) { %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1) %retval.0 = xor i64 %1, -1 ret i64 %retval.0 } ``` GCC generates: ``` cmp x0, 0 csinv x0, xzr, x0, ge ret ``` Clang generates: ``` cmn x0, #1 csinv x8, x0, xzr, lt mvn x0, x8 ret ``` Clang keeps flipping x0 through x8 unnecessarily. So I added the following folds to DAGCombiner: fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0 fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0 alive2: https://alive2.llvm.org/ce/z/gffoir --------- Co-authored-by: Yui5427 <[email protected]> Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]>

…#159099)

…159108) SmallSetVector is too optimistic, there are usually more than 16 unique decoders and predicates. Modernize `typedef` to `using` while here.

Elide bitcast combine to build_vector in case of i64 immediate that can be materialized through 64b mov

Ensure alias analyses mask out `errnomem` location, refining the resulting modref info, when the given access/location does not alias errno. This may occur either when TBAA proves there is no alias with errno (e.g., float TBAA for the same root would be disjoint with the int-only compatible TBAA node for errno); or if the memory access size is larger than the integer size, or when the underlying object is a potentially-escaping alloca. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.

…sdot (#158310) This allows dot products with scalable 8xi16 vectors (and fixed-length vectors which are converted into a scalable vector) accumulating into a 4xi32 vector to lower into a single instruction (`udot`/`sdot`), rather than a sequence of `umlalb`s and `umlalt`s`.

There is a number of attributes that is expected to be set on functions by default. This patch implements setting more such attributes on the FMV resolver functions generated by Clang. On AArch64, this makes the resolver functions use the default PAC and BTI hardening settings.

…59312) This popped up during our internal integrates of upstream changes. It started happening after ba9d1c4, which started using `TemplateSpecializationType` in this place and the code was not prepared to handle it.

…gs. NFC. (#159327) This avoids the following kind of warning with GCC: warning: control reaches end of non-void function [-Wreturn-type]

#159337) This fixes the following warning when compiled with GCC: ../lib/Target/AArch64/AArch64ISelLowering.cpp: In function ‘bool shouldLowerTailCallStackArg(const llvm::MachineFunction&, const llvm::CCValAssign&, llvm::SDValue, llvm::ISD::ArgFlagsTy, int)’: ../lib/Target/AArch64/AArch64ISelLowering.cpp:9310: warning: comparison of integer expressions of different signedness: ‘uint64_t’ {aka ‘long unsigned int’} and ‘int64_t’ {aka ‘long int’} [-Wsign-compare] 9310 | if (SizeInBits / 8 != MFI.getObjectSize(FI)) |

This avoids the following warnings from Clang: ../../lldb/source/Host/windows/Host.cpp:324:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default] 324 | default: | ^ ../../lldb/source/Host/common/File.cpp:662:26: warning: cast from 'const void *' to 'char *' drops const qualifier [-Wcast-qual] 662 | .write((char *)buf, num_bytes); | ^

…ngs. NFC. (#159330) This avoids the following warnings: ../../clang/lib/AST/ExprConstant.cpp: In member function ‘bool {anonymous}::IntExprEvaluator::VisitBuiltinCallExpr(const clang::CallExpr*, unsigned int)’: ../../clang/lib/AST/ExprConstant.cpp:14104:3: warning: this statement may fall through [-Wimplicit-fallthrough=] 14104 | } | ^ ../../clang/lib/AST/ExprConstant.cpp:14105:3: note: here 14105 | case Builtin::BIstrlen: | ^~~~ ../../clang/lib/Driver/ToolChains/CommonArgs.cpp: In function ‘std::string clang::driver::tools::complexRangeKindToStr(clang::LangOptionsBase::ComplexRangeKind ’: ../../clang/lib/Driver/ToolChains/CommonArgs.cpp:3523:1: warning: control reaches end of non-void function [-Wreturn-type] 3523 | } | ^

This avoids the following kind of warning with GCC: ../tools/llvm-lipo/llvm-lipo.cpp: In function ‘void printInfo(llvm::LLVMContext&, llvm::ArrayRef<llvm::object::OwningBinary<llvm::object::Binary> >)’: ../tools/llvm-lipo/llvm-lipo.cpp:464:34: warning: suggest parentheses around ‘& ’ within ‘||’ [-Wparentheses] 464 | Binary->isArchive() && "expected MachO binary"); | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~

…154410) Co-authored-by: Jay Foad <[email protected]> Co-authored-by: Jay Foad <[email protected]>

While debugging #145206 I found that a possible cause for the problem is the call to printf, which is variadic. In a musl environment VarArgs are treated like *non* VarArgs. The handling of this special case was bypassed by the commit a4f8551. The reason is that the arguement `TreatAsVarArg` is only set to `true` in an *non* musl env. `TreatAsVarArg` determines the result of `isVarArg()`. The `CCIfArgVarArg` class only checks each individual variable, but not whether `isVarArg()` is true. Without the special case, the unnamed arguments are always passed on the stack, as is standard calling convention. But with musl also unnamed arguments can be passed in registers. Possibly, this fixes #145206.

…subprogram DIEs (#159104) With this change, construction of abstract subprogram DIEs is split in two stages/functions: creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population with children (in DwarfCompileUnit::constructAbstractSubprogramScopeDIE). With that, abstract subprograms can be created/referenced from DwarfDebug::beginModule, which should solve the issue with static local variables DIE creation of inlined functons with optimized-out definitions. It fixes #29985. LexicalScopes class now stores mapping from DISubprograms to their corresponding llvm::Function's. It is supposed to be built before processing of each function (so, now LexicalScopes class has a method for "module initialization" alongside the method for "function initialization"). It is used by DwarfCompileUnit to determine whether a DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is invoked. DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can create an abstract or a concrete DIE for a subprogram. It accepts llvm::Function* argument to determine whether a concrete DIE must be created. This is a temporary fix for #29985. Ideally, it will be fixed by moving global variables and types emission to DwarfDebug::endModule (https://reviews.llvm.org/D144007, https://reviews.llvm.org/D144005). Some code proposed by Ellis Hoag <[email protected]> in #90523 was taken for this commit.

Recognize an MTE tag fault Mach exception. A tag fault is an error reported by Arm's Memory Tagging Extension (MTE) when a memory access attempts to use a pointer with a tag that doesn't match the tag stored with the memory. LLDB will print the tag and address to make the issue easier to diagnose. This was hand tested by debugging an MTE enabled binary on an iPhone 17 running iOS 26. rdar://113575216

…144744) Fix `llvm::concat_iterator` for the case of `ValueT` being a pointer to a common base class to which the result of dereferencing any iterator in `ItersT` can be casted to.

…6 on aarch64. (#159417)

There is no RISCV isel for bitcast between f16 and bf16 which will trigger "cannot select" fatal error. Co-authored-by: Ying Wang <[email protected]>

…#159218) Also turn the method into a static function so it can be used without an instance of the class.

Remove XFAILs for llvm-driver. DTLTO is still incompatible with llvm-driver, but these tests now pass after #159151. Modify a missed regex to use filename.py (missed in #159151). Tighten overly greedy regexes to prevent spurious failures.

…55431) The current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support. --------- Signed-off-by: Muzammiluddin Syed <[email protected]>

…C. (#159338) This avoids the following kind of warning when built with GCC: ../../clang/lib/Sema/SemaStmtAttr.cpp: In function ‘clang::Attr* ProcessStmtAttribute(clang::Sema&, clang::Stmt*, const clang::ParsedAttr&, clang::SourceRange)’: ../../clang/lib/Sema/SemaStmtAttr.cpp:677:30: warning: enumerated mismatch in conditional expression: ‘clang::diag::<unnamed enum>’ vs ‘clang::diag::<unnamed enum>’ [-Wenum-compare] 676 | S.Diag(A.getLoc(), A.isRegularKeywordAttribute() | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 677 | ? diag::err_keyword_not_supported_on_targe | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 678 | : diag::warn_unhandled_ms_attribute_ignore ) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These enums are non-overlapping, but due they are defined in different enum scopes due to how they are generated with tablegen.

Without this patch, we are doing a roundtrip on types. Specifically, if decltype(...) is well formed, std::is_same_v evaluates to a boolean value. We then pass the boolean value to std::enable_if_t, go through the sizeof(char)/sizeof(double) trick, and then come back to a boolean value. This patch simplifies all this by having test() return std::is_same<...>. The "caller" attaches ::value, so effectively we are using std::is_same<...>::value when decltype(...) is well formed, bypassing std::enable_if_t and the sizeof(char)/sizeof(double) trick. If we did not care about the return type of the shift operator, we could use llvm::is_detected, but the return type check doesn't allow us to simplify things that far.

Summary: Turns out the new CUDA ABI now applies retroactively to all the other SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping all the SM flags consistent but using an offset. Fixes: #159088

A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <[email protected]>

Breaks some buildbots This reverts commit c928516.

…159448) NFC.

pull bot locked and limited conversation to collaborators Sep 12, 2025

pull bot added the ⤵️ pull label Sep 12, 2025

jiang1997 and others added 28 commits September 16, 2025 09:26

[OpenMP] Fix force-usm test after #157182 (#159095)

311d78f

The refactoring lead to an additional data transfer. This changes the assumed transfers in the check-strings to work with that changed behavior.

[MLIR][OpenMP] Set default address space for OpenMPIRBuilder (#158689)

148e099

Extension of #158152 for MLIR. --------- Signed-off-by: Sarnie, Nick <[email protected]>

[NFC][TableGen] Move decoder tests to DecoderEmitter directory (#159040)

ce073a9

Mark STATISTIC variables as maybe_unused when stats are disabled. (#1…

334013b

…59103) PR #159045 made the constructor constexpr, which allows `-Wunused-variable` to trigger. However, we don't really care if a statistic is unused if `LLVM_ENABLE_STATS` is 0.

[TableGen][Decoder] Make predicate/decocder generation functions retu…

b3fa92f

…rn a string (NFC) (#159089) These functions will see more uses in a future patch. This also resolves a FIXME.

[NFC ]Add a helper function isTailCall for getting libcall in Selecti…

2771d35

…onDAG (#155256) Based on comment of #153600 (comment), Add a helper function isTailCall for getting libcall in SelectionDAG.

[ADT] Wrapper for std::accumulate accepting a range. (#158702)

7e71877

AMDGPU: Regenerate baseline test checks for some gfx12 mc tests (#159098

a4c5a74

)

[AMDGPU] Use larger immediate values in S_NOP (#158990)

eeced0d

The S_NOP instruction has an immediate operand which is one less than the number of cycles to delay for. The maximum value that may be encoded in this field was increased in GFX8 and again in GFX12.

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in LowerGpuOpsT…

beb6bab

…oROCDLOps.cpp (NFC)

[mlir][gpu][spirv] Add conversion for gpu.subgroup_mma_elementwise mu…

f017bcb

…lf (#158832) gpu.subgroup_mma_elementwise supports mulf op type. Add conversion for it.

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in WinogradConv…

9865f7e

…2D.cpp (NFC)

[InstCombine] Optimize redundant floating point comparisons in or/`…

08a58b2

…and` inst's (#158097) Resolves #157371 We can eliminate one of the `fcmp` when we have two same `olt` or `ogt` instructions matched in `or`/`and` simplification.

[cmake] Add cmake file for hexagon-builtins baremetal (#151500)

3388d40

This will be used to build hexagon-builtins for baremetal. Signed-off-by: Kushal Pal <[email protected]>

[flang] Allow polymorphic type mismatch for hlfir.eoshift. (#158718)

d2fbca8

When the ARRAY has polymorphic type, its element type may not match the element type of BOUNDARY. Fixes #158382.

[TableGen][DecoderEmitter] Inline a couple of trivial functions (NFC) (…

ee66d96

…#159099)

[TableGen][DecoderEmitter] Change SmallSetVector to SetVector (NFC) (#…

3ef066f

…159108) SmallSetVector is too optimistic, there are usually more than 16 unique decoders and predicates. Modernize `typedef` to `using` while here.

[AMDGPU] Elide bitcast fold i64 imm to build_vector (#154115)

341cdbc

Elide bitcast combine to build_vector in case of i64 immediate that can be materialized through 64b mov

atrosinenko and others added 30 commits September 17, 2025 20:23

[lldb] Add unreachable after fully covered switches, avoid GCC warnin…

4ff113f

…gs. NFC. (#159327) This avoids the following kind of warning with GCC: warning: control reaches end of non-void function [-Wreturn-type]

[AMDGPU] Fold copies of constant physical registers into their uses (#…

f0090ba

…154410) Co-authored-by: Jay Foad <[email protected]> Co-authored-by: Jay Foad <[email protected]>

[mlir][llvm] Pretty printing for trap intrinsics (#159385)

4e3aa76

[libc][math] Adjust rsqrtf16 exception checks. (#159411)

f549bb2

[ADT] Fix llvm::concat_iterator for ValueT == common_base_class * (#…

b241cc9

…144744) Fix `llvm::concat_iterator` for the case of `ValueT` being a pointer to a common base class to which the result of dereferencing any iterator in `ItersT` can be casted to.

[gn build] Port 2caf4c1

3a2eb08

[libc] Temporarily disable floating point exception check for rsqrtf1…

e8fd84d

…6 on aarch64. (#159417)

[RISCV] Add isel for bitcasting between bfloat and half types (#158828)

4bac9d4

There is no RISCV isel for bitcast between f16 and bf16 which will trigger "cannot select" fatal error. Co-authored-by: Ying Wang <[email protected]>

[TableGen][DecoderEmitter] Simplify FilterChooser::getIslands() (NFC) (…

2c2fec3

…#159218) Also turn the method into a static function so it can be used without an instance of the class.

[mlir][ArithToAMDGPU] limit scaling truncf/extf support to gfx950 (#1…

aa5558d

…55431) The current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support. --------- Signed-off-by: Muzammiluddin Syed <[email protected]>

[LLVM] Fix offload and update CUDA ABI for all SM values (#159354)

dffd7f3

Summary: Turns out the new CUDA ABI now applies retroactively to all the other SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping all the SM flags consistent but using an offset. Fixes: #159088

[profcheck] Exclude LoopVectorize tests introduced in #155301 (#159440)

7dc8753

[profcheck] exclude test introduced in #158328 (#159441)

f992b5b

[AMDGPU] Add gfx1251 subtarget (#159430)

e556dc0

[profcheck] exclude LV test introduced in #155547 (#159443)

835d6b3

[PatternMatch] Introduce match functor (NFC) (#159386)

7fb3a91

A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <[email protected]>

Revert "[SFrames] Emit and relax FREs (#158154)" (#159436)

6c8fcd6

Breaks some buildbots This reverts commit c928516.

[RISCV][NFC] Merge some WriteRes entries in SiFive7 scheduling model (#…

96f2ab2

…159448) NFC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #5634

[pull] main from llvm:main #5634

pull bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] main from llvm:main #5634

Are you sure you want to change the base?

[pull] main from llvm:main #5634

Conversation

pull bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Sep 12, 2025 •

edited

Loading