Skip to content

Conversation

pull[bot]
Copy link

@pull pull bot commented Sep 12, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Sep 12, 2025
@pull pull bot added the ⤵️ pull label Sep 12, 2025
jiang1997 and others added 28 commits September 16, 2025 09:26
This PR refactors alignment validation in MLIR's MemRef and SPIRV
dialects:
- Use `IntValidAlignment` for consistent type safety across MemRef and
SPIRV dialects
  - Eliminate duplicate validation logic in `MemRefOps.cpp`
  - Adjust error messages in `invalid.mlir` to match improved validation
  
This is the first of two PRs addressing issue #155677.
The refactoring lead to an additional data transfer. This changes the
assumed transfers in the check-strings to work with that changed
behavior.
Summary:
Currently we have this `__tgt_device_image` indirection which just takes
a reference to some pointers. This was all find and good when the only
usage of this was from a section of GPU code that came from an ELF
constant section. However, we have expanded beyond that and now need to
worry about managing lifetimes. We have code that references the image
even after it was loaded internally. This patch changes the
implementation to instaed copy the memory buffer and manage it locally.

This PR reworks the JIT and other image handling to directly manage its
own memory. We now don't need to duplicate this behavior externally at
the Offload API level. Also we actually free these if the user unloads
them.

Upside, less likely to crash and burn. Downside, more latency when
loading an image.
Extension of #158152 for MLIR.

---------

Signed-off-by: Sarnie, Nick <[email protected]>
…59103)

PR #159045 made the constructor constexpr, which allows
`-Wunused-variable` to trigger. However, we don't really care if a
statistic is unused if `LLVM_ENABLE_STATS` is 0.
…rn a string (NFC) (#159089)

These functions will see more uses in a future patch.
This also resolves a FIXME.
…onDAG (#155256)

Based on comment of
#153600 (comment),
Add a helper function isTailCall for getting libcall in SelectionDAG.
…et (#158597)

Fixes #157252.
Peephole optimization tends to fold:
```
add %gpr1, %stack, 0
subs %gpr2, %gpr1, 0
```
to
```
adds %gpr2, %stack, 0
```

This patch undoes the fold in `rewriteAArch64FrameIndex` to process
`adds` on the stack object.
Fixes an issue in commit 3946c50, PR #135349.

The DebugSSAUpdater class performs raw pointer allocations. It frees
these properly in reset(), but does not do so in its destructor - as an
immediate fix, this patch adds a destructor which frees the allocations
correctly.

I'll be merging this immediately to fix the issue, but will be open to
post-commit review and/or producing a better fix in a follow-up commit.
The S_NOP instruction has an immediate operand which is one less than
the number of cycles to delay for. The maximum value that may be encoded
in this field was increased in GFX8 and again in GFX12.
…58026)

IR has the `contract` to indicate that contraction is allowed. Testing
shouldn't rely on global flag to perform contraction. This is a
prerequisite before making backends rely only on the IR to perform
contraction. See more here:
https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract/80909/5
…lf (#158832)

gpu.subgroup_mma_elementwise supports mulf op type. Add conversion for it.
)

After replacing VGPR MFMAs with the AGPR form, we've alleviated VGPR
pressure which may have triggered spills during allocation. Identify
these spill slots, and try to reassign them to newly freed VGPRs,
and replace the spill instructions with copies.

Fixes #154260
This PR is a part of the effort to make the VFS used in the compiler
more explicit and consistent.

Instead of creating the VFS deep within the compiler (in
`CompilerInstance::createFileManager()`), clients are now required to
explicitly call `CompilerInstance::createVirtualFileSystem()` and
provide the base VFS from the outside.

This PR also helps in breaking up the dependency cycle where creating a
properly configured `DiagnosticsEngine` requires a properly configured
VFS, but creating properly configuring a VFS requires the
`DiagnosticsEngine`.

Both `CompilerInstance::create{FileManager,Diagnostics}()` now just use
the VFS already in `CompilerInstance` instead of taking one as a
parameter, making the VFS consistent across the instance sub-object.
…and` inst's (#158097)

Resolves #157371

We can eliminate one of the `fcmp` when we have two same `olt` or `ogt`
instructions matched in `or`/`and` simplification.
This will be used to build hexagon-builtins for baremetal.

Signed-off-by: Kushal Pal <[email protected]>
When the ARRAY has polymorphic type, its element type may not match
the element type of BOUNDARY.

Fixes #158382.
…), C)) (#155141)

Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is:
```
define i64 @test_smin_neg_one(i64 %a) {
  %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1)
  %retval.0 = xor i64 %1, -1
  ret i64 %retval.0
}
```
GCC generates:
```
	cmp	x0, 0
	csinv	x0, xzr, x0, ge
	ret
```
Clang generates:
```
	cmn	x0, #1
	csinv	x8, x0, xzr, lt
	mvn	x0, x8
	ret
```
Clang keeps flipping x0 through x8 unnecessarily.
So I added the following folds to DAGCombiner:
fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0
fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0

alive2: https://alive2.llvm.org/ce/z/gffoir

---------

Co-authored-by: Yui5427 <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Simon Pilgrim <[email protected]>
…159108)

SmallSetVector is too optimistic, there are usually more than 16 unique
decoders and predicates. Modernize `typedef` to `using` while here.
Elide bitcast combine to build_vector in case of i64 immediate that can
be materialized through 64b mov
Ensure alias analyses mask out `errnomem` location, refining the
resulting modref info, when the given access/location does not
alias errno. This may occur either when TBAA proves there is no
alias with errno (e.g., float TBAA for the same root would be
disjoint with the int-only compatible TBAA node for errno); or
if the memory access size is larger than the integer size, or
when the underlying object is a potentially-escaping alloca.

Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
…sdot (#158310)

This allows dot products with scalable 8xi16 vectors (and fixed-length
vectors which are converted into a scalable vector) accumulating into a
4xi32 vector to lower into a single instruction (`udot`/`sdot`), rather
than a sequence of `umlalb`s and `umlalt`s`.
atrosinenko and others added 30 commits September 17, 2025 20:23
There is a number of attributes that is expected to be set on functions
by default. This patch implements setting more such attributes on the
FMV resolver functions generated by Clang. On AArch64, this makes the
resolver functions use the default PAC and BTI hardening settings.
…59312)

This popped up during our internal integrates of upstream changes. It
started happening after ba9d1c4, which
started using `TemplateSpecializationType` in this place and the code
was not prepared to handle it.
…gs. NFC. (#159327)

This avoids the following kind of warning with GCC:

    warning: control reaches end of non-void function [-Wreturn-type]
#159337)

This fixes the following warning when compiled with GCC:

    ../lib/Target/AArch64/AArch64ISelLowering.cpp: In function ‘bool shouldLowerTailCallStackArg(const llvm::MachineFunction&, const llvm::CCValAssign&, llvm::SDValue, llvm::ISD::ArgFlagsTy, int)’:
    ../lib/Target/AArch64/AArch64ISelLowering.cpp:9310: warning: comparison of integer expressions of different signedness: ‘uint64_t’ {aka ‘long unsigned int’} and ‘int64_t’ {aka ‘long int’} [-Wsign-compare]
     9310 |       if (SizeInBits / 8 != MFI.getObjectSize(FI))
          |
This avoids the following warnings from Clang:

    ../../lldb/source/Host/windows/Host.cpp:324:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
      324 |   default:
          |   ^
    ../../lldb/source/Host/common/File.cpp:662:26: warning: cast from 'const void *' to 'char *' drops const qualifier [-Wcast-qual]
      662 |           .write((char *)buf, num_bytes);
          |                          ^
…ngs. NFC. (#159330)

This avoids the following warnings:

    ../../clang/lib/AST/ExprConstant.cpp: In member function ‘bool {anonymous}::IntExprEvaluator::VisitBuiltinCallExpr(const clang::CallExpr*, unsigned int)’:
    ../../clang/lib/AST/ExprConstant.cpp:14104:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
    14104 |   }
          |   ^
    ../../clang/lib/AST/ExprConstant.cpp:14105:3: note: here
    14105 |   case Builtin::BIstrlen:
          |   ^~~~
    ../../clang/lib/Driver/ToolChains/CommonArgs.cpp: In function ‘std::string clang::driver::tools::complexRangeKindToStr(clang::LangOptionsBase::ComplexRangeKind ’:
    ../../clang/lib/Driver/ToolChains/CommonArgs.cpp:3523:1: warning: control reaches end of non-void function [-Wreturn-type]
     3523 | }
          | ^
This avoids the following kind of warning with GCC:

    ../tools/llvm-lipo/llvm-lipo.cpp: In function ‘void printInfo(llvm::LLVMContext&, llvm::ArrayRef<llvm::object::OwningBinary<llvm::object::Binary> >)’:
    ../tools/llvm-lipo/llvm-lipo.cpp:464:34: warning: suggest parentheses around ‘& ’ within ‘||’ [-Wparentheses]
      464 |              Binary->isArchive() && "expected MachO binary");
          |              ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
While debugging #145206 I found that a possible cause for the problem is
the call to printf, which is variadic.

In a musl environment VarArgs are treated like *non* VarArgs.
The handling of this special case was bypassed by the commit 
a4f8551.

The reason is that the arguement `TreatAsVarArg` is only set to `true`
in
an *non* musl env. `TreatAsVarArg` determines the result of
`isVarArg()`.
The `CCIfArgVarArg` class only checks each individual
variable, but not whether `isVarArg()` is true.

Without the special case, the unnamed arguments are always passed
on the stack, as is standard calling convention. But with musl
also unnamed arguments can be passed in registers.

Possibly, this fixes #145206.
…subprogram DIEs (#159104)

With this change, construction of abstract subprogram DIEs is split in
two stages/functions:
creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE)
and its population with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).
With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes #29985.

LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.

DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.

This is a temporary fix for
#29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).

Some code proposed by Ellis Hoag <[email protected]> in
#90523 was taken for this
commit.
Recognize an MTE tag fault Mach exception. A tag fault is an error
reported by Arm's Memory Tagging Extension (MTE) when a memory access
attempts to use a pointer with a tag that doesn't match the tag stored
with the memory. LLDB will print the tag and address to make the issue
easier to diagnose.

This was hand tested by debugging an MTE enabled binary on an iPhone 17
running iOS 26.

rdar://113575216
…144744)

Fix `llvm::concat_iterator` for the case of `ValueT` being a pointer to
a common base class to which the result of dereferencing any iterator in
`ItersT` can be casted to.
There is no RISCV isel for bitcast between f16 and bf16 which will
trigger "cannot select" fatal error.

Co-authored-by: Ying Wang <[email protected]>
…#159218)

Also turn the method into a static function so it can be used without
an instance of the class.
Remove XFAILs for llvm-driver. DTLTO is still incompatible with
llvm-driver, but these tests now pass after #159151.

Modify a missed regex to use filename.py (missed in #159151).

Tighten overly greedy regexes to prevent spurious failures.
…55431)

The current chip guard fails to prevent scaling_extf/truncf patterns
from being applied on gfx1100 which does not have scaling support.

---------

Signed-off-by: Muzammiluddin Syed <[email protected]>
…C. (#159338)

This avoids the following kind of warning when built with GCC:

    ../../clang/lib/Sema/SemaStmtAttr.cpp: In function ‘clang::Attr* ProcessStmtAttribute(clang::Sema&, clang::Stmt*, const clang::ParsedAttr&, clang::SourceRange)’:
    ../../clang/lib/Sema/SemaStmtAttr.cpp:677:30: warning: enumerated mismatch in conditional expression: ‘clang::diag::<unnamed enum>’ vs ‘clang::diag::<unnamed enum>’ [-Wenum-compare]
      676 |       S.Diag(A.getLoc(), A.isRegularKeywordAttribute()
          |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      677 |                              ? diag::err_keyword_not_supported_on_targe
          |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      678 |                              : diag::warn_unhandled_ms_attribute_ignore )
          |                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These enums are non-overlapping, but due they are defined in different
enum scopes due to how they are generated with tablegen.
Without this patch, we are doing a roundtrip on types.  Specifically,
if decltype(...) is well formed, std::is_same_v evaluates to a boolean
value.  We then pass the boolean value to std::enable_if_t, go through
the sizeof(char)/sizeof(double) trick, and then come back to a boolean
value.

This patch simplifies all this by having test() return
std::is_same<...>.  The "caller" attaches ::value, so effectively we
are using std::is_same<...>::value when decltype(...) is well formed,
bypassing std::enable_if_t and the sizeof(char)/sizeof(double) trick.

If we did not care about the return type of the shift operator, we
could use llvm::is_detected, but the return type check doesn't allow
us to simplify things that far.
Summary:
Turns out the new CUDA ABI now applies retroactively to all the other
SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping
all the SM flags consistent but using an offset.

Fixes: #159088
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.

Co-authored-by: Luke Lau <[email protected]>
Breaks some buildbots

This reverts commit c928516.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.