Skip to content

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Sep 13, 2025

This branch started off as a simple attempt to fix #25025, and ended up as essentially a rewrite of std.debug.SelfInfo and everything it touches. Big thanks to @kcbanner and @jacobly0 for their help with various parts of this. The commit history is a mess, especially at the start, but I'll leave it intact because there's certainly some useful separation there.

The Big Problem

TL;DR: iterating the stack by assuming that stack frames form a simple linked list is not reliable in most cases, which leads to issues like #25025. Instead, it is usually necessary for correctness to use something like DWARF stack unwinding.

#24960 deleted std.debug.MemoryAccessor, which was previously used by std.debug while gathering stack traces to avoid accessing invalid memory. The reasons for deleting it were good: in particular, this abstraction punished users for collecting stack traces by decimating performance since every memory access ended up going through a system call. However, this resulted in #25025. What's going on here is simple: collecting stack traces is just really difficult.

The obvious strategy is to simple traverse "frame pointers", used by essentially every architecture. For instance, on x86_64, it is typical to use rbp to point to the base of the stack frame; at that address is the saved value of rbp for the caller, as well as the return address. So the frame pointers form a linked list which gives us access to all of the return addresses. The problem is that in general, there is no requirement to use frame pointers. Binaries can omit them (-fomit-frame-pointer) and use the register in question (rbp on x86_64) as an extra general-purpose register. That's usually not too helpful nowadays, but on architectures with few GPRs, such as x86, having an extra one can be a big deal for performance.

std.debug knew about this, so it avoided walking frame pointers if builtin.omit_frame_pointer was set. However, that's not actually sufficient. That flag only tells us about the current compilation unit (the Zig code being compiled). It does not give us information about whether any other object linked into the final executable has omitted frame pointers. For instance, if we are linking libc and that libc omits frame pointers, then the first few stack frames (which are in libc's startup code) will have bogus frame pointers, leading to invalid memory accesses when we try to do a naive FP walk.

In other words, stack frame pointer walking can never be assumed to be a safe strategy. In the future, when our self-hosted linkers are mature enough to be used everywhere, we could theoretically track omit_frame_pointer information across every object which an executable uses, but still, in general, FP walking is unsafe.

Luckily, there are alternatives to FP walking. Many targets allow binaries to embed "unwind information"---often in a format specified by DWARF---which specifies how to safely iterate stack frames. As well as debugging and profiling, this information is also often used by C++ to unwind the stack when exceptions are thrown, so it is actually very often available with minimal overhead.

std.debug was aware of unwind information, and would sometimes use it instead of FP walking. Unfortunately, the implementation was rather lacklustre, and in particular, calling std.debug.captureStackTrace simply never used any unwind information (there was a TODO comment for this; the issue was that the implementation could only do it if it loaded the binary from disk, which is slow and usually unnecessary).

So, the original goal here was just to fix that issue: make captureStackTrace use proper unwind information if available. Thanks to @jacobly0 for kicking this work off. When I took it over, I ended up continuing some refactors he started, and... well, the branch snowballed a little from there, and... here we are.

The Big Fix

"Proper" unwinding (e.g. through DWARF unwind information) is now much better supported and less buggy. It is always the default path for stack unwinding. FP unwinding is still available, but it will only be used if the caller passes an option .allow_unsafe_unwind = true with the understanding that it could crash. aarch64-macos is a notable exception: on that target, Apple actually mandate that frame pointers are always used, so it FP unwinder is always used as a fallback since it is guaranteed to be safe.

Bugs notwithstanding, this means that things like std.heap.DebugAllocator are guaranteed to never crash. The downside is that if std.debug cannot use unwind information for the target, collected stack traces are empty, but std.debug has relatively good support for DWARF unwinding etc, and the information is available in the vast majority of cases (including in release builds).

Breaking API changes

The biggest user-facing change in this PR is that several std.debug APIs have changed; in particular, capturing and printing stack traces:

pub const StackUnwindOptions = struct {
    /// If not `null`, we will ignore all frames up until this return address. This is typically
    /// used to omit intermediate handling code (for instance, a panic handler and its machinery)
    /// from stack traces.
    first_address: ?usize = null,
    /// If not `null`, we will unwind from this `ThreadContext` instead of the current top of the
    /// stack. The main use case here is printing stack traces from signal handlers, where the
    /// kernel provides a `*const ThreadContext` of the state before the signal.
    context: ?ThreadContextPtr = null,
    /// If `true`, stack unwinding strategies which may cause crashes are used as a last resort.
    /// If `false`, only known-safe mechanisms will be attempted.
    allow_unsafe_unwind: bool = false,
};

pub fn captureCurrentStackTrace(options: StackUnwindOptions, addr_buf: []usize) StackTrace;
pub fn writeCurrentStackTrace(options: StackUnwindOptions, writer: *Writer, tty_config: tty.Config) Writer.Error!void;
pub fn dumpCurrentStackTrace(options: StackUnwindOptions) void;

pub fn writeStackTrace(st: *const StackTrace, writer: *Writer, tty_config: tty.Config) Writer.Error!void;
pub fn dumpStackTrace(st: *const StackTrace) void;

The functions regarding the "current" stack are all called <something>CurrentStackTrace and take StackUnwindOptions, while functions taking already-recorded traces are called <something>StackTrace and take *const StackTrace.
Here are "direct" migrations for all the old functions. Note that .{ .first_address = null } can be replaced with just .{}:

 OLD                                             NEW
dumpCurrentStackTrace(a)                        dumpCurrentStackTrace(.{ .first_address = a })
dumpCurrentStackTraceToWriter(a, w)             writeCurrentStackTrace(.{ .first_address = a }, w, tty_config)
dumpStackTraceFromBase(ctx, w)                  writeCurrentStackTrace(.{ .context = ctx }, w, tty_config)
captureStackTrace(a, &st)                       st = captureCurrentStackTrace(.{ .first_address = a }, addr_buf)
dumpStackTrace(st)                              dumpStackTrace(&st)
writeStackTrace(st, w, di, tty_config)          writeStackTrace(&st, w, tty_config)
writeCurrentStackTrace(w, di, tty_config, a)    writeStackTrace(.{ .first_address = a }, w, tty_config)

std.debug.StackIterator is no longer pub; it is an internal implementation detail. You probably just want captureCurrentStackTrace.

std.debug.have_ucontext and std.debug.have_getcontext are also no longer pub (in fact, they no longer exist!). There was no good reason to inspect these, but if you do need them, you can just do feature checks on std.posix.system.

std.debug.SelfInfo has been completely reworked. Note that while this API is exposed, there is usually little reason to use it directly.

Debug info: visible enhancements

  • Crashes when collecting or printing stack traces are fixed (stack trace SEGV when linking libc and using debug_allocator within a thread #25025, as discussed above)
  • aarch64-macos DWARF unwinding has been fixed and therefore enabled (thanks Jacob for tracking this down!)
  • On ELF targets, the binary is no longer loaded from disk for unwind information if the .eh_frame_hdr section is available
  • On Windows, libraries loaded late using LoadLibrary will now appear correctly in stack traces
  • On DWARF targets (basically everything except Windows), inlined frames now use the correct function name in stack traces
  • On ELF targets, if DWARF info is unavailable, symbol names are still resolved from the symtab (work based on teach std.debug to convert addresses to ELF symbols #22077; thank you @leroycep!)

Debug info: internal changes

std.debug.SelfInfo has been completely reworked to be more modular and have more clear separation of concerns. SelfInfo itself essentially just contains target-agnostic logic, of which there is fairly little. The target-specific logic is all in SelfInfo.Module, which is chosen at comptime to be, by default, one of ElfModule, DarwinModule, or WindowsModule (or void otherwise). The implementation is responsible for looking up the symbol and source location corresponding to a given address, as well as (optionally) providing a safe stack unwinding routine. Target-specific stack unwinding logic which was previously partly in std.debug itself is now in those Module implementations.

The type was was previously std.debug.Dwarf.ElfModule has been rewritten and turned into std.debug.ElfFile. It is a target-agnostic abstraction for loading DWARF debug and unwind info from an ELF file.

DWARF unwinding info is no longer handled by std.debug.Dwarf itself, but instead by the new std.debug.Dwarf.Unwind. The DWARF specification itself makes efforts to separate these two things by intentionally making the .debug_frame section independent (i.e. usable without loading any other debug info), and platforms typically facilitate this by loading the unwind info section (.eh_frame on Linux, __TEXT,__eh_frame on macOS) into memory without needing the binary to be loaded from disk.

Stack unwinding is generalised and does not necessarily correspond to DWARF unwinding. Windows' RtlVirtualUnwind function is simply the unwinding implementation provided by WindowsModule. On x86 (32-bit), where Windows' RtlVirtualUnwind function is unavailable, RtlCaptureStackBackTrace is no longer used, because it is just doing FP unwinding; see comments in std.debug.SelfInfo.WindowsModule for details.

std.debug.Dwarf has had many dependencies on the host target eliminated, so cross-target DWARF loading is much more possible than it previously was. Dwarf itself should have no dependencies on the host at all, though some nested namespaces which look at builtin.target currently remain.

Segfault handler

By default, std.debug installs a segfault handler which behaves similarly to the default panic handler, which we all agree is lovely. However, writing a custom segfault handler has always been a little annoying, because you had to copy all of the target-specific logic.

This PR allows you to keep std.Options.enable_segfault_handler as true, but override the OS-agnostic handler logic. The default implementation is std.debug.defaultHandleSegfault, but it can be overriden by exposing root.debug.handleSegfault. This is useful because it facilitates the use case of overriding the panic and segfault handlers to insert crash information before falling through to the normal behavior. An example of this is the compiler itself, where src/crash_report.zig has been massively simplified---its overrides now look like this:

/// We override the panic implementation to our own one, so we can print our own information before
/// calling the default panic handler. This declaration must be re-exposed from `@import("root")`.
pub const panic = if (dev.env == .bootstrap)
    std.debug.simple_panic
else
    std.debug.FullPanic(panicImpl);

/// We let std install its segfault handler, but we override the target-agnostic handler it calls,
/// so we can print our own information before calling the default segfault logic. This declaration
/// must be re-exposed from `@import("root")`.
pub const debug = struct {
    pub const handleSegfault = handleSegfaultImpl;
};

fn handleSegfaultImpl(addr: ?usize, name: []const u8, opt_ctx: ?std.debug.ThreadContextPtr) noreturn {
    @branchHint(.cold);
    dumpCrashContext() catch {};
    std.debug.defaultHandleSegfault(addr, name, opt_ctx);
}
fn panicImpl(msg: []const u8, first_trace_addr: ?usize) noreturn {
    @branchHint(.cold);
    dumpCrashContext() catch {};
    std.debug.defaultPanic(msg, first_trace_addr orelse @returnAddress());
}

Of course, it is still possible to set .enable_segfault_handler = false in your std.Options and do everything yourself.

Simpler freestanding Debug Info Support

This blog post from Andrew in 2018 describes using Zig's standard library to implement support for nice stack traces when running an image on bare metal, just by shuffling some DWARF sections around and providing a few overrides. Unfortunately, this functionality had severely regressed since 2018.

This branch fixes that; once the target-specific bits of std.debug.SelfInfo were all factored out, it actually wasn't overly difficult. The process just consists of exposing a few declarations from root.debug. To test it, I updated the project in Andrew's blog post (ClashOS) to Zig master and implemented stack traces on it. I'm very happy with how this turned out; in my opinion, the new logic is even simpler. The implementation of the root.debug namespace is at https://zigbin.io/ff9fa0; with that, std.debug.writeCurrentStackTrace can be used. I'll upstream this to andrewrk/clashos after this PR is merged.

Here's what the output of that code looks like:

image

ELF linker fixes

This branch fixes two bugs I hit in the self-hosted ELF linker.

Firstly, we no longer incorrectly emit a DT_PLTGOT entry in the .dynamic section in static PIEs. This was causing crashes in std.dynamic_library when using our own dl_iterate_phdr implementation. Thanks to Jacob for helping to figure this one out.

Secondly, we no longer emit a bogus .eh_frame_hdr section when using self-hosted backends. It is not valid to emit an incomplete lookup table; unwinders expect that if the table is present, it is complete (the old SelfInfo implementation deviated from convention by not assuming that; the new one does make that assumption). Luckily, it is entirely compliant to simply omit the lookup table, and that could be a desirable long-term strategy in incremental binaries. For now, we generate the LUT if there is no ZigObject (i.e. if there is no Zig code or if we are using the LLVM backend with the self-hosted linker), and omit it (requiring std.debug.SelfInfo and other unwinders to build their own at runtime) if there is a ZigObject.

New test coverage

Previously, zig build test-stack-traces was really mainly testing error traces. The coverage was a bit lacklustre, and also included silly workarounds for some bugs.

I've split this into test-stack-traces and test-error-traces, and introduced a load of new tests to the former. I also rewrote their harnesses entirely. In brief:

  • test-stack-traces tests anything which does stack iteration, including panics, dumpCurrentStackTrace(.{}), and captureCurrentStackTrace(.{}). It will run through permutations of -fPIE, -lc, -fomit-frame-pointer, -fstrip, -funwind-tables; these are the things which affect (and hence could break!) stack unwinding. Everything is tested in Debug mode.
  • test-error-traces is tested in all optimize modes. It will enable error tracing for all of them by default, but in cases where LLVM's optimizations defeat the error trace, select configurations can be told to disable error tracing (in which case they merely confirm that the correct error is returned from root.main).

Follow-up work

There are some related things I couldn't get to in this branch. Where relevant, I'll file follow-up issues when this is merged.

  • The Mach-O linker currently does not emit __unwind_info entries when using self-hosted backends. This information is required for backtraces on these targets. The approach which is friendliest to incremental compilation would probably be to mark the whole address range of Zig functions as using "frame pointer unwinding". This would require the backend to always push frame base pointers (which at least the x86_64 backend already does!), and to save callee-saved registers at a known offset from the frame base; see this documentation.

  • The ELF linker also behaves incorrectly in terms of unwind information with self-hosted backends. The current behavior is to emit .eh_frame entries iff debug information is being emitted. That is not correct: unwind information is potentially necessary for correctness, and so is locked behind different flags. The correct behavior depends on whether debug info is included and on whether -fno-unwind-tables was given:

    • default: emit unwind entries into .eh_frame
    • -fstrip: emit unwind entries into .eh_frame
    • -fno-unwind-tables: emit unwind entries into .debug_frame
    • -fno-unwind-tables -fstrip: do not emit any unwind entries
  • As reported by Frame address is generally incorrect on windows and uefi #18662, stack unwinding on UEFI is unreliable, because PE environments tend to use a weird frame layout. Unwind information is required, which is what RtlVirtualUnwind is doing for us on Windows. We could have our own implementation of this for UEFI; we could also possibly make it perform better than RtlVirtualUnwind on Windows. The structure of unwind information in PE binaries is defined here.

  • Some test coverage is currently disabled; to restore it, we either need some more precise "strip" flags (related: add -fstrip=debug_info command line option #22591), or to un-regress zig objcopy (tracked by objcopy isn't sweaty #24522).

  • std.debug.Dwarf probably uses too much memory; for instance, my ClashOS thing needs a 4 megabyte FixedBufferAllocator just to load its own debug info. Most of that memory goes to the cached line number tables introduced in 1792258. I think we could probably find a nice middle ground which keeps source location resolution relatively fast whilst requiring less memory.

  • Overriding SelfInfo.Module.UnwindContext on freestanding targets currently doesn't allow you to easily use DWARF unwinding, because std.Dwarf.abi.regBytes has hardcoded references to ThreadContext fields based on the target (and will just always error on freestanding. To fix this, that target-specific logic must be overrideable somehow. As a part of this, I suspect that std.Dwarf.abi should be combined into std.SelfInfo.UnwindContext.

@mlugg mlugg added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Sep 13, 2025
@alexrp alexrp added the ci-riscv64-linux Pull requests with this label applied will have riscv64-linux jobs scheduled. label Sep 13, 2025
@mlugg
Copy link
Member Author

mlugg commented Sep 13, 2025

One merge blocker for this is making a decision on how std.debug should behave by default in compilations where debug info is stripped. On master, the default panic handler will refuse to print a stack trace in stripped builds by default, but that's not actually a particularly logical constraint: you can still get very helpful information from a stack trace even with debug information stripped! Currently, I think this branch will include all of the stack tracing logic even in -fstrip compilations. I think this is a valuable option---note #18520---but perhaps it's desirable for this logic to be omitted by default for reasons of binary size, I'm not sure.

@alexrp
Copy link
Member

alexrp commented Sep 13, 2025

I think there should be an std.Options field controlling whether the stack trace code gets compiled into the default panic handler. The default initializer for that field could then be something like !builtin.strip_debug_info or builtin.unwind_tables != .none or !builtin.omit_frame_pointer.

@mlugg
Copy link
Member Author

mlugg commented Sep 13, 2025

@Vexu note that unlike #20126, the "context" for CodeGen crashes added here is only the function name. If you're interested in adding the body index stuff so we can actually see the instruction which is crashing, feel free to push to this PR.

@Vexu
Copy link
Member

Vexu commented Sep 13, 2025

The name of the function was the most useful part of that PR to me.

@xdBronch
Copy link
Contributor

I think there should be an std.Options field controlling whether the stack trace code gets compiled into the default panic handler.

you mean like #19650 possibly? 😄
not being able to easily control whether this stuff is included sounds pretty unfortunate, its not a trivial amount of code

@Khitiara
Copy link

Note that one thing still missing here is support for using Dwarf-based unwinding with a custom ThreadContext. Despite the ability to override the ThreadContext, if an attempt is made to support DwarfUnwindContext etc on freestanding then Dwarf.abi will error due to a lack of posix ucontext and the only workaround as of the state of this PR is to duplicate DwarfUnwindContext completely

@mlugg
Copy link
Member Author

mlugg commented Sep 14, 2025

@Khitiara, that's absolutely true---I tried to add unwinding support to ClashOS and quickly hit the same thing. I'll add that to my list of follow-up tasks. I suspect Dwarf.abi and SelfInfo.DwarfUnwindContext should just be rolled into one thing; perhaps Dwarf.SelfUnwind?

@Khitiara
Copy link

@mlugg that makes a lot of sense to me, though honestly the abi is the only part likely to need much replacability if im reading this right. either way worth adding to follow ups if not this PR

Copy link
Contributor

@rootbeer rootbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an excellent PR to me. I like the comments you've added on the structure definitions, in particular.

@mlugg
Copy link
Member Author

mlugg commented Sep 17, 2025

Stack tracing can now be enabled or disabled based on a flag in std.Options. Unlike #19650, it is a simple bool: this is because the third option in that PR (enabled, but printed stack traces do not include source snippets) can be easily implemented after this PR by overriding root.debug.printLineFromFile:

pub const debug = struct {
    pub fn printLineFromFile(w: *Writer, sl: std.debug.SourceLocation) !void {
        _ = w;
        _ = sl;
        return error.Omitted; // returning any error from this function causes the cursor to not be printed
    }
};

I needed getcontext for some more targets, so ended up jumping down another (relatively shallow?) rabbit-hole and rewriting all of the ucontext_t and ThreadContext stuff. See the commit message for details, but in short, we now only deal with a new type std.debug.cpu_context.Native in the main stack unwinding logic, and this type is based only on the architecture (though it can be overriden).

@Khitiara, note that the above should solve the issue with DWARF unwinding on freestanding, and in fact it means you don't even need to implement your own getcontext equivalent! I've not tried this yet, but I'm going to do so momentarily.


After @kristoff-it showed me a macOS stack trace, I spent quite a while trying to improve it. There were various bugs, but there was also an issue which frankly I don't think was our fault surrounding how the dyld shared cache behaves... I don't love the way I had to solve it, but it's fine, it works at least.


I've been doing a little work on the side to port the Zig fuzzer to macOS, and as a part of that, I've realised that much of std.debug.SelfInfo.DarwinModule can be extracted into a host-agnostic abstraction similar to std.debug.ElfFile. I'm not going to do that in this branch, but it's another follow-up task.

Another follow-up task is dealing with inline calls in stack traces. This branch improves them a little so that the printed symbol name is now the inline callee, i.e. the name of the function we're seeing a source snippet from; but we still have the issue that inline callers are not visible. Making them show up is a job for the stack trace printing logic: one address might correspond to multiple frames in the printed trace. Figuring this out will require some (I suspect non-trivial) work on std.debug.Dwarf, so I'm leaving it as a follow-up task.


oh come onnn, how did windows ci fail alreadyyyy (EDIT: it failed because i am stupid. don't look at that diff it's embarrassing)

@mlugg mlugg force-pushed the capture-stack branch 3 times, most recently from f024e28 to c045a0d Compare September 18, 2025 13:14
Copy link
Contributor

@rootbeer rootbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks even better now! Thanks for removing the getcontext dependency! I think you've now also fixed #23801 and #23494.

Feel free to ignore my comments if you get CI passing, nothing here than can't be done later (by other folks even).


const regNative = std.debug.SelfInfo.DwarfUnwindContext.regNative;

const ip_reg_num = std.debug.Dwarf.ipRegNum(native_arch).?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I've seen some comments around the Dwarf code about how it should be possible in the future to decode non-native Dwarf. If so, maybe these constants should be decoded closer to where they're used so they can become target-specific in the future? Vs. building in more native-host assumptions?

All that said, I'd much rather see these changes get merged as-is, and leave the cross-target debugging for future work...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this PR already makes most of std.debug.Dwarf not depend on the host. I think Dwarf.expression is the only remaining part of it with a target dependency.

I agree that the target dependency should probably be eliminated, but this file was already pretty deeply tied to stuff like native usize, so I just allowed myself to continue the pattern for now so I could get the branch ready ASAP. I'll open a follow-up issue after merge.

.linux => std.os.linux.ucontext_t,
.emscripten => std.os.emscripten.ucontext_t,
.freebsd => std.os.freebsd.ucontext_t,
.macos, .ios, .tvos, .watchos, .visionos => extern struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ucontext_t/siginfo_t structs seem like the should be defined over in std.os.*? Not something to hold up up this change for, so maybe just add a comment to that effect? (If you even agree its a good idea.)

Ah after reading farther through this diff... I see you've moved this from std.c. This location is better than std.c. One subtle implication, though, is that you're pulling in the C library sigset_t references, which might not be quite right? For example, on Linux, the kernel has a smaller signal state mask than glibc. I think the BSDs keep them consistent so in practice this is all probably fine as-is. But I think these definitions should eventually change to more clearly represent the OS ABI vs the C ABI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I haven't yet bothered to actually simplify these structures to the kernel ones, which are what we should be using here. I'll open a follow-up issue for that.

.s390x => @import("linux/s390x.zig"),
else => struct {
pub const ucontext_t = void;
pub const getcontext = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've fixed #23494

};
};

/// For stack trace tests, we only test native, because external executors are pretty unreliable at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience qemu has been pretty solid for stack traces in non-native tests. Are there particular architectures you've seen problems with? (I was running a variation on standalone/stack_iterator/unwind.zig.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worst offender is Wine. I thought I remembered having some issues with QEMU too, but I might be mistaken. Rosetta 2 also isn't the most reliable thing ever, though it does seem to be working okay on this branch (sans the last frame of all traces).

@mlugg mlugg enabled auto-merge September 19, 2025 00:24
@andrewrk
Copy link
Member

andrewrk commented Sep 19, 2025

x86_64-linux-release appears to be in an infinite loop. Attaching gdb and getting a stack trace, all threads are in a thread pool waiting for a task except main thread:

Thread 1 (LWP 3893551 "zig"):
#0  0x0000000001bcde26 in debug.Dwarf.Unwind.VirtualMachine.runTo (self=0x7ffc9bb857a8, gpa=..., pc=34007545, cie=..., fde=..., addr_size_bytes=8 '\b', endian=little)
#1  0x0000000001bd04a7 in debug.SelfInfo.DwarfUnwindContext.unwindFrameInner (context=0x7ffc9bb85708, gpa=..., unwind=0x7fe5ff653548, load_offset=0, explicit_fde_offset=...)
#2  0x0000000001bd1c32 in debug.SelfInfo.DwarfUnwindContext.unwindFrame (context=0x7ffc9bb85708, gpa=..., unwind=0x7fe5ff653548, load_offset=0, explicit_fde_offset=...)
#3  0x0000000001b83fcf in debug.SelfInfo.ElfModule.unwindFrame (module=0x7ffc9bb84d18, gpa=..., di=0x7fe5ff6532c0, context=0x7ffc9bb85708)
#4  0x0000000001b46fd5 in debug.SelfInfo.unwindFrame (self=0x6e92d00 <debug.getSelfDebugInfo.S.self_info>, gpa=..., context=0x7ffc9bb85708)
#5  0x0000000001b09422 in debug.StackIterator.next (it=0x7ffc9bb85708)
#6  0x0000000001ef3216 in debug.captureCurrentStackTrace (options=..., addr_buf=...)
#7  0x000000000207064d in heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 4, .enable_memory_limit = false, .safety = true, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).collectStackTrace (first_trace_addr=43583353, addr_buf=0x7fe3c5f1dcc0)
#8  0x0000000001eefcd1 in heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 4, .enable_memory_limit = false, .safety = true, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).BucketHeader.captureStackTrace (bucket=0x7fe3c5f1c498, ret_addr=43583353, slot_count=226, slot_index=85, trace_kind=alloc)
#9  0x0000000001dca77e in heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 4, .enable_memory_limit = false, .safety = true, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).alloc (context=0x6e94330 <main.debug_allocator>, len=304, alignment=4, ret_addr=43583353)
#10 0x0000000001ba8d7f in mem.Allocator.allocBytesWithAlignment__anon_33050 (self=..., alignment=4, byte_count=304, return_address=43583353)
#11 0x000000000206e9fa in mem.Allocator.allocWithSizeAndAlignment__anon_184046 (self=..., size=4, alignment=4, n=76, return_address=43583353)
#12 0x0000000002fc3eae in mem.Allocator.alloc__anon_304717 (self=..., T=<optimized out>, n=76)
#13 0x0000000002990779 in Sema.InstMap.ensureSpaceForInstructions (map=0x7ffc9bb89448, allocator=..., insts=...)
#14 0x0000000004e0f087 in Sema.analyzeCall (sema=0x7ffc9bbdb1c8, block=0x7ffc9bba4a08, callee=96401, func_ty=..., func_src=..., call_src=..., modifier=auto, ensure_result_used=false, args_info=..., call_dbg_node=..., operation=call)
#15 0x00000000045b0671 in Sema.zirCall__anon_649700 (sema=0x7ffc9bbdb1c8, block=0x7ffc9bba4a08, inst=75161, kind=direct)
#16 0x0000000003ad9bb3 in Sema.analyzeBodyInner (sema=0x7ffc9bbdb1c8, block=0x7ffc9bba4a08, body=...)
#17 0x0000000003b1fa9e in Sema.analyzeInlineBody (sema=0x7ffc9bbdb1c8, block=0x7ffc9bba4a08, body=..., break_target=75173)
#18 0x0000000002fbb3a3 in Sema.resolveInlineBody (sema=0x7ffc9bbdb1c8, block=0x7ffc9bba4a08, body=..., break_target=75173)
#19 0x0000000004e12d4d in Sema.analyzeCall (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, callee=525951, func_ty=..., func_src=..., call_src=..., modifier=auto, ensure_result_used=false, args_info=..., call_dbg_node=..., operation=call)
#20 0x00000000045b0671 in Sema.zirCall__anon_649700 (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, inst=209, kind=direct)
#21 0x0000000003ad9bb3 in Sema.analyzeBodyInner (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, body=...)
#22 0x0000000003b1fa9e in Sema.analyzeInlineBody (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, body=..., break_target=205)
#23 0x0000000002fbb3a3 in Sema.resolveInlineBody (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, body=..., break_target=205)
#24 0x000000000592acb3 in Sema.CallArgsInfo.analyzeArg (cai=..., sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, arg_index=1, maybe_param_ty=..., func_ty_info=..., func_inst=96409, maybe_func_src_inst=...)
#25 0x0000000004e1123d in Sema.analyzeCall (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, callee=96409, func_ty=..., func_src=..., call_src=..., modifier=auto, ensure_result_used=false, args_info=..., call_dbg_node=..., operation=call)
#26 0x00000000045b0671 in Sema.zirCall__anon_649700 (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, inst=205, kind=direct)
#27 0x0000000003ad9bb3 in Sema.analyzeBodyInner (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, body=...)
#28 0x0000000002fb1cb2 in Sema.analyzeFnBody (sema=0x7ffc9bbdb1c8, block=0x7ffc9bbdbba0, body=...)
#29 0x0000000002fb6b7c in Zcu.PerThread.analyzeFnBodyInner (pt=..., func_index=540778)
#30 0x0000000002987a17 in Zcu.PerThread.analyzeFuncBody (pt=..., func_index=540778)
#31 0x0000000002397eda in Zcu.PerThread.ensureFuncBodyUpToDate (pt=..., func_index=540778)
#32 0x0000000004741f77 in Sema.resolveInferredErrorSet (sema=0x7ffc9bbf0528, block=0x7ffc9bbf0f00, src=..., ies_index=540780)
#33 0x000000000470e36d in Sema.analyzeIsNonErrComptimeOnly (sema=0x7ffc9bbf0528, block=0x7ffc9bbf0f00, src=..., operand=2147483668)
#34 0x000000000470af8e in Sema.zirTry (sema=0x7ffc9bbf0528, parent_block=0x7ffc9bbf0f00, inst=390)
#35 0x0000000003aed138 in Sema.analyzeBodyInner (sema=0x7ffc9bbf0528, block=0x7ffc9bbf0f00, body=...)
#36 0x0000000002fb1cb2 in Sema.analyzeFnBody (sema=0x7ffc9bbf0528, block=0x7ffc9bbf0f00, body=...)
#37 0x0000000002fb6b7c in Zcu.PerThread.analyzeFnBodyInner (pt=..., func_index=540774)
#38 0x0000000002987a17 in Zcu.PerThread.analyzeFuncBody (pt=..., func_index=540774)
#39 0x0000000002397eda in Zcu.PerThread.ensureFuncBodyUpToDate (pt=..., func_index=540774)
#40 0x0000000004741f77 in Sema.resolveInferredErrorSet (sema=0x7ffc9bc05888, block=0x7ffc9bc06260, src=..., ies_index=540776)
#41 0x000000000470e36d in Sema.analyzeIsNonErrComptimeOnly (sema=0x7ffc9bc05888, block=0x7ffc9bc06260, src=..., operand=2147492092)
#42 0x000000000470af8e in Sema.zirTry (sema=0x7ffc9bc05888, parent_block=0x7ffc9bc06260, inst=18472)
#43 0x0000000003aed138 in Sema.analyzeBodyInner (sema=0x7ffc9bc05888, block=0x7ffc9bc06260, body=...)
#44 0x0000000002fb1cb2 in Sema.analyzeFnBody (sema=0x7ffc9bc05888, block=0x7ffc9bc06260, body=...)
#45 0x0000000002fb6b7c in Zcu.PerThread.analyzeFnBodyInner (pt=..., func_index=530450)
#46 0x0000000002987a17 in Zcu.PerThread.analyzeFuncBody (pt=..., func_index=530450)
#47 0x0000000002397eda in Zcu.PerThread.ensureFuncBodyUpToDate (pt=..., func_index=530450)
#48 0x0000000004741f77 in Sema.resolveInferredErrorSet (sema=0x7ffc9bc1abe8, block=0x7ffc9bc1b5c0, src=..., ies_index=530452)
#49 0x000000000470e36d in Sema.analyzeIsNonErrComptimeOnly (sema=0x7ffc9bc1abe8, block=0x7ffc9bc1b5c0, src=..., operand=2147483669)
#50 0x000000000470af8e in Sema.zirTry (sema=0x7ffc9bc1abe8, parent_block=0x7ffc9bc1b5c0, inst=75205)
#51 0x0000000003aed138 in Sema.analyzeBodyInner (sema=0x7ffc9bc1abe8, block=0x7ffc9bc1b5c0, body=...)
#52 0x0000000002fb1cb2 in Sema.analyzeFnBody (sema=0x7ffc9bc1abe8, block=0x7ffc9bc1b5c0, body=...)
#53 0x0000000002fb6b7c in Zcu.PerThread.analyzeFnBodyInner (pt=..., func_index=525950)
#54 0x0000000002987a17 in Zcu.PerThread.analyzeFuncBody (pt=..., func_index=525950)
#55 0x0000000002397eda in Zcu.PerThread.ensureFuncBodyUpToDate (pt=..., func_index=525950)
#56 0x000000000216a08a in Compilation.processOneJob (tid=0, comp=0x7fe50f020010, job=...)
#57 0x0000000001fa26cd in Compilation.performAllTheWork (comp=0x7fe50f020010, main_progress_node=...)
#58 0x0000000001e7ecaa in Compilation.update (comp=0x7fe50f020010, main_progress_node=...)
#59 0x0000000001e94131 in main.updateModule (comp=0x7fe50f020010, color=auto, prog_node=...)
#60 0x0000000001d65020 in main.buildOutputType (gpa=..., arena=..., all_args=..., arg_mode=...)
#61 0x0000000001dc7614 in main.mainArgs (gpa=..., arena=..., args=...)
#62 0x0000000001cde28b in main.main ()

This reminds me, it would be really nice if unit test timeouts (#19821) would handle a timeout by sending a signal to the child process. This gives a chance for it to dump stack trace, which would be quite handy to find out when something is infinite looping.

I detached the debugger, allowing the CI job to run all the way to timeout.

@mlugg
Copy link
Member Author

mlugg commented Sep 19, 2025

Huh, fascinating---thanks for grabbing that trace @andrewrk. I'll see if I can figure anything out today.

mlugg and others added 28 commits September 27, 2025 11:31
The input path could be cwd-relative, in which case it must be modified
before it is written into the batch script.

Also, remove usage of deprecated `GeneralPurposeAllocator` alias, rename
`allocator` to `gpa`, use unmanaged `ArrayList`.
This only matters if `callMain` is called by a user, since `std.start`
will never itself call `callMain` when `target.os.tag == .other`.
However, it *is* a valid use case for a user to call
`std.start.callMain` in their own startup logic, so this makes sense.
Our usage of `ucontext_t` in the standard library was kind of
problematic. We unnecessarily mimiced libc-specific structures, and our
`getcontext` implementation was overkill for our use case of stack
tracing.

This commit introduces a new namespace, `std.debug.cpu_context`, which
contains "context" types for various architectures (currently x86,
x86_64, ARM, and AARCH64) containing the general-purpose CPU registers;
the ones needed in practice for stack unwinding. Each implementation has
a function `current` which populates the structure using inline
assembly. The structure is user-overrideable, though that should only be
necessary if the standard library does not have an implementation for
the *architecture*: that is to say, none of this is OS-dependent.

Of course, in POSIX signal handlers, we get a `ucontext_t` from the
kernel. The function `std.debug.cpu_context.fromPosixSignalContext`
converts this to a `std.debug.cpu_context.Native` with a big ol' target
switch.

This functionality is not exposed from `std.c` or `std.posix`, and
neither are `ucontext_t`, `mcontext_t`, or `getcontext`. The rationale
is that these types and functions do not conform to a specific ABI, and
in fact tend to get updated over time based on CPU features and
extensions; in addition, different libcs use different structures which
are "partially compatible" with the kernel structure. Overall, it's a
mess, but all we need is the kernel context, so we can just define a
kernel-compatible structure as long as we don't claim C compatibility by
putting it in `std.c` or `std.posix`.

This change resulted in a few nice `std.debug` simplifications, but
nothing too noteworthy. However, the main benefit of this change is that
DWARF unwinding---sometimes necessary for collecting stack traces
reliably---now requires far less target-specific integration.

Also fix a bug I noticed in `PageAllocator` (I found this due to a bug
in my distro's QEMU distribution; thanks, broken QEMU patch!) and I
think a couple of minor bugs in `std.debug`.

Resolves: ziglang#23801
Resolves: ziglang#23802
Mostly on macOS, since Loris showed me a not-great stack trace, and I
spent 8 hours trying to make it better. The dyld shared cache is
designed in a way which makes this really hard to do right, and
documentation is non-existent, but this *seems* to work pretty well.
I'll leave the ruling on whether I did a good job to CI and our users.
This option disables both capturing and printing stack traces. The
default is to disable if debug info is stripped.
This crash exists on master, and seems to have existed since 2019; I
think it's just very rare and depends on the exact binary generated. In
theory, a stream block should always be a "data" block rather than a FPM
block; the FPMs use blocks `1, 4097, 8193, ...` and `2, 4097, 8194, ...`
respectively. However, I have observed LLVM emitting an otherwise valid
PDB which maps FPM blocks into streams. This is not a bug in
`std.debug.Pdb`, because `llvm-pdbutil` agrees with our stream indices.
I think this is arguably an LLVM bug; however, we don't really lose
anything from just weakening this check. To be fair, MSF doesn't have an
explicit specification, and LLVM's documentation (which is the closest
thing we have) does not explicitly state that FPM blocks cannot be
mapped into streams, so perhaps this is actually valid.

In the rare case that LLVM emits this, previously, stack traces would
have been completely useless; now, stack traces will work okay.
...and just deal with signal handlers by adding 1 to create a fake
"return address". The system I tried out where the addresses returned by
`StackIterator` were pre-subtracted didn't play nicely with error
traces, which in hindsight, makes perfect sense. This definition also
removes some ugly off-by-one issues in matching `first_address`, so I do
think this is a better approach.
Processes should reasonably be able to expect their children to abort
with typical exit codes, rather than a debugger breakpoint signal. This
flag in the PEB is what would be checked by `IsDebuggerPresent` in
kernel32, which is the function you would typically use for this
purpose.

This fixes `test-stack-trace` failures on Windows, as these tests were
expecting exit code 3 to indicate abort.
Calling `current` here causes compilation failures as the C backend
currently does not emit valid MSVC inline assembly. This change means
that when building for MSVC with the self-hosted C backend, only FP
unwinding can be used.
This has been a TODO for ages, but in the past it didn't really matter
because stack traces are typically printed to stderr for which a mutex
is held so in practice there was a mutex guarding usage of `SelfInfo`.

However, now that `SelfInfo` is also used for simply capturing traces,
thread safety is needed. Instead of just a single mutex, though, there
are a couple of different mutexes involved; this helps make critical
sections smaller, particularly when unwinding the stack as `unwindFrame`
doesn't typically need to hold any lock at all.
…cified

This logic was causing some occasional infinite looping on ARM, where
the `.debug_frame` section is often incomplete since the `.exidx`
section is used for unwind information. But the information we're
getting from the compiler is totally *valid*: it's leaving the rule as
the default, which is (as with most architectures) equivalent to
`.undefined`!
...just in case there is broken debug info and/or bad values on the
stack, either of which could cause stack unwinding to potentially loop
forever.
We need to parse the `.ARM.exidx` section to be able to reliably unwind
the stack on ARM.
This was causing a zig2 miscomp, which emitted slightly broken debug
information, which caused extremely slow stack unwinding. We're working
on fixing or reporting this upstream, but we can use this workaround for
now, because GCC guarantees arithmetic signed shift.
By my estimation, these changes speed up DWARF unwinding when using the
self-hosted x86_64 backend by around 7x. There are two very significant
enhancements: we no longer iterate frames which don't fit in the stack
trace buffer, and we cache register rules (in a fixed buffer) to avoid
re-parsing and evaluating CFI instructions in most cases. Alongside this
are a bunch of smaller enhancements, such as pre-caching the result of
evaluating the CIE's initial instructions, avoiding re-parsing of CIEs,
and big simplifications to the `Dwarf.Unwind.VirtualMachine` logic.
Apparently the `__eh_frame` in Mach-O binaries doesn't include the
terminator entry, but in all other respects it acts like `.eh_frame`
rather than `.debug_frame`. I have no idea.
@mlugg
Copy link
Member Author

mlugg commented Sep 27, 2025

we were so close to greatness, but there was a two character typo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. ci-riscv64-linux Pull requests with this label applied will have riscv64-linux jobs scheduled. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stack trace SEGV when linking libc and using debug_allocator within a thread
9 participants