Skip to content

Conversation

@zavitax
Copy link
Contributor

@zavitax zavitax commented Oct 13, 2025

Description

This PR modifies the canvases vector to be
represented as a map of obs_canvas_t* to a
shared pointer to OBS::Canvas.

This will preserve both the OBS::Canvas manager
instance, and pointer to obs_canvas_t* for
reference by obs_frontend_canvas_remove(),
ensuring that the right objects will be used
when called upon.

Motivation and Context

When the obs_frontend_remove_canvas is called,
the destructor of OBS::Canvas calls
obs_canvas_release() on an object which is
both allocated on the heap and added as reference
to the canvases vector causing a double deallocation.

This causes the canvases collection on the main
window to become corrupt, and crash the next time
you try to add another frontend canvas.

This effectively prevents the canvases collection
from being managed properly by plug-ins.

How Has This Been Tested?

Ran obs_frontend_add_canvas() followed by obs_frontend_remove_canvas() followed by obs_frontend_add_canvas() before and after the change.

Before change: crash
After change: smooth operation

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@zavitax zavitax force-pushed the frontend_canvas_remove_fix branch 3 times, most recently from e1734c5 to f460a38 Compare October 13, 2025 13:31
@Warchamp7 Warchamp7 requested review from PatTheMav and derrod October 13, 2025 14:29
@Warchamp7 Warchamp7 added the Bug Fix Non-breaking change which fixes an issue label Oct 13, 2025
The `obs_frontend_remove_canvas` is called, the
destructor of OBS::Canvas calls
obs_canvas_release() on an object which is
both allocated on the heap and used as reference.

This causes the `canvases` collection on the main
window to become corrupt, and crash the next time
you try to add another frontend canvas.

This PR modifies the canvases vector to be
represented as a map of obs_canvas_t* to a
shared pointer to OBS::Canvas.

This will preserve both the OBS::Canvas manager
instance, and pointer to obs_canvas_t* for
reference by obs_frontend_canvas_remove(),
ensuring that the right objects will be used
when called upon.
@zavitax zavitax force-pushed the frontend_canvas_remove_fix branch from f460a38 to fdb9420 Compare October 14, 2025 09:13
@PatTheMav
Copy link
Member

Could you please provide a minimal reproducer for us (e.g. a small plugin that does nothing else but cause the issue)?

At first glance I don't think that entangling two different reference counting systems (canvases themselves and std::shared_ptr) is necessarily the right approach.


So the core issue (if I understand this right), is that when obs_frontend_remove_canvas is called, the canvas is removed from its owning std::vector and thus Canvas::~Canvas is invoked.

The destructor itself marks the canvas object as removed and reduces its strong reference count, which seems correct in concept, however I don't see the reference count explicitly being incremented when the Canvas instance is created (we just type-pun an obs_canvas_t pointer into a Canvas which is also unnecessarily hard to read/understand).

@derrod Is this possibly just an unbalanced retain/release pair?

@derrod
Copy link
Member

derrod commented Oct 23, 2025

@derrod Is this possibly just an unbalanced retain/release pair?

I think it is, but in a different way than you might expect.

The issue, I think, is the signal handling: canvases.erase(canvas_it); in OBSBasic::RemoveCanvas() will immediately call Canvas::~Canvas() which will in turn call obs_canvas_remove() and cause the OBSBasic::CanvasRemoved() signal handler to recursively invoke OBSBasic::RemoveCanvas() which will attempt to erase the same element again, which is probably all kinds of UB.

I think there are at least three different ways we can deal with that, but I don't particularly like any of them right now:

  1. Change to a queued connection to delay the call to OBSBasic::RemoveCanvas() until the previous one has finished, which will avoid the recursion and find the vector in a state where the object has already been removed:
	QMetaObject::invokeMethod(static_cast<OBSBasic *>(data), "RemoveCanvas", Qt::QueuedConnection,
				  Q_ARG(OBSCanvas, OBSCanvas(canvas)));

But that feels a bit brittle.

  1. Change the Canvas::~Canvas() destructor's order of operations
	obs_canvas_t *removed = canvas;
	canvas = nullptr;
	obs_canvas_remove(removed);
	obs_canvas_release(removed);

By setting canvas to nullptr in before calling obs_canvas_remove()/obs_canvas_release() any iteration over frontend canvases that may be happening in signal handlers will no longer find the object by the obs_canvas_t * it used to hold.

This seems like perhaps the safest option?

  1. Move to a temporary object before erasing

We can add one line to OBSBasic::RemoveCanvas() like so:

		OBS::Canvas tmp = std::move(*canvas_it);

This will ensure that the destructor called during .erase() has canvas == nullptr and will not fire the signal handlers, and the signal handlers will then only be called once the temporary object goes out of scope and gets destroyed. This however, also feels a bit ugly and hard to comprehend.

@PatTheMav
Copy link
Member

PatTheMav commented Oct 24, 2025

Ok, so I get this right:

  • OBSStudioAPI::obs_frontend_remove_canvas calls OBSBasic::RemoveCanvas
  • OBSBasic::RemoveCanvas effectively triggers Canvas::~Canvas
  • Canvas::~Canvas calls obs_canvas_remove
  • obs_canvas_remove signals "canvas_remove"
  • OBSBasic has subscribed to "canvas_remove" and calls OBSBasic::CanvasRemoved
  • OBSBasic::CanvasRemoved invokes OBSBasic::RemoveCanvas

First naive question is why the frontend listens to canvas removal events if it was itself the cause of the removal - if it weren't for the double destruction this would have potentially led to an endless loop, so it's obviously an architectural issue and the suggested changes just paper over the core issue.


The current design suggests that libobs is the owner of canvases, its collection is the "source of truth". And that's why the signal exists, because the collection can be mutated without any participation of the frontend, so it needs to be informed of those changes.

So in that sense it seems wrong to me that OBSStudioAPI::obs_frontend_remove_canvas calls OBSBasic::RemoveCanvas. It should call obs_canvas_remove instead, which triggers the removal at the "source of truth" and which is then bubbled up into the hierarchy:

  • obs_canvas_remove signals "canvas_remove"
  • OBSBasic has subscribed to "canvas_remove" and calls OBSBasic::CanvasRemoved
  • OBSBasic::CanvasRemoved invokes OBSBasic::RemoveCanvas
  • OBSBasic::RemoveCanvas effectively triggers Canvas::~Canvas
  • Canvas::~Canvas cleans up after itself and just releases its strong reference to the underlying canvas

That sounds like the more correct fix to me, because it triggers the canvas removal at the "source of truth".


What this also suggests to me is that having this implemented in the "frontend API" is wrong - canvases are not a frontend concern, but (as implemented) a libobs concern. Otherwise the frontend would be the owner of the collection and would not need to emit a libobs signal (which could then ouroboros back into frontend code).

@zavitax
Copy link
Contributor Author

zavitax commented Oct 24, 2025

To be completely honest, it feels to me that adding a flag which would mark an obs_canvas as a FRONTEND canvas would be the more correct solution overall:

  • It would eliminate the need for keeping a separate canvases collection in the frontend
  • Would also enable one to expose an EPHEMERAL canvas to the OBS frontend
  • Would not force having to store canvases in the scene collection to be available to the frontend (a decision I would also probably challenge)
  • The missing part for the point above is the ability to specify canvas UUIDs on creation

The solution I suggested assumes that you would want to retain as much of the existing code as possible, that more in-depth refactoring of OBS::Canvas would not be welcome, and that clarity and comprehensibility is preferred over (non-critical) optimization in this particular code path.

I was pretty challenging to wrap my head around what was going on in obs_frontend_add_canvas and why obs_frontend_remove_canvas causes corruption, which is not very healthy in a code base of any size, especially in a large one like OBS.

I usually stick to the approach "clarity first, optimization later", which works for most code out of the hot path.

@PatTheMav
Copy link
Member

The solution I suggested assumes that you would want to retain as much of the existing code as possible

While I appreciate that notion, that is precisely what we want much less of in the code base - if something is broken at a core level, it needs to be fixed/refactored at that core level. OBS has too many band-aids over core issues that rear their ugly head time and time again, requiring even more piles of workarounds to make it do what it's supposed to do.

And particularly for such a "new" feature it is important that we nip those issues in the bud early and comprehensively, particularly if it exposes an underlying design issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Fix Non-breaking change which fixes an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants