Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions sycl/source/detail/device_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,19 @@ namespace detail {
/// Constructs a SYCL device instance using the provided
/// UR device instance.
device_impl::device_impl(ur_device_handle_t Device, platform_impl &Platform,
device_impl::private_tag)
device_impl::private_tag, bool IsSubDevice)
: MDevice(Device), MPlatform(Platform),
// No need to set MRootDevice when MAlwaysRootDevice is true
MRootDevice(Platform.MAlwaysRootDevice
? nullptr
: get_info_impl<UR_DEVICE_INFO_PARENT_DEVICE>()),
// TODO catch an exception and put it to list of asynchronous exceptions:
MCache{*this} {
// Interoperability Constructor already calls DeviceRetain in
// urDeviceCreateWithNativeHandle.
getAdapter().call<UrApiKind::urDeviceRetain>(MDevice);
if (!IsSubDevice) {
// Interoperability Constructor already calls DeviceRetain in
// urDeviceCreateWithNativeHandle.
getAdapter().call<UrApiKind::urDeviceRetain>(MDevice);
}
Comment on lines +33 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destructor device_impl::~device_impl calls urDeviceRelease, which is meant to balance the urDeviceRetain in the constructor. However, after this change, urDeviceRetain is no longer called for subdevices in the constructor, while urDeviceRelease is still invoked for them in the destructor. This imbalance is somewhat concerning. It seems that the actual root cause of the memory leak may lie elsewhere. Could you please clarify what you believe the underlying issue is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that when a device is created RefCount is equal 1 without any call to retain(), but it is destroyed when RefCount is equal 0, so there should be one more call to release() than a number of calls to retain() ... or the implementation of class RefCount is wrong ?

class RefCount {
public:
  RefCount(uint32_t count = 1) : Count(count) {}
  ...
  uint32_t retain() { return ++Count; }
  bool release() { return --Count == 0; }
  void reset(uint32_t value = 1) { Count = value; }

private:
  std::atomic_uint32_t Count;
};

...

ur_result_t urDeviceRelease(ur_device_handle_t Device) {
  // Root devices are destroyed during the piTearDown process.
  if (Device->isSubDevice()) {
    if (Device->RefCount.release()) {
      delete Device;
    }
  }

  return UR_RESULT_SUCCESS;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@againull againull Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldorau @pbalcer In such case, why we differentiate between root device and sub-device in this case? can we just remove retain from the constructor completely? (only sub-devices being ref counted seems like implementation detail of the adapter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"(only sub-devices being ref counted seems like implementation detail of the adapter)"

That's the reason. I pointed it out because it's possible that the existing sequence of creates/retains/releases was always broken, and we never noticed because the leak happens in the rare case of subdevices. So in a way, I'm agreeing with your earlier comment.

}

device_impl::~device_impl() {
Expand Down Expand Up @@ -161,12 +163,13 @@ std::vector<device> device_impl::create_sub_devices(
// times with the same arguments?
//
std::vector<device> res;
std::for_each(SubDevices.begin(), SubDevices.end(),
[&res, this](const ur_device_handle_t &a_ur_device) {
device sycl_device = detail::createSyclObjFromImpl<device>(
MPlatform.getOrMakeDeviceImpl(a_ur_device));
res.push_back(sycl_device);
});
std::for_each(
SubDevices.begin(), SubDevices.end(),
[&res, this](const ur_device_handle_t &a_ur_device) {
device sycl_device = detail::createSyclObjFromImpl<device>(
MPlatform.getOrMakeDeviceImpl(a_ur_device, true /*IsSubDevice*/));
res.push_back(sycl_device);
});
return res;
}

Expand Down
2 changes: 1 addition & 1 deletion sycl/source/detail/device_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,7 @@ class device_impl : public std::enable_shared_from_this<device_impl> {
// Must be called through `platform_impl::getOrMakeDeviceImpl` only.
// `private_tag` ensures that is true.
explicit device_impl(ur_device_handle_t Device, platform_impl &Platform,
private_tag);
private_tag, bool IsSubDevice);

~device_impl();

Expand Down
5 changes: 3 additions & 2 deletions sycl/source/detail/platform_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -298,15 +298,16 @@ device_impl *platform_impl::getDeviceImpl(ur_device_handle_t UrDevice) {
return getDeviceImplHelper(UrDevice);
}

device_impl &platform_impl::getOrMakeDeviceImpl(ur_device_handle_t UrDevice) {
device_impl &platform_impl::getOrMakeDeviceImpl(ur_device_handle_t UrDevice,
bool IsSubDevice) {
const std::lock_guard<std::mutex> Guard(MDeviceMapMutex);
// If we've already seen this device, return the impl
if (device_impl *Result = getDeviceImplHelper(UrDevice))
return *Result;

// Otherwise make the impl
MDevices.emplace_back(std::make_shared<device_impl>(
UrDevice, *this, device_impl::private_tag{}));
UrDevice, *this, device_impl::private_tag{}, IsSubDevice));

return *MDevices.back();
}
Expand Down
3 changes: 2 additions & 1 deletion sycl/source/detail/platform_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,8 @@ class platform_impl : public std::enable_shared_from_this<platform_impl> {
/// \param PlatormImpl is the Platform for that Device
///
/// \return a device_impl* corresponding to the device
device_impl &getOrMakeDeviceImpl(ur_device_handle_t UrDevice);
device_impl &getOrMakeDeviceImpl(ur_device_handle_t UrDevice,
bool IsSubDevice = false);

/// Queries the cache to see if the specified UR platform has been seen
/// before. If so, return the cached platform_impl, otherwise create a new
Expand Down