Skip to content

Conversation

Millefeuille42
Copy link

No description provided.

MarkSymsCtx and others added 30 commits March 20, 2025 11:23
… After Disabling and Enabling Multipath

Signed-off-by: Lunfan Zhang <[email protected]>
This was a patch added to the sm RPM git repo before we had this
forked git repo for sm in the xcp-ng github organisation.
Originally-by: Ronan Abhamon <[email protected]>

This version obtained through merge in
ff1bf65:

 git restore -SW -s ydi/forks/2.30.7/xfs drivers/EXTSR.py
 mv drivers/EXTSR.py drivers/XFSSR.py
 git restore -SW drivers/EXTSR.py

Signed-off-by: Yann Dirson <[email protected]>
Some important points:

- linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix).

- Encrypted VDI are supported with key_hash attribute (not tested, experimental).

- When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless
devices are not necessarily created on other hosts. So if a resource definition exists without
local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume
is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the
volume if the symlink has not yet been created by the udev rule.

- Can change the provisioning using the device config 'provisioning' param.

- We can only increase volume size (See: LINBIT/linstor-server#66),
it would be great if we could shrink volumes to limit the space used by the snapshots.

- Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present
to do this from slaves. The same plugin is used to open LINSTOR ports + start controller.

- Use a `total_allocated_volume_size` method to have a good idea of the reserved memory
Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok,
but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method
solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the
volume data.

- Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info,
the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that
contains the DRBD data or a remote disk if the DRBD device is diskless.

- If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no
errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when
VDIs are loaded.

- Refresh PhyLink when unpause in called on DRBD devices:
We must always recreate the symlink to ensure we have
the right info. Why? Because if the volume UUID is changed in
LINSTOR the symlink is not directly updated. When live leaf
coalesce is executed we have these steps:
"A" -> "OLD_A"
"B" -> "A"
Without symlink update the previous "A" path is reused instead of
"B" path. Note: "A", "B" and "OLD_A" are UUIDs.

- Since linstor python modules are not present on every XCP-ng host,
module imports are protected by try.. except... blocks.

- Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages
- Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Co-authored-by: Piotr Robert Konopelko <[email protected]>
Signed-off-by: Aleksander Wieliczko <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir
used during SR creation is unmounted at the end of the `create` call (and also
when a PBD is unplugged) in `detach` block.

Signed-off-by: Ronan Abhamon <[email protected]>
A sm-config boolean param `subdir` is available to configure where to store the VHDs:
- In a subdir with the SR UUID, the new behavior
- In the root directory of the MooseFS SR

By default, new SRs are created with `subdir` = True.
Existing SRs  are not modified and continue to use the folder that was given at
SR creation, directly, without looking for a subdirectory.

Signed-off-by: Ronan Abhamon <[email protected]>
Ensure all shared drivers are imported in `_is_open` definition to register
them in the driver list. Otherwise this function always fails with a SRUnknownType exception.

Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`.
Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of
the `serverpath` param because its value is equal to None.)

`location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`.

Signed-off-by: Ronan Abhamon <[email protected]>
SR_CACHING offers the capacity to use IntelliCache, but this
feature is only available using NFS SR.

For more details, the implementation of `_setup_cache` in blktap2.py
uses only an instance of NFSFileVDI for the shared target.

Signed-off-by: Ronan Abhamon <[email protected]>
* `except` syntax fixes
* drop `has_key()` usage
* drop `filter()` usage (but drop their silly `list(x.keys())` wrappings)
* drop `map()` usage
* use `int` not `long`
* use `items()` not `iteritems()`

Signed-off-by: Yann Dirson <[email protected]>
Guided by futurize's "old_div" use

Signed-off-by: Yann Dirson <[email protected]>
PROBE_MOUNTPOINT in a some drivers is a relative path, which is resolved
using MOUNT_BASE at probe time, but CephFS, GlusterFS and MooseFS it is
set on driver load to an absolute path, and this requires MOUNT_BASE to be
looking like a path component.

```
drivers/CephFSSR.py:69: in <module>
    PROBE_MOUNTPOINT = os.path.join(SR.MOUNT_BASE, "probe")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = <MagicMock name='mock.MOUNT_BASE' id='140396863897728'>, p = ('probe',)

    def join(a, *p):
        """Join two or more pathname components, inserting '/' as needed.
        If any component is an absolute path, all previous path components
        will be discarded.  An empty last part will result in a path that
        ends with a separator."""
>       a = os.fspath(a)
E       TypeError: expected str, bytes or os.PathLike object, not MagicMock

/usr/lib64/python3.6/posixpath.py:80: TypeError
```

Note this same idiom is also used in upstream SMBFS, although that does not
appear to cause any problem with the tests.

Signed-off-by: Yann Dirson <[email protected]>
(coverage 7.2.5)

Without these changes many warns/errors are emitted:
- "assertEquals" is deprecated, "assertEqual" must be used instead
- mocked objects in "setUp" method like "cleanup.IPCFlag" cannot be repatched
  at the level of the test functions, otherwise tests are aborted,
  this is the  behavior of coverage version 7.2.5

Signed-off-by: Ronan Abhamon <[email protected]>
The probe method is not implemented so we
shouldn't advertise it.

Signed-off-by: BenjiReis <[email protected]>
Impacted drivers: LINSTOR, MooseFS and ZFS.
- Ignore all linstor.* members during coverage,
  the module is not installed in github runner.
- Use mock from unittest, the old one is not found now.
- Remove useless return from LinstorSR scan method.

Signed-off-by: Ronan Abhamon <[email protected]>
A SR inheriting from a EXTSR allowing to use a 4KiB blocksize device as
SR.
Create a 512 bytes loop device on top of a 4KiB device then give it to
EXTSR code.
It uses the same device-config as a normal local SR, i.e.
`device-config:device=/dev/nvme0n1`

After creation, the driver find the device under the VG to identify the
correct disk.
It means that creating the SR with a non-stable disk identifier is
doable and it will work as EXTSR would by ignoring the device-config
after creation.
Identifying the correct disk by using LVM infos.

The VG is created using a different prefix name from EXTSR.
It is `XSLocalLargeBlock-<SR UUID>`.

The SR artificially limits the creation to disk not being 512b.
It will throw an error if a disk whose blocksize is 512 is given.

We currently don't support multi devices, it fails at the EXTSR creation.
We added an error to explicitly say that multi devices SR is not supported
on the driver.
Before that, it would make another error:
```
Error code: SR_BACKEND_FAILURE_77
Error parameters: , Logical Volume group creation failed,
```
Sometimes the pvremove from EXTSR using the loop device fails.
In this case, we need to remove the real device from PV list ourself in
the error handling.

Signed-off-by: Damien Thenot <[email protected]>
With this change the driver supports a "lvm-conf" param on "other-config".
For now The configuration is only used by "remove" calls from LVMCache.

Example to issue discards after a lvremove command:

> xe sr-param-set uuid=<SR_UUID> other-config:lvm-conf=issue_discards=1

And to remove the param:

> xe sr-param-remove uuid=<SR_UUID> param-name=other-config param-key=lvm-conf

Signed-off-by: Ronan Abhamon <[email protected]>
Last commit: 9207abe
"fix(linstor): check if resource is tiebreaker (#62)"

Signed-off-by: Ronan Abhamon <[email protected]>
Wescoeur and others added 16 commits April 14, 2025 13:53
Signed-off-by: Ronan Abhamon <[email protected]>
Co-authored-by: Damien Thenot <[email protected]>
Co-authored-by: Ronan Abhamon <[email protected]>
Signed-off-by: Damien Thenot <[email protected]>
Also use return type annotations on these methods.

Signed-off-by: Ronan Abhamon <[email protected]>
Due to mypy modifications, we can't build the sm RPM in Koji
without a recent pylint version. So the precheck target is only
executed in a github workflow now.

Signed-off-by: Ronan Abhamon <[email protected]>
Avoid:
```
drivers/LVHDSR.py:195: error: Item "None" of "Any | None" has no attribute "get"  [union-attr]
drivers/LVHDSR.py:196: error: Value of type "Any | None" is not indexable  [index]
```

Signed-off-by: Ronan Abhamon <[email protected]>
During `LinstorSR` init, only create the journaler to make `should_preempt` happy.
The volume manager MUST always be created in a SR lock context. Otherwise,
we can trigger major issues.

For example, a volume can be deleted from the KV-store by `cleanup.py` during a
snapshot rollback. Very rare situation but which allowed this problem to be discovered.

Signed-off-by: Ronan Abhamon <[email protected]>
Until now the cleanup VHD resize commands were performed on the master.
But it doesn't work every time when a VHD of a chain is opened for reading on another host.

As a reminder, this portion of code is only executed rarely.
A user must have resized a VHD that must later be coalesced.

Signed-off-by: Ronan Abhamon <[email protected]>
Previous version is no longer available.

Signed-off-by: Ronan Abhamon <[email protected]>
Check enabled hosts instead of online during SR creation

Added `get_enabled_hosts` to get enabled hosts during SR operations. In some drivers e.g. LINSTOR, we need to ensure that hosts are enabled before performing operations, hence this function is needed. In some cases `thick` SR creation may fail due to `get_online_hosts` as the metrics could take time.

Signed-off-by: Rushikesh Jadhav <[email protected]>
@Millefeuille42 Millefeuille42 self-assigned this Jun 30, 2025
Comment on lines 2082 to 2086
self._linstor.resource_auto_place(
rsc_name=resource.name,
place_count=1,
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be recreated on the correct node, otherwise you change the behavior on the caller side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I read on the linstor docs, I'd say something like this would be appropriate. Is it ?

        self._linstor.resource_create(
            rscs=[
                linstor.linstorapi.ResourceData(
                    node_name=resource.node_name,
                    rsc_name=resource.name,
                    storage_pool=volume.storage_pool_name,
                    diskless=linstor.consts.FLAG_DISKLESS in resource.flags,
                    drbd_diskless=linstor.consts.FLAG_DRBD_DISKLESS in resource.flags,
                    nvme_initiator=linstor.consts.FLAG_NVME_INITIATOR in resource.flags,
                    ebs_initiator=linstor.consts.FLAG_EBS_INITIATOR in resource.flags
                )
            ]
        )

Copy link
Author

@Millefeuille42 Millefeuille42 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion added in 6b32030

place_count=1,
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS
)
self._mark_resource_cache_as_dirty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure there is no issue for caller with for resource in self._get_resource_cache().resources code? 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in current usage I break the parent loop with a return, however for now it is up to the caller to handle the fact that this function implicitly modifies shared state. I see two options:

  • Document this clearly in a docstring, specifying that it mutates the resource cache
  • Move cache invalidation responsibility to the caller

I'd prefer the first option since as I know there is little to no point to not invalidate the cache afterwards ? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed to a scan happening before the loop in 9a9dec1. So this is not an issue anymore.

Comment on lines 527 to 533
return util.retry(
lambda: self.allocated_volume_size,
maxretry=1,
period=1
)
except LinstorVolumeManagerError as e:
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this code. 😅 Maybe we should talk about it in a call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See answers below, the util.retry was unnecessary and removed in bae7edb, meanwhile the except->raise is there to prevent nested exceptions of the same type.

Comment on lines 2149 to 2150
except LinstorVolumeManagerError as e:
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function will call itself if a LinstorVolumeManagerError is raised, that exception can be caught again below, unnecessarily nesting exceptions of the same type. This might be a bit of overkill though, so I can remove it if you see no use in it.

Comment on lines 2144 to 2148
return util.retry(
lambda: self._get_volumes_info(volume_name),
maxretry=1,
period=1
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a retry, and with a maxretry of 1 and a return here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have mixed up stuff because I'm not sure of why I did this in the first place. I have fixed this in bae7edb. However this brings the eventuality of a loop if the resource is gets broken in the meantime.

Signed-off-by: Mathieu Labourier <[email protected]>
@Wescoeur
Copy link
Member

Wescoeur commented Jul 8, 2025

  • Can we delete a diskless that is open in read-only mode? Does DRBD prevent deletion if it is open?
  • Can we detect if a diskless is open (without using /sys/kernel/debug/)?

@Millefeuille42
Copy link
Author

Following issues that @Wescoeur noted we decided to have this PR on hold. We don't really know what is the cause of this issue and it seems relatively rare.

Not being sure of what happens when trying to delete a resource open in read-only and how to know if it is open makes room for further errors. Thus, manual repair remains preferred for the moment.

This is something that will be easily handled in SMAPIv3.

@stormi
Copy link
Member

stormi commented Jul 17, 2025

Then I suggest converting this PR back to draft.

@Millefeuille42 Millefeuille42 marked this pull request as draft July 17, 2025 10:18
@Wescoeur Wescoeur force-pushed the 3.2.12-8.3 branch 3 times, most recently from e7801da to ca5b52d Compare August 28, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants