-
Notifications
You must be signed in to change notification settings - Fork 7
Repair diskless resource on sub-zero volume allocated size #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.2.12-8.3
Are you sure you want to change the base?
Repair diskless resource on sub-zero volume allocated size #88
Conversation
Signed-off-by: Mark Syms <[email protected]>
… After Disabling and Enabling Multipath Signed-off-by: Lunfan Zhang <[email protected]>
This was a patch added to the sm RPM git repo before we had this forked git repo for sm in the xcp-ng github organisation.
Originally-by: Ronan Abhamon <[email protected]> This version obtained through merge in ff1bf65: git restore -SW -s ydi/forks/2.30.7/xfs drivers/EXTSR.py mv drivers/EXTSR.py drivers/XFSSR.py git restore -SW drivers/EXTSR.py Signed-off-by: Yann Dirson <[email protected]>
Some important points: - linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix). - Encrypted VDI are supported with key_hash attribute (not tested, experimental). - When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless devices are not necessarily created on other hosts. So if a resource definition exists without local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the volume if the symlink has not yet been created by the udev rule. - Can change the provisioning using the device config 'provisioning' param. - We can only increase volume size (See: LINBIT/linstor-server#66), it would be great if we could shrink volumes to limit the space used by the snapshots. - Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present to do this from slaves. The same plugin is used to open LINSTOR ports + start controller. - Use a `total_allocated_volume_size` method to have a good idea of the reserved memory Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok, but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the volume data. - Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info, the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that contains the DRBD data or a remote disk if the DRBD device is diskless. - If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when VDIs are loaded. - Refresh PhyLink when unpause in called on DRBD devices: We must always recreate the symlink to ensure we have the right info. Why? Because if the volume UUID is changed in LINSTOR the symlink is not directly updated. When live leaf coalesce is executed we have these steps: "A" -> "OLD_A" "B" -> "A" Without symlink update the previous "A" path is reused instead of "B" path. Note: "A", "B" and "OLD_A" are UUIDs. - Since linstor python modules are not present on every XCP-ng host, module imports are protected by try.. except... blocks. - Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages - Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Co-authored-by: Piotr Robert Konopelko <[email protected]> Signed-off-by: Aleksander Wieliczko <[email protected]> Signed-off-by: Ronan Abhamon <[email protected]>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir used during SR creation is unmounted at the end of the `create` call (and also when a PBD is unplugged) in `detach` block. Signed-off-by: Ronan Abhamon <[email protected]>
A sm-config boolean param `subdir` is available to configure where to store the VHDs: - In a subdir with the SR UUID, the new behavior - In the root directory of the MooseFS SR By default, new SRs are created with `subdir` = True. Existing SRs are not modified and continue to use the folder that was given at SR creation, directly, without looking for a subdirectory. Signed-off-by: Ronan Abhamon <[email protected]>
Ensure all shared drivers are imported in `_is_open` definition to register them in the driver list. Otherwise this function always fails with a SRUnknownType exception. Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`. Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of the `serverpath` param because its value is equal to None.) `location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`. Signed-off-by: Ronan Abhamon <[email protected]>
SR_CACHING offers the capacity to use IntelliCache, but this feature is only available using NFS SR. For more details, the implementation of `_setup_cache` in blktap2.py uses only an instance of NFSFileVDI for the shared target. Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
* `except` syntax fixes * drop `has_key()` usage * drop `filter()` usage (but drop their silly `list(x.keys())` wrappings) * drop `map()` usage * use `int` not `long` * use `items()` not `iteritems()` Signed-off-by: Yann Dirson <[email protected]>
…store Signed-off-by: Yann Dirson <[email protected]>
Guided by futurize's "old_div" use Signed-off-by: Yann Dirson <[email protected]>
PROBE_MOUNTPOINT in a some drivers is a relative path, which is resolved using MOUNT_BASE at probe time, but CephFS, GlusterFS and MooseFS it is set on driver load to an absolute path, and this requires MOUNT_BASE to be looking like a path component. ``` drivers/CephFSSR.py:69: in <module> PROBE_MOUNTPOINT = os.path.join(SR.MOUNT_BASE, "probe") _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = <MagicMock name='mock.MOUNT_BASE' id='140396863897728'>, p = ('probe',) def join(a, *p): """Join two or more pathname components, inserting '/' as needed. If any component is an absolute path, all previous path components will be discarded. An empty last part will result in a path that ends with a separator.""" > a = os.fspath(a) E TypeError: expected str, bytes or os.PathLike object, not MagicMock /usr/lib64/python3.6/posixpath.py:80: TypeError ``` Note this same idiom is also used in upstream SMBFS, although that does not appear to cause any problem with the tests. Signed-off-by: Yann Dirson <[email protected]>
(coverage 7.2.5) Without these changes many warns/errors are emitted: - "assertEquals" is deprecated, "assertEqual" must be used instead - mocked objects in "setUp" method like "cleanup.IPCFlag" cannot be repatched at the level of the test functions, otherwise tests are aborted, this is the behavior of coverage version 7.2.5 Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
The probe method is not implemented so we shouldn't advertise it. Signed-off-by: BenjiReis <[email protected]>
Impacted drivers: LINSTOR, MooseFS and ZFS. - Ignore all linstor.* members during coverage, the module is not installed in github runner. - Use mock from unittest, the old one is not found now. - Remove useless return from LinstorSR scan method. Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: BenjiReis <[email protected]>
) Signed-off-by: Yann Dirson <[email protected]>
A SR inheriting from a EXTSR allowing to use a 4KiB blocksize device as SR. Create a 512 bytes loop device on top of a 4KiB device then give it to EXTSR code. It uses the same device-config as a normal local SR, i.e. `device-config:device=/dev/nvme0n1` After creation, the driver find the device under the VG to identify the correct disk. It means that creating the SR with a non-stable disk identifier is doable and it will work as EXTSR would by ignoring the device-config after creation. Identifying the correct disk by using LVM infos. The VG is created using a different prefix name from EXTSR. It is `XSLocalLargeBlock-<SR UUID>`. The SR artificially limits the creation to disk not being 512b. It will throw an error if a disk whose blocksize is 512 is given. We currently don't support multi devices, it fails at the EXTSR creation. We added an error to explicitly say that multi devices SR is not supported on the driver. Before that, it would make another error: ``` Error code: SR_BACKEND_FAILURE_77 Error parameters: , Logical Volume group creation failed, ``` Sometimes the pvremove from EXTSR using the loop device fails. In this case, we need to remove the real device from PV list ourself in the error handling. Signed-off-by: Damien Thenot <[email protected]>
With this change the driver supports a "lvm-conf" param on "other-config". For now The configuration is only used by "remove" calls from LVMCache. Example to issue discards after a lvremove command: > xe sr-param-set uuid=<SR_UUID> other-config:lvm-conf=issue_discards=1 And to remove the param: > xe sr-param-remove uuid=<SR_UUID> param-name=other-config param-key=lvm-conf Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Benjamin Reis <[email protected]>
Last commit: 9207abe "fix(linstor): check if resource is tiebreaker (#62)" Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Co-authored-by: Damien Thenot <[email protected]> Co-authored-by: Ronan Abhamon <[email protected]> Signed-off-by: Damien Thenot <[email protected]>
Also use return type annotations on these methods. Signed-off-by: Ronan Abhamon <[email protected]>
Due to mypy modifications, we can't build the sm RPM in Koji without a recent pylint version. So the precheck target is only executed in a github workflow now. Signed-off-by: Ronan Abhamon <[email protected]>
Avoid: ``` drivers/LVHDSR.py:195: error: Item "None" of "Any | None" has no attribute "get" [union-attr] drivers/LVHDSR.py:196: error: Value of type "Any | None" is not indexable [index] ``` Signed-off-by: Ronan Abhamon <[email protected]>
During `LinstorSR` init, only create the journaler to make `should_preempt` happy. The volume manager MUST always be created in a SR lock context. Otherwise, we can trigger major issues. For example, a volume can be deleted from the KV-store by `cleanup.py` during a snapshot rollback. Very rare situation but which allowed this problem to be discovered. Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Damien Thenot <[email protected]>
Until now the cleanup VHD resize commands were performed on the master. But it doesn't work every time when a VHD of a chain is opened for reading on another host. As a reminder, this portion of code is only executed rarely. A user must have resized a VHD that must later be coalesced. Signed-off-by: Ronan Abhamon <[email protected]>
Previous version is no longer available. Signed-off-by: Ronan Abhamon <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
Check enabled hosts instead of online during SR creation Added `get_enabled_hosts` to get enabled hosts during SR operations. In some drivers e.g. LINSTOR, we need to ensure that hosts are enabled before performing operations, hence this function is needed. In some cases `thick` SR creation may fail due to `get_online_hosts` as the metrics could take time. Signed-off-by: Rushikesh Jadhav <[email protected]>
Signed-off-by: Mathieu Labourier <[email protected]>
drivers/linstorvolumemanager.py
Outdated
self._linstor.resource_auto_place( | ||
rsc_name=resource.name, | ||
place_count=1, | ||
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be recreated on the correct node, otherwise you change the behavior on the caller side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I read on the linstor docs, I'd say something like this would be appropriate. Is it ?
self._linstor.resource_create(
rscs=[
linstor.linstorapi.ResourceData(
node_name=resource.node_name,
rsc_name=resource.name,
storage_pool=volume.storage_pool_name,
diskless=linstor.consts.FLAG_DISKLESS in resource.flags,
drbd_diskless=linstor.consts.FLAG_DRBD_DISKLESS in resource.flags,
nvme_initiator=linstor.consts.FLAG_NVME_INITIATOR in resource.flags,
ebs_initiator=linstor.consts.FLAG_EBS_INITIATOR in resource.flags
)
]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion added in 6b32030
place_count=1, | ||
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS | ||
) | ||
self._mark_resource_cache_as_dirty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure there is no issue for caller with for resource in self._get_resource_cache().resources
code? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here in current usage I break the parent loop with a return, however for now it is up to the caller to handle the fact that this function implicitly modifies shared state. I see two options:
- Document this clearly in a docstring, specifying that it mutates the resource cache
- Move cache invalidation responsibility to the caller
I'd prefer the first option since as I know there is little to no point to not invalidate the cache afterwards ? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed to a scan happening before the loop in 9a9dec1. So this is not an issue anymore.
drivers/linstorvolumemanager.py
Outdated
return util.retry( | ||
lambda: self.allocated_volume_size, | ||
maxretry=1, | ||
period=1 | ||
) | ||
except LinstorVolumeManagerError as e: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this code. 😅 Maybe we should talk about it in a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See answers below, the util.retry
was unnecessary and removed in bae7edb, meanwhile the except->raise is there to prevent nested exceptions of the same type.
drivers/linstorvolumemanager.py
Outdated
except LinstorVolumeManagerError as e: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the function will call itself if a LinstorVolumeManagerError
is raised, that exception can be caught again below, unnecessarily nesting exceptions of the same type. This might be a bit of overkill though, so I can remove it if you see no use in it.
drivers/linstorvolumemanager.py
Outdated
return util.retry( | ||
lambda: self._get_volumes_info(volume_name), | ||
maxretry=1, | ||
period=1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a retry, and with a maxretry of 1 and a return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have mixed up stuff because I'm not sure of why I did this in the first place. I have fixed this in bae7edb. However this brings the eventuality of a loop if the resource is gets broken in the meantime.
Signed-off-by: Mathieu Labourier <[email protected]>
|
Signed-off-by: Mathieu Labourier <[email protected]>
Signed-off-by: Mathieu Labourier <[email protected]>
Following issues that @Wescoeur noted we decided to have this PR on hold. We don't really know what is the cause of this issue and it seems relatively rare. Not being sure of what happens when trying to delete a resource open in read-only and how to know if it is open makes room for further errors. Thus, manual repair remains preferred for the moment. This is something that will be easily handled in SMAPIv3. |
Then I suggest converting this PR back to draft. |
e7801da
to
ca5b52d
Compare
No description provided.