Skip to content

Conversation

mirabilos
Copy link

Please be gentle, this is my first Ansible module, but do tell me how to improve it to meet the same standards as other modules ;)

SUMMARY

This module can be used to access the JSON API of a KEA DHCP4, DHCP6, DDNS or other services in a generic way, without having to manually format the JSON, with response error code checking.

It directly accesses the Unix Domain Socket API so it needs to execute on the system the server is running, with superuser privilegues, but without the hassle of wrapping it into HTTPS and password auth (or client certificates).

ISSUE TYPE
  • New Module/Plugin Pull Request
COMPONENT NAME

kea_command

ADDITIONAL INFORMATION

The integration test uses a predefined setup for convenience (i.e. so I can rely on a system setup I know and have tested); this in no way limits the usability of the module itself, it merely makes it easier to configure the “server component” of the integration test.

This module can be used to access the JSON API of a
KEA DHCP4, DHCP6, DDNS or other services in a generic
way, without having to manually format the JSON, with
response error code checking.

It directly accesses the Unix Domain Socket API so it
needs to execute on the system the server is running,
with superuser privilegues, but without the hassle of
wrapping it into HTTPS and password auth (or client
certificates).

The integration test uses a predefined setup for
convenience.
@ansibullbot ansibullbot added integration tests/integration module module new_contributor Help guide this first time contributor plugins plugin (any type) tests tests labels Aug 21, 2025
@felixfontein felixfontein added check-before-release PR will be looked at again shortly before release and merged if possible. backport-11 Automatically create a backport for the stable-10 branch labels Aug 21, 2025
@mirabilos
Copy link
Author

ERROR: Found 1 metaclass-boilerplate issue(s) which need to be resolved:
ERROR: plugins/modules/kea_command.py:0:0: missing: metaclass = type (75%)

Huh.

  1. why does the sanity test not show this when ran locally?
  2. isn’t that just for Python compatibility, but all supported releases require Python3 now anyway, and thus redundant?

interestingly not when ran locally, and which is also
useless because this is for Python compatibility, but
all supported versions already require py3k instead,
also on the target nodes, not just on the controller
I have no idea what this is doing, but the one I copied
from core fails the CI tests (but not when run locally!)
and this seems to be more obsolete Python compatibility
for versions not even supported by Ansible any more, as
if someone forgot to update the testsuite to match the
current reality… (not that I even use these but the docs
say it’s required)
@felixfontein
Copy link
Collaborator

Thanks for your contribution!

ERROR: Found 1 metaclass-boilerplate issue(s) which need to be resolved:
ERROR: plugins/modules/kea_command.py:0:0: missing: metaclass = type (75%)

Huh.

1. why does the sanity test not show this when ran locally?

2. isn’t that just for Python compatibility, but all supported releases require Python3 now anyway, and thus redundant?

The sanity test was removed from ansible-core 2.17 I think (since 2.17 dropped support for Python 2.7 on the target as well), but this collection still supports ansible-core 2.16, whose sanity tests still require it, and which also still supports Python 2.7 on the target. (We'll remove support for ansible-core 2.16 in the next major release in ~November this year. From then on it's no longer needed.)

@mirabilos
Copy link
Author

mirabilos commented Aug 23, 2025 via email

@felixfontein
Copy link
Collaborator

It’s a bit tricky as I seem to have to rely on the CI, as the sanity tests from the documentation pass when I run them locally.

You can install ansible-core 2.16 locally and use its ansible-test, but note that it will refuse to run if you don't run it with Python 3.10, 3.11, or 3.12...

In any case, usually running sanity tests with the latest (or one of the latest) ansible-core release is perfectly fine, this only changed once ansible-core 2.17 dropped support for Python 2.7, and is only a problem while we still support ansible-core versions that still support Python 2.7. Which fortunately ends (at least for main and the latest stable branch) in ~November this year.

(And yes, I've been looking forward to this for years :D )

Copy link
Collaborator

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

@ansibullbot ansibullbot added the needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI label Aug 23, 2025
@ansibullbot ansibullbot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR and removed needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Aug 23, 2025
mirabilos and others added 2 commits August 23, 2025 21:13
@mirabilos
Copy link
Author

mirabilos commented Aug 23, 2025 via email

@felixfontein
Copy link
Collaborator

Hmm, the image works fine for me without privileged mode (and I would
Does the integration test work for you without it?

No, Networking setup, interface fails with Operation not permitted.

I could imagine that it would work fine in some of the test VMs though that are used in CI (various versions of RHEL, Alpine 3.22, Ubuntu 22.04 and 24.04).

@mirabilos
Copy link
Author

24.04 is rather old for this (KEA is, while not brand new, only used by people now that ISC DHCP is EOL, and has had several major changes recently), but if you think it worth, I can try making a regression test for it.

Rest ok then?

@mirabilos
Copy link
Author

Or, hm, well, if the “Networking setup, interface” step is the one that fails for you, then I don’t think it will even work there, I need an extra dummy network interface it can temporarily play DHCP server on for this.

Copy link
Collaborator

@russoz russoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @mirabilos Thanks for your contribution!!!!

I got some comments myself, please take a look. Not everything is mandatory, feel free to disagree :-)

# Copyright © Thorsten Glaser <[email protected]>
# GNU General Public License v3.0+ (see LICENSES/GPL-3.0-or-later.txt or https://www.gnu.org/licenses/gpl-3.0.txt)

# note: this module has not been tested with Python, as the switch from ISC DHCPD to KEA occurs at a time where distros only ship Python3 instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood the note. What do you mean by "has not tested with Python" ? Do you mean Python 2.7?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I have written this in py3k and had to port it to the subset between py3k and Python later so the checks pass, but KEA is so new that any distro that ships it will already have Python deprecated and removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please be explicit about that. Also, in the aliases file in the integration test, there is a keyword you can use to prevent it from running the test in Python 2.7. Please add it (and remove the unsupported one as mentioned in the previous comment).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… huh, which one, and does that affect the EOL tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in this very line, please use "Python 2.7" instead of "Python". The unnumbered form is ambiguous, some people might read and imagine that "Python", in 2025, certainly means Python 3 (because Python 2 has been deprecated a long time ago). What is obvious to one person might be completely misleading to another. ;-)

arguments:
description:
- The arguments sent along with the command, if any.
- Use V({}) to send an empty JSONObject instead of omitting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make default={}? Or, leave no default and, in the code (down there somewhere):

if arguments is None:
    argumentes = {}

Whichever makes more sense to you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had wondered about this, but it makes a difference, and I’ve only tested the variant of not sending the JSONObject, and who knows how the KEA API will develop in the future, so let’s allow the user to set either for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not call it a JSONObject in the documentation for the end user, though. Some of your users might not be too familiar with JSON and its terminology. In the Ansible/YAML world, that thing would be a dict, even though they're amount to the same thing under the wraps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I was addressing the KEA user here. I guess we’ll need to fit both.

What do you two think about: - Use V({}) to send an empty arguments dict/object instead of omitting.

Comment on lines +37 to +56
rv_unchanged:
description:
- A list of C(result) codes to indicate success but unchanged system state.
- Set this to V([0]) for most acquisition commands.
- Use V([3]) for C(lease4-del) and similar which have a separate code for this.
- Any C(result) codes not listed in either O(rv_unchanged) or O(rv_changed) are interpreted as indicating an error result.
- O(rv_unchanged) has precedence over O(rv_changed) if a result code is in both lists.
type: list
elements: int
default: []
rv_changed:
description:
- A list of C(result) codes to indicate success and changed system state.
- Omit this for most acquisition commands.
- Set it to V([0]) for C(lease4-del) and similar which return changed system state that way.
- Any C(result) codes not listed in either O(rv_unchanged) or O(rv_changed) are interpreted as indicating an error result.
- O(rv_unchanged) has precedence over O(rv_changed) if a result code is in both lists.
type: list
elements: int
default: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so the user has to map all the low level exist codes to get the changed status? That almost make using ansible.builtin.command more attractive an option :-)

To me that sounds like we need not one module, but a handful of them, so that the we can handle these exist codes accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so the user has to map all the low level exist codes to get the changed status?

Yes, as they differ between commands. KEA does not have a central command registry; they have commands common to DHCP4 and DHCP6 and DDNS, and commands specific for these, and dozens of extension libraries.

If someone really has time and interest, they can machinally create a few hundred Ansible modules that map just one command, but until then, we have to do this. And even then we still need the generic one for commands from plug-ins we don’t know about (KEA has paid-for plugins as well)…

I looked into defaulting to something more user-friendly, but the status codes diverge so much… at least, if the user does not list them it will fail, so the user will know to fix it.

The ansible.builtin.command way does not look better: you pretty much have to install socat, do something like…

   ansible.builtin.command:
     argv: ["socat", "UNIX:/run/kea/kea4-ctrl-socket", "-"]
     expand_argument_vars: false
     stdin: |
       {
         "command": "lease4-del",
         "arguments": {
           "ip-address": "{{ ipaddr }}"
         }
       }

… and still check return values. Or use one of the possible clients provided by KEA that also require you to manually format the JSON, but one is not available in any version but the one released last month, and the other is deprecated…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see :-) Thanks for clarification.

The only other way I could think of, until you said a few hundred, would be to have a preset map of command names to RV values (both changed, unchanged), so that could be automatically selected. But then again, probably only worth doing if we were talking a handful, or tens, but certainly not hundreds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have some patterns that are repeated abundantly in these hundred, maybe you could create a parameter, say, rv_schema or something to that effect, make it a fixed list of choices, and hopefully cover 80% of the cases with that. That parameter would then be mutually exclusive with these existing ones, so if the user uses a fixed schema they cannot pass these specific values.

Like, one such schema might be called 0chg_3unchg (please use better names than this if you go for it) and it would be equivalent to rv_changed=[0], rv_unchanged=[3], your description for the command=lease4-del.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.

That honestly sounds like overcomplicating things to me, and definitely reduces legibility for saving a few bytes at best.

I also don’t have an easy way to check return values for even the currently known plugins, some even are pay-to-use…

I’d prefer to keep it simple at first.

@ansibullbot
Copy link
Collaborator

@mirabilos This PR contains @ mentions in at least one commit message. Those mentions can cause cascading notifications through GitHub and need to be removed. Please squash or amend your commits to remove the mentions.

click here for bot help

@ansibullbot ansibullbot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label Aug 27, 2025
description:
- A list of C(result) codes to indicate success but unchanged system state.
- Set this to V([0]) for most acquisition commands.
- Use V([3]) for C(lease4-del) and similar which have a separate code for this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have missed this one before, I believe this would be:

Suggested change
- Use V([3]) for C(lease4-del) and similar which have a separate code for this.
- Use V([3]) for O(command=lease4-del) and similar which have a separate code for this.

More occurrences around the docs, please review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

phase = 'closing'
except OSError as ex:
r['msg'] = 'error %s socket (%s): %s' % (phase, sockfn, str(ex))
r['exception'] = ex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value exception is dubbed as one of the common return values, and as such, they have specific semantics. In this case exception is a misnomer, probably for some lost historical reason. The value of that RV, despite the name, is not an object of the type Exception (like ex here). It is meant to be a traceback object - associated with that exception, yes, but NOT the exception itself.

One simple way of making that work is to raise a new Exception with your text message as argument, from the exception ex. To make that work, because we still have to support Python 2 in this collection, you would have to use:

from ansible.module_utils.six import raise_from
...

    except OSError as ex:
        raise_from(Exception('error %s socket ...' % (...)), ex)

Or you can obtain the traceback object yourself and pass it as RV exception.
Personally, I think since you are handling the error and passing an informative message about it, there is no need to return exception nor traceback. But that's your call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively change r['exception'] = ex to r['exception'] = traceback.format_exc() (and add import traceback further up).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that another Python vs. py3k thing? The builtin command module certainly doesn’t do that when it does:

    try:
        os.chdir(chdir)
    except OSError as ex:
        r['msg'] = 'Unable to change directory before execution.'
        module.fail_json(**r, exception=ex)

(It’s one of the two-and-a-half I derived from.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is an Ansible thing. Exceptions must not be passed as exception objects to module.fail_json, but as stack traces. The latest ansible-core version might support this, but we also support older versions, and they do not. (I did thest that.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Thank you for all the fixes, I’ll work them in once I justifiably find time for this again (need to get paid work done first).

Comment on lines +11 to +17
- name: Ensure this is run on the right distro and version the integration test was written for
ansible.builtin.assert:
that:
- ansible_distribution_major_version | int == 13
- ansible_distribution | lower == 'debian'
fail_msg: "This test was written for a Debian 13 (trixie) container due to uses of its default configuration"
quiet: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy as that:

Suggested change
- name: Ensure this is run on the right distro and version the integration test was written for
ansible.builtin.assert:
that:
- ansible_distribution_major_version | int == 13
- ansible_distribution | lower == 'debian'
fail_msg: "This test was written for a Debian 13 (trixie) container due to uses of its default configuration"
quiet: true
- name: Ensure this is run on the right distro and version the integration test was written for
ansible.builtin.meta: end_play
when: ansible_os_family != 'Debian' and ansible_distribution_major_version | int != 13

This way we can enable the test and, whenever Debian trixie is enabled in the CI, it will already work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't have Debian Trixie as a VM, so it won't work :) But there are other OSes that we do have as a VM that should work already now (with potentially little changes, like Ubuntu 24.04).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds... final. Any reason not to use Trixie in tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tje ansible-core testing infrastructure does not provide Debian VMs. We can run containers though (and we will run Debian Trixie containers).

The tests for this modlue do not run in containers, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow. If they don’t run in containers, there’s little chance I can do the things needed to start the KEA services in there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfontein I gave the thumbs up to your last comment, but then later it hit me: why? why won't the tests run in containers? I am not familiar with KEA, so I am assuming something in this tool that prevents that from happening? Glancing over the test code I could not see anything else that would justify that - but I really just glanced quickly, I may have missed something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russoz the tests set up a new networking device, that isn't supported in "normal" containers (without additional capabilities, or privileged mode): #10709 (comment)

Since we want that containerized tests "just work" if you run ansible-test integration --docker <right image> <test name>, without having to supply specific additional options to give more capabilities to the container (and especially without making it a privileged container), it won't run in CI this way.

@felixfontein
Copy link
Collaborator

Or, hm, well, if the “Networking setup, interface” step is the one that fails for you, then I don’t think it will even work there, I need an extra dummy network interface it can temporarily play DHCP server on for this.

That fails for me because it's in a container. Ubuntu 24.04 also runs as a VM, there this is not a problem.

There are packages for KEA for Ubuntu 24.04, Alpine 3.22, and RHEL 10, all which are available as VMs in our CI. You should adjust the tests so they at least run with (at least) one of these VMs.

@mirabilos
Copy link
Author

There are packages for KEA for Ubuntu 24.04, Alpine 3.22, and RHEL 10, all which are available as VMs in our CI. You should adjust the tests so they at least run with (at least) one of these VMs.

Hm, okay. If it runs as VM, I can likely set up the service.

I can try to get it working under Ubuntu noble. What magic incantations do I need in the tests/ subdirectory to make it select precisely that VM, and nothing else, to run the integration test?

@felixfontein
Copy link
Collaborator

Basically add this to aliases:

azp/posix/2
azp/posix/vm
skip/aix
skip/alpine  # TODO: make this work
skip/docker
skip/fedora  # TODO: make this work (not running in CI right now)
skip/freebsd
skip/macos
skip/osx
skip/rhel  # TODO: make this work
skip/ubuntu22.04
needs/root

The TODOs are so that at some point hopefully someone extends the tests to also run on these systems.

@ansibullbot ansibullbot added the stale_ci CI is older than 7 days, rerun before merging label Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-11 Automatically create a backport for the stable-10 branch check-before-release PR will be looked at again shortly before release and merged if possible. integration tests/integration module module needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR new_contributor Help guide this first time contributor plugins plugin (any type) stale_ci CI is older than 7 days, rerun before merging tests tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants