#827 get from db ports firstly #893

BioQwer · 2025-08-26T10:14:56Z

i have been check in k8s

hub is work
create lab
reboot hub
lab is still working

i have been check in k8s

minrk · 2025-09-03T20:02:18Z

kubespawner/spawner.py

        if self.working_dir:
            self.working_dir = self._expand_user_properties(self.working_dir)
-        if self.port == 0:
+        # Prefer reading the port from the persisted server record, if available


This would not usually be correct. The persisted server.port is the connect port, whereas self.port sets the bind port (should almost never have a value other than the default in kubespawner).

Plus, access to the db from Spawners is deprecated and discouraged, so I don't think we should add this.

Can you share more about what problem this aims to solve? Maybe the answer is somewhere else.

@minrk

Can you share more about what problem this aims to solve?

Yes, of course.

Context

We run Hadoop and Spark on YARN.
To simplify the deployment of a Spark driver in JupyterLab we set the following
KubeSpawner options:

c.KubeSpawner.extra_pod_config = { "hostNetwork": True, "dnsPolicy": "ClusterFirstWithHostNet", }

Situation

When we use a static port, only one Lab instance can be started per node,
which is insufficient for our workload.
This issue first appeared when we had 5 nodes serving 15 Hub users.

To work around the limitation we introduced a pre‑spawn hook that selects a
random port in the range 9000‑9299:

import random def my_pre_spawn_hook(spawner): """Choose a random port for the notebook server.""" spawner.port = random.choice([9000 + i for i in range(300)])

The Hub remembers the chosen port and uses it for health‑check requests, so
the solution works well—provided the Hub process stays alive.

If the Hub pod restarts, it loses the mapping of Lab instances to their
assigned ports, and the health checks start failing.
We also tried falling back to the default port 8888 for the health check,
but when no response is received the Hub deletes the Lab that is running on
the random port, which is not the behavior we want.

Key points

Problem: Static ports restrict us to a single Lab per node.

Current workaround: Randomly assign a port in a pre‑spawn hook and store
it in the Hub for health checks.

Failure mode: The mapping is lost when the Hub pod restarts, causing
orphaned Lab pods or premature deletions.

Feel free to let me know if any part needs more detail!

Thanks for clarifying that this is about host networking. Do you see url changed! in your logs when the hub restarts?

I think you're right that it will do the wrong thing if .port is specified from anything other than static config, specifically here it assumes get_pod_url() is right (uses self.port by default) and the persisted db value in self.server is wrong, but in your case, it is the opposite.

That code is really there to deal with cluster networking changes, so maybe we should either remove or reverse the port logic, leaving only the host? Either that or persist/restore self.port in load_state/get_state. I doubt persisting self.port is right, though. I'll need to think through some cases to know which is right. Removing the port check is the simplest and usually probably the right thing to do.

Actually, I think I know what would be simplest and most correct: add self.port to the pod manifest in the annotation hub.jupyter.org/port, then retrieve that in _get_pod_url instead of using self.port unconditionally. self.port can then be a fallback if undefined (e.g. across an upgrade).

Do you want to have a go at tackling that? If not, I can probably do it.

is right (uses self.port by default) and the persisted db value in self.server is wrong, but in your case, it is the opposite.

yes it's persisted, but after restart hub.
hub clear port value.

That code is really there to deal with cluster networking changes, so maybe we should either remove or reverse the port logic, leaving only the host?

We can't delete port.
You should know port value for liveness probe.
http://<k8s_ip>:/api/

Should i do fix something for merge PR ?

i debug this situation

before

hub working

user_1 start pod at random_port 1234

hub persist in db port 1234

user_1 pod is working

hub restart

in db hub rewrite port to default 8888 but ip is ok

hub see user_1 pod

hub try livecheck in k8s_private_ip:8888/api

hub kill pod for user_1

after

hub working

user_1 start pod at random_port 1234

hub persist in db port 1234

user_1 pod is working

hub restart

in db hub read from db ip and port 1234

hub see user_1 pod

hub try livecheck in k8s_private_ip:8888/api

user_1 pod is working -> user_1 happy

The meaning of self.port is confused here - it is not the port used to connect, it is the port used by the process to bind.

it only for get previous config of early started pod.

The issue is the connect port is not persisted properly, and _get_pod_url always retrieves the self.port config, which can change, but should not be permitted to change while a pod is running.

i'm resolve persisting in db, by changing here.

I suggested the fix of persisting self.port in the pod's annotations in get_pod_manifest and using the annotation in _get_pod_url, which I believe should solve the problem here. It's not that self.port can change, it's that self.port is used in _get_pod_url, since changing it is normal.

why should i get from get_pod_manifest, if hub reading from db ?

The problem is in get_pod_url using self.port instead of the actual port when the pod is running. Fixing that will fix the problem. Relying on deprecated db access will eventually break, and is not the right thing to do when the pod is not running. The fix is to persist the port in the pod manifest via the annotation, so get_pod_url gets the right value, and self.port config will still have the right effect rather than being overridden.

Another, smaller fix would be to replace the netloc check with only a hostname check, so that we don't rewrite the port. I'm not sure if there are situations where the port could change, but we know there are where the ip changes.

The problem is in get_pod_url using self.port instead of the actual port when the pod is running.

are you sure that it fix it?
What will we do if not fixed?

are you sure that it fix it?

I believe it will

What will we do if not fixed?

Keep working to fix it

@minrk i understand that i have not time to refactor this.
It's working on production for 2 months.
Many not investing time to JH because have this problem.

jupyterhub#827 get from db ports firstly

bfeceeb

i have been check in k8s

jupyterhub-pr-triage-board-bot bot added this to PR triage (experimental) Aug 26, 2025

minrk reviewed Sep 3, 2025

View reviewed changes

Merge branch 'main' into fix_827

e70eeb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#827 get from db ports firstly #893

#827 get from db ports firstly #893

BioQwer commented Aug 26, 2025

Uh oh!

minrk Sep 3, 2025

Uh oh!

BioQwer Sep 5, 2025

Uh oh!

minrk Sep 5, 2025

Uh oh!

minrk Sep 5, 2025

Uh oh!

BioQwer Sep 5, 2025

Uh oh!

BioQwer Sep 6, 2025

Uh oh!

minrk Sep 6, 2025 •

edited

Loading

Uh oh!

BioQwer Sep 8, 2025

Uh oh!

minrk Sep 8, 2025

Uh oh!

BioQwer Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#827 get from db ports firstly #893

Are you sure you want to change the base?

#827 get from db ports firstly #893

Conversation

BioQwer commented Aug 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Context

Situation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

before

after

Uh oh!

minrk Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

minrk Sep 6, 2025 •

edited

Loading