Skip to content

Conversation

ilia-db
Copy link
Contributor

@ilia-db ilia-db commented Aug 25, 2025

Changes

  1. "ssh" and "ssh setup" commands PR: Add "ssh" and "ssh setup" commands #3470
  2. "ssh connect" PR: Add "ssh connect" command #3471

This PR is based on the above, it adds "ssh server" command and related logic and finishes the "ssh" implementation by adding ssh command group to the root cmd (it's still hidden from the --help output).

Server overview:

  • The server itself is started by the client running the ssh-server-bootsrap.py job
    • The client passes inputs to this file as jobs API args, and the python logic gets the values using dbutils widgets
    • One interesting part is setup_subreaper method, which ensures no child process leave the parent process group. This is necessary in order to prevent background processes being reparented to pid 1 when their parent ssh server session terminates, after which they loose access to wsfs and dbfs fuse mounts (as access is based on process groups). All major IDEs usually try to spawn their server processes as a background processes (by double forking for example).
  • Server creates one sshd config for all future sshd processes in the home folder
  • It spawns sshd with -i flag, making it run as in non-daemon mode and communicating with it through stdin and our (and so sshd doesn't listen on any ports)
  • Separate sshd process is spawned for each new websocket connection (but the sshd config is re-used), there are limit of 10 connections (for no particular reason)
  • The server start a shutdown timer when there are not active connections, after which it terminates itself, which terminates the job, and the cluster idle timeout can start ticking after that
  • The server also places jupyter-init.py file to the $HOME/.ipython/profile_default/startup. This file makes jupyter kernels a bit more similar to webapp notebooks - it provides spark and dbutils globals, plus handles %sql and %pip magic commands

WIP: unit tests

Tests

Manual and unit

@eng-dev-ecosystem-bot
Copy link
Collaborator

eng-dev-ecosystem-bot commented Aug 25, 2025

Run: 17551494331

Env ✅​pass 🙈​skip
✅​ aws linux 308 519
✅​ aws windows 309 518
✅​ aws-ucws linux 420 417
✅​ aws-ucws windows 421 416
✅​ azure linux 308 518
✅​ azure windows 309 517
✅​ azure-ucws linux 420 416
✅​ azure-ucws windows 421 415
✅​ gcp linux 307 520
✅​ gcp windows 308 519

@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from 8c90ef8 to e6fb1d0 Compare August 25, 2025 13:06
@ilia-db ilia-db temporarily deployed to test-trigger-is August 25, 2025 13:06 — with GitHub Actions Inactive
@ilia-db ilia-db temporarily deployed to test-trigger-is August 25, 2025 13:08 — with GitHub Actions Inactive
@ilia-db ilia-db requested a review from fjakobs August 26, 2025 08:49
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from a0f94b7 to fbbc3db Compare August 27, 2025 10:17
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 10:17 — with GitHub Actions Inactive
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from fbbc3db to 17c3059 Compare August 27, 2025 10:22
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 10:22 — with GitHub Actions Inactive
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 10:36 — with GitHub Actions Inactive
@ilia-db ilia-db force-pushed the ssh-tunnel-connect branch from b222dae to 421c153 Compare August 27, 2025 10:46
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from 50eeb53 to 48eaf73 Compare August 27, 2025 10:46
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 10:47 — with GitHub Actions Inactive
@ilia-db ilia-db force-pushed the ssh-tunnel-connect branch from 421c153 to 7d74402 Compare August 27, 2025 11:01
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from 48eaf73 to de12be3 Compare August 27, 2025 11:02
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 11:02 — with GitHub Actions Inactive
@ilia-db ilia-db force-pushed the ssh-tunnel-connect branch from 7d74402 to a3e7567 Compare August 27, 2025 11:18
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from de12be3 to 6f8a52e Compare August 27, 2025 11:29
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 11:29 — with GitHub Actions Inactive
@ilia-db ilia-db force-pushed the ssh-tunnel-connect branch from a3e7567 to 33d5bda Compare August 27, 2025 12:42
@ilia-db ilia-db force-pushed the ssh-tunnel-server branch from 6f8a52e to febaca3 Compare August 27, 2025 12:42
@ilia-db ilia-db temporarily deployed to test-trigger-is August 27, 2025 12:42 — with GitHub Actions Inactive
github-merge-queue bot pushed a commit that referenced this pull request Sep 1, 2025
## Changes

This is the first PR that introduces the ssh-tunnel functionality into
the CLI

See README for an overview of the SSH tunnel feature and its usage. 

This PR only adds `ssh setup` command, `connect` and `server` are here:
- #3471
- #3475

The whole `ssh` command group is hidden (not available in `--help`
output, and in this PR it's not even added to the root cmd), and the
usage text has private preview disclaimer. Hidden flag will be removed
when we move to public preview phase (no timeline yet).

`setup` command updates local ssh config (`~/.ssh/config`). The config
entry relies on `ProxyCommand` option calling `databricks ssh connect`,
which is implemented in a follow up PR.


## Why
We've agreed to move the implementation form the separate repo to the
main databricks CLI

## Tests
Manually and unit tests.

---------

Co-authored-by: Pieter Noordhuis <[email protected]>
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp to unblock review of the #3544.

Long: `Run SSH tunnel server.

This command starts an SSH tunnel server that accepts WebSocket connections
and proxies them to local SSH daemon processes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. The failure happens somewhere deep in the stack. It would be useful to make this error more explicit. If a user runs it and gets some error about a secret missing, that's confusing. If instead they see "This command is supposed to be run on Databricks compute. Please find usage instructions at <>.", that's better.

github-merge-queue bot pushed a commit that referenced this pull request Sep 8, 2025
## Changes
See the base PR for the context: 
- #3470

This PR adds `databricks ssh connect` subcommand and all related
utilities.

The main logic here is in `libs/ssh/client.go` and `libs/ssh/proxy.go`
files.

See `proxy_test.go` for an overview of how client and server interact
with each other through the proxy.

Overview of the `ssh client` logic:
- Generate local ssh keys if necessary
(`~/.databricks/ssh-tunnel-keys/<cluster-id>`)
- Save the public key in the secret scope
(`<username>-<cluster-id>-ssh-tunnel-keys`)
- Upload databricks releases (linux arm and amd) with the marching
version to the /Workspace
- Get /Workspace/metadata.json file with the server port info
- If the metadata is not there, execute `ssh-server-bootsrap.py` file as
a job (it runs `databricks server` command that's implemented in the
follow up PR)
- Get server metadata by sending a driver-proxy request to the known
port, it will return a user name
- Spawn `ssh client` with the right User and a ProxyCommand that
executed `ssh connect --proxy --metadata`
- New instance of the `connect` command can now start a proxy over
websocket connection backed by Driver Proxy API

Overview of the Proxy logic:
- The main interesting part is that the proxy does automatic re-connects
every 30 minutes (configurable) to avoid auth token expiration problems.
In all the tricky places (mostly related to concurrency and locks) there
are comments explaining the logic.

The final follow up PR:
- #3475


## Tests
Manual and unit
Base automatically changed from ssh-tunnel-connect to main September 8, 2025 12:47
@ilia-db ilia-db enabled auto-merge September 8, 2025 13:43
@ilia-db ilia-db added this pull request to the merge queue Sep 8, 2025
Merged via the queue into main with commit a0b55e7 Sep 8, 2025
13 checks passed
@ilia-db ilia-db deleted the ssh-tunnel-server branch September 8, 2025 14:05
github-merge-queue bot pushed a commit that referenced this pull request Sep 10, 2025
## Changes

Previous PRs in the stack:
- #3471
- #3475

Move from flat structure to cmd and internal
client/server/proxy/keys/workspace sub packages.

Moved a bunch of hardcoded values to constants (some hardcoded stuff
remains in the `keys` and `workspace` sub packages, will be improved in
a follow up)

Added unit tests for servers connection manager logic and updated proxy
tests.
The coverage isn't great yet, the goal is to avoid unit tests with
excessive mocking and focus on integration tests from now on.

## Why
`/experimental` already has the databricks-bundles python project, seems
like a good place to put ssh stuff too

## Tests
Unit and manual
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants