Skip to content

Conversation

@mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Mar 5, 2024

The functional design in ssh.py was broken.

All other modules share these design characteristics:

  • module.open: accepts module-specific keyword parameters
  • module.open_uri function: accepts a URI and transport_params dict, function signature common across all modules
  • module.open_uri unpacks transport_params dict and passes it to module.open

The SSH submodule, on the other hand, violates these characteristics. ssh.open_uri passes transport_params to ssh.open as-is, without unpacking them. It looks like this snuck into the code in this commit 4e67683 and then further developed more recently in 269c3a2.

This PR brings ssh.py back in line with the common design characteristics shared by other submodules.

@mpenkov
Copy link
Collaborator Author

mpenkov commented Mar 5, 2024

@mrk-its and @wbeardall Can you please review?

Copy link
Contributor

@wbeardall wbeardall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except that the docstring in ssh.py (line 261-262) should change

"""
If ``username`` or ``password`` are specified in *both* the uri and
``transport_params``, ``transport_params`` will take precedence
"""

to

"""
If ``username`` or ``password`` are specified *both* as function arguments 
and in ``connect_kwargs``, ``connect_kwargs`` will take precedence.
"""

if connect_kwargs:
connect_kwargs = copy.deepcopy(connect_kwargs)
else:
connect_kwargs = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can there be mutable values inside connect_kwargs? if its a simple dict[str, str] or so, a mere connect_kwargs = connect_kwargs.copy() would suffice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can technically be mutable values in connect_kwargs; key_filename could potentially be a list or other mutable iterable (see SSHClient). Using the regular copy should be fine, as I don't think Paramiko will ever modify the provided iterable, but I think I just left it as a deepcopy as a failsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have code that sets connect_kwargs["pkey"] as an instance of a Paramiko PKey subclass, as well as connect_kwargs["sock"] to an instance of Paramiko ProxyCommand, so those are likely mutable. They definitely shouldn't change though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypothetically: if you'd want to share a sock across multiple threads (assuming sock is thread-safe), wouldn't deepcopy be unwanted here?

here is such a scenario but with s3. if smart_open would be calling deepcopy on my boto client in every call to open, that would defeat the purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a moot point, paramiko itself isn't thread-safe. The recommended solution is to open a new connection entirely in each thread (so in smart_open implementation terms, ignore the cached connection). So the smart_open implementation already isn't thread-safe.

I would assume, though can't find any documentation, that this implies ProxyCommand is also not thread-safe. Looking at the implementation of ProxyCommand, I'm not even sure what deepcopying it would do - it's basically a wrapper around a spawned subprocess, and I don't know enough about the Python (deep)copy implementation to know if deepcopying would spawn a new instance of the subprocess to give to the deepcopy, or if it'd preserve the reference to the original spawned process and just pass the same subprocess to the deepcopy.

…o fix-ssh

* 'develop' of https://github.com/piskvorky/smart_open: (66 commits)
  Optimize forward seeks within buffered data to avoid redundant GET (#892)
  Add macos to CI (#891)
  Simplify CI, use uv (#890)
  [s3] Improve handling of InvalidRange and seek on empty file (#889)
  Protect against hanging tests (#888)
  Bump the github-actions group with 2 updates (#886)
  build: fix invalid `fallback_version` when builing with `uv` (#884)
  Remove travis leftover (#881)
  Disambiguate URI examples in README.rst (#879)
  Update CHANGELOG.md
  Add .xz and increase performance of compression module (#875)
  Bump pypa/gh-action-pypi-publish in /.github/workflows (#878)
  Bump actions/checkout from 4 to 5 in the github-actions group (#877)
  Fix release.sh for the final merge back into develop (#872)
  Update CHANGELOG.md
  Drop 3.7 support in pyproject.toml (#871)
  Fix CI badge (#869)
  Bump softprops/action-gh-release in the github-actions group (#867)
  Fix release.sh merge message and final merge (#868)
  Update CHANGELOG.md
  ...
@ddelange
Copy link
Collaborator

these fixes were incorporated in #849

@ddelange ddelange closed this Oct 12, 2025
@ddelange ddelange deleted the fix-ssh branch October 12, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants