Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions smart_open/ssh.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

"""

import copy
import getpass
import os
import logging
Expand Down Expand Up @@ -162,6 +163,14 @@ def _maybe_fetch_config(host, username=None, password=None, port=None, connect_k
#
actual_hostname = ""

#
# Avoid modifying the caller's copy of connect_kwargs, as that would create
# an unexpected side-effect.
#
if connect_kwargs:
connect_kwargs = copy.deepcopy(connect_kwargs)
else:
connect_kwargs = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can there be mutable values inside connect_kwargs? if its a simple dict[str, str] or so, a mere connect_kwargs = connect_kwargs.copy() would suffice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can technically be mutable values in connect_kwargs; key_filename could potentially be a list or other mutable iterable (see SSHClient). Using the regular copy should be fine, as I don't think Paramiko will ever modify the provided iterable, but I think I just left it as a deepcopy as a failsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have code that sets connect_kwargs["pkey"] as an instance of a Paramiko PKey subclass, as well as connect_kwargs["sock"] to an instance of Paramiko ProxyCommand, so those are likely mutable. They definitely shouldn't change though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypothetically: if you'd want to share a sock across multiple threads (assuming sock is thread-safe), wouldn't deepcopy be unwanted here?

here is such a scenario but with s3. if smart_open would be calling deepcopy on my boto client in every call to open, that would defeat the purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a moot point, paramiko itself isn't thread-safe. The recommended solution is to open a new connection entirely in each thread (so in smart_open implementation terms, ignore the cached connection). So the smart_open implementation already isn't thread-safe.

I would assume, though can't find any documentation, that this implies ProxyCommand is also not thread-safe. Looking at the implementation of ProxyCommand, I'm not even sure what deepcopying it would do - it's basically a wrapper around a spawned subprocess, and I don't know enough about the Python (deep)copy implementation to know if deepcopying would spawn a new instance of the subprocess to give to the deepcopy, or if it'd preserve the reference to the original spawned process and just pass the same subprocess to the deepcopy.

for config_filename in config_files:
try:
cfg = paramiko.SSHConfig.from_path(config_filename)
Expand Down Expand Up @@ -192,14 +201,14 @@ def _maybe_fetch_config(host, username=None, password=None, port=None, connect_k
# that the identityfile list has len > 0. This should be redundant, but
# keeping it for safety.
#
if connect_params.get("key_filename") is None:
if connect_kwargs.get("key_filename") is None:
identityfile = cfg.get("identityfile", [])
if len(identityfile):
connect_params["key_filename"] = identityfile
connect_kwargs["key_filename"] = identityfile

for param_name, (sshcfg_name, from_str) in _PARAMIKO_CONFIG_MAP.items():
if connect_params.get(param_name) is None and sshcfg_name in cfg:
connect_params[param_name] = from_str(cfg[sshcfg_name])
if connect_kwargs.get(param_name) is None and sshcfg_name in cfg:
connect_kwargs[param_name] = from_str(cfg[sshcfg_name])

#
# Continue working through other config files, if there are any,
Expand Down Expand Up @@ -266,7 +275,7 @@ def open(
the local ~/.ssh/known_hosts *automatically*.

If ``username`` or ``password`` are specified in *both* the uri and
``transport_params``, ``transport_params`` will take precedence
``connect_kwargs``, ``connect_kwargs`` will take precedence
"""
host, user, password, port, connect_kwargs = _maybe_fetch_config(
host, user, password, port, connect_kwargs
Expand Down
2 changes: 1 addition & 1 deletion tests/test_smart_open.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ def test_read_ssh(self, mock_open):
obj = smart_open.open(
"ssh://ubuntu:pass@ip_address:1022/some/path/lines.txt",
mode='rb',
transport_params=dict(hello='world'),
transport_params={'connect_kwargs': {'hello': 'world'}},
)
obj.__iter__()
mock_open.assert_called_with(
Expand Down
Loading