solarman_rtu_proxy.py: improve runtime stability #93

ngehrsitz · 2025-05-29T18:39:34Z

When running solarman_rtu_proxy.py from the docker container contineously I encountered a crash loop:

OSError: [Errno 24] No file descriptors available
ERROR:asyncio:Unhandled exception in client_connected_cb
transport: <_SelectorSocketTransport fd=7 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/utils/solarman_rtu_proxy.py", line 29, in handle_client
    solarmanv5 = PySolarmanV5Async(
        logger_address, logger_serial, verbose=True, auto_reconnect=True
    )
  File "/usr/local/lib/python3.13/site-packages/pysolarmanv5/pysolarmanv5_async.py", line 66, in __init__
    self.data_wanted_ev = Event()
                          ~~~~~^^
  File "/usr/local/lib/python3.13/multiprocessing/context.py", line 93, in Event
    return Event(ctx=self.get_context())
  File "/usr/local/lib/python3.13/multiprocessing/synchronize.py", line 331, in __init__
    self._cond = ctx.Condition(ctx.Lock())
                 ~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/multiprocessing/context.py", line 78, in Condition
    return Condition(lock, ctx=self.get_context())
  File "/usr/local/lib/python3.13/multiprocessing/synchronize.py", line 221, in __init__
    self._sleeping_count = ctx.Semaphore(0)
                           ~~~~~~~~~~~~~^^^
  File "/usr/local/lib/python3.13/multiprocessing/context.py", line 83, in Semaphore
    return Semaphore(value, ctx=self.get_context())
  File "/usr/local/lib/python3.13/multiprocessing/synchronize.py", line 133, in __init__
    SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
                         ~~~~~~~~~~~~~~~~~~~~~~~~^
        kind, value, maxvalue, self._make_name(),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        unlink_now)
        ^^^^^^^^^^^

This PR:

Exits with a non-zero status code, thus causing the docker container to be restarted instead of contineoulsy printing the error message
Switching to a debian based base image since only alpine appears to be affected by this issue: https://stackoverflow.com/questions/77679957/python-multiprocessing-no-file-descriptors-available-error-inside-docker-alpine
https://gitlab.alpinelinux.org/alpine/aports/-/issues/15651

githubDante · 2025-05-29T20:03:39Z

Hi,

The problem is caused by the Pysolarman initialization for each new client/connection which at some point leads to semaphore exhaustion (according to the linked posts). Modification of the proxy to use a single instance of Pysolarman should fix it.

githubDante · 2025-05-29T21:40:55Z

Hi,

Can you try this version of the proxy on alpine and see how it handles multiple connections/longer operating intervals.

ngehrsitz · 2025-06-04T18:05:08Z

Hi @githubDante,
unfortunately your version does not work at all with evcc as a client. I can see that some data is sent, but I just get IO timeouts

('127.0.0.1', 57492): New connection
DEBUG:pysolarmanv5.pysolarmanv5:[2967017391] SENT: a5 17 00 10 45 29 00 af 17 d9 b0 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 02 65 00 01 95 ad 94 15
('127.0.0.1', 57492): Connection closed

read failed: read tcp 127.0.0.1:57454->127.0.0.1:1502: i/o timeout
Power:          read failed: read tcp 127.0.0.1:57454->127.0.0.1:1502: i/o timeout
Energy:         read failed: read tcp 127.0.0.1:57456->127.0.0.1:1502: i/o timeout
Current L1..L3: read failed: read tcp 127.0.0.1:57460->127.0.0.1:1502: i/o timeout
read failed: read tcp 127.0.0.1:57463->127.0.0.1:1502: i/o timeout

I got slightly further with f889e55, but the reconnect does not work:
('127.0.0.1', 52834): New connection

DEBUG:pysolarmanv5.pysolarmanv5:[2967017391] SENT: a5 17 00 10 45 64 00 af 17 d9 b0 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 02 65 00 01 95 ad cf 15
WARNING:asyncio:socket.send() raised exception.
('127.0.0.1', 52834): Connection closed

davidrapan · 2025-06-04T18:36:04Z

evcc supports the Solarman protocol?

ngehrsitz · 2025-06-04T19:47:01Z

evcc supports the Solarman protocol?

No, it uses Modbus RTU over TCP which is why I am running solarman_rtu_proxy.py in the first place

davidrapan · 2025-06-04T22:40:19Z

Ah, I mixed up the proxy "type", my bad. :)

githubDante · 2025-06-05T18:31:36Z

Hi @githubDante, unfortunately your version does not work at all with evcc as a client. I can see that some data is sent, but I just get IO timeouts

This doesn't sounds good. Maybe I missed something. I will check and update when I have more info.

davidrapan · 2025-06-05T18:52:44Z

Yup, can confirm:

Listening on 0.0.0.0:8899
('192.168.144.208', 46344): New connection
DEBUG:pysolarmanv5.pysolarmanv5:[????????????] SENT: a5 33 00 10 45 ........................................
('192.168.144.208', 46344): Connection closed
('192.168.144.208', 46318): New connection
DEBUG:pysolarmanv5.pysolarmanv5:[????????????] SENT: a5 33 00 10 45 ........................................
('192.168.144.208', 36698): Connection closed

Edit: And wireshark doesn't show any comm proxy <-> logger, only client <-> proxy.

githubDante · 2025-06-05T19:18:19Z

The connect call was missing. The branch has been updated. I will try to test in docker as well.

davidrapan · 2025-06-05T19:40:19Z

('192.168.144.208', 47340): New connection
DEBUG:pysolarmanv5.pysolarmanv5:[????????????] SENT: a5 17 00 10 45 ........................................
DEBUG:pysolarmanv5.pysolarmanv5:[????????????] RECD: a5 41 00 10 15 ........................................

👍

githubDante · 2025-06-05T20:21:12Z

Also a fix for the reconnect issue has been added.

@ngehrsitz can you try again. Thank you in advance.

davidrapan · 2025-06-05T21:16:52Z

utils/solarman_rtu_proxy.py

+        asyncio.run(run_proxy(args.bind, args.port, args.logger, args.serial))
+    except Exception as e:
+        print(f"Exception: {e}")
+        sys.exit(1)


This is completely unnecessary call.

I also thaught that an expection should cause a crash of the script with a non-zero exit code.
But that was not the case, instead it kept prinitng the exception. I suspect this has something to do with how these scripts are made executable

pysolarmanv5/pyproject.toml

Lines 18 to 22 in 8b60764

[project.scripts]

solarman-decoder = "utils.solarman_decoder:main"

solarman-rtu-proxy = "utils.solarman_rtu_proxy:main"

solarman-scan = "utils.solarman_scan:main"

solarman-uni-scan = "utils.solarman_uni_scan:main"

Hmmm.. 🤔

ngehrsitz · 2025-06-05T21:52:41Z

Also a fix for the reconnect issue has been added.

@ngehrsitz can you try again. Thank you in advance.

It starts now. Let´s see if it is stable. Usually crashes occur after one or two days of runtime

ngehrsitz · 2025-06-05T23:17:21Z

Also a fix for the reconnect issue has been added.
@ngehrsitz can you try again. Thank you in advance.

It starts now. Let´s see if it is stable. Usually crashes occur after one or two days of runtime

@githubDante There is still some sort of locking issue. The client can only get data on the first connection. As soon as you reconnect, it stops working.
Besides, why are you using threading.Lock and not asnycio.Lock?
I also don´t understand why go to the effort to do all of the connection setup inside of handle_client? Wouldn´t it be much simpler to connect once before startup like in my variant? Then we would only need to fix the bugs in the reconnect logic.

githubDante · 2025-06-09T19:47:12Z

Hi,

I don't see it locked in my test environment, but I have to admit that I'm using a simple echo server simulating the datalogger socket. Just added lower socket timeout during the initialization and extra exception handling to avoid locking by a real datalogger while the proxy is waiting for a response from it (the lock can be held for too long with the default settings).

Besides, why are you using threading.Lock and not asnycio.Lock?

Don't know, it's just a habbit, I guess.

I also don´t understand why go to the effort to do all of the connection setup inside of handle_client?

For control over the PySolarman instance. Since the handle_client method is the main driving force of the proxy we can check the state of PySolarman when a new client is connected and prepare it to serve the requests which the client will start issuing in the next moment, otherwise extra effort should be made elsewhere (extra methods or tasks in the asyncio loop that need to check its state)

Wouldn´t it be much simpler to connect once before startup like in my variant? Then we would only need to fix the bugs in the reconnect logic.

It's basically the same at runtime - single connect (_solarman_init is like a singleton) and then state condition checks via _solarman_connect.

Norman Gehrsitz added 2 commits May 24, 2025 05:48

Exit for unhandled Exceptions

f72bb35

Switch docker base image to debian

e3ab2ee

davidrapan suggested changes Jun 5, 2025

View reviewed changes

	[project.scripts]
	solarman-decoder = "utils.solarman_decoder:main"
	solarman-rtu-proxy = "utils.solarman_rtu_proxy:main"
	solarman-scan = "utils.solarman_scan:main"
	solarman-uni-scan = "utils.solarman_uni_scan:main"

solarman_rtu_proxy.py: improve runtime stability #93

Are you sure you want to change the base?

solarman_rtu_proxy.py: improve runtime stability #93

Uh oh!

Conversation

ngehrsitz commented May 29, 2025

Uh oh!

githubDante commented May 29, 2025

Uh oh!

githubDante commented May 29, 2025

Uh oh!

ngehrsitz commented Jun 4, 2025

Uh oh!

davidrapan commented Jun 4, 2025

Uh oh!

ngehrsitz commented Jun 4, 2025

Uh oh!

davidrapan commented Jun 4, 2025

Uh oh!

githubDante commented Jun 5, 2025

Uh oh!

davidrapan commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

githubDante commented Jun 5, 2025

Uh oh!

davidrapan commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

githubDante commented Jun 5, 2025

Uh oh!

davidrapan Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

ngehrsitz Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

davidrapan Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngehrsitz commented Jun 5, 2025

Uh oh!

ngehrsitz commented Jun 5, 2025

Uh oh!

githubDante commented Jun 9, 2025

Uh oh!

Uh oh!

davidrapan commented Jun 5, 2025 •

edited

Loading

davidrapan commented Jun 5, 2025 •

edited

Loading

davidrapan Jun 5, 2025 •

edited

Loading