Skip to content

Conversation

unclearParadigm
Copy link

Hi @sigaloid,

It has been a while since I initially proposed to add Support for Proxies. The project was still called LibReddit at that time - see original request: libreddit/libreddit#841 .

Further Issues that relate to that

Since my instance is getting rate-limited frequently, people are already complaining about it, and I don't want to add further VPS (with new public IPs) just to circumvent the rate-limiting. So I finally decided to patch redlib to support HTTP_PROXY and SOCKS proxies.

Changes in this MR

Redlib honors the following Environment-Variables:

  • HTTP_PROXY
  • HTTPS_PROXY
  • SOCKS_PROXY

if SOCKS_PROXY is set, it'll take precedence over HTTPS_PROXY and HTTP_PROXY. Additionally the Environment Variables support authentication using the following format scheme://username:password@host:port (e.g. http://proxyUsername:mySuperSecureProxyPassword@localhost:8090).

For SOCKS support I introduced tokio-socks as dependency. For HTTP Proxy support, I just wrote a simple HTTP CONNECT wrapper. In all cases, consumers of the CLIENT won't notice a difference.

Tests

I have tested HTTP Proxying with TinyProxy locally. And I have tested SOCKS Proxying with my VPN Providers SOCKS Proxy, and TOR proxying through a local tor-socks-proxy container.

Would appreciate if you could review this MR - and maybe merge back. That'd make the world a better place - at least get rid of some rate-limits ^^

Cheers,
ruffy

@unclearParadigm
Copy link
Author

Hey @sigaloid, hope you're doing good. I don't want to rush things here, but it's been around a month without a response to that PR. Is there anything wrong, problematic with the PR? Can/Should I change anything?

@uhthomas
Copy link

uhthomas commented Sep 6, 2025

I've tested the SOCKS5 proxy on my own instance and it works great, thank you :)

@sigaloid
Copy link
Member

sigaloid commented Sep 9, 2025

This looks great! Thanks for the PR.

Copy link
Member

@sigaloid sigaloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a few things - what does the encryption look like for something like this? I'd rather not allow unencrypted HTTP proxies at all for privacy reasons - I don't want to send plaintext traffic under any configuration.

Comment on lines +132 to +134
if !response.starts_with("HTTP/1.1 200") {
return Err(Box::new(ProxyError(format!("Proxy CONNECT failed: {}", response))));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the particulars regarding HTTP proxies but I assume this is acceptable as most (all?) proxies support 1.1? Also is there really no packages that handles the manual TCP transmission 😅 I tried to find one myself last time I tried to tackle this and similarly could not find one. It's also not my area of expertise so 😞

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I am a fan of HTTP > 1.1 and all the innovation that HTTP2 and HTTP3 bring - a proxy implementation that supports CONNECT only over 2/3 but not over 1.1 would be very weird one. It'd be equally weird if a Webserver implementation would only support 2/3 and deliberately not support HTTP 1.1.

I have to admit, I'm not very much into Rust and the ecosystem to know all of the libraries flying around. But I did a fair amount of research, even asked some LLMs (*pilot, *gpt) for potentially existing implementations that tackle HTTP Proxying. If you stumble over a lib, I'd be happy to patch it in accordingly. I can very much understand why you don't want that in Redlib ;)

One option is to extract that logic into a library block and make redlib depend on it? Would you feel more comfortable if that logic is not within redlib, but in a separate repo/crate/package/lib/submodule?

@uhthomas
Copy link

uhthomas commented Sep 9, 2025

Just chiming in to say I'm using a plaintext SOCKS5 proxy. I run redlib in Kubernetes and have a user space wireguard SOCKS5 proxy as a sidecar container. Forced encryption or authentication would be a huge pain and for no benefit.

@sigaloid
Copy link
Member

sigaloid commented Sep 9, 2025

How about a mandatory flag if your proxy is plaintext, like PLAINTEXT_PROXY=1? It's something I want to avoid accidentally enabling, not something I am morally opposed to. If this is common or expected behavior across similar projects, though, I'd be okay to merge it - any citations for others that do it like this?

@uhthomas
Copy link

uhthomas commented Sep 9, 2025

I am pretty sure some projects use two different environment variables: HTTP_PROXY and HTTPS_PROXY. That should be pretty clear on whether encryption is intended?

This is how Go handles it: https://cs.opensource.google/go/x/net/+/internal-branch.go1.17-vendor:http/httpproxy/proxy.go;l=92

@sigaloid
Copy link
Member

sigaloid commented Sep 9, 2025

That's fair - the hard-coded 80 did make me nervous but I'm happy to leave it as HTTP_PROXY, if that's standard.

@uhthomas
Copy link

uhthomas commented Sep 9, 2025

There isn't really a standard for these env vars unfortunately, and their meaning can be a bit weird / deceptive. The example I linked will prefer HTTPS_PROXY for https requests, not that it means that it will use a HTTPS proxy. Sorry if that's confusing.

Anyway, it's my understanding that using plain HTTP is actually not that bad in this case? The HTTP proxy server will essentially just proxy TCP, and so the actual connection between redlib and reddit are still using HTTPS / TLS. The HTTP proxy cannot inspect or modify the stream without breaking it.

I don't see any explicit handling for HTTPS/TLS though, so I don't think HTTP_PROXY=https://example.com will work, not that it's necessary.

@unclearParadigm
Copy link
Author

Hey,

happy you @sigaloid got to review the PR - thank you very much for taking the time, also thanks to @uhthomas for testing and your (very much correct) answers so far. I'd like to add a few cents into HTTP_PROXY and HTTPS_PROXY discussion and some more details to get a common understanding here - buckle up that response might take me some hours to formulate - if it took me long to write, it should take you long to read 😄

HTTP CONNECT (tunneling/piping)

The HTTP CONNECT wrapper I have implemented here is using the de-facto way to proxy traffic through HTTP, sometimes even referred to as HTTP tunneling/piping. Redlib as client opens up a TCP socket to the proxy and continues to send a HTTP CONNECT to the PROXY (proxy address taken from HTTP_PROXY or HTTPS_PROXY env vars), which contains the Hostname/FQDN + Port of the destination (e.g. api.reddit.com:443). The Mozilla Dev Docs show a simple request example. The proxy will extract the FQDN/Hostname + Port from the CONNECT request - and opens up a TCP socket to the specified target, then the proxy will respond with (let's assume the happy path here) with a HTTP 200 Status Code. The catch here is, that the Proxy will not close the underlying TCP socket after it responded. At this point the proxy acts as a dumb pipe from Redlib to Reddit. Whatever Redlib writes to socket to the proxy, is what the proxy will write to its open socket with reddit. Note, this is a raw socket - data transferred does not need to be HTTP-formatted content - at that point it's basically a tunnel or pipe for any TCP traffic. In a nutshell, at that point Redlib has a TCP connection to reddit.com, just not directly, but through the proxy. Let's continue with the journey and talk about TLS. Clients that open a connection on 443/tcp typically assume HTTPS (which is HTTP through TLS) - Redlib is luckily not an exception here. That requires a client to initiate a TLS Handshake. Redlib now uses the established TCP socket to the proxy (that pipes byte by byte to reddit) and starts the TLS handshake by sending a CLIENT HELLO. Again the proxy just pipes whatever it receives to reddit - and sends the CLIENT HELLO to Reddit. The reddit server will respond with a SERVER HELLO and the proxy will again just pipe it back to Redlib. After playing through the whole TLS Handshake procedure and land at the point that we have a trusted/encrypted connection from Redlib to Reddit using TLS - one that cannot even be snooped by the Proxy. I don't want to get into the TLS details here, but we're talking about the whole: "Alice (=Redlib) wants to talk to Bob (reddit.com) while Eve (the Proxy) is snooping on the whole conversation" situation. For the sake of simplicity (and to prevent me from ending up at explaining how electrons work), we can safely assume that TLS prevents Eve as MITM (Man in the middle - yes Eve identifies as man... ) to snoop on what Alice and Bob chat about.

HTTP_PROXY Vs. HTTPS_PROXY

There's a lot of confusion about the HTTPS_PROXY and HTTP_PROXY environment variables. Both ENV vars point to a proxy (in format scheme://<optionalUsername>:<optionalPassword>@<fqdn/hostname>:<TCP-port>). The only formal difference for HTTPS_PROXY is, that instead of sending the HTTP CONNECT on a plain-text TCP socket to the proxy, the client needs to perform a TLS handshake with the Proxy first - just to send the HTTP CONNECT over the afterwards TLS protected socket. At the end of the day it remains a dumb pipe.

Why HTTPS_PROXY is nonsensical in the case of Redlib?

Well, Redlib is a privacy fronted for Reddit. From what I have seen (and please tell me if that is incorrect), redlib only uses https:// URLS. I have not found a single reference to any http:// URLS pointing to reddit. Reddit has a TLS Server certificate that is trusted by Redlib. Redlib performs a TLS Handshake against Reddit, and considering that TLS does a great job at preventing MITM (especially in this setup), it is nonsensical to additionally encrypt from Redlib to the proxy (just for the sake of sending the one HTTP CONNECT). It's like wrapping up a package in 2 layers of bubble-wrap, when one layer would be enough already.

But why does Redlib then even support HTTPS_PROXY env var?

short answer: it's complicated and not very much standardized. From what I have seen with most clients implementing HTTP CONNECT method for proxying is, that they don't really care about the SCHEME prefix - all they want is FQDN/Hostname + Port of the proxy (optionally also Authentication credentials for the proxy). In almost all cases they attempt an HTTP CONNECT in plaintext first, which almost always works out. Some implementations don't honor HTTP_PROXY at all, as they rely on HTTPS_PROXY to be present. Others (like I have implemented) use HTTPS_PROXY and fall back to HTTP_PROXY if needed. Others only support only HTTP_PROXY as env var. Sysadmins also have developed the following protective measure to configure their systems to maximize compatibility with all applications.

myproxy="http://yomamasof.at:80`
export HTTP_PROXY="$myproxy"
export HTTPS_PROXY="$myproxy"
# purposefully setting HTTP_PROXY to the same as HTTPS_PROXY to maximize compatibility.

What about the big corporate proxies that terminate TLS and re-establish TLS?

Now one might argue, that not all proxies operate the way I've been trying to explain it above. And yes, that's a great argumentation - Especially, the corporate world has some very evil implementations in place that allow inspection of each individual HTTP request flowing through it. They do this by "terminating" TLS. Redlib requests reddit.com through the proxy - but instead of showing Redlib the certificate of reddit.com to Redlib, it shows its own (that obviously is trusted on machines in the corporate environments). The client establishes a TLS protected socket with the Proxy - and the proxy thus can decrypt all the traffic with ease, and just re-encrypts (using another TLS handshake) whatever the client sent.. This MR does not support this setup, because for this setup a) Redlib would need a way to get a certificate of the proxy it is allowed to trust, and b) would not necessarily improve the privacy ;)

@unclearParadigm
Copy link
Author

LGTM except a few things - what does the encryption look like for something like this? I'd rather not allow unencrypted HTTP proxies at all for privacy reasons - I don't want to send plaintext traffic under any configuration.

Please refer to section: Why HTTPS_PROXY is nonsensical in the case of Redlib in this comment

@unclearParadigm
Copy link
Author

Just chiming in to say I'm using a plaintext SOCKS5 proxy. I run redlib in Kubernetes and have a user space wireguard SOCKS5 proxy as a sidecar container. Forced encryption or authentication would be a huge pain and for no benefit.

I'm in a similar boat - I host Redlib as a Deployment in a Kubernetes Cluster on 6 worker nodes with (atm) a total of 6 replicas, where each node has its own public IPv4 egress addresses, a round-robin Loadbalancer in front - still I'm getting rate-limited as if there is no tomorrow. I had a similar intention of spinning up a SOCKS proxy as sidecar. In my case a TOR proxy to circumvent the rate limits accordingly. I like your socks-wireguard proxy. Great work!

@uhthomas
Copy link

uhthomas commented Oct 4, 2025

Is there anything blocking this @sigaloid? It would be great to have this on the main branch so I don't have to build my own image and rebase it to get updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants