zzz requested to merge zzz/i2p.i2p:i2ptunnel-keepalive-client into master Jan 14, 2024

(persistence phase 1) and the I2P socket on the client side (persistence phase 2a).

This MR builds on previous MR !166 (merged) which added utilities for keepalive. It replaces draft MR !129 (closed) which was an early version of phase 1 only.

Phase 1 is the simplest. Phase 2a and 2b add more complexity and risk. Keepalive for each socket will be enabled per-tunnel by separate i2ptunnel.config settings, two for the client side (browser and i2p sockets) and one for the server side (i2p socket). In this MR, the client side defaults to true, and the server side is not yet implemented. Server side (phase 2b) will be in a subsequent MR.

If issues arise during the development cycle, we may change the defaults before release, to easily disable keepalive and reduce risk. While phase 1 and 2a are together in this MR, they may be disabled separately.

As implemented, I2P socket keepalive depends on browser socket keepalive. If the browser does not request keepalive, we will not request I2P socket keepalive. This is fine because all browsers support keepalive to proxies. For simplification, requests from internal sockets will not use keepalive, and EepGet will not either.

Note that without any server support, the server will always return Connection: close, and the I2P socket will not be persisted. The browser socket will be persisted regardless. Browser socket persistence alone has very little benefit, so this MR on its own will not measurably reduce resource usage.

Testers running with this patch may test I2P socket persistence with stats.i2p and zzz.i2p which are running with the upcoming server-side patch. When testing with any other server, only browser socket persistence will result.

Note that the new keepalive do/while loop in I2PTunnelHTTPClient is inserted without reindenting about 1100 lines of code in between. The loop is clearly marked. This is for ease of review. The indenting can be fixed up after merge if desired.

Below is an overview of this project.

Overview

Standards

See RFC 2616, which defines persistence as a hop-by-hop property. What it calls "messages" are hop-by-hop, while the "entity" is end-to-end. Each hop may or may not have persistence, independent of the other hops.

HTTP/1.1 is persistent by default. To disable persistence, a hop must inject or replace the Connection: header with Connection: close.

The usual setup in I2P is:

Browser -- Browser -- Client -- I2P   -- Server -- Server -- Web
           Socket     Proxy     Socket   Proxy     Socket    Server

However, the client and server proxies may be any combination of Java I2P or i2pd, of any version.

Additionally, the client proxy may be replaced by any non-HTTP-aware proxy such as a standard tunnel, a SOCKS client tunnel, or SAM. The server proxy may be replaced by any non-HTTP-aware proxy such as a standard tunnel, or SAM.

Previous Work

In 2005 I attempted to implement persistence and pipelining and created a patch. Pipelining is very difficult, and it was not enabled by default in browsers anyway. Today, pipelining has been abandoned by the browsers because support in proxies is very poor. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Connection_management_in_HTTP_1.x

The 2005 patch had a fatal flaw. It attempted to manage persistence and pipelining end-to-end. The browser, client proxy, server proxy, and client would all have to support persistence for it to work.

The current I2P code is not completely compliant with the hop-by-hop specification, and the 2005 pipelining patch made it much worse.

Development Plan

Pipelining is neither supported nor planned in this new MR series.

By implementing persistence for each hop separately, it is RFC-compliant, and it does not require simultaneous support at all points. This allows a gradual implementation for each of the three hops and by different projects.

However, by only supporting persistence on a hop if the previous hop has persistence, this implementation can be vastly simpler. In particular, we will only persist the I2P socket (phase 2) if the browser requests persistence on the browser socket (phase 1). While we do not plan to support server socket persistence (phase 3), it would only be enabled if the I2P socket were persisted (phase 2). This is explained further below.

Planned support for persistence on each socket combination:

-------  ---   ------  -------
Browser  I2P   Server  Support
-------  ---   ------  -------
   n      n      n     now
   y      n      n     phase 1 (client side)
   n      y      n     no
   y      y      n     phase 2a (client side) and 2b (server side)
   n      n      y     no
   y      n      y     no
   n      y      y     no
   y      y      y     phase 3 (server side not planned)
-------  ---   ------  -------

There are further simplifications to reduce risk. We do not support keepalive after an error, or for CONNECT or POST, or for HTTP/1.0. Also, we will not supporting additional requests on the I2P socket after the browser changes requested host:port.

In summary, this plan enables persistence for the common use cases that will provide the most benefit, to reduce complexity and risk.

Phase 1: Browser Socket

This is the simplest. However, it requires significant changes in the HTTP Client Proxy. Browser-to-client-proxy sockets will be persistent. Each I2P socket will be unique for a single request, as now. No change to the server side.

There is almost no performance savings for phase 1, because localhost-to-localhost sockets are cheap. There will be some savings in that we currently start two threads for every socket. However, it gives us experience with persistent connections. Also, it is necessary for phase 2. Without persistent browser sockets, if we tried to implement phase 2 only, we would have to maintain a "pool" of idle I2P sockets, indexed by server destination:port, and grab one to use for the next request.

So it's by far easier to implement Phase 2 if we do Phase 1 first, and require keepalive on the browser socket to use keepalive on the I2P socket. All browsers support keepalive to proxies.

To further reduce risk, keepalive is not supported by EepGet or by internal sockets, so vital functions such as reseeding, news fetching, and torrent announces should not be affected.

Phase 2: I2P Socket

With Phase 1 implemented, we can bind the I2P socket to the browser socket. As long as the browser continues making requests to the same destination, we will reuse the I2P socket as well.

When a request for a different destination:port comes in, we will close the I2P socket for the old destination:port, and open a new one.

Thus we maintain a 1:1 mapping of I2PTunnelHTTPClient, HTTPResponseOutputStream, I2PTunnelHTTPClientRunner, browser socket, and I2P socket.

This phase provides significant bandwidth savings and, perhaps, some latency savings also. Bandwidth is reduced because the streaming SYN is very large. Latency savings are small because (unlike for TCP) streaming supports 0-RTT delivery of data, we don't have to wait for a 3-way handshake.

Phase 2a is the client side, and phase 2b is the browser side.

In phase 2a, we must wait for the end-of-message on the I2P socket, and then allow the client proxy to accept a new request.

End-of-message detection is possible for the following cases:

HEAD response (no message body, just wait for end of headers)
1xx, 204, and 304 responses (no message body, just wait for end of headers)
Response with Content-Length (count the bytes)
Chunked response (wait for chunk trailer)
x-i2p-gzip response (wait for gzip trailer)

This is handled by HTTPResponseOutputStream and the various utilities in the util/ directory. When done, the utility signals I2PTunnelHTTPClientRunner to stop forwarding data, flush the data (but not close the socket), and return control back to I2PTunnelHTTPClient. All of this only happens, of course, if the server side does not send Connection: close. That will be enabled in the next MR.

The biggest change from MR 129 is that HTTPResponseOutputStream.close() now does the I2P socket close in a separate thread if keepalive is enabled on the browser socket. This prevents a possibly lengthy delay waiting for the streaming close ack.

Phase 3: Server Socket

This would be complex and is not necessary. There is almost no performance savings for phase 3. We would have to maintain a "pool" of idle server sockets, indexed by server IP:port, and grab one to use for the next request.

The only case were there would be performance improvement would be for the outproxy, or for people running web servers remotely (not on localhost).

Assumptions

All current I2P and i2pd client and server proxies, and all maintained clients such as eepget, currently inject Connection: close, as required by RFC 2616 (unless sending HTTP/1.0 in the request), to indicate that they do not support keepalive on that hop.

Current Java Behavior

The client proxy injects an X-Accept-Encoding header containing x-i2p-gzip to the request header. The actual current value is: "x-i2p-gzip;q=1.0, identity;q=0.5, deflate;q=0, gzip;q=0, *;q=0"
The server proxy looks for X-Accept-Encoding header containing x-i2p-gzip" in the request headers (this is the current standard)
The server proxy looks for Accept-Encoding header containing x-i2p-gzip" in the request headers (this hasn't been done in 20 years)
The server proxy strips the X-Accept-Encoding header from the request
The server proxy passes through the Accept-Encoding header in the request
If the client supports x-i2p-gzip (as specified in the request headers), and the response Content-Length is either not present or above a minimum size, and the response Content-Type is either not present or is not a known compressed mime type (image, audio, video, gzip, etc.), and there is no Content-Encoding header in the response, THEN the server proxy injects a "Content-Encoding: x-i2p-gzip" header to the response, and gzips the payload, including any chunking sent by the server.
If the client proxy sees a "Content-Encoding: x-i2p-gzip" header to the response, it removes the header, and gunzips the response.

Problems with current specification and Java implementation

Much of this violates RFC 2616. While it works in practice, the violations aren't well-documented and that makes the code less maintainable. In some cases, we're relying on browsers to handle our protocol violations, which may not always be true.

Does not dechunk/chunk per-hop; passes through chunking end-to-end
Passes Transfer-Encoding header through end-to-end
Uses Content-Encoding, not Transfer-Encoding, to specify the per-hop encoding
Prohibits x-i2p gzipping when Content-Encoding is set (but we probably don't want to do that anyway)
Gzips the server-sent chunking, rather than dechunk-gzip-rechunk and dechunk-gunzip-rechunk
Because the chunking, if any, is inside the gzip rather than outside, there's no easy way to find the end of the data, which prohibits I2P socket keepalive on those sockets. This violates the requirement that all Transfer-Encoding other than "identity" is chunked.
The elaborate X-Accept-Encoding header with all the q values is pointless; we just look for x-i2p-gzip and then strip the header out.
Spec says Content-Length must not be sent if Transfer-Encoding is present, but we do. Spec says ignore Content-Length if Transfer-Encoding is present, so it works for us.

Changes to support I2P Socket Persistence

Requirements:

Backward compatible with all previous versions of I2P and i2pd client and server proxies, and I2P eepget, or at least for several years back.
Compatible with browsers and servers
Compatible with the current outproxies

Proposal:

While x-i2p-gzip is not chunked (violating RFC 2616), we can still use it to find end-of-message, by looking for the gzip trailer. This allows us to do persistence even if the reply is x-i2p-gzip. So we do not need to come up with a new gzip-and-chunked transfer encoding.

Therefore, all the following responses may be persisted:

HEAD response (no data)
1xx, 204, 304 responses (no data)
Responses with content-length
Chunked responses
x-i2p-gzip responses

Unchanged

Don't support keepalive after client-side errors. All error pages will continue to send Connection: close.
Don't support keepalive for internal sockets.
Don't add keepalive support to eepget. Eepget will continue to send Connection: close.
Continue passing through Transfer-Encoding and chunking end-to-end in most cases, even though it violates the spec.
Continue supporting x-i2p-gzip

Edited Jan 14, 2024 by zzz

Implement client-side HTTP persistence (keepalive) for the browser-to-client proxy socket