Draft: Do not merge: I2PTunnel: Implement HTTP persistence for the browser-to-client proxy socket
(persistence phase 1)
Warnings:
- WIP
- Lightly tested, requires much more
- Requires further cleanup
- Not for 2.4.0
- Will not provide any measurable performance benefit
- Indenting in HTTPClient intentionally wrong to make it easier to review
- Contains some unrelated cleanup that will make this more difficult to review
Previous Work
In 2005 I attempted to implement persistence and pipelining and created a patch. Pipelining is very difficult, and it was not enabled by default in browsers anyway. Today, pipelining has been abandoned by the browsers because support in proxies is very poor. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Connection_management_in_HTTP_1.x
The 2005 development had a fatal flaw. It attempted to manage persistence and pipelining end-to-end. The browser, client proxy, server proxy, and client would all have to support persistence for keepalive to work. But RFC 2616 defines persistence as a hop-by-hop property (what it calls "messages" are hop-by-hop, while the "entity" is end-to-end).
The current code is not completely compliant with the hop-by-hop specification, and the 2005 patch made it much worse.
The good news is that by implementing persistence for each hop separately, it is RFC-compliant, and it does not require simultaneous support at all points. This allows a gradual implementation for each of the three hops.
Browser -- Browser -- Client -- I2P -- Server -- Server -- Web Socket Proxy Socket Proxy Socket Server
Proposed Development Plan
Phase 1: Browser Socket
This is the simplest. However, it requires significant changes in the HTTP Client Proxy. Browser-to-client-proxy sockets will be persistent. Each I2P socket will be unique for a single request, as now.
There is almost no performance savings for phase 1, because localhost-to-localhost sockets are cheap. There will be some savings in that we currently start two threads for every socket. However, it gives us experience with persistent connections. Also, it is necessary for phase 2. Without persistent browser sockets, if we tried to implement phase 2 only, we would have to maintain a "pool" of idle I2P sockets, indexed by server destination:port, and grab one to use for the next request.
So it's by far easier to implement Phase 2 if we do Phase 1 first, and require keepalive on the browser socket to use keepalive on the I2P socket. All browsers support keepalive to proxies.
Phase 2: I2P Socket
With Phase 1 implemented, we can bind the I2P socket to the browser socket. As long as the browser continues making requests to the same destination, we will reuse the I2P socket as well.
When a request for a different destination:port comes in, we will close the I2P socket for the old destination:port, and open a new one.
Thus we maintain a 1:1 mapping of I2PTunnelHTTPClient, HTTPResponseOutputStream, I2PTunnelHTTPClientRunner, browser socket, and I2P socket.
This is all made possible by abandoning the attempt to support pipelining.
This phase provides significant bandwidth savings and, perhaps, some latency savings also. Bandwidth is reduced because the streaming SYN is very large. Latency savings are small because (unlike for TCP) streaming supports 0-RTT delivery of data, we don't have to wait for a 3-way handshake.
Phase 3: Server Socket
This is probably not necessary. There is almost no performance savings for phase 3. We would have to maintain a "pool" of idle server sockets, indexed by server IP:port, and grab one to use for the next request.
The only case were there would be performance improvement would be for the outproxy, or for people running web servers remotely (not on localhost).
I2P Proxy Specification Changes
This will be required to support persistence on I2P sockets that contain x-i2p-gzip responses.
The current i2ptunnel proxy behavior is documented very briefly at http://i2p-projekt.i2p/en/docs/api/i2ptunnel
Specification changes are only required for phase 2, and only for responses currently encoded with x-i2p-gzip (i.e., responses not gzip encoded by the server).
We could split phase 2 into 2A and 2B, where we only implement x-i2p-gzchunked in 2B, and let x-i2p-gzip responses close the I2P socket.
Assumptions
- All current I2P and i2pd client and server proxies, and all maintained clients such as eepget, currently inject Connection: close, as required by RFC 2616 (unless sending HTTP/1.0 in the request), to indicate that they do not support keepalive on that hop.
Current Java Behavior
- The client proxy injects an X-Accept-Encoding header containing x-i2p-gzip to the request header. The actual current value is: "x-i2p-gzip;q=1.0, identity;q=0.5, deflate;q=0, gzip;q=0, *;q=0"
- The server proxy looks for X-Accept-Encoding header containing x-i2p-gzip" in the request headers (this is the current standard)
- The server proxy looks for Accept-Encoding header containing x-i2p-gzip" in the request headers (this hasn't been done in 20 years)
- The server proxy strips the X-Accept-Encoding header from the request
- The server proxy passes through the Accept-Encoding header in the request
- If the client supports x-i2p-gzip (as specified in the request headers), and the response Content-Length is either not present or above a minimum size, and the response Content-Type is either not present or is not a known compressed mime type (image, audio, video, gzip, etc.), and there is no Content-Encoding header in the response, THEN the server proxy injects a "Content-Encoding: x-i2p-gzip" header to the response, and gzips the payload, including any chunking sent by the server.
- If the client proxy sees a "Content-Encoding: x-i2p-gzip" header to the response, it removes the header, and gunzips the response.
- We also inject Proxy-Connection: close in some places, although that is apparently wrong: see http://jdebp.info/FGA/web-proxy-connection-header.html
Problems with current specification and Java implementation
Most of this violates RFC 2616. While it works in practice, the violations aren't well-documented and that makes the code less maintainable. In some cases, we're relying on browsers to handle our protocol violations, which may not always be true.
- Does not dechunk/chunk per-hop; passes through chunking end-to-end
- Passes Transfer-Encoding header through end-to-end
- Uses Content-Encoding, not Transfer-Encoding, to specify the per-hop encoding
- Prohibits x-i2p gzipping when Content-Encoding is set (but we probably don't want to do that anyway)
- Gzips the server-sent chunking, rather than dechunk-gzip-rechunk and dechunk-gunzip-rechunk
- Because the chunking, if any, is inside the gzip rather than outside, there's no easy way to find the end of the data, which prohibits I2P socket keepalive on those sockets. This violates the requirement that all Transfer-Encoding other than "identity" is chunked.
- The elaborate X-Accept-Encoding header with all the q values is pointless; we just look for x-i2p-gzip and then strip the header out.
- Spec says Content-Length must not be sent if Transfer-Encoding is present, but we do. Spec says ignore Content-Length if Transfer-Encoding is present, so it works for us.
Changes to support I2P Socket Persistence
Requirements:
- Backward compatible with all previous versions of I2P and i2pd client and server proxies, and I2P eepget, or at least for several years back.
- Any new encoding name must not contain the string "x-i2p-gzip" because current I2P proxies search for that string in the headers
- Compatible with browsers and servers
- Compatible with the current outproxies
- Do not leak any i2p-specific headers out the outproxies
- Get closer to RFC 2616 standards
- Don't gunzip if the Content-Encoding is gzip, or we screw up download of gz/tgz files (see eepget)
While the standard "Transfer-Encoding: gzip" may be used for the response, we still need a way for the client proxy to indicate support to the server proxy. We can't use "gzip" in X-Accept-Encoding because it's already in there. For symmetry, better to use x-i2p-gzchunked in the response, to match the request. We just specify that the format is identical to gzip.
Proposal:
- Client proxy sends: X-Accept-Encoding: x-i2p-gzip, x-i2p-gzchunked
- If server proxy sees any Transfer-Encoding except for chunked, pass it through as we do now, and disable x-i2p-gzchunked
- If we're going to use x-i2p-gzchunked, and the server sends Transfer-Encoding: chunked, then server proxy dechunks first (don't double-chunk)
- Server proxy sends: "Transfer-Encoding: gzip" or x-i2p-gzchunked?
- Both proxies continue to send chunking through end-to-end, except for x-i2p-gzchunked, in which case the server proxy dechunks if necessary, gzips, and chunks.
- The client proxy dechunks, gunzips, and chunks if there's no Content-Length.
- Alternatively, the client proxy could replace Transfer-Encoding: x-i2p-gzchunked with gzip and pass it through to the client to do the gzipping, which isn't standard, but would save in-java CPU where gzipping is less efficient than in the browser.
- Phase 2 follows phase 1 by at least one release
Unchanged
- Don't support keepalive after client-side errors. All error pages will continue to send Connection: close.
- Don't support keepalive for internal sockets.
- Don't add keepalive support to eepget. Eepget will continue to send Connection: close.
- Continue passing through Transfer-Encoding and chunking end-to-end in most cases, even though it violates the spec.
- Continue supporting x-i2p-gzip for backward compatibility
- Probably don't ever do the server side (phase 3)
Future I2P socket keepalive:
This will allow future development on I2P socket keepalive. Dechunk (or just spy on if we're passing it through) at the client proxy, so we know where the end is and can return the socket for the next request.