netdb: lookup throttler issues
Results of a few hours of research and code review: 2.3.0 added three Lookup throttlers to the existing one. Added were:
- Drop burst lookup
- Ban lookup
- Ban burst lookup
2.3.0-1 commented out the use, but not the instantiation of, the last two.
Issues:
- Resource usage issues with 2.3.0 changes
- Greatly magnified resource/lifecycle/memory leak issues with subdb changes
- Flawed use of DLM "from" field for banning
FNDF:
private final int BAN_LOOKUP_BASE = 75;
private final int BAN_LOOKUP_BASE_INTERVAL = 5*60*1000;
private final int BAN_LOOKUP_BURST = 10;
private final int BAN_LOOKUP_BURST_INTERVAL = 15*1000;
private final int DROP_LOOKUP_BURST = 10;
private final int DROP_LOOKUP_BURST_INTERVAL = 30*1000;
_lookupThrottler = new LookupThrottler(); // 20, 3 minutes
_lookupBanner = new LookupThrottler(BAN_LOOKUP_BASE, BAN_LOOKUP_BASE_INTERVAL);
_lookupThrottlerBurst = new LookupThrottler(DROP_LOOKUP_BURST, DROP_LOOKUP_BURST_INTERVAL);
_lookupBannerBurst = new LookupThrottler(BAN_LOOKUP_BURST, BAN_LOOKUP_BURST_INTERVAL);
-
Do we have any data that any of the three added throttlers are effective? I guess not for the ban-ers, because two were removed in -1:
-
in -1, FloodfillDatabaseLookupMessageHandler, commented out shouldBanLookup() and shouldBanBurstLookup() calls. This is the only call location for these methods, but these methods and the two burst throttlers _lookupBanner and _lookupBannerBurst were not commented out or removed in FNDF.
// Implementation of the banning of routers based on excessive burst DLM
// is pending a reliable way to discriminate between DLM that are sent
// and replied directly, and DLM that are forwarded by a router OBEP.
This is sortof true but not really. The "from" field in the DLM is put in by the sender; it is the requestor or the reply gateway, and it must not be relied on. It can be spoofed. You cannot ban that host based on that field. This cannot be fixed. ref: http://i2p-projekt.i2p/spec/i2np#databaselookup
-
The "burst" intervals are very short which leads to high resource usage
-
The subdbs now each have 4 throttlers. Do subdbs need throttlers? Do subdbs get lookup requests at all? Aren't these stripped out in InboundMessageDistributor? To be researched.
-
The throttlers use timers and have no shutdown methods, so this is another subdb memory leak, see #406 (closed)
So, let's look at the burst drop throttler, since the ban throttlers are no longer used. The original one is 75 in 5 minutes (15/minute). The burst one added in 2.3.0 is 10 in 30 seconds (20/minute). Not too different, just gets triggered faster. The idea, I guess, is to start dropping sooner? I suppose all this was added in response to an attack but I don't have the data on that attack or the test results for the code added in 2.3.0 so I'm just guessing here.
But was the correct solution to add a second throttler? Could we have accomplished our goals by just adjusting the limit and/or time constant of the first throttler?
So, do we really need the burst drop throttler?
The overall result of the changes is we used to have ONE lookup throttler that fired a timer every 3 minutes. Now EACH db fires about 32 timers every 3 minutes. and those are leaky, so after a few days of uptime we have hundreds of timers firing every 3 minutes.
Tentative recommendations, pending further research and idk review:
- Remove the two ban throttlers in FNDF and the commented-out code in the Handler
- Don't create/use throttlers in subdbs
- Consider removing the drop burst throttler and adjusting the original throttler's threshold and/or time constant after review of the data both pre- and post-2.3.0.
marking as blocker for 2.4.0