I2P Website issueshttps://i2pgit.org/i2p-hackers/i2p.www/-/issues2021-04-30T03:54:10Zhttps://i2pgit.org/i2p-hackers/i2p.www/-/issues/17Crawling: i2p2.i2p recursive source loops2021-04-30T03:54:10ZidkCrawling: i2p2.i2p recursive source loopsOpened [5 years ago](/timeline?from=2016-04-06T15%3A38%3A09Z&precision=second "See timeline at Apr 6, 2016 3:38:09 PM")
Last modified [5 years ago](/timeline?from=2016-05-04T16%3A25%3A58Z&precision=second "See timeline at May 4, 2016 4:...Opened [5 years ago](/timeline?from=2016-04-06T15%3A38%3A09Z&precision=second "See timeline at Apr 6, 2016 3:38:09 PM")
Last modified [5 years ago](/timeline?from=2016-05-04T16%3A25%3A58Z&precision=second "See timeline at May 4, 2016 4:25:58 PM")
## [\#1781](/ticket/1781)[assigned](/query?status=assigned)[defect](/query?status=!closed&type=defect)
# Crawling: i2p2.i2p recursive source loops
Reported by:[k1773r](/query?status=!closed&reporter=k1773r)Owned by:[str4d](/query?status=!closed&owner=str4d)
Priority:
[minor](/query?status=!closed&priority=minor)
Milestone:
[undecided](/milestone/undecided "No date set")
Component:
[www/i2p](/query?status=!closed&component=www%2Fi2p)
Version:
[0.9.24](/query?status=!closed&version=0.9.24)
Keywords:
Cc:
Parent Tickets:
Sensitive:
[no](/query?status=!closed&sensitive=0)
### Description
While crawling www.i2p2.i2p i get recursive links which lead to a "page not found" site, but the HTTP status is 200. On those pages i get further nested links and it starts all over. Eventually it will hit a 404 (as shown below).
crawler logs:
first link is the site crawled, second link is where it came from.
```
2016-04-06T**:19:40.798Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_ru.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_ru.html text/html #044 20160406**1940424+346 sha1:66374BVL4IQZ3HBJXFVOAYAZBWU6VGEQ - -
2016-04-06T**:19:39.700Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_nl.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_nl.html text/html #018 20160406**1939082+603 sha1:JWWJX7KEBMZCBJSEZW6C3TQPEEA6VG32 - -
2016-04-06T**:19:38.583Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_it.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_it.html text/html #047 20160406**1938203+365 sha1:TNDZLJEXSFWTE3UZ3FX4BHELNBQSAW3F - -
2016-04-06T**:19:37.853Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_fr.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_fr.html text/html #029 20160406**1937490+336 sha1:UIIBTTZBEW2LHC5TIWALY33YBZPQ4Y5C - -
2016-04-06T**:19:37.081Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_zh.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_zh.html text/html #018 20160406**1936671+397 sha1:P6IKCGRG77YEY3U3QGET6JQICO2M274M - -
2016-04-06T**:19:36.201Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_es.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_es.html text/html #047 20160406**1935726+448 sha1:GWBZFXTRMUQZQIPJ4EKA3FW4ERRRLYHS - -
2016-04-06T**:19:35.361Z 404 22321 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_de.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index_de.html text/html #040 20160406**1934995+353 sha1:M56A3Y62E7AJYUEURZ224EEEYXS3GYCP - -
2016-04-06T**:19:34.526Z 404 22318 https://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index.html LEEEEEEEELERR http://geti2p.net/feeds/p/i2p/downloads/_static/styles/_static/styles/_static/_static/styles/_static/styles/_static/_static/index.html text/html #048 20160406**1934130+372 sha1:MAW4ZNR2RB4RFR6XG2UECOZCKQFT4TFW - -
```
The Crawler would detect the loop after some nested loops, but for now i just created a exclude regex.
### Subticketsundecided