I2PSnark filename conversion to builtin charset in windows may cause data loss
Opened 6 years ago
Last modified 6 years ago
#1415newdefect
I2PSnark filename conversion to builtin charset in windows may cause data loss
Reported by:DjJeshkOwned by:zzz Priority: major Milestone: 0.9.20 Component: apps/i2psnark Version: 0.9.17 Keywords: filenames corruption i2psnark Cc:
Parent Tickets:
Sensitive: no
Description
I2P version: 0.9.17-0
Java version: Oracle Corporation 1.7.0_71 (Java™ SE Runtime Environment 1.7.0_71-b14)
Wrapper version: 3.5.25
Server version: 8.1.16.v20140903
Servlet version: Jasper JSP 2.1 Engine
Platform: Windows XP x86 5.1
Processor: Core 2 (45nm) (core2)
Jbigi: Locally optimized native BigInteger? library loaded from file
Encoding: Cp1257
Charset: windows-1257
If torrent contains filename which contains a character which does not exist at system char set, it will be converted to
_
for example Sakın Bana Söyleme is converted to Sak_n Bana Söyleme.
If torrent contains two different filenames which converts to same filename it may cause unexpected behaviour including sending corrupted pieces to others or neverending torrents.
Reason: fixed width 1 byte long characters are in use which limits usable characters to 256 - 32 = 224 (first 32 characters are not allowed in filenames).
Solution: use bultin 2 byte long characters (wide chars, wchars, unicode characters) and leave one byte long character set for 1980s and 1990s, when Unicode was not implemented yet.