aboutsummaryrefslogtreecommitdiff
path: root/linux
AgeCommit message (Collapse)Author
2014-10-15Change 'big5' to 'Big5'jshin@chromium.org
Some layout tests and chrome browser tests compare the encoding labels case-sensitively. Fixing them all takes a while. In the meantime, change the ICU alias table to use exactly what they expects (Big5). Also drop big5-hkscs from the alias for Big5 for now. They'll be unified in another CL. See the layout test failures for PS #2 at https://codereview.chromium.org/649413002 BUG=412053 TEST=browser_tests --gtest_filter=*ncoding* TEST=Layout tests : http/tests/misc/char-encod*, fast/encoding/* R=jsbell@chromium.org TBR=jsbell@chromium.org Review URL: https://codereview.chromium.org/654153002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292476 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-10-13Make all the single byte encodings compliant to the encoding spec.jshin@chromium.org
1. Replace the current encoding alias list (heavily patched) with our own HTML5-specific alias list. It's mostly generated from encoding.json, which is in turn derived from the WHATWG Encoding living standard. The most notable difference is that UTF-32 entries are kept until bug 417850 is resolved. Two other differences are: a. Two aliases for iso-8859-8-i (logical and csiso88598i) are not listed. They're dealt with in Blink. b. Chinese (gb*, big5*) aliases are not yet aligned to the encoding spec pending our decision on the unification of Big5 / Big5-HKSCS and GBK / GB18030. 2. Replace all the single-byte mapping tables with what's automatically generated with scripts/single-byte-gen.sh that uses index-* files downloaded from the WHATWG spec site. This will fix the decoding (ToUnicode) of windows-874 and windows-1253 while removing a lot of fallback/spurrious mapping entries in encoding direction ('FromUnicode') in a number of encodings. 3. Regenerate the ICU binary data files for Linux/Mac/Android/Windows/CrOS. 4. Remove now obsolete noop-*ucm files used to make ISO-2022-CN* decoder to turn an empty string. They're not necessary any more because ISO-2022-CN* were made 'replacement' encodings in Blink and our version of ICU does not have any code for ISO-2022-CN* any more. This cuts down the data size by 15kB. On Android, there's virtually no change in the data size because the previous data file on Android accidentally had smaller locale data for nb and ms. BUG=412053 TEST=browser_tests --gtest_filter="*ncoding*" TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=Blink: fast/encoding/* TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-indexes TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-1253_test TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-874_test R=jsbell@chromium.org Review URL: https://codereview.chromium.org/598383002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292447 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-09-25Turn on UCONFIG_NO_NON_HTML5_CONVERTERjshin@chromium.org
UCONFIG_NO_NON_HTML5_CONVERTER was added earlier to our copy of ICU, but it was never set to 1. It's my oversight. 1. Turns UCON..CONVERTER on in icu.gyp to drop all the encodings not required by the Encoding spec. Dropped encodings include UTF-7, BOCU, SCSU, CESU, ISCII, ISO-2022-{KR, CN*}, HZ-GB, ISO-2022-JP's other than the original. 2. A lot more sections of the ICU converter code are excluded when it's set to 1 including the code for LMB (Lotus Multibyte) encodings and X11 compound text encoding (icu common). 3. The character encoding detections for encodings excluded are also disabled. (icu i18n) 4. ISO-2022-{KR, CN*} and HZ-GB can be dropped now because Blink treats them as replacement encoding. The corresponding alias entries from convertrs.txt are also removed. 5. ibm-874 was removed. We used to need it before Blink started, but not any more. We only need windows-874. 6. A mistaken in convertrs.txt was corrected : Big5-HKSCS was pointing to an old mapping table. 7. Per ICU upstream's suggestion, use '-html' suffix instead of '-html5' for the encoding tables derived from the WHATWG's encoding spec (ibm866, shift_jis and euc-jp). The static 64-bit release build of Chrome on Linux went down from 141,596,616 to 141,491,968 bytes (~ 100 kB reduction). Besides, the icu data size got smaller by ~ 19 kB ( 10,490,576 to 10,471,008 bytes). See http://bugs.icu-project.org/trac/ticket/11296 for an upstream bug I've filed on the issue. BUG=76328 TEST=browser_tests --gtest_filter="*ncoding*" TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=Blink: fast/encoding/* TEST=With shared library build, the following has no match. nm libicuuc.so | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii)' nm libicui18n.so | egrep -i '(2022kr|2022cn|ibm42)' TEST=With static library build, the following has no match. nm chrome | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii|ibm42)' R=jsbell@chromium.org, mark@chromium.org Review URL: https://codereview.chromium.org/587833004 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292131 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-09-02Update tz data to 2014f and add SJIS for the encoding specjshin@chromium.org
1. Timezone data files (4 of them) in source/data/misc to 2014f (the latest) to prepare for an upcoming Russian timezone change. 2. Add Shift_JIS converter compliant to the WHATWG encoding spec. 3. Update converters.txt and ucmlocal.mk accordingly 4. Update the pre-built data files for Linux/Mac/Android/Windows. (icudt.dll is not updated in this CL. It's not used in the default configuration. It'll be updated in a separate CL). 5. Fix a typo in ibm866_gen.sh. The acual table used does not need a change. BUG=277062,404445 TEST=After rolling icu to this revision, the following tests should pass. TEST=Blink: fast/encoding/* all pass except for fast/encoding/api/ascii-supersets.html that should fail by *passing* the test for Shift_JIS, which is expected to fail. Blink layout tests needs to be updated. TEST=browser_tests --gtest_filter="*ncoding*" TEST=In JS console, run the following to check if Europe/Moscow is 3 hrs ahead of UTC after Oct 26 and 4 hrs ahead before that and if Asia/Kamchatka remains 12 hrs ahead of UTC. nov1_2014_1500=new Date("11/01/2014 15:00Z") nov1_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) nov1_2014_1500.toLocaleString("en", {timeZone: "UTC"}) nov1_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) oct24_2014_1500=new Date("10/24/2014 15:00Z") oct24_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"}) oct24_2014_1500.toLocaleString("en", {timeZone: "UTC"}) oct24_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"}) TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" R=jsbell@chromium.org Review URL: https://codereview.chromium.org/497543003 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@291774 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-05-24Add a few words to cjdict.txt and remove an unintended change in line.txtjshin@chromium.org
When the upstream took our patches for CJ segmentation, over 10k words were dropped. Some of them are pretty common and not having them led to a Blink layout test failure. Add several of them back to cjdict.txt. In addition, remove a patch that breaks line breaking around single/double quotation marks. Rebuild the data for Linux/Mac/Windows/Android. BUG=132145 TEST=Once rolled, layouttest:fast/text/international/cjk-segmentation.html and fast/hyphen-min-preferred-width.html pass. TBR=mark Review URL: https://codereview.chromium.org/292123005 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@272650 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-05-05Add back display names for non-UI languages in A-L listjshin@chromium.org
I was too aggressive in trimming the data and dropped the display names for languages that Chromium needs (for non-UI languages that are in the A-L list). It's not my intention (the comment in trim_data.sh said one thing, but the code did another). Besides, add Norweigian (nb) and Malay (ms) locale data that were not included by mistake. Also update trim_data.sh script NOT to drop 'ALIAS' lines which are used to indicate that a given locale is an alias to another locale. That also required adding ro_MD.txt (null locale which mo.txt is aliased to). The above three adds about 110kB to the icu data (from 10.3MB to 10.4MB). Also update the pre-built icu data files for Linux, Mac and Windows. The Android data will be updated in a follow-up patch. BUG=132145 TEST=When ICU is rolled, unit_tests:ExtensionL10* pass. TBR=mark Review URL: https://codereview.chromium.org/264973016 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@268285 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-04-29Update EUC-JP per WHATWG encoding specjshin@chromium.org
- Add missing half-width kana entries (omitted by mistake) - Drop 'extra' decoding only mapping. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=25266 - Regenerate icu data files (*dat and assembly source files) for Linux, Mac, Windows and Android. (they'll not be shown at codereview.chromium.org because they're too large). BUG=132145,78847 TEST=When ICU is rolled in, base_unittests --gtest_filter=*ICU* and layout tests R=jsbell@chromium.org Review URL: https://codereview.chromium.org/251203003 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@266919 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-04-22Trim unit* sections in data/locales/*jshin@chromium.org
Add 'filter_locale_data' function to trim_data.sh Chromium/Blink do not use most of unit* sections in locale data. Keep only duration and compound sub-sections. Update the icudtl.dat and two assembly source files for Mac/Linux. It saves ~200kB (uncompressed). 7z-compressed size reduction is 34kB. With all these changes (up to this CL) applied, the net increase of the ICU data from icu 46 to 52 is 49kB with 7z-compressed. (3,070,246 vs 3,021,457) and ~ 390kB uncompressed (10,370,656 vs 9,980,368 ). BUG=132145 TEST=None. TBR=mark Review URL: https://codereview.chromium.org/247663002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@265354 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-04-18Remove {big5,gb2312}han collation datajshin@chromium.org
1. {big5,gb2312}han collation data is not used by anybody because they're useless as a sorting order. Add a function to trim_data.sh to remove them from zh.txt 2. Remove remove_unihan.sh and add back unihan rules to coll/{zh,ja,ko}.txt. In ICU 52, tools/genrb does NOT include unihan collation by default so that we don't have to bother to remove it from the rule files. 3. Remove obsolete patch files (locale[23].patch) 4. Add LICENSE file (converted from license.html) 5. Update README.chromium accordingly. 6. Check in the updated data file/assembly files. The net saving in icudtl.dat is ~ 220kB. BUG=132145 TEST=icudtl.dat is 10576480 TBR=mark Review URL: https://codereview.chromium.org/243763002 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264857 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
2014-04-18Trim ICU data to reduce the download size/memory usagejshin@chromium.org
Add a shell script to trim the ICU data further : trim_data.sh along with locale list files. The script does the following: 1. Remove the display names of languages NOT listed in Chrome's Accept-Language list. (800kB) 2. Minimize the locale data for locales listed in the A-L list that are not a UI locale in Chrome. For those locales, exemplar characters, the display name in the native language and layout direction are included. (640kB) 3. Filter the region data to drop numeric region display names other than 419 (Latin-America). (50kB) 4. Filter the currency data (display name and plurals) for historic currencies. (200kB) This CL also checks in icudtl.dat (source/data/in) and icudt_dat.S (mac and linux). Note that I dropped '52' (the version number) in the assembly source file name and icu.gyp was adjusted accordingly. With all these changes, icudtl.dat is ~ 800kB larger than that in ICU 4.6. The 7z compression (as used by the installer) makes the size difference go down to ~ 130kB. BUG=132145 TEST=The icudtl.dat (uncompressed) is about 10.7MB instead of 12.4MB without this CL. R=mark@chromium.org Review URL: https://codereview.chromium.org/239543018 git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264811 4ff67af0-8c30-449e-8e8b-ad334ec8d88c