Age | Commit message (Collapse) | Author |
|
Some layout tests and chrome browser tests compare the encoding labels
case-sensitively. Fixing them all takes a while. In the meantime,
change the ICU alias table to use exactly what they expects (Big5).
Also drop big5-hkscs from the alias for Big5 for now. They'll be
unified in another CL.
See the layout test failures for PS #2 at https://codereview.chromium.org/649413002
BUG=412053
TEST=browser_tests --gtest_filter=*ncoding*
TEST=Layout tests : http/tests/misc/char-encod*, fast/encoding/*
R=jsbell@chromium.org
TBR=jsbell@chromium.org
Review URL: https://codereview.chromium.org/654153002
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292476 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
1. Replace the current encoding alias list (heavily patched) with our own
HTML5-specific alias list. It's mostly generated from encoding.json, which
is in turn derived from the WHATWG Encoding living standard. The most notable
difference is that UTF-32 entries are kept until bug 417850 is resolved.
Two other differences are:
a. Two aliases for iso-8859-8-i (logical and csiso88598i) are not listed.
They're dealt with in Blink.
b. Chinese (gb*, big5*) aliases are not yet aligned to the encoding
spec pending our decision on the unification of Big5 / Big5-HKSCS and
GBK / GB18030.
2. Replace all the single-byte mapping tables with what's automatically
generated with scripts/single-byte-gen.sh that uses index-* files downloaded
from the WHATWG spec site. This will fix the decoding (ToUnicode) of
windows-874 and windows-1253 while removing a lot of fallback/spurrious
mapping entries in encoding direction ('FromUnicode') in a number of encodings.
3. Regenerate the ICU binary data files for Linux/Mac/Android/Windows/CrOS.
4. Remove now obsolete noop-*ucm files used to make ISO-2022-CN* decoder
to turn an empty string. They're not necessary any more because ISO-2022-CN*
were made 'replacement' encodings in Blink and our version of ICU does not
have any code for ISO-2022-CN* any more.
This cuts down the data size by 15kB. On Android, there's virtually
no change in the data size because the previous data file on Android
accidentally had smaller locale data for nb and ms.
BUG=412053
TEST=browser_tests --gtest_filter="*ncoding*"
TEST=net_unittest --gtest_filter="*ilenameUtil*"
TEST=base_unittests --gtest_filter="*Conv*"
TEST=Blink: fast/encoding/*
TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-indexes
TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases
TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-1253_test
TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-874_test
R=jsbell@chromium.org
Review URL: https://codereview.chromium.org/598383002
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292447 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
UCONFIG_NO_NON_HTML5_CONVERTER was added earlier to our copy of ICU, but
it was never set to 1. It's my oversight.
1. Turns UCON..CONVERTER on in icu.gyp to drop all the encodings not
required by the Encoding spec. Dropped encodings include
UTF-7, BOCU, SCSU, CESU, ISCII, ISO-2022-{KR, CN*}, HZ-GB, ISO-2022-JP's
other than the original.
2. A lot more sections of the ICU converter code are excluded when
it's set to 1 including the code for LMB (Lotus Multibyte) encodings and
X11 compound text encoding (icu common).
3. The character encoding detections for encodings excluded are also disabled.
(icu i18n)
4. ISO-2022-{KR, CN*} and HZ-GB can be dropped now because Blink treats them
as replacement encoding. The corresponding alias entries from convertrs.txt
are also removed.
5. ibm-874 was removed. We used to need it before Blink started, but not any
more. We only need windows-874.
6. A mistaken in convertrs.txt was corrected : Big5-HKSCS was pointing to
an old mapping table.
7. Per ICU upstream's suggestion, use '-html' suffix instead of '-html5'
for the encoding tables derived from the WHATWG's encoding spec (ibm866,
shift_jis and euc-jp).
The static 64-bit release build of Chrome on Linux went down from
141,596,616 to 141,491,968 bytes (~ 100 kB reduction). Besides, the icu data
size got smaller by ~ 19 kB ( 10,490,576 to 10,471,008 bytes).
See http://bugs.icu-project.org/trac/ticket/11296 for an upstream bug
I've filed on the issue.
BUG=76328
TEST=browser_tests --gtest_filter="*ncoding*"
TEST=net_unittest --gtest_filter="*ilenameUtil*"
TEST=base_unittests --gtest_filter="*Conv*"
TEST=Blink: fast/encoding/*
TEST=With shared library build, the following has no match.
nm libicuuc.so | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii)'
nm libicui18n.so | egrep -i '(2022kr|2022cn|ibm42)'
TEST=With static library build, the following has no match.
nm chrome | egrep -i '(bocu|scsu|utf7|2022kr|2022cn|iscii|ibm42)'
R=jsbell@chromium.org, mark@chromium.org
Review URL: https://codereview.chromium.org/587833004
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@292131 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
1. Timezone data files (4 of them) in source/data/misc to 2014f (the latest)
to prepare for an upcoming Russian timezone change.
2. Add Shift_JIS converter compliant to the WHATWG encoding spec.
3. Update converters.txt and ucmlocal.mk accordingly
4. Update the pre-built data files for Linux/Mac/Android/Windows.
(icudt.dll is not updated in this CL. It's not used in the default
configuration. It'll be updated in a separate CL).
5. Fix a typo in ibm866_gen.sh. The acual table used does not need a change.
BUG=277062,404445
TEST=After rolling icu to this revision, the following tests should pass.
TEST=Blink: fast/encoding/* all pass except for
fast/encoding/api/ascii-supersets.html that should fail by *passing*
the test for Shift_JIS, which is expected to fail. Blink layout tests needs
to be updated.
TEST=browser_tests --gtest_filter="*ncoding*"
TEST=In JS console, run the following to check if Europe/Moscow is
3 hrs ahead of UTC after Oct 26 and 4 hrs ahead before that and
if Asia/Kamchatka remains 12 hrs ahead of UTC.
nov1_2014_1500=new Date("11/01/2014 15:00Z")
nov1_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"})
nov1_2014_1500.toLocaleString("en", {timeZone: "UTC"})
nov1_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"})
oct24_2014_1500=new Date("10/24/2014 15:00Z")
oct24_2014_1500.toLocaleString("en", {timeZone: "Europe/Moscow"})
oct24_2014_1500.toLocaleString("en", {timeZone: "UTC"})
oct24_2014_1500.toLocaleString("en", {timeZone: "Asia/Kamchatka"})
TEST=net_unittest --gtest_filter="*ilenameUtil*"
TEST=base_unittests --gtest_filter="*Conv*"
R=jsbell@chromium.org
Review URL: https://codereview.chromium.org/497543003
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@291774 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
When the upstream took our patches for CJ segmentation, over 10k words were
dropped. Some of them are pretty common and not having them led to a Blink layout test
failure. Add several of them back to cjdict.txt.
In addition, remove a patch that breaks line breaking around single/double
quotation marks.
Rebuild the data for Linux/Mac/Windows/Android.
BUG=132145
TEST=Once rolled, layouttest:fast/text/international/cjk-segmentation.html and fast/hyphen-min-preferred-width.html pass.
TBR=mark
Review URL: https://codereview.chromium.org/292123005
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@272650 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
I was too aggressive in trimming the data and dropped the display
names for languages that Chromium needs (for non-UI languages
that are in the A-L list). It's not my intention (the comment in
trim_data.sh said one thing, but the code did another).
Besides, add Norweigian (nb) and Malay (ms) locale data that were not
included by mistake.
Also update trim_data.sh script NOT to drop 'ALIAS' lines which are
used to indicate that a given locale is an alias to another locale.
That also required adding ro_MD.txt (null locale which mo.txt is
aliased to).
The above three adds about 110kB to the icu data (from 10.3MB to 10.4MB).
Also update the pre-built icu data files for Linux, Mac and Windows.
The Android data will be updated in a follow-up patch.
BUG=132145
TEST=When ICU is rolled, unit_tests:ExtensionL10* pass.
TBR=mark
Review URL: https://codereview.chromium.org/264973016
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@268285 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
- Add missing half-width kana entries (omitted by mistake)
- Drop 'extra' decoding only mapping. See
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25266
- Regenerate icu data files (*dat and assembly source files) for Linux,
Mac, Windows and Android. (they'll not be shown at
codereview.chromium.org because they're too large).
BUG=132145,78847
TEST=When ICU is rolled in, base_unittests --gtest_filter=*ICU*
and layout tests
R=jsbell@chromium.org
Review URL: https://codereview.chromium.org/251203003
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@266919 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
Add 'filter_locale_data' function to trim_data.sh
Chromium/Blink do not use most of unit* sections in locale data. Keep
only duration and compound sub-sections.
Update the icudtl.dat and two assembly source files for Mac/Linux.
It saves ~200kB (uncompressed). 7z-compressed size reduction is 34kB.
With all these changes (up to this CL) applied, the net increase of the ICU data from icu 46 to 52 is 49kB with 7z-compressed.
(3,070,246 vs 3,021,457) and ~ 390kB uncompressed (10,370,656 vs 9,980,368 ).
BUG=132145
TEST=None.
TBR=mark
Review URL: https://codereview.chromium.org/247663002
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@265354 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
1. {big5,gb2312}han collation data is not used by anybody because they're
useless as a sorting order.
Add a function to trim_data.sh to remove them from zh.txt
2. Remove remove_unihan.sh and add back unihan rules to coll/{zh,ja,ko}.txt.
In ICU 52, tools/genrb does NOT include unihan collation by default so that
we don't have to bother to remove it from the rule files.
3. Remove obsolete patch files (locale[23].patch)
4. Add LICENSE file (converted from license.html)
5. Update README.chromium accordingly.
6. Check in the updated data file/assembly files.
The net saving in icudtl.dat is ~ 220kB.
BUG=132145
TEST=icudtl.dat is 10576480
TBR=mark
Review URL: https://codereview.chromium.org/243763002
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264857 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|
|
Add a shell script to trim the ICU data further : trim_data.sh along with
locale list files. The script does the following:
1. Remove the display names of languages NOT listed in Chrome's Accept-Language
list. (800kB)
2. Minimize the locale data for locales listed in the A-L list that are
not a UI locale in Chrome. For those locales, exemplar characters,
the display name in the native language and layout direction are included.
(640kB)
3. Filter the region data to drop numeric region display names other than 419
(Latin-America). (50kB)
4. Filter the currency data (display name and plurals) for historic currencies.
(200kB)
This CL also checks in icudtl.dat (source/data/in) and
icudt_dat.S (mac and linux). Note that I dropped '52' (the version number)
in the assembly source file name and icu.gyp was adjusted accordingly.
With all these changes, icudtl.dat is ~ 800kB larger than that in ICU 4.6.
The 7z compression (as used by the installer) makes the size difference
go down to ~ 130kB.
BUG=132145
TEST=The icudtl.dat (uncompressed) is about 10.7MB instead of 12.4MB without this CL.
R=mark@chromium.org
Review URL: https://codereview.chromium.org/239543018
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/icu52@264811 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
|