summaryrefslogtreecommitdiff
path: root/icu4j
diff options
context:
space:
mode:
authorVictor Chang <vichang@google.com>2022-11-10 18:19:32 +0000
committerVictor Chang <vichang@google.com>2022-11-15 15:01:06 +0000
commit5e4a54c070b83ad6a8cb0c0477f2e51811d27fbd (patch)
treefad4be374e0cba578ea1a86251e0615bbf861a22 /icu4j
parent847beadc68e1ca6cbade163d6df5b1cee13fb96b (diff)
downloadicu-5e4a54c070b83ad6a8cb0c0477f2e51811d27fbd.tar.gz
Cherry-pick: ICU-22119 Add lw=phrase for Korean using line_*_phrase_cj
Upstream commmit: https://github.com/unicode-org/icu/commit/05dc2ac924a0acd3f9a7ca65e8a6078f8f62d299 brkitr/ko.txt is created to use line_*_.cj.txt for both lw=phrase and lw != phrase cases for Korean. This is the simplest way to fix ICU-22119 taking advantage of the fact that ICU does not have a Korean dictionary so we don't have to worry about adding the list of Korean particles to keep them attached to the preceeding word. The downside is that it only works when the locale is ko or ja while it should work in any locale. Another is it makes ICU deviate from CSS3 by using the same CJ (conditonal Japanese) rules for Korean as well. However, CSS3 spec is wrong on that point and should be changed. See https://unicode-org.atlassian.net/browse/CLDR-4931 . (cherry picked from commit 05dc2ac924a0acd3f9a7ca65e8a6078f8f62d299) Bug: 244777768 Test: CtsIcuTestCases Test: CtsIcu4cTestCases Change-Id: Ic61256f4f30c2f1c2a5dab19384bbe726f3e8bd4 (cherry picked from commit 089645b23c69fffbbe8d2327585d5235b2df8f06)
Diffstat (limited to 'icu4j')
-rw-r--r--icu4j/main/classes/core/src/com/ibm/icu/text/BreakIteratorFactory.java3
-rw-r--r--icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt52
2 files changed, 54 insertions, 1 deletions
diff --git a/icu4j/main/classes/core/src/com/ibm/icu/text/BreakIteratorFactory.java b/icu4j/main/classes/core/src/com/ibm/icu/text/BreakIteratorFactory.java
index 3de520597..c78f36ed6 100644
--- a/icu4j/main/classes/core/src/com/ibm/icu/text/BreakIteratorFactory.java
+++ b/icu4j/main/classes/core/src/com/ibm/icu/text/BreakIteratorFactory.java
@@ -136,7 +136,8 @@ final class BreakIteratorFactory extends BreakIterator.BreakIteratorServiceShim
typeKeyExt = "_" + keyValue;
}
String language = locale.getLanguage();
- if (language != null && language.equals("ja")) {
+ // lw=phrase is only supported in Japanese and Korean
+ if (language != null && (language.equals("ja") || language.equals("ko"))) {
keyValue = locale.getKeywordValue("lw");
if (keyValue != null && keyValue.equals("phrase")) {
typeKeyExt += "_" + keyValue;
diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt
index 6f7555d2f..72bd15803 100644
--- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt
+++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt
@@ -1981,6 +1981,58 @@ Bangkok)•</data>
#これらの連絡先はデバイスをロック解除しなくても表示され -> これらの▁連絡先は▁デバイスを▁ロック▁解除しなくても▁表示され
<data>•\u3053\u308C\u3089\u306E•\u9023\u7D61\u5148\u306F•\u30C7\u30D0\u30A4\u30B9\u3092•\u30ED\u30C3\u30AF•\u89E3\u9664\u3057\u306A\u304F\u3066\u3082•\u8868\u793A\u3055\u308C•</data>
+# Test the differences in ko with or without lw=phrase.
+<locale ko@lw=phrase>
+<line>
+#1948년 7월 12일에 제정되고 8차에 국민투표에 의하여 개정한다.
+<data>•1948년 •7월 •12일에 •제정되고 •8차에 •국민투표에 •의하여 •개정한다.•</data>
+#대한민국은 민주공화국이다.
+<data>•대한민국은 •민주공화국이다.•</data>
+#서울에서 부산까지 London까지
+<data>•서울에서 •부산까지 •London까지•</data>
+#LTE가 안 되면 WiFi를
+<data>•LTE가 •안 •되면 •WiFi를•</data>
+#<님의 침묵>을 읽고 느낀 점은?
+<data>•\u003c님의 •침묵\u003e을 •읽고 •느낀 •점은?•</data>
+# The following entry passes in ICU4C but fails in ICU4J for an unknown reason.
+#"님의 침묵"을 읽고
+#<data>•"님의 •침묵"을 •읽고•</data>
+# The following 3 lines are not handled properly, yet.
+#“님의 침묵”을 읽고
+#<data>•“님의 •침묵”을 •읽고•</data>
+#『님의 침묵』을 읽고
+#<data>•『님의 •침묵』을 •읽고•</data>
+#大韓民國은 民主共和國이다
+#<data>•大韓民國은 •民主•共和國이다•</data>
+# All the tests for ja@lw=phrase should also work in Korean.
+#[京都観光]時雨殿に行った。-> [京都•観光]•時雨•殿に•行った。•
+<data>•\uff3b\u4eac\u90fd•\u89b3\u5149\uff3d•\u6642\u96e8•\u6bbf\u306b•\u884c\u3063\u305f\u3002•</data>
+#9月に東京から友達が遊びに来た -> 9月に•東京から•友達が•遊びに•来た•
+<data>•\uff19\u6708\u306b•\u6771\u4eac\u304b\u3089•\u53cb\u9054\u304c•\u904a\u3073\u306b•\u6765\u305f•</data>
+
+<locale ko>
+<line>
+#1948년 7월 12일에 제정되고 8차에 국민투표에 의하여 개정한다.
+<data>•1948•년 •7•월 •12•일•에 •제•정•되•고 •8•차•에 •국•민•투•표•에 •의•하•여•개•정•한•다.•</data>
+#대한민국은 민주공화국이다.
+<data>•대•한•민•국•은 •민•주•공•화•국•이•다.•</data>
+#서울에서 부산까지 London까지
+<data>•서•울•에•서 •부•산•까•지 •London•까•지•</data>
+#LTE가 안 되면 WiFi를
+<data>•LTE•가 •안 •되•면 •WiFi•를•</data>
+#<님의 침묵>을 읽고 느낀 점은?
+<data>•\u003c•님•의 •침•묵•\u003e•을 •읽•고 •느•낀 •점•은?•</data>
+#"님의 침묵"을 읽고
+<data>•"님•의 •침•묵"을 •읽•고•</data>
+#“님의 침묵”을 읽고
+<data>•“님•의 •침•묵”을 •읽•고•</data>
+#『님의 침묵』을 읽고
+<data>•『님•의 •침•묵』•을 •읽•고•</data>
+#『foo bar』load
+<data>•『foo •bar』•load•</data>
+#《님의 침묵》을 읽고
+<data>•《님•의 •침•묵》•을 •읽•고•</data>
+
####################################################################################
#
# Test rule status values