-
-
Notifications
You must be signed in to change notification settings - Fork 824
ICU-23056 CLDR 48 alpha0 integration #3600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ICU-23056 CLDR 48 alpha0 integration #3600
Conversation
@DraganBesevic getting a crash e.g. in the clang-cpp17 tests:
Depending on what is going on here: Maybe skip this test with the logKnownIssue already covering Gannen formatting? |
That's weird. I wonder why adding a known issue would cause to crash in CPP 17. I will skip it as it is already a known issue |
I was able to reproduce it on an Ubuntu system at home. I am trying to figure out what the cause is. But it is very puzzling, I see no reason for this to fail with the CLDR update and not fail without it. Nothing in this PR can explain what is going on (just by looking at it). |
The failure happens in icu4c/source/i18n/smpdtfmt.cpp, line 1536, in this call:
The locale failing is Japanese with alternate calendar, 235 is Heisei I am not sure how to fix this, since it is not just a few entries added, the structure seems changed.
to a map:
|
Why it does not fail on non-Linux? On Linux the test results in a core-dump. For macOS that should be enabled, it does not happen by default: Also, it looks like the test suite (on Linux) recovers after a core dump and continues. |
For the ja.txt file the eras/abbreviated has 237 entries, but the eras/narrow has only 232 entries. This is very likely related to the new bugs filed by Dragan for the Japanese calendar that only happen with the new data. |
Per @pedberg-icu - Only the 5 most recent eras are the ones that differ from the abbreviated. It seems likely that those are not being copied for some reason, and if there were true that would explain why the formatting is not using the era code for those. It's not clear where in the integration this is happening. |
Yeah, it seems like inheritance is not working for the narrow Japanese eras using the new table format for the era names data. But I did run the ICU4C and ICU4J tests with data generated using the new table format, including with early eras deleted, and did not have these problems. I will try to take a look this weekend. |
The CLDR structure for eras is changing, allowing sparse eras, so that early (inaccurate) Japanese eras can be removed (and other changes endorsed by the design group). Peter's recent PR prepared the eras-loading code so that it can handle either the old or the new data structure. He generated the new structure to test with, but didn't add the new data in the same PR (because the generation pulled in a lot of unrelated stuff), and I guess he didn't see these errors with his generated data. I reviewed the new loading code, and it seemed fine. Maybe we overlooked something. I actually suspect that the formatting code goes directly into the array when it should use a getter -- there is a new minEras field, and the array omits those first entries. There are also conditions when future eras at the end are dropped. And, I don't know how loading vs. using abbreviated vs. wide vs. narrow symbols works. Maybe something expects the arrays to has the same lengths, but if there is no abbreviated symbol then the formatting code should pick the narrow one, or vice versa. |
Checklist
Alpha0 integration for CLDR 48
Known issues:
ICU-23182, (Formatting for Japanese calendars)
ICU-23183 (Time Zone round trip test)
ICU-23185 (hz and hv patterns)
ICU-23186 (Japanese calendars fails round trip unit tests)
ICU-23187 (TestPersonNames test fail for kk_Arab and shn)
- note, kk_Arab and shn are new locales not yet in ICU (until config file is updated)
ICU-23188 (Compact decimal format adds extra RLM mark for ar-EG)
ALLOW_MANY_COMMITS=true