JSR310 How to internationalize a pattern for a month-day?

JSR310 How to internationalize a pattern for a month-day? - java

Im trying to format a date without a year (just day and month, e.g 12.10)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT) still yield year for me (12.10.20).
so I tried DateTimeFormatter.ofPattern("dd. MM") but that obviously hardcodes order and dot, which wont make american users happy. (who expect slashes and month first)
How can I internationalize a pattern? Is there some abstract syntax for separators etc?

Well, as Ole pointed out there is no 100% satisfying solution using java.time only. But my library Time4J has found a solution based on the data of the CLDR repository (ICU4J also gives support) using the type AnnualDate (as replacement for MonthDay):
LocalDate yourLocalDate = ...;
MonthDay md = MonthDay.from(yourLocalDate);
AnnualDate ad = AnnualDate.from(md);
ChronoFormatter<AnnualDate> usStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.US, AnnualDate.chronology());
ChronoFormatter<AnnualDate> germanStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.GERMANY, AnnualDate.chronology());
System.out.println("US-format: " + usStyle.format(ad)); // US-format: 12/31
System.out.println("German: " + germanStyle.format(ad)); // German: 31.12.

I don’t think that a solution can be made that gives 100 % satisfactory results for all locales. Let’s give it a shot anyway.
Locale formattingLocale = Locale.getDefault(Locale.Category.FORMAT);
String formatPattern = DateTimeFormatterBuilder.getLocalizedDateTimePattern(
FormatStyle.SHORT, null, IsoChronology.INSTANCE, formattingLocale);
// If year comes first, remove it and all punctuation and space before and after it
formatPattern = formatPattern.replaceFirst("^\\W*[yu]+\\W*", "")
// If year comes last and is preceded by a space somewhere, break at the space
// (preserve any punctuation before the space)
.replaceFirst("\\s\\W*[yu]+\\W*$", "")
// Otherwise if year comes last, remove it and all punctuation and space before and after it
.replaceFirst("\\W*[yu]+\\W*$", "");
DateTimeFormatter monthDayFormatter
= DateTimeFormatter.ofPattern(formatPattern, formattingLocale);
For comparison I am printing a date both using the normal formatter with year from your question and using my prepared formatter.
LocalDate exampleDate = LocalDate.of(2020, Month.DECEMBER, 31);
System.out.format(formattingLocale, "%-11s %s%n",
exampleDate.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)),
exampleDate.format(monthDayFormatter));
Output in French locale (Locale.FRENCH):
31/12/2020 31/12
In Locale.GERMAN:
31.12.20 31.12
Edit: My German girl friend informs me that this is wrong. We should always write a dot after each of the two numbers because both are ordinal numbers. Meno Hochschild, the German author of the other answer, also produces 31.12. with two dots for German.
In Locale.US:
12/31/20 12/31
It might make American users happy. In Swedish (Locale.forLanguageTag("sv")):
2020-12-31 12-31
In a comment I mentioned Bulgarian (bg):
31.12.20 г. 31.12
As far as I have understood, “г.” (Cyrillic g and a dot) is an abbreviation of a word that means year, so when leaving out the year, we should probably leave this abbreviation out too. I’m in doubt whether we ought to include the dot after 12.
Finally Hungarian (hr):
31. 12. 2020. 31. 12.
How the code works: We are first inquiring DateTimeFormatterBuilder about the short date format pattern for the locale. I assume that this is the pattern that your formatter from the question is also using behind the scenes (haven’t checked). I then use different regular expressions to remove the year from different variants, see the comments in the code. Year may be represented by y or u, so I take both into account (in practice y is used). Now it’s trivial to build a new formatter from the modified pattern. For the Bulgarian: from my point of view there is an error in Java regular expressions, they don’t recognize Cyrillic letters as word characters, which is why г was removed too (the error is in documentation too, it claims that a word character is [a-zA-Z_0-9]).
We were lucky, though, in our case it produces the result that I wanted.
If you’re happy with a 90 % solution, this would be my suggestion, and I hope you can modify it to any needs your users in some locale may have.
Link: Documentation of Java regular expressions (regex)

Related

When I format a LocalDate to dd.MMM.YYYY I get 01.Jan..2000 with two dots

I am trying to format LocalDate variables to dd.MMM.YYYY with:
DateTimeFormatter.ofPattern("dd.MMM.yyyy")
The problem is that more half the time I get two dots. For example 01-01-2000 goes to 01.Jan..2000.
I know why I have this problem, because of the three Ms. When I use dd.MM.yyyy I get to 01.01.2000 without issue. The third M is the problem.
How can I fix this?

The cause of your problem is that the abbreviations for months are locale specific:
In some locales there is a dot (period) to indicate abbreviations1; Locale.CANADA for example. In others there isn't; Locale.ENGLISH for example.
In the locales where a dot indicates abbreviation, you may or may not find that there is dot when the name of months doesn't need abbreviating. For example the name of the month May is only three letters, so May. indicating that this is an abbreviation would be illogical.
There are various ways to deal with this, including:
Don't fix it. The output with doubled dots in some cases and not others is logically correct (by a certain logic)2, even though it looks odd.
My preferred way would be to use a different output format. Don't use dot as a separator.
Using dot characters as separators is ... unconventional ... and when you combine this with abbreviated month names, you get this awkward edge-case.
Sure there are ways to deal with this, but consider that other people might then run into an equivalent problem if they need to parse your dates in their code-base.
Hard wire your DateTimeFormatter to an existing Locale where there are no dots in the abbreviated names.
There is a theoretical risk that they may decide to change the abbreviations in a standard Locale. But doubt that they would, because such a change is liable to break customer code which is implicitly locale dependent ... like yours would be.
Create a custom Locale and use that when creating the DateTimeFormatter.
Use DateTimeFormatterBuilder for create the formatter. To deal with the month, use appendText(TemporalField field, Map<Long,String> textLookup) with a lookup table that contains exactly the abbreviations that you want to use.
Depending on how you "append" the other fields, your formatter can be totally or partially locale independent.
Of these, 2. and 5. are the most "correct", in my opinion. Ole's answer illustrates these options with code.
1 - See this article on American English grammar - When you need periods after abbreviations.
2 - The problem would be convincing people that "looks odd but is logical" is better than "looks nice but is illogical". I don't think you would win this argument ...

Stephen C. has written an answer that covers your options really well. As a supplement, since I agree that options 2 and 5 are the most correct, I would like to spell those two out.
Option 2: Use a different format
Localized date formats for most available locales are built into Java. These are generally under-used. We can save ourselves a lot if trouble by relying on Java to know how to format dates for our audience and their locale. I am using German as an example because it’s one of those locales that consistently includes dots both between the parts of the date and for abbreviation. The following should work for your locale too even if it’s not German (if you substitute Locale.getDefault(Locale.Category.FORMAT) or your users’ locale).
private static final Locale LOCALE = Locale.GERMAN;
private static final DateTimeFormatter DATE_FORMATTER
= DateTimeFormatter.ofLocalizedDate(FormatStyle.MEDIUM)
.withLocale(LOCALE);
For demonstration I am formatting a day of each month of the current year:
LocalDate date = LocalDate.of(2021, Month.JANUARY, 16);
for (int i = 0; i < 12; i++) {
System.out.println(date.format(DATE_FORMATTER));
date = date.plusMonths(1).minusDays(1);
}
Output is:
16.01.2021
15.02.2021
14.03.2021
13.04.2021
12.05.2021
11.06.2021
10.07.2021
09.08.2021
08.09.2021
07.10.2021
06.11.2021
05.12.2021
For German locale we got numeric months here. Other locales may give other results, for example month abbreviations.
If you want a longer format that doesn’t use numeric months, specify for example FormatStyle.LONG instead of FormatStyle.MEDIUM:
private static final DateTimeFormatter DATE_FORMATTER
= DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG)
.withLocale(LOCALE);
16. Januar 2021
15. Februar 2021
14. März 2021
13. April 2021
12. Mai 2021
11. Juni 2021
10. Juli 2021
9. August 2021
8. September 2021
7. Oktober 2021
6. November 2021
5. Dezember 2021
I suggest that your users would be happy with one of the above.
Option 5: DateTimeFormatterBuilder.appendText(TemporalField, Map<Long, String>)
If your users tell you that they don’t want the localized formats above and they do want your format with month abbreviations and single dots — it’s getting longer, but the result is beautiful and everyone will be happy.
private static final DateTimeFormatter DATE_FORMATTER = new DateTimeFormatterBuilder()
.appendPattern("dd.")
.appendText(ChronoField.MONTH_OF_YEAR, getMonthAbbreviations())
.appendPattern(".uuuu")
.toFormatter(LOCALE);
private static Map<Long, String> getMonthAbbreviations() {
return Arrays.stream(Month.values())
.collect(Collectors.toMap(m -> Long.valueOf(m.getValue()),
MyClass::getDisplayNameWithoutDot));
}
private static String getDisplayNameWithoutDot(Month m) {
return m.getDisplayName(TextStyle.SHORT, LOCALE)
.replaceFirst("\\.$", "");
}
Output from the same loop as above:
16.Jan.2021
15.Feb.2021
14.März.2021
13.Apr.2021
12.Mai.2021
11.Juni.2021
10.Juli.2021
09.Aug.2021
08.Sep.2021
07.Okt.2021
06.Nov.2021
05.Dez.2021
One dot each time. The central trick is to use Java’s month abbreviation and remove the dot from it if there is one (Jan. becomes Jan) and use it as-is if there is no dot (Mai stays Mai). My getDisplayNameWithoutDot method does this. I am in turn using this method to build the map that the two-arg appendText(TemporalField, Map<Long, String>) method requires and uses for formatting.

Java8 equivalent of JodaTime DateTimeFormat.shortDate()

What is the Java8 java.time equivalent of
org.joda.time.formatDateTimeFormat.shortDate()
I've tried below way, but it fails to parse values such as "20/5/2016" or "20/5/16".
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)

You are correct: A Joda-Time DateTimeFormatter (which is the type you get from DateTimeFormat.shortDate()) parses more leniently than a java.time DateTimeFormatter. In the English/New Zealand locale (en-NZ) shortDate uses the format pattern d/MM/yy and parses both 20/5/2016 and 20/5/16 into 2016-05-20.
I frankly find it nasty that it interprets both two-digit and four-digit years into the same year. When the format specifies two-digit year, I would have expected four digits to be an error for stricter input validation. Accepting one-digit month when the format specifies two digits is lenient too, but maybe not so dangerous and more in line with what we might expect.
java.time too uses the format pattern d/MM/yy (tested on jdk-11.0.3). When parsing is accepts one or two digits for day of month, but insist on two-digit month and two-digit year.
You may get the Joda-Time behaviour in java.time, but it requires you to specify the format pattern yourself:
Locale loc = Locale.forLanguageTag("en-NZ");
DateTimeFormatter dateFormatter
= DateTimeFormatter.ofPattern("d/M/[yyyy][yy]", loc);
System.out.println(LocalDate.parse("20/5/2016", dateFormatter));
System.out.println(LocalDate.parse("20/5/16", dateFormatter));
Output is:
2016-05-20
2016-05-20
If you want an advanced solution that works in other locales, I am sure that you can write a piece of code that gets the format pattern from DateTimeFormatterBuilder.getLocalizedDateTimePattern and modifies it by replacing dd with d, MM with M and any number of y with [yyyy][yy]. Then pass the modified format pattern string to DateTimeFormatter.ofPattern.
Edit: I’m glad that you got something to work. In your comment you said that you used:
Stream<String> shortFormPatterns = Stream.of(
"[d][dd]/[M][MM]",
"[d][dd]-[M][MM]",
"[d][dd].[M][MM]",
"[d][dd] [M][MM]",
"[d][dd]/[M][MM]/[yyyy][yy]",
"[d][dd]-[M][MM]-[yyyy][yy]",
"[d][dd].[M][MM].[yyyy][yy]",
"[d][dd] [M][MM] [yyyy][yy]");
It covers more cases that your Joda-Time formatter. Maybe that’s good. Specifically your Joda-Time formatter insists on a slash / between the numbers and rejects either hyphen, dot or space. Also I believe that Joda-Time would object to the year being left out completely.
While you do need [yyyy][yy], you don’t need [d][dd] nor [M][MM]. Just d and M suffice since they also accept two digits (what happens in your code is that for example [d] parses either one or two digits, so [dd] is never used anyway).
If you prefer only one format pattern string, I would expect d[/][-][.][ ]M[/][-][.][ ][yyyy][yy] to work (except in hte cases where the year is omitted) (I haven’t tested).

FormatStyle.SHORT returns shortest format either dd/MM/yy or d/M/yy format, so you need to use pattern to get the customized format
LocalDate date = LocalDate.now();
System.out.println(date.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT))); //9/29/19
You can also use DateTimeFormatter.ISO_DATE or DateTimeFormatter.ISO_LOCAL_DATE to get the iso format like yyyy-MM-dd, and also you can see the available formats in DateTimeFormatter
System.out.println(date.format(DateTimeFormatter.ISO_DATE)); //2019-09-29
System.out.println(date.format(DateTimeFormatter.ISO_LOCAL_DATE)); //2019-09-29
If you want the custom format like yyyy/MM/dd the use ofPattern
System.out.println(date.format(DateTimeFormatter.ofPattern("yyyy/MM/dd"))); //2019/09/29

Java regex to validate range of dates

I having problems to generate a regex for a range of dates.
For example this range [2015-11-17, 2017-10-05], How can I do? to validate if having a date belogns to that range using regex.
And second question if is possible to have a generic regex which I can use for several range of date, only replacing few values in the regex with the new ranges I have, and the regex continues validating a range of dates , but with the new ranges. Thanks in advance for help =)

Do not use Regex
As the comments state, Regex is not appropriate for a range of dates, nor any span of time. Regex is intended to be “dumb” in the sense of looking only at the syntax of the text not the semantics (the meaning).
java.time
Use the java.time framework built into Java 8 and later.
Parse your strings into LocalDate objects.
LocalDate start = LocalDate.parse( "2015-11-17" );
Compare by calling the isEqual, isBefore, and isAfter methods.
Note that we commonly use the Half-Open approach in date-time work where the beginning is inclusive while the ending is exclusive.
These issues are covered already in many other Questions and Answers on Stack Overflow. So I have abbreviated my discussion here.

Just for completeness: You can actually use regular expressions to recognize any finite set of strings, such as a specific date range, however it would be more of an academic exercise than an actual recommended usage. However, if you happen to be programming some arcane hardware it could actually be necessary.
Assuming the input is always a valid date in the given format, the regex for your example could consist of:
2015-0[1-9].* - 2015 January to September
2015-10.* - 2015 October
2015-11-0[1-9] - 2015 November 1 to 9
2015-11-1[0-7] - 2015 November 10 to 17
2016.* - all dates of 2016
Add analogously for 2017, make a disjunction using | (a|b|c|...), apply escaping of the regex implementation you use and then you have your date checker. If the input is not guaranteed to be a valid date it gets a bit more complicated but is still possible.

SimpleDateFormat: Inconsistent Pattern Letters

Recently I looked into the Documentation for SimpleDateFormat and noticed some inconsistencies (in my opinion) in how they handle the letters for parsing.
For example, look at these representations:
M: Month in year
D: Day in year
d: Day in month
"x in year" is a bigger timespan than "x in month" and has therefore uppercase letters so this makes perfect sense to me.
But then there is
w: Week in year
W: Week in month
Here, the letters are swapped, which is totally counter-intuitive in my opinion. It seems like these two should be the other way around, to conform to the "pattern" mentioned above.
Another example are the different hour-representations:
H: Hour in day (0-23)
k: Hour in day (1-24)
K: Hour in am/pm (0-11)
h: Hour in am/pm (1-12)
I kinda get the idea. Uppercase letters for hours starting with 0, lowercase letters for hours starting with 1.
Here, both lowercase letters should be swapped, because shouldn't the same letters belong to the same category? (H/h for hour in day, K/k for hour in am/pm)
So my question is this: Is there a reason behind this seemingly counter-intuitive representation?
The only reason i could think of is, that some of these pattern letters were added at a later time and they couldn't change the already existing ones, because of downwards compatibility. But other than that, it doesn't make much sense to me.

Citation:
"The only reason i could think of is, that some of these pattern
letters were added at a later time and they couldn't change the
already existing ones, because of downwards compatibility."
Your suspicion is correct. But you cannot (only) blame Sun respective Oracle designers for that. They have just overtaken the whole stuff originally from Taligent (now merged into IBM). And IBM itself is one of the leading companies behind Unicode consortium which defined the CLDR-standard. In that standard all these pattern symbols were defined (indeed in a totally inconsistent manner - only explainable by historic development).
Worse, the inconsistencies in CLDR don't stop: Recently we have got a NARROW variant in addition to SHORT, LONG etc. That means if you want the shortes possible representation of a month as a single letter then you need to specify the pattern symbol MMMMM (5 letters because one letter M is already reserved for the numerical short form).
Another notice: SimpleDateFormat does not even strictly follow CLDR. For example Oracle has defined the pattern symbol "u" as ISO-Day number of week (1 = Monday, ..., 7 = Sunday) in Java-version 7 although CLDR has already introduced the same symbol earlier as the proleptic ISO-year. And Java 8 again deviates, invents new symbols not known in CLDR but else tries to follow CLDR more closely.
We have already remarkable differences using pattern languages (compare Java-6, Java-7, Java-8, pure CLDR and Joda-Time). And I fear this will never stop.

Displaying HH:MM with SimpleDateFormat

I am having some problems when trying to format the time string I have created, I am trying to make it output only the time in HH:mm format by using the Date and Time conversion characters I found at this website
DateFormat time = new SimpleDateFormat("HH:mm R");
I get no problems without the "R" but then it outputs the entire date and time, defeating my goal.

Well, as R is not documented to mean anything but the documentation clearly says all other formatting characters "...from 'A' to 'Z' and from 'a' to 'z' are reserved...", all bets are off about what you'll get if you use R in the formatting string.
Remove the R. If your goal is to have an R at the end of the string, append it to the result. If not, consult the documentation to see what formatting character you should be using.

There is no "R" in the javadoc for SimpleDateFormat -- and it says that all unspecified letters are reserved. I don't know where your website got it...

You don't need 'R'. If you want 24 hour "military" time, just use capital 'HH' and be done with it. Delete the R and you should get what you want. If you don't, you're probably doing something else wrong in your code that you haven't posted.

According to the date/time formatting symbols for SimpleDateFormat, there is no "R". In fact, I get an IllegalArgumentException trying to use the "R" as you have it.
You can't use DateFormat time = new SimpleDateFormat("HH:mm R");.
However, printf recognizes a different set of symbols for formatting dates, and "R" is defined there:
'R' Time formatted for the 24-hour clock as "%tH:%tM"
The website you refer to does mention "R" -- under the printf section, not for SimpleDateFormat. It mentions to prepend t before all date/time format symbols.
Try
System.out.printf("%tR", new Date());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.