DateTimeFormatter parsing - timezone names and daylight savings overlap times - java

For improving performance of some legacy code, I am considering a replacement of java.text.SimpleDateFormat by java.time.format.DateTimeFormatter.
Among the tasks performed is parsing date/time values that had been serialized using java.util.Date.toString. With SimpleDateFormat, it was possible to turn them back into the original timestamps (neglecting fractional seconds), however I am facing problems when attempting to do the same with DateTimeFormatter.
When formatting with either, my local timezone is indicated as CET or CEST, depending on whether daylight savings time is in effect for the time to be formatted. However it appears that at parsing time, both CET and CEST are treated the same by DateTimeFormatter.
This creates a problem with the overlap occurring at the end of daylight savings time. When formatting, 02:00:00 is created twice, for times one hour apart, but with CEST and CET timezone names - which is fine. But at parsing time, that difference can't be reclaimed.
Here is an example:
long msecPerHour = 3600000L;
long cet_dst_2016 = 1477778400000L;
DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ENGLISH);
ZoneId timezone = ZoneId.of("Europe/Berlin");
for (int hours = 0; hours < 6; ++hours) {
long time = cet_dst_2016 + msecPerHour * hours;
String formatted = formatter.format(Instant.ofEpochMilli(time).atZone(timezone));
long parsedTime = Instant.from(formatter.parse(formatted)).toEpochMilli();
System.out.println(formatted + ", diff: " + (parsedTime - time));
}
which results in
Sun Oct 30 00:00:00 CEST 2016, diff: 0
Sun Oct 30 01:00:00 CEST 2016, diff: 0
Sun Oct 30 02:00:00 CEST 2016, diff: 0
Sun Oct 30 02:00:00 CET 2016, diff: -3600000
Sun Oct 30 03:00:00 CET 2016, diff: 0
Sun Oct 30 04:00:00 CET 2016, diff: 0
It shows that the second occurrence of 02:00:00, inspite of the different timezone name, was treated like the first one. So the result effectively is off by one hour.
Obviously the formatted string has all information available, and SimpleDateFormat parsing in fact honored it. Is it possible to roundtrip through formatting and parsing, using DateTimeFormatter, with the given pattern?

It is possible for a specific case:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("EEE MMM dd HH:mm:ss ")
.appendText(OFFSET_SECONDS, ImmutableMap.of(2L * 60 * 60, "CEST", 1L * 60 * 60, "CET"))
.appendPattern(" yyyy")
.toFormatter(Locale.ENGLISH);
This maps the exact offset to the expected text. Where this fails is when you need to deal with more than one time-zone.
To do the job properly requires a JDK change.

It seems like a bug. I tested in Java 17 and it's still the same behaviour. I dug into the parsing logic and I can see why this happens.
One of the first things that happens is TimeZoneNameUtility.getZoneStrings(locale) is called. This gives you a 2D array of Strings
[
[
"Europe/Paris",
"Central European Standard Time", "CET",
"Central European Summer Time", "CEST",
"Central European Time", "CET"
],
// others
]
It builds a prefix tree out of them. All items in here get mapped to the 0th item - "Europe/Paris". When it's parsing, it descends the prefix tree one character at a time e.g. C... E... T..., then returns a match if there was one. Since CEST and CET map to the same thing, they're effectively just aliases of one another.
Later on that string is passed to ZoneId.of() which means the fact of whether it's summertime or not has been thrown away.
It does seem in Java 18 that there have been significant changes in this code, so maybe they're addressing that.

The general workaround
JodaStephen, the main author of java.time, in his answer shows a workaround for the case of CET and CEST (Central European Time and Central European Summer Time). I present a workaround that I believe will work in all time zones having different abbreviations for standard time and summer time (DST).
public static ZonedDateTime parse(String text) {
ZonedDateTime result = ZonedDateTime.parse(text, FORMATTER);
if (result.format(FORMATTER).equals(text)) {
return result;
}
// Default we get the earlier offset at overlap,
// so if it didn’t work, try the later offset
result = result.withLaterOffsetAtOverlap();
if (result.format(FORMATTER).equals(text)) {
return result;
}
// As a last desperate attempt, try earlier offset explicitly
result = result.withEarlierOffsetAtOverlap();
if (result.format(FORMATTER).equals(text)) {
return result;
}
// Give up
throw new IllegalArgumentException();
}
The method could use any formatter with a time zone name or abbreviation as long as it’s supposed to give the same output from formatting as the input it parses (so optional parts are a no-no, for example). I have assumed a formatter equivalent to yours:
private static final DateTimeFormatter FORMATTER
= DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ROOT);
Your trouble was with a millisecond value of 1 477 789 200 000, which was formatted into Sun Oct 30 02:00:00 CET 2016 and then parsed to 1 477 785 600 000 for a difference of -3 600 000 milliseconds. So let’s try my method with that one.
private static final ZoneId TIME_ZONE = ZoneId.of("Europe/Berlin");
long trouble = 1_477_789_200_000L;
String formatted = Instant.ofEpochMilli(trouble).atZone(TIME_ZONE).format(FORMATTER);
ZonedDateTime zdt = parse(formatted);
long parsedTime = zdt.toInstant().toEpochMilli();
System.out.println(formatted + ", diff: " + (parsedTime - trouble));
Output is:
Sun Oct 30 02:00:00 CET 2016, diff: 0
But don’t parse three letter time zone abbreviations
All of the above said, even with a workaround for that case of the fall overlap, you are on shaky ground when trying to parse time zone abbreviations. Most of the most common ones are ambiguous, and you don’t know what you get from parsing. In the case of CET and CEST, they are common abbreviations for very many European time zones that at present share offset +01:00 during standard time and +02:00 during summer time, but historically have had their own offset each and are likely to go separate ways again since the EU has decided to give up summer time completely. Next year one time zone may use CET all year and another CEST all year. My code above does not account for that.
Instead simply take the output from ZonedDateTime.toString and parse it back using the one-arg ZonedDateTime.parse(CharSequence).

Related

Parsing a ZonedDateTime around the Daylight saving transition fails [duplicate]

For improving performance of some legacy code, I am considering a replacement of java.text.SimpleDateFormat by java.time.format.DateTimeFormatter.
Among the tasks performed is parsing date/time values that had been serialized using java.util.Date.toString. With SimpleDateFormat, it was possible to turn them back into the original timestamps (neglecting fractional seconds), however I am facing problems when attempting to do the same with DateTimeFormatter.
When formatting with either, my local timezone is indicated as CET or CEST, depending on whether daylight savings time is in effect for the time to be formatted. However it appears that at parsing time, both CET and CEST are treated the same by DateTimeFormatter.
This creates a problem with the overlap occurring at the end of daylight savings time. When formatting, 02:00:00 is created twice, for times one hour apart, but with CEST and CET timezone names - which is fine. But at parsing time, that difference can't be reclaimed.
Here is an example:
long msecPerHour = 3600000L;
long cet_dst_2016 = 1477778400000L;
DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ENGLISH);
ZoneId timezone = ZoneId.of("Europe/Berlin");
for (int hours = 0; hours < 6; ++hours) {
long time = cet_dst_2016 + msecPerHour * hours;
String formatted = formatter.format(Instant.ofEpochMilli(time).atZone(timezone));
long parsedTime = Instant.from(formatter.parse(formatted)).toEpochMilli();
System.out.println(formatted + ", diff: " + (parsedTime - time));
}
which results in
Sun Oct 30 00:00:00 CEST 2016, diff: 0
Sun Oct 30 01:00:00 CEST 2016, diff: 0
Sun Oct 30 02:00:00 CEST 2016, diff: 0
Sun Oct 30 02:00:00 CET 2016, diff: -3600000
Sun Oct 30 03:00:00 CET 2016, diff: 0
Sun Oct 30 04:00:00 CET 2016, diff: 0
It shows that the second occurrence of 02:00:00, inspite of the different timezone name, was treated like the first one. So the result effectively is off by one hour.
Obviously the formatted string has all information available, and SimpleDateFormat parsing in fact honored it. Is it possible to roundtrip through formatting and parsing, using DateTimeFormatter, with the given pattern?
It is possible for a specific case:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("EEE MMM dd HH:mm:ss ")
.appendText(OFFSET_SECONDS, ImmutableMap.of(2L * 60 * 60, "CEST", 1L * 60 * 60, "CET"))
.appendPattern(" yyyy")
.toFormatter(Locale.ENGLISH);
This maps the exact offset to the expected text. Where this fails is when you need to deal with more than one time-zone.
To do the job properly requires a JDK change.
It seems like a bug. I tested in Java 17 and it's still the same behaviour. I dug into the parsing logic and I can see why this happens.
One of the first things that happens is TimeZoneNameUtility.getZoneStrings(locale) is called. This gives you a 2D array of Strings
[
[
"Europe/Paris",
"Central European Standard Time", "CET",
"Central European Summer Time", "CEST",
"Central European Time", "CET"
],
// others
]
It builds a prefix tree out of them. All items in here get mapped to the 0th item - "Europe/Paris". When it's parsing, it descends the prefix tree one character at a time e.g. C... E... T..., then returns a match if there was one. Since CEST and CET map to the same thing, they're effectively just aliases of one another.
Later on that string is passed to ZoneId.of() which means the fact of whether it's summertime or not has been thrown away.
It does seem in Java 18 that there have been significant changes in this code, so maybe they're addressing that.
The general workaround
JodaStephen, the main author of java.time, in his answer shows a workaround for the case of CET and CEST (Central European Time and Central European Summer Time). I present a workaround that I believe will work in all time zones having different abbreviations for standard time and summer time (DST).
public static ZonedDateTime parse(String text) {
ZonedDateTime result = ZonedDateTime.parse(text, FORMATTER);
if (result.format(FORMATTER).equals(text)) {
return result;
}
// Default we get the earlier offset at overlap,
// so if it didn’t work, try the later offset
result = result.withLaterOffsetAtOverlap();
if (result.format(FORMATTER).equals(text)) {
return result;
}
// As a last desperate attempt, try earlier offset explicitly
result = result.withEarlierOffsetAtOverlap();
if (result.format(FORMATTER).equals(text)) {
return result;
}
// Give up
throw new IllegalArgumentException();
}
The method could use any formatter with a time zone name or abbreviation as long as it’s supposed to give the same output from formatting as the input it parses (so optional parts are a no-no, for example). I have assumed a formatter equivalent to yours:
private static final DateTimeFormatter FORMATTER
= DateTimeFormatter.ofPattern("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ROOT);
Your trouble was with a millisecond value of 1 477 789 200 000, which was formatted into Sun Oct 30 02:00:00 CET 2016 and then parsed to 1 477 785 600 000 for a difference of -3 600 000 milliseconds. So let’s try my method with that one.
private static final ZoneId TIME_ZONE = ZoneId.of("Europe/Berlin");
long trouble = 1_477_789_200_000L;
String formatted = Instant.ofEpochMilli(trouble).atZone(TIME_ZONE).format(FORMATTER);
ZonedDateTime zdt = parse(formatted);
long parsedTime = zdt.toInstant().toEpochMilli();
System.out.println(formatted + ", diff: " + (parsedTime - trouble));
Output is:
Sun Oct 30 02:00:00 CET 2016, diff: 0
But don’t parse three letter time zone abbreviations
All of the above said, even with a workaround for that case of the fall overlap, you are on shaky ground when trying to parse time zone abbreviations. Most of the most common ones are ambiguous, and you don’t know what you get from parsing. In the case of CET and CEST, they are common abbreviations for very many European time zones that at present share offset +01:00 during standard time and +02:00 during summer time, but historically have had their own offset each and are likely to go separate ways again since the EU has decided to give up summer time completely. Next year one time zone may use CET all year and another CEST all year. My code above does not account for that.
Instead simply take the output from ZonedDateTime.toString and parse it back using the one-arg ZonedDateTime.parse(CharSequence).

Future dates in java calendar are giving a strange behaviour

I have an application which I create dates that a user can select to an appointment. If a user start to work at 9, and an appointment takes 2 hours, I create dates at 9, 11, 13... until a limit, of course. And then I change the day and start again.
This is the code for doing this:
public List<Agenda> createListOfDates(Calendar initial, Calendar end,
int appointmentDuration, int lunchTimeDuration, int lunchTimeStart) {
List<Agenda> agendaList = new ArrayList<Agenda>();
Agenda agenda = new Agenda();
agenda.setWorkingHour(initial.getTime());
agendaList.add(agenda);
while (true) {
initial.add(Calendar.HOUR_OF_DAY, appointmentDuration);
// Logger.error("" + initial.getTime());
if (initial.getTime().after(end.getTime())) {
break;
} else if (initial.get(Calendar.HOUR_OF_DAY) == lunchTimeStart
&& initial.get(Calendar.DAY_OF_WEEK) != Calendar.SATURDAY
) {
initial.add(Calendar.HOUR_OF_DAY, lunchTimeDuration);
agenda = new Agenda();
agenda.setWorkingHour(initial.getTime());
agendaList.add(agenda);
} else {
agenda = new Agenda();
agenda.setWorkingHour(initial.getTime());
agendaList.add(agenda);
}
}
for(Agenda agendaX : agendaList){
Logger.info("" + agendaX.getWorkingHour());
}
return agendaList;
}
I am working with the "America/Sao_Paulo" timezone to create these dates. I set the variables "initial" and "end" as "America/Sao_Paulo". My system timezone is "GMT", and that is ok, because I want to save these dates in GMT in the database. When I print the dates in last "for", magically it is already converted from "America/Sao_Paulo" to "GMT" and it is printing right. The strange thing is that from a certain date, it changes the time zone. Example of prints:
Sat Mar 30 12:00:00 GMT 2019
Sat Mar 30 14:00:00 GMT 2019
Sat Mar 30 16:00:00 GMT 2019
Sat Mar 30 18:00:00 GMT 2019
Mon Apr 01 13:00:00 BST 2019
Mon Apr 01 15:00:00 BST 2019
Mon Apr 01 18:00:00 BST 2019
Mon Apr 01 20:00:00 BST 2019
Mon Apr 01 22:00:00 BST 2019
While is in GMT, it is right, but I can't understand this BST. Can it be because it's too much in the future? It always starts on April.
Your system time isn’t GMT, it’s Europe/London (or something similar). In March London time coincides with GMT. Not in April. That’s why.
getWorkingHour() returns an instance of Date (another poorly designed and long outdated class, but let that be a different story for now). When you append it to the empty string, Date.toString is implicitly called and builds the string using your system time zone. During standard time it prints GMT as time zone abbreviation. Summer time (DST) begins in London on the last Sunday of March, in this case March 31. So in April Date.toString on your JVM uses British Summer Time and its abbreviation, BST for printing the time.
The good solution involves two changes:
Don’t rely on the JVM’s default time zone. It can be changed at any time from another part of your program or another program running in the same JVM, so is too fragile. Instead give explicit time zone to your date-time operations.
Skip the old date-time classes Calendar and Date and instead use java.time, the modern Java date and time API. It is so much nicer to work with and gives much clearer code, not least when it comes to conversions between time zones.
Instead of Calendar use ZonedDateTime. Depending on the capabilities of your JDBC driver, convert it to either Instant or OffsetDateTime in UTC for saving to the database.
To create a ZonedDateTime, one option is to use one of its of methods (there are several):
ZonedDateTime initial = ZonedDateTime.of(2019, 3, 10, 9, 0, 0, 0, ZoneId.of("America/Sao_Paulo"));
This creates a date-time of March 10, 2019 at 09:00 in São Paolo. To add 2 hours to it:
int appointmentDuration = 2;
ZonedDateTime current = initial.plusHours(appointmentDuration);
System.out.println(current);
Output:
2019-03-10T11:00-03:00[America/Sao_Paulo]
To convert to an Instant for your database:
Instant inst = current.toInstant();
System.out.println(inst);
Output:
2019-03-10T14:00:00Z
Instants are time zone neutral, just a point in time, but print in UTC. Some JDBC drivers accept them for UTC times. If yours doesn’t happen to, you will need to give it an OffsetDateTime instead. Convert like this:
OffsetDateTime odt = current.toOffsetDateTime().withOffsetSameInstant(ZoneOffset.UTC);
System.out.println(odt);
Output:
2019-03-10T14:00Z
Note that I give UTC explicitly rather than relying on the JVM default. So this is explicitly in UTC. You notice that the date and time agree with what was printed from the Instant.

DateFormat returning wrong hour

I am trying to parse the string "20/08/18 13:21:00:428" using the DateFormat class and a formatting pattern of "dd/MM/yy' 'HH:mm:ss:SSS". The Timezone is set to BST.
The date returned for the above is correct but the time is getting returned as 08 for the hours instead of 13 - "Mon Aug 20 08:21:00 BST 2018"
The following snippet prints the date and time just mentioned:
String toBeParsed = "20/08/18 13:21:00:428";
DateFormat format = new SimpleDateFormat("dd/MM/yy' 'HH:mm:ss:SSS");
format.setTimeZone(TimeZone.getTimeZone("BST"));
Date parsedDate = format.parse(toBeParsed);
System.out.println(parsedDate);
Is this something to do with my timezone or have I misunderstood the pattern?
BST is Bangladesh Standard Time. The correct time zone to use is "Europe/London" if you want automatic summer time, or "UTC+1" if you want British Summer Time always.
See https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS
java.time
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd/MM/uu H:mm:ss:SSS");
String toBeParsed = "20/08/18 13:21:00:428";
ZonedDateTime dateTime = LocalDateTime.parse(toBeParsed, formatter)
.atZone(ZoneId.of("Europe/London"));
System.out.println(dateTime);
Output from this snippet is:
2018-08-20T13:21:00.428+01:00[Europe/London]
What went wrong in your code?
While I always recommend against the long outdated and poorly designed classes Date, TimeZone and DateFormat, in this case they are behaving particularly confusingly. Printing a Date on a JVM with Europe/London as default time zone gives time zone as BST if the date is in the summer time part of the year:
TimeZone.setDefault(TimeZone.getTimeZone("Europe/London"));
Date oldFashionedDate = new Date();
System.out.println(oldFashionedDate);
Mon Aug 20 15:45:39 BST 2018
However, when I give time zone as BST, Bangladesh time is understood, but it comes out with the non-standard abbreviation BDT:
TimeZone.setDefault(TimeZone.getTimeZone("BST"));
System.out.println(oldFashionedDate);
Mon Aug 20 20:45:39 BDT 2018
(I have observed this behaviour on Java 8 and Java 10.)
Another lesson to learn is never to rely on three and four letter time zone abbreviations. They are ambiguous and not standardized.
BST may mean Brazil Summer Time or Brazilian Summer Time, Bangladesh Standard Time, Bougainville Standard Time or British Summer Time (note that S is sometimes for Standard, sometimes for Summer, which is typically the opposite of Standard Time).
BDT may mean Brunei Darussalam Time or British Daylight Time (another name for British Summer Time (BST)), but I wasn’t aware that Bangladesh Time was also sometimes abbreviated this way.
PS Thanks to DodgyCodeException for spotting the time zone abbreviation interpretation issue.
Link
Time Zone Abbreviations — Worldwide List

Parse Date with time zone from GWT

I read many of similar SO questions regarding this very topic, but I still am confused how to do it now... I try to make it simpel:
I got this String: "Sat Jan 24 00:00:00 GMT+100 2015"
which shall not be modified in any way
My question now is: What kind of pattern shall I use to parse this String into a java.util.Date? I tried: "EEE MMM dd HH:mm:ss z yyyy" but it fails with "unparsable Date"
What I know is: If I'd have "Sat Jan 24 00:00:00 GMT+1:00 2015", which is the same (right?), then my pattern works. But I can't (want..) modify it.
--> Is there a pattern which works out of the box, yes or no?
PS: I assume this question ends as duplicate of one of all the others, but if you vote so, please answer my bold question in addition, as I could not read it out of there with certainty
regards and thanks in advance
"Hours must be between 0 to 23 and Minutes must be between 00 to 59. For example, "GMT+10" and "GMT+0010" mean ten hours and ten minutes ahead of GMT, respectively."
as per java time zone specification http://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html and this is what i found for three letter time zone Three-letter time zone IDs
For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such as "PST", "CTT", "AST") are also supported. However, their use is deprecated because the same abbreviation is often used for multiple time zones (for example, "CST" could be U.S. "Central Standard Time" and "China Standard Time"), and the Java platform can then only recognize one of them.
so i do not think there is any out of box pattern which works for you. check the other answer .
I don't think there is a pattern that can match your timezone / offset GMT+100. You could amend the input before parsing:
String input = "Sat Jan 24 00:00:00 GMT+100 2015";
input = input.replaceAll("([+-])(\\d+?)(\\d{2})", "$1$2:$3");
String pattern = "EEE MMM dd HH:mm:ss z yyyy";
Date date = new SimpleDateFormat(pattern, Locale.ENGLISH).parse(input);

Awkward Java Date creation behaviour

I've just came upon a very strange behaviour of Java's Date class when I try to create two dates consequently:
Date startDate = new Date(1282863600000L);
System.out.println(startDate);
Date endDate = new Date(1321919999000L);
System.out.println(endDate);
The output is respectively:
Fri Aug 27 00:00:00 BST 2010
Mon Nov 21 23:59:59 GMT 2011
Has anyone seen something like that? Both date are initialized in an identical manner but when printed the first is shown in BST and the latter in GMT?
I tried to find explanation about that but I didn't. Can someone help me?
Thanks in advance!
This is documented behaviour.
From Date.toString():
Converts this Date object to a String of the form:
dow mon dd hh:mm:ss zzz yyyy
zzz is the time zone (and may reflect daylight saving time). Standard time zone abbreviations include those recognized by the method parse. If time zone information is not available, then zzz is empty - that is, it consists of no characters at all.
You are using a locale that uses British Summer Time and creating a date where a day-light-saving rule applies. This would be the expected form of the date at that time to a local user.
For me the output of this code is
Fri Aug 27 01:00:00 CEST 2010
Tue Nov 22 00:59:59 CET 2011
The exact result depends on the default locale Java is using on your system.
The difference is that CEST is the central european summer time, while CET is the central european time (i.e. not summer time).
You seem to be running in a british locale (en_GB or similar), so your output shows the British Summer Time and the Greenwich Mean Time respectively.
The first date you specify falls into the respective summer times and the second doesn't. So Java chooses the appropriate time zone for each locale/time combination.
After a lovely session of trying different long values I got this:
Date startDate1 = new Date(1284245999999L);
Date startDate2 = new Date(1284246000000L);
System.out.println(startDate1);
System.out.println(startDate2);
Date endDate = new Date(1321919999000L);
System.out.println(endDate);
The output was:
Sun Sep 12 01:59:59 IDT 2010
Sun Sep 12 01:00:00 IST 2010 <-- Long value is greater, but due to DST changes, actual time is one hour earlier
Tue Nov 22 01:59:59 IST 2011
Note that incrementing the long by 1 from 1284245999999L to 1284246000000L takes us "back in time" because of the transition from standard time to daylight savings time.
That is how Java time calculation behaves - the number of milliseconds since 1/1/1970 does not change, but the time it represents is based on the timezone.

Categories