SimpleDateFormat leniency leads to unexpected behavior

SimpleDateFormat leniency leads to unexpected behavior - java

I have found that SimpleDateFormat::parse(String source)'s behavior is (unfortunatelly) defaultly set as lenient: setLenient(true).
By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds.
If I set the leniency to false, the documentation said that with strict parsing, inputs must match this object's format. I have used paring with SimpleDateFormat without the lenient mode and by mistake, I had a typo in the date (letter o instead of number 0). (Here is the brief working code:)
// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o")); //WTF?
In my surprise, this has passed and no ParseException has been thrown. I'd go further:
// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
Let's go on:
// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));
Finally, the exception is thrown: Unparseable date: "o3.12.1990". I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? The documentation says:
With strict parsing, inputs must match this object's format.
My input clearly doesn't strictly match the format - I expect this parsing to be really strict. Why does this (not) happen?

Why does this (not) happen?
It’s not very well explained in the documentation.
With lenient parsing, the parser may use heuristics to interpret
inputs that do not precisely match this object's format. With strict
parsing, inputs must match this object's format.
The documentation does help a bit, though, by mentioning that it is the Calendar object that the DateFormat uses that is lenient. That Calendar object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat documentation since SimpleDateFormat is a subclass of DateFormat).
SimpleDateFormat, no matter if lenient or not, will accept 3-digit year, for example 199, even though you have specified yyyy in the format pattern string. The documentation says about year:
For parsing, if the number of pattern letters is more than 2, the year
is interpreted literally, regardless of the number of digits. So using
the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
DateFormat, no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o in your first example. It objects to unexpected text before or inside the text, as when in your last example you put the letter o in front. The documentation of DateFormat.parse says:
The method may not use the entire text of the given string.
As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. So a lenient SimpleDateFormat will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat will refuse to do that and will throw an exception. The default lenient behaviour can lead to very surprising and downright inexplicable results. As a simple example, giving the day, month and year in the wrong order: 1990.03.12 will result in August 11 year 17 AD (2001 years ago).
The solution
VGR already in a comment mentioned LocalDate from java.time, the modern Java date and time API. In my experience java.time is so much nicer to work with than the old date and time classes, so let’s give it a shot. Try a correct date string first:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
System.out.println(LocalDate.parse("03.12.1990", dateFormatter));
We get:
java.time.format.DateTimeParseException: Text '03.12.1990' could not
be parsed: Unable to obtain LocalDate from TemporalAccessor:
{Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type
java.time.format.Parsed
This is because I used your format pattern string of dd.mm.yyyy, where lowercase mm means minute. When we read the error message closely enough, it does state that the DateTimeFormatter interpreted 12 as minute of hour, which was not what we intended. While SimpleDateFormat tacitly accepted this (even when strict), java.time is more helpful in pointing out our mistake. What the message only indirectly says is that it is missing a month value. We need to use uppercase MM for month. At the same time I am trying your date string with the typo:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
System.out.println(LocalDate.parse("03.12.199o", dateFormatter));
We get:
java.time.format.DateTimeParseException: Text '03.12.199o' could not
be parsed at index 6
Index 6 is where is says 199. It objects because we had specified 4 digits and are only supplying 3. The docs say:
The count of letters determines the minimum field width …
It would also object to unparsed text after the date. In short it seems to me that it gives you everything that you had expected.
Links
DateFormat.setLenient documentation
Oracle tutorial: Date Time explaining how to use java.time.

Leniency is not about whether the entire input matches but whether the format matches. Your input can still be 3.12.1990somecrap and it would work.
The actual parsing is done in parse(String, ParsePosition) which you could use as well. Basically parse(String) will pass a ParsePosition that is set up to start at index 0 and when the parsing is done the current index of that position is checked.
If it's still 0 the start of the input didn't match the format, not even in lenient mode.
However, to the parser 03.12.199 is a valid date and hence it stops at index 8 - which isn't 0 and thus the parsing succeeded. If you want to check whether everything was parsed you'd have to pass your own ParsePosition and check whether the index is matches to the length of the input.

If you use setLenient(false) it will still parse the date till the desired pattern is meet. However, it will check the output date is a valid date or not. In your case, 03.12.199 is a valid date, so it will not throw an exception. Lets take an example to understand where the setLenient(false) different from setLenient(true)/default.
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
System.out.println(simpleDateFormat.parse("31.02.2018"));
The above will give me output: Sat Mar 03 00:00:00 IST 2018
But the below code throw ParseException as 31.02.2018 is not a valid/possible date:
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("31.02.2018"));

Related

ParseException when trying to parse Date string

I have a piece of code written to parse date string -
DateFormat cal = new SimpleDateFormat("yyyy-MM-dd hh:mm");
cal.setLenient(false);
try {
cal.parse("2018-01-01 14:42");
}
catch (Exception e)
{
e.printStackTrace();
}
}
But I get an exception saying -
java.text.ParseException: Unparseable date: "2018-01-01 14:42"
at java.base/java.text.DateFormat.parse(DateFormat.java:388)
at MyClass.main(MyClass.java:10)
I am not sure why I am seeing this error as the date string and the format given is right. Please help

From the documentation https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html:
Lowercase h in the SimpleDateFormat indicates hour in the 12 hour format, whereas 24-hour format is indicated with uppercase H. As 14 > 12, the date 14:42 fails to be parsed.

You should be using HH instead of hh for hour pattern if the hour is displayed in 24 hour format.
See the documentation below for more information.
https://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

hh is used for the time of the day with hours going from 1 to 12. 14 is not a valid hour for this kind of hour-representation, so you have to use HH or kk. The former is used for times that are shown from 0-23, the latter for times shown between 1-24. Most likely you have to use HH or H/k if the single digit hours aren't preceded by a 0.

TL;DR
DateTimeFormatter cal = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm");
LocalDateTime.parse("2018-01-01 14:42", cal);
This runs without exception or other error.
java.time
The date-time classes you use, DateFormat and SimpleDateFormat, are long outdated and furthermore notoriously troublesome. I recommend you stop using them immediately. Instead use java.time, the modern Java date and time API. It came out nearly four years ago after having been described in Java Specification Request (JSR) 310 (a name that somehow still clings to the API).
As others have correctly pointed out, your error was that you used lowercase hh in your format pattern string, where you should have used uppercase HH for hour of day. Just one little example of where the modern classes try to be more helpful, try the same. If I insert hh in the format pattern in the code above, my program crashes (because there is no try-catch construct) with an DateTimeParseException with the following message:
Text '2018-01-01 14:42' could not be parsed: Invalid value for
ClockHourOfAmPm (valid values 1 - 12): 14
While perhaps still a bit esoteric, it is very precise. And I would dare hope that in combination with the documentation it would tell you what you did wrong.
The exception is unchecked, so no try-catch is required around the parsing. On the other hand, you may use one if you like (and if you are not very certain that the format of your date-time string is correct, you should).
Links
Oracle tutorial: Date Time, explaining how to use java.time.
Java Specification Request (JSR) 310, where the modern date and time API was first described.

Using ~ in SimpleDateFormat#parse()

The below code is executing without any problems. But logically it seems to be incorrect. Why is it so?
import java.sql.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
DateFormat df =new SimpleDateFormat("MM/dd/yyyy");
new Date(df.parse("09/01/3~34").getTime()); // Produces '09/01/3'
new Date(df.parse("09/01/100000").getTime()); // Produces ' 000-09-01'

First question:
new Date(df.parse("09/01/3~34").getTime()); // Produces '09/01/3'
According to DateFormat#parse() JavaDoc:
Parses text from the beginning of the given string to produce a date. The method may not use the entire text of the given string.
Because of that, after it parses the value from the String, it stops searching on it. When it found the ~ sign, it parsed the 3 value to the year and stopped looking on the String.
Second question:
new Date(df.parse("09/01/100000").getTime()); // Produces ' 000-09-01'
It's not producing '000-09-01'. The following code:
DateFormat df = new SimpleDateFormat("MM/dd/yyyy");
java.util.Date parsedDate = df.parse("09/01/100000");
System.out.println(parsedDate);
System.out.println(df.format(parsedDate));
Outputs:
Fri Sep 01 00:00:00 BRT 100000
09/01/100000
However, it appears to be a bug with the java.sql.Date#toString method on JDK. To present this java.sql.Date correctly, try passing it to your DateFormat#format method:
java.sql.Date sqlDt = new java.sql.Date(df.parse("09/01/10000").getTime());
System.out.println(df.format(sqlDt));
Output: 09/01/100000

About first question:
The class SimpleDateFormat is VERY lenient and just stops when some invalid chars are hit. Even when you explicitly instruct it to parse strict via df.setLenient(false); you will notice here the same output and no exception. JodaTime, JSR-310 or my time library would correctly reject the input containing invalid chars in strict mode. So here we have a clear bug in JDK.
About second question:
Well, you use java.sql.Date instead of java.util.Date. The sql-version is not designed for years > 9999. This is even specified in its javadoc:
"date milliseconds since January 1, 1970, 00:00:00 GMT not to exceed the milliseconds representation for the year 8099"
Not quite clear in javadoc but Oracle speaks about year offset of 1900 so finally java.sql.Date only supports the year range up to 9999. This is conform with ANSI-SQL, so no that year limitation itself is not a bug, but necessary for interoperability with SQL. Instead another aspect is bad and in my opinion a bug, namely: If you feed the constructor with an invalid year then you don`t get any exception but internally the state will be silently set to something silly and unpredictable.
Conclusion:
Avoid to use java.sql.Date for anything else than in JDBC layer. It is not for the application layer. Furthermore, java.util.Date and also SimpleDateFormat (which are the "standard" in Java-pre8) are more or less horribly broken. Good alternatives are JodaTime, in Java 8 the new JSR-310-API or my library if it reaches non-alpha-state one day.

java.text.ParseException: Unparseable date "yyyy-MM-dd'T'HH:mm:ss.SSSZ" - SimpleDateFormat

I would appreciate any help with finding bug for this exception:
java.text.ParseException: Unparseable date: "2007-09-25T15:40:51.0000000Z"
and following code:
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
Date date = sdf.parse(timeValue);
long mills = date.getTime();
this.point.time = String.valueOf(mills);
It throws expcetion with Date date = sdf.parse(timeValue); .
timeValue = "2007-09-25T15:40:51.0000000Z"; , as in exception.
Thanks.

Z represents the timezone character. It needs to be quoted:
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");

(Answer now extensively revised, thanks for the corrections in the comments)
In Java 7 you can use the X pattern to match an ISO8601 timezone, which includes the special Z (UTC) value.
The X pattern also supports explicit timezones, e.g. +01:00
This approach respects the timezone indicator correctly, and avoids the problem of treating it merely as a string, and thus incorrectly parsing the timestamp in the local timezone rather than UTC or whatever.
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssX");
Date date = sdf.parse("2007-09-25T15:40:51Z");
Date date2 = sdf.parse("2007-09-25T15:40:51+01:00");
This can also be used with milliseconds:
SimpleDateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSX");
Date date3 = sdf2.parse("2007-09-25T15:40:51.500Z");
However, as others have pointed out, your format has 7-digit fractional seconds, which are presumably tenth-microseconds. If so, SimpleDateFormat cannot handle this, and you will get incorrect results, because each 0.1 microsecond will be interpreted as a millisecond, giving a potential overall error of up to 10,000 seconds (several hours).
In the extreme case, if the fractional second value is 0.9999999 seconds, that will be incorrectly interpreted as 9999999 milliseconds, which is about 167 minutes, or 2.8 hours.
// Right answer, error masked for zero fractional seconds
Date date6 = sdf2.parse("2007-09-25T15:40:51.0000000Z");
// Tue Sep 25 15:40:51 GMT 2007
// Error - wrong hour
// Should just half a second different to the previous example
Date date5 = sdf2.parse("2007-09-25T15:40:51.5000000Z");
// Tue Sep 25 17:04:11 GMT 2007
This error is hidden when the fractional seconds are zero, as in your example, but will manifest whenever they are nonzero.
This error can be detected in many cases, and its impact reduced, by turning off "lenient" parsing which by default will accept a fractional part of more than one second and carry it over to the seconds/minutes/hours parts:
sdf2.setLenient(false);
sdf2.parse("2007-09-25T15:40:51.5000000Z");
// java.text.ParseException: Unparseable date: "2007-09-25T15:40:51.5000000Z"
This will catch cases where the millis value is more than 999, but does not check the number of digits, so it is only a partial and indirect safeguard against millis/microseconds mismatches. However, in many real-world datasets this will catch a large number of errors and thus indicate the root problem, even if some values slip through.
I recommend that lenient parsing is always disabled unless you have a specific need for it, as it catches a lot of errors that would otherwise be silently hidden and propagated into downstream data.
If your fractional seconds are always zero, then you could use one of the solutions here, but with the risk that they will NOT work if the code is later used on non-zero fractional seconds. You may wish to document this and/or assert that the value is zero, to avoid later bugs.
Otherwise, you probably need to convert your fractional seconds into milliseconds, so that SimpleDateFormat can interpret them correctly. Or use one of the newer datetime APIs.

java.time
I recommend that you use java.time, the modern Java date and time API, for your date and time work. Your string is in ISO 8601 format and can be directly parsed by the java.time.Instant class without us specifying any formatter:
String timeValue = "2007-09-25T15:40:51.0000000Z";
Instant i = Instant.parse(timeValue);
long mills = i.toEpochMilli();
String time = String.valueOf(mills);
System.out.println(time);
Output:
1190734851000
May use a formatter for output if desired
If we know for a fact that the millisecond value will never be negative, java.time can format it into a string for us. This saves the explicit conversion to milliseconds first.
private static final DateTimeFormatter EPOCH_MILLI_FORMATTER
= new DateTimeFormatterBuilder().appendValue(ChronoField.INSTANT_SECONDS)
.appendValue(ChronoField.MILLI_OF_SECOND, 3)
.toFormatter(Locale.ROOT);
Now formatting is trivial:
assert ! i.isBefore(Instant.EPOCH) : i;
String time = EPOCH_MILLI_FORMATTER.format(i);
And output is still the same:
1190734851000
In particular if you need to format Instant objects to strings in more places in your program, I recommend the latter approach.
What went wrong in your code?
First of all, there is no way that SimpleDateFormat can parse 7 decimals of fraction of second correctly. As long as the fraction is zero, the result will happen to come out correct anyway, but imagine a time that is just one tenth of a second after the full second, for example, 2007-09-25T15:40:51.1000000Z. In this case SimpleDateFormat would parse the fraction into a million milliseconds, and your result would be more than a quarter of an hour off. For greater fractions the error could be several hours.
Second as others have said format pattern letter Z does not match the offset of Z meaning UTC or offset zero from UTC. This caused the exception that you observed. Putting Z in quotes as suggested in the accepted answer is wrong too since it will cause you to miss this crucial information from the string, again leading to an error of several hours (in most time zones).
Link
Oracle tutorial: Date Time explaining how to use java.time.

Invalid format issue parsing string to JodaTime

String dateString = "20110706 1607";
DateTimeFormatter dateStringFormat = DateTimeFormat.forPattern("YYYYMMDD HHMM");
DateTime dateTime = dateStringFormat.parseDateTime(dateString);
Resulting stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid format: "201107206 1607" is malformed at " 1607"
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:644)
at org.joda.time.convert.StringConverter.getInstantMillis(StringConverter.java:65)
at org.joda.time.base.BaseDateTime.<init>(BaseDateTime.java:171)
at org.joda.time.DateTime.<init>(DateTime.java:168)
......
Any thoughts? If I truncate the string to 20110706 with pattern "YYYYMMDD" it works, but I need the hour and minute values as well. What's odd is that I can convert a Jodatime DateTime to a String using the same pattern "YYYYMMDD HHMM" without issue
Thanks for looking

Look at your pattern - you're specifying "MM" twice. That can't possibly be right. That would be trying to parse the same field (month in this case) twice from two different bits of the text. Which would you expect to win? You want:
DateTimeFormat.forPattern("yyyyMMdd HHmm")
Look at the documentation for DateTimeFormat to see what everything means.
Note that although calling toString with that pattern will produce a string, it won't produce the string you want it to. I wouldn't be surprised if the output even included "YYYY" and "DD" due to the casing, although I can't test it right now. At the very least you'd have the month twice instead of the minutes appearing at the end.

timezone inconsistency when parsing datetime Strings

I convert two types of Strings to an ISO format using SimpleDateFormat for parsing and org.apache.commons.lang.time.DateFormatUtils for formatting (since they provide a ISO formatter out-of-the-box).The pattern Strings for parsing are M/d/y H:m and d.M.y H:m. A typical String to convert may look either like 4/14/2009 11:22 or 4.14.2009 11:22. I initialize the parsers as follows:
SimpleDateFormat SLASH = new SimpleDateFormat(PATTERN_S, Locale.getDefault());
SimpleDateFormat DOT = new SimpleDateFormat(PATTERN_D, Locale.getDefault());
I get the the formatter:
FastDateFormat isoFormatter = DateFormatUtils.ISO_DATETIME_TIME_ZONE_FORMAT
After creating a Date from the parsed String:
Date date = FORMAT_SLASH.parse(old);
it is formatted for output:
isoFormatter.format(date)
The strange thing is : when a String with slashes was converted, the output looks like 2009-04-14T11:42:00+01:00 (which is correct) but when a String with dots was converted, the output looks like 2010-02-14T11:42:00+02:00, shifting my timezone to somewhere between Finland and South Africa, the year to 2010 and the month to february
What is going wrong here and why?
EDIT : changed the output strings to match real output (damn you, cut-n-paste). The reason was the interchanged M and d in the pattern strings that I failed to notice. 14 seems to be a perfecty valid month - its next year's february and even non-lenient settings can't force the formatter to reject it. The timeshift issue is resolved and the reason for the TimeZone change is provided by Jim Garrison. Thanks Ahmad and Jim

Your dot pattern is d.M.y H:m while your example shows that you meant M.d.y H:m, I supposed this would throw a ParseException, but it doesn't and it causes timezone issues.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.