explanation for behavior from SimpleDateFormat

explanation for behavior from SimpleDateFormat - java

Try this:
DateFormat df = new SimpleDateFormat("y");
System.out.println(df.format(new Date()));
Without reading the javadoc for SimpleDateFormat, what would you expect this to output? My expectation was "0". That is to say, the last digit of the current year, which is 2010.
Instead it pads it out to 2 digits, just as if the format string had been "yy".
Why? Seems rather bizarre. If I had wanted 2 digits then I'd have used "yy".

"Without reading the javadoc" is a dangerous attitude more often than not. Joshua Bloch wrote a book on this strange behaviours in Java.

That's why you need to check the documentation:
For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century.
"y" is an abbreviation, so this is expected behavior.

If you want only a digit you can:
System.out.println(df.format(new Date()).substring(1));
:)

Related

JSR310 How to internationalize a pattern for a month-day?

Im trying to format a date without a year (just day and month, e.g 12.10)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT) still yield year for me (12.10.20).
so I tried DateTimeFormatter.ofPattern("dd. MM") but that obviously hardcodes order and dot, which wont make american users happy. (who expect slashes and month first)
How can I internationalize a pattern? Is there some abstract syntax for separators etc?

Well, as Ole pointed out there is no 100% satisfying solution using java.time only. But my library Time4J has found a solution based on the data of the CLDR repository (ICU4J also gives support) using the type AnnualDate (as replacement for MonthDay):
LocalDate yourLocalDate = ...;
MonthDay md = MonthDay.from(yourLocalDate);
AnnualDate ad = AnnualDate.from(md);
ChronoFormatter<AnnualDate> usStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.US, AnnualDate.chronology());
ChronoFormatter<AnnualDate> germanStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.GERMANY, AnnualDate.chronology());
System.out.println("US-format: " + usStyle.format(ad)); // US-format: 12/31
System.out.println("German: " + germanStyle.format(ad)); // German: 31.12.

I don’t think that a solution can be made that gives 100 % satisfactory results for all locales. Let’s give it a shot anyway.
Locale formattingLocale = Locale.getDefault(Locale.Category.FORMAT);
String formatPattern = DateTimeFormatterBuilder.getLocalizedDateTimePattern(
FormatStyle.SHORT, null, IsoChronology.INSTANCE, formattingLocale);
// If year comes first, remove it and all punctuation and space before and after it
formatPattern = formatPattern.replaceFirst("^\\W*[yu]+\\W*", "")
// If year comes last and is preceded by a space somewhere, break at the space
// (preserve any punctuation before the space)
.replaceFirst("\\s\\W*[yu]+\\W*$", "")
// Otherwise if year comes last, remove it and all punctuation and space before and after it
.replaceFirst("\\W*[yu]+\\W*$", "");
DateTimeFormatter monthDayFormatter
= DateTimeFormatter.ofPattern(formatPattern, formattingLocale);
For comparison I am printing a date both using the normal formatter with year from your question and using my prepared formatter.
LocalDate exampleDate = LocalDate.of(2020, Month.DECEMBER, 31);
System.out.format(formattingLocale, "%-11s %s%n",
exampleDate.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)),
exampleDate.format(monthDayFormatter));
Output in French locale (Locale.FRENCH):
31/12/2020 31/12
In Locale.GERMAN:
31.12.20 31.12
Edit: My German girl friend informs me that this is wrong. We should always write a dot after each of the two numbers because both are ordinal numbers. Meno Hochschild, the German author of the other answer, also produces 31.12. with two dots for German.
In Locale.US:
12/31/20 12/31
It might make American users happy. In Swedish (Locale.forLanguageTag("sv")):
2020-12-31 12-31
In a comment I mentioned Bulgarian (bg):
31.12.20 г. 31.12
As far as I have understood, “г.” (Cyrillic g and a dot) is an abbreviation of a word that means year, so when leaving out the year, we should probably leave this abbreviation out too. I’m in doubt whether we ought to include the dot after 12.
Finally Hungarian (hr):
31. 12. 2020. 31. 12.
How the code works: We are first inquiring DateTimeFormatterBuilder about the short date format pattern for the locale. I assume that this is the pattern that your formatter from the question is also using behind the scenes (haven’t checked). I then use different regular expressions to remove the year from different variants, see the comments in the code. Year may be represented by y or u, so I take both into account (in practice y is used). Now it’s trivial to build a new formatter from the modified pattern. For the Bulgarian: from my point of view there is an error in Java regular expressions, they don’t recognize Cyrillic letters as word characters, which is why г was removed too (the error is in documentation too, it claims that a word character is [a-zA-Z_0-9]).
We were lucky, though, in our case it produces the result that I wanted.
If you’re happy with a 90 % solution, this would be my suggestion, and I hope you can modify it to any needs your users in some locale may have.
Link: Documentation of Java regular expressions (regex)

Java8 equivalent of JodaTime DateTimeFormat.shortDate()

What is the Java8 java.time equivalent of
org.joda.time.formatDateTimeFormat.shortDate()
I've tried below way, but it fails to parse values such as "20/5/2016" or "20/5/16".
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)

You are correct: A Joda-Time DateTimeFormatter (which is the type you get from DateTimeFormat.shortDate()) parses more leniently than a java.time DateTimeFormatter. In the English/New Zealand locale (en-NZ) shortDate uses the format pattern d/MM/yy and parses both 20/5/2016 and 20/5/16 into 2016-05-20.
I frankly find it nasty that it interprets both two-digit and four-digit years into the same year. When the format specifies two-digit year, I would have expected four digits to be an error for stricter input validation. Accepting one-digit month when the format specifies two digits is lenient too, but maybe not so dangerous and more in line with what we might expect.
java.time too uses the format pattern d/MM/yy (tested on jdk-11.0.3). When parsing is accepts one or two digits for day of month, but insist on two-digit month and two-digit year.
You may get the Joda-Time behaviour in java.time, but it requires you to specify the format pattern yourself:
Locale loc = Locale.forLanguageTag("en-NZ");
DateTimeFormatter dateFormatter
= DateTimeFormatter.ofPattern("d/M/[yyyy][yy]", loc);
System.out.println(LocalDate.parse("20/5/2016", dateFormatter));
System.out.println(LocalDate.parse("20/5/16", dateFormatter));
Output is:
2016-05-20
2016-05-20
If you want an advanced solution that works in other locales, I am sure that you can write a piece of code that gets the format pattern from DateTimeFormatterBuilder.getLocalizedDateTimePattern and modifies it by replacing dd with d, MM with M and any number of y with [yyyy][yy]. Then pass the modified format pattern string to DateTimeFormatter.ofPattern.
Edit: I’m glad that you got something to work. In your comment you said that you used:
Stream<String> shortFormPatterns = Stream.of(
"[d][dd]/[M][MM]",
"[d][dd]-[M][MM]",
"[d][dd].[M][MM]",
"[d][dd] [M][MM]",
"[d][dd]/[M][MM]/[yyyy][yy]",
"[d][dd]-[M][MM]-[yyyy][yy]",
"[d][dd].[M][MM].[yyyy][yy]",
"[d][dd] [M][MM] [yyyy][yy]");
It covers more cases that your Joda-Time formatter. Maybe that’s good. Specifically your Joda-Time formatter insists on a slash / between the numbers and rejects either hyphen, dot or space. Also I believe that Joda-Time would object to the year being left out completely.
While you do need [yyyy][yy], you don’t need [d][dd] nor [M][MM]. Just d and M suffice since they also accept two digits (what happens in your code is that for example [d] parses either one or two digits, so [dd] is never used anyway).
If you prefer only one format pattern string, I would expect d[/][-][.][ ]M[/][-][.][ ][yyyy][yy] to work (except in hte cases where the year is omitted) (I haven’t tested).

FormatStyle.SHORT returns shortest format either dd/MM/yy or d/M/yy format, so you need to use pattern to get the customized format
LocalDate date = LocalDate.now();
System.out.println(date.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT))); //9/29/19
You can also use DateTimeFormatter.ISO_DATE or DateTimeFormatter.ISO_LOCAL_DATE to get the iso format like yyyy-MM-dd, and also you can see the available formats in DateTimeFormatter
System.out.println(date.format(DateTimeFormatter.ISO_DATE)); //2019-09-29
System.out.println(date.format(DateTimeFormatter.ISO_LOCAL_DATE)); //2019-09-29
If you want the custom format like yyyy/MM/dd the use ofPattern
System.out.println(date.format(DateTimeFormatter.ofPattern("yyyy/MM/dd"))); //2019/09/29

SimpleDateFormat leniency leads to unexpected behavior

I have found that SimpleDateFormat::parse(String source)'s behavior is (unfortunatelly) defaultly set as lenient: setLenient(true).
By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds.
If I set the leniency to false, the documentation said that with strict parsing, inputs must match this object's format. I have used paring with SimpleDateFormat without the lenient mode and by mistake, I had a typo in the date (letter o instead of number 0). (Here is the brief working code:)
// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o")); //WTF?
In my surprise, this has passed and no ParseException has been thrown. I'd go further:
// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
Let's go on:
// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));
Finally, the exception is thrown: Unparseable date: "o3.12.1990". I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? The documentation says:
With strict parsing, inputs must match this object's format.
My input clearly doesn't strictly match the format - I expect this parsing to be really strict. Why does this (not) happen?

Why does this (not) happen?
It’s not very well explained in the documentation.
With lenient parsing, the parser may use heuristics to interpret
inputs that do not precisely match this object's format. With strict
parsing, inputs must match this object's format.
The documentation does help a bit, though, by mentioning that it is the Calendar object that the DateFormat uses that is lenient. That Calendar object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat documentation since SimpleDateFormat is a subclass of DateFormat).
SimpleDateFormat, no matter if lenient or not, will accept 3-digit year, for example 199, even though you have specified yyyy in the format pattern string. The documentation says about year:
For parsing, if the number of pattern letters is more than 2, the year
is interpreted literally, regardless of the number of digits. So using
the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
DateFormat, no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o in your first example. It objects to unexpected text before or inside the text, as when in your last example you put the letter o in front. The documentation of DateFormat.parse says:
The method may not use the entire text of the given string.
As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. So a lenient SimpleDateFormat will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat will refuse to do that and will throw an exception. The default lenient behaviour can lead to very surprising and downright inexplicable results. As a simple example, giving the day, month and year in the wrong order: 1990.03.12 will result in August 11 year 17 AD (2001 years ago).
The solution
VGR already in a comment mentioned LocalDate from java.time, the modern Java date and time API. In my experience java.time is so much nicer to work with than the old date and time classes, so let’s give it a shot. Try a correct date string first:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
System.out.println(LocalDate.parse("03.12.1990", dateFormatter));
We get:
java.time.format.DateTimeParseException: Text '03.12.1990' could not
be parsed: Unable to obtain LocalDate from TemporalAccessor:
{Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type
java.time.format.Parsed
This is because I used your format pattern string of dd.mm.yyyy, where lowercase mm means minute. When we read the error message closely enough, it does state that the DateTimeFormatter interpreted 12 as minute of hour, which was not what we intended. While SimpleDateFormat tacitly accepted this (even when strict), java.time is more helpful in pointing out our mistake. What the message only indirectly says is that it is missing a month value. We need to use uppercase MM for month. At the same time I am trying your date string with the typo:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
System.out.println(LocalDate.parse("03.12.199o", dateFormatter));
We get:
java.time.format.DateTimeParseException: Text '03.12.199o' could not
be parsed at index 6
Index 6 is where is says 199. It objects because we had specified 4 digits and are only supplying 3. The docs say:
The count of letters determines the minimum field width …
It would also object to unparsed text after the date. In short it seems to me that it gives you everything that you had expected.
Links
DateFormat.setLenient documentation
Oracle tutorial: Date Time explaining how to use java.time.

Leniency is not about whether the entire input matches but whether the format matches. Your input can still be 3.12.1990somecrap and it would work.
The actual parsing is done in parse(String, ParsePosition) which you could use as well. Basically parse(String) will pass a ParsePosition that is set up to start at index 0 and when the parsing is done the current index of that position is checked.
If it's still 0 the start of the input didn't match the format, not even in lenient mode.
However, to the parser 03.12.199 is a valid date and hence it stops at index 8 - which isn't 0 and thus the parsing succeeded. If you want to check whether everything was parsed you'd have to pass your own ParsePosition and check whether the index is matches to the length of the input.

If you use setLenient(false) it will still parse the date till the desired pattern is meet. However, it will check the output date is a valid date or not. In your case, 03.12.199 is a valid date, so it will not throw an exception. Lets take an example to understand where the setLenient(false) different from setLenient(true)/default.
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
System.out.println(simpleDateFormat.parse("31.02.2018"));
The above will give me output: Sat Mar 03 00:00:00 IST 2018
But the below code throw ParseException as 31.02.2018 is not a valid/possible date:
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("31.02.2018"));

Parse time with microseconds in Java

I am having problems parsing time strings in Java that are in the format of 2013-01-09 09:15:03.000000. In my data, the last three digits are always 0 (meaning the input strings have only millisecond precision), so I passed this format to SimpleDateFormat:
formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS'000'");
but formatter.parse("2013-01-09 09:15:02.500000"); throws an exception:
Unparseable date: "2013-01-09 09:15:02.500000"
at java.text.DateFormat.parse(DateFormat.java:357)
Anyone knows how to do it correctly? I can work around by using format yyyy-MM-dd HH:mm:ss.SSS and using substring to get rid of last three digits but that's really hacky.
EDIT: can anyone explain why the format string yyyy-MM-dd HH:mm:ss.SSS'000' can't be used to parse time "2013-01-09 09:15:02.500000"

try java.sql.Timestamp
Timestamp ts = Timestamp.valueOf("2013-01-09 09:15:03.500000");
Date date = new Date(ts.getTime())
it's also thread-safe and fast as opposed to SimpleDateFormat

java.time
I should like to contribute the modern answer. Use java.time, the modern Java date and time API. One option, you may use a formatter:
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss.SSSSSS");
LocalDateTime dateTime = LocalDateTime.parse(timeString, formatter);
System.out.println(dateTime);
When using the string from your question, "2013-01-09 09:15:02.500000", this printed:
2013-01-09T09:15:02.500
If you want the value printed with six decimals on the seconds even when the last three decimals are 0, use the same formatter to format the time back into a string:
System.out.println(dateTime.format(formatter));
The other option, you may exploit the fact that your string resembles the ISO 8601 format, the format that the modern classes parse as their default, that is, without any explicit formatter. Only ISO 8601 has a T to denote the start of the time part, but we can fix that easily:
LocalDateTime dateTime = LocalDateTime.parse(timeString.replace(' ', 'T'));
It gives the same result, 2013-01-09T09:15:02.500. It’s shorter, but also more tricky.
Why bother?
The classes Date and Timestamp are long outdated, and SimpleDateFormat in particular has proven troublesome. Its surprising behaviour in your situation is just one little story out of very many. The modern API is generally so much nicer to work with.
Why didn’t your formatter work?
While the format pattern strings used by SimpleDateFormat and DateTimeFormatter are similar, there are differences. One is that SimpleDateFormat understands uppercase S as milliseconds no matter of there are one or nine of them, whereas to DateTimeFormatter they mean fraction of second. Your SimpleDateFormat furthermore grabbed all six digits after the decimal point, ignoring the fact that you had typed only three S, so there were no zeroes left to match the '000' (by the way, the apostrophes are not necessary, only letters need them).
Link
Oracle Tutorial

I've figured out myself. Just FYI, Apache commons' FastDateFormat seems accepting the SSS000 format and parses the time correctly.

Displaying HH:MM with SimpleDateFormat

I am having some problems when trying to format the time string I have created, I am trying to make it output only the time in HH:mm format by using the Date and Time conversion characters I found at this website
DateFormat time = new SimpleDateFormat("HH:mm R");
I get no problems without the "R" but then it outputs the entire date and time, defeating my goal.

Well, as R is not documented to mean anything but the documentation clearly says all other formatting characters "...from 'A' to 'Z' and from 'a' to 'z' are reserved...", all bets are off about what you'll get if you use R in the formatting string.
Remove the R. If your goal is to have an R at the end of the string, append it to the result. If not, consult the documentation to see what formatting character you should be using.

There is no "R" in the javadoc for SimpleDateFormat -- and it says that all unspecified letters are reserved. I don't know where your website got it...

You don't need 'R'. If you want 24 hour "military" time, just use capital 'HH' and be done with it. Delete the R and you should get what you want. If you don't, you're probably doing something else wrong in your code that you haven't posted.

According to the date/time formatting symbols for SimpleDateFormat, there is no "R". In fact, I get an IllegalArgumentException trying to use the "R" as you have it.
You can't use DateFormat time = new SimpleDateFormat("HH:mm R");.
However, printf recognizes a different set of symbols for formatting dates, and "R" is defined there:
'R' Time formatted for the 24-hour clock as "%tH:%tM"
The website you refer to does mention "R" -- under the printf section, not for SimpleDateFormat. It mentions to prepend t before all date/time format symbols.
Try
System.out.printf("%tR", new Date());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.