I am trying to parse an incoming string which might contain time or not. Both the following dates should be accepted
"2022-03-03" and "2022-03-03 15:10:05".
The DateTimeFormatter that I know will fail in any one of the cases. This is one answer I got, but I don't know if in any ways time part can be made optional here.
ISO_DATE_TIME.format() to LocalDateTime with optional offset
The idea is if the time part is not present I should set it to the end of the day, so the time part should be 23:59:59.
Any help is appreciated. Thanks!
Well, you could utilize a DateTimeFormatterBuilder to specify defaults for missing fields:
private static LocalDateTime parse(String str) {
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("uuuu-MM-dd[ HH:mm:ss]")
.parseDefaulting(ChronoField.HOUR_OF_DAY, 23)
.parseDefaulting(ChronoField.MINUTE_OF_HOUR, 59)
.parseDefaulting(ChronoField.SECOND_OF_MINUTE, 59)
.toFormatter();
return LocalDateTime.parse(str, formatter);
}
The pattern specifies the pattern it will try to parse. Note that the square brackets ([]) are optional parts. Everything between them will be either completely consumed, or entirely discarded.
With parseDefaulting you can specify the default values for when fields are missing. In your case, if the user provides only the date, the hour-of-day, minute-of-hour and second-of-minute fields are missing, that's why it is needed to provide defaults for them.
Example
System.out.println(parse("2022-03-03"));
System.out.println(parse("2022-03-03 15:10:05"));
System.out.println(parse("2025"));
Outputs the following:
2022-03-03T23:59:59
2022-03-03T15:10:05
Exception in thread "main" java.time.format.DateTimeParseException: Text '2025' could not be parsed at index 4
Related
I'm trying to parse the date format used in PDFs. According to this page, the format looks as follows:
D:YYYYMMDDHHmmSSOHH'mm'
Where all components except the year are optional. I assume this means the string can be cut off at any point as i.e. specifying a year and an hour without specifying a month and a day seems kind of pointless to me. Also, it would make parsing pretty much impossible.
As far as I can tell, Java does not support zone offsets containing single quotes. Therefore, the first step would be to get rid of those:
D:YYYYMMDDHHmmSSOHHmm
The resulting Java date pattern should then look like this:
['D:']uuuu[MM[dd[HH[mm[ss[X]]]]]]
And my overall code looks like this:
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("['D:']uuuu[MM[dd[HH[mm[ss[X]]]]]]");
TemporalAccessor temporalAccessor = formatter.parseBest("D:20020101",
ZonedDateTime::from,
LocalDateTime::from,
LocalDate::from,
Month::from,
Year::from
);
I would expect that to result in a LocalDate object but what I get is java.time.format.DateTimeParseException: Text 'D:20020101' could not be parsed at index 2.
I've played around a bit with that and found out that everything works fine with the optional literal at the beginning but as soon as I add optional date components, I get an exception.
Can anybody tell me what I'm doing wrong?
Thanks in advance!
I've found a solution:
String dateString = "D:20020101120000+01'00'";
String normalized = dateString.replace("'", "");
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("['D:']ppppy[ppM[ppd[ppH[ppm[pps[X]]]]]]");
TemporalAccessor temporalAccessor = formatter.parseBest(normalized,
OffsetDateTime::from,
LocalDateTime::from,
LocalDate::from,
YearMonth::from,
Year::from
);
As it seems, the length of the components is ambiguous and parsing of the date without any separators thus failed.
When specifying a padding, the length of each component is clearly stated and the date can therefore be parsed.
At least that's my theory.
I was recently trying to make a generic date and time parsing method with the java 8 time API, mainly for interfacing with older code using Date.
I wanted to do something like that:
public static Date parse(String dateStr, String pattern) {
return Date.from(Instant.parse(dateStr, DateTimeFormatter.ofPattern(pattern)));
}
The problem is that with the time API, the class to use depends on the pattern DateTimeFormatter.parse will never fail but will return a TemporalAccessor which is horrible to work with and convert to a usable class.
And LocalDateTime.parse will fail if the pattern has no time information like "dd/MM/yyyy". Other classes like Instant, ZonedDateTime, etc. will all fail to parse if the pattern doesn't match the expected class.
Ideally, I'd like a way to parse leniently and return an Instant, with default values for missing fields, but I can't find a way to do that.
Any idea?
You can use DateTimeFormatterBuilder::parseDefaulting to set default values.
var now = ZonedDateTime.now();
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern(pattern)
.parseDefaulting(ChronoField.OFFSET_SECONDS, now.getOffset().getTotalSeconds())
.parseDefaulting(ChronoField.YEAR, now.getYear())
.parseDefaulting(ChronoField.MONTH_OF_YEAR, now.getMonthValue())
.parseDefaulting(ChronoField.DAY_OF_MONTH, now.getDayOfMonth())
.parseDefaulting(ChronoField.HOUR_OF_DAY, now.getHour())
.parseDefaulting(ChronoField.MINUTE_OF_HOUR, now.getMinute())
.parseDefaulting(ChronoField.SECOND_OF_MINUTE, now.getSecond())
.toFormatter(Locale.ROOT);
Instant dt = Instant.from(formatter.parse(str));
Note that it's important to first append the pattern using appendPattern, and then set all your defaults using parseDefaulting.
Also note that I used the current time stamp to fill the defaults. So, for example, if you left out the year, it takes the current year (2022 at the time of writing). Of course, the defaults depend on your exact use case.
Examples:
At the time of writing, it's 2022-06-09T17:18:36+02:00.
System.out.println(parse("9-6", "d-M"));
System.out.println(parse("2023", "uuuu"));
System.out.println(parse("10:13", "H:m"));
System.out.println(parse("25 Dec, 16:22", "d MMM, H:mm"));
resolves to
2022-06-09T15:18:36Z
2023-06-09T15:18:36Z
2022-06-09T08:13:36Z
2022-12-25T14:22:36Z
I need to read one csv file which has different time format in one timestamp column. It can be anything from below mentioned 5 formats. I need to match the fetched date and parse accordingly on each row.
Please suggest how to validate ad parse it. thanks in advance.
public static final String DEFAULT_DATE_FORMAT_PATTERN = "yyyy-MM-dd";
public static final String DEFAULT_DATE_TIME_FORMAT_PATTERN = "yyyy-MM-dd HH:mm:ss.SSS";
public static final String DATE_TIME_MINUTES_ONLY_FORMAT_PATTERN = "yyyy-MM-dd HH:mm";
public static final String DATE_TIME_WITHOUT_MILLIS_FORMAT_PATTERN = "yyyy-MM-dd HH:mm:ss";
Epoch in milli
What you need is a formatter with optional parts. A pattern can contain square brackets to denote an optional part, for example HH:mm[:ss]. The formatter then is required to parse HH:mm, and tries to parse the following text as :ss, or skips it if that fails. yyyy-MM-dd[ HH:mm[:ss[.SSS]]] would then be the pattern.
There is only one issue here – when you try to parse a string with the pattern yyyy-MM-dd (so without time part) using LocalDateTime::parse, it will throw a DateTimeFormatException with the message Unable to obtain LocalDateTime from TemporalAccessor. Apparently, at least one time part must be available to succeed.
Luckily, we can use a DateTimeFormatterBuilder to build a pattern, instructing the formatter to use some defaults if information is missing from the parsed text. Here it is:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("yyyy-MM-dd[ HH:mm[:ss[.SSS]]]")
.parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
.parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
.parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
.toFormatter();
LocalDateTime dateTime = LocalDateTime.parse(input, formatter);
Tests:
String[] inputs = {
"2020-10-22", // OK
"2020-10-22 14:55", // OK
"2020-10-22T14:55", // Fails: incorrect format
"2020-10-22 14:55:23",
"2020-10-22 14:55:23.9", // Fails: incorrect fraction of second
"2020-10-22 14:55:23.91", // Fails: incorrect fraction of second
"2020-10-22 14:55:23.917", // OK
"2020-10-22 14:55:23.9174", // Fails: incorrect fraction of second
"2020-10-22 14:55:23.917428511" // Fails: incorrect fraction of second
};
And what about epoch in milli?
Well, this cannot be parsed directly by the DateTimeFormatter. But what's more: an epoch in milli has an implicit timezone: UTC. The other patterns lack a timezone. So an epoch is a fundamentally different piece of information. One thing you could do is assume a timezone for the inputs missing one.
However, if you nevertheless want to parse the instant, you could try to parse it as a long using Long::parseLong, and if it fails, then try to parse with the formatter. Alternatively, you could use a regular expression (like -?\d+ or something) to try to match the instant, and if it does, then parse as instant, and if it fails, then try to parse with the abovementioned formatter.
The brute force approach:
simply try your 4 formats, one after the other to parse the incoming string
if parsing throws an exception, try the next one
if parsing passes, well, that format just matched
Of course, if we are talking about larger tables, that is quite inefficient. Possible optimisations:
obviously, the different patterns have subtle differences, so you could use indexOf() checks first. Like: if the value to be parsed contains no ':' char, then it can only be the first pattern.
you can look at your data manually to figure the actual distribution of patterns that are used. then you adapt the order of patterns to try to the likelihood of the pattern being used in your data
Alternatively: you could define your own regex. The only thing that makes it slightly ugly is the fact that your input uses month names, not month number. But I think it shouldn't be too hard to write up a single regex that covers all your cases.
I am receiving timestamp in format : HHmmss followed by milleseconds and microseconds.Microseconds after the '.' are optional
For example: "timestamp ":"152656375.489991" is 15:26:56:375.489991.
Below code is throwing exceptions:
final DateTimeFormatter FORMATTER = new DateTimeFormatterBuilder()
.appendPattern("HHmmssSSS")
.appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
.toFormatter();
LocalTime.parse(dateTime,FORMATTER);
Can someone please help me with DateTimeformatter to get LocalTime in java.
Here is the stacktrace from the exception from the code above:
java.time.format.DateTimeParseException: Text '152656375.489991' could not be parsed: Conflict found: NanoOfSecond 375000000 differs from NanoOfSecond 489991000 while resolving MicroOfSecond
at java.base/java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1959)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1894)
at java.base/java.time.LocalTime.parse(LocalTime.java:463)
at com.ajax.so.Test.main(Test.java:31)
Caused by: java.time.DateTimeException: Conflict found: NanoOfSecond 375000000 differs from NanoOfSecond 489991000 while resolving MicroOfSecond
at java.base/java.time.format.Parsed.updateCheckConflict(Parsed.java:329)
at java.base/java.time.format.Parsed.resolveTimeFields(Parsed.java:462)
at java.base/java.time.format.Parsed.resolveFields(Parsed.java:267)
at java.base/java.time.format.Parsed.resolve(Parsed.java:253)
at java.base/java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1994)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1890)
... 3 more
There are many options, depending on the possible variations in the strings you need to parse.
1. Modify the string so you need no formatter
String timestampString = "152656375.489991";
timestampString = timestampString.replaceFirst(
"^(\\d{2})(\\d{2})(\\d{2})(\\d{3})(?:\\.(\\d*))?$", "$1:$2:$3.$4$5");
System.out.println(timestampString);
LocalTime time = LocalTime.parse(timestampString);
System.out.println(time);
The output from this snippet is:
15:26:56.375489991
The replaceFirst() call modifies your string into 15:26:56.375489991, the default format for LocalTime (ISO 8601) so it can be parsed without any explicit formatter. For this I am using a regular expression that may not be too readable. (…) enclose groups that I use as $1, $2, etc., in the replacement string. (?:…) denotes a non-capturing group, that is, cannot be used in the replacement string. I put a ? after it to specify that this group is optional in the original string.
This solution accepts from 1 through 6 decimals after the point and also no fractional part at all.
2. Use a simpler string modification and a formatter
I want to modify the string so I can use this formatter:
private static DateTimeFormatter fullParser
= DateTimeFormatter.ofPattern("HHmmss.[SSSSSSSSS][SSS]");
This requires the point to be after the seconds rather than after the milliseoncds. So move it three places to the left:
timestampString = timestampString.replaceFirst("(\\d{3})(?:\\.|$)", ".$1");
LocalTime time = LocalTime.parse(timestampString, fullParser);
15:26:56.375489991
Again I am using a non-capturing group, this time to say that after the (captured) group of three digits must come either a dot or the end of the string.
3. The same with a more flexible parser
The formatter above specifies that there must be either 9 or 3 digits after the decimal point, which may be too rigid. If you want to accept something in between too, a builder can build a more flexible formatter:
private static DateTimeFormatter fullParser = new DateTimeFormatterBuilder()
.appendPattern("HHmmss")
.appendFraction(ChronoField.NANO_OF_SECOND, 3, 9, true)
.toFormatter();
I think that this would be my favourite approach, again depending on the exact requirements.
4. Parse only a part of the string
There is no problem so big and awful that it cannot simply be run away
from (Linus in Peanuts, from memory)
If you can live without the microseconds, ignore them:
private static DateTimeFormatter partialParser
= DateTimeFormatter.ofPattern("HHmmssSSS");
To parse only a the part of the string up to the point using this formatter:
TemporalAccessor parsed
= partialParser.parse(timestampString, new ParsePosition(0));
LocalTime time = LocalTime.from(parsed);
15:26:56.375
As you can see it has ignored the part from the decimal point, which I wouldn’t find too satisfactory.
What went wrong in your code?
Your 6 digits after the decimal point denote nanoseconds. Microseconds would have been only 3 decimals after the milliseconds. To use appendFraction() to parse these you would have needed a TemporalUnit of nano of millisecond. The ChronoUnit enum offers nano of day and nano of second, but not nano of milli. TemporalUnit is an interface, so in theory we could develop our own nano of milli class for the purpose. I tried to develop a class implementing TemporalUnit once, but gave up, I couldn’t get it to work.
Links
Wikipedia article: ISO 8601
Regular expressions in Java - Tutorial
String dateString = "20110706 1607";
DateTimeFormatter dateStringFormat = DateTimeFormat.forPattern("YYYYMMDD HHMM");
DateTime dateTime = dateStringFormat.parseDateTime(dateString);
Resulting stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid format: "201107206 1607" is malformed at " 1607"
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:644)
at org.joda.time.convert.StringConverter.getInstantMillis(StringConverter.java:65)
at org.joda.time.base.BaseDateTime.<init>(BaseDateTime.java:171)
at org.joda.time.DateTime.<init>(DateTime.java:168)
......
Any thoughts? If I truncate the string to 20110706 with pattern "YYYYMMDD" it works, but I need the hour and minute values as well. What's odd is that I can convert a Jodatime DateTime to a String using the same pattern "YYYYMMDD HHMM" without issue
Thanks for looking
Look at your pattern - you're specifying "MM" twice. That can't possibly be right. That would be trying to parse the same field (month in this case) twice from two different bits of the text. Which would you expect to win? You want:
DateTimeFormat.forPattern("yyyyMMdd HHmm")
Look at the documentation for DateTimeFormat to see what everything means.
Note that although calling toString with that pattern will produce a string, it won't produce the string you want it to. I wouldn't be surprised if the output even included "YYYY" and "DD" due to the casing, although I can't test it right now. At the very least you'd have the month twice instead of the minutes appearing at the end.