Time interval localization in Java - java

I need to represent a time interval as localized string like this: 10 hours 25 minutes 1 second depending on Locale.
It is pretty easy to realize by hand in English:
String hourStr = hours == 1 ? "hour" : "hours" etc.
But I need some "out-of-the-box" Java (maybe Java8) mechanism according to rules of different languages.
Does Java have it, or I need to realize it for each Locale used in app by myself?

Look at Joda-Time. It supports the languages English, Danish, Dutch, French, German, Japanese, Polish, Portuguese and Spanish with version 2.5.
Period period = new Period(new LocalDate(2013, 4, 11), LocalDate.now());
PeriodFormatter formatter = PeriodFormat.wordBased(Locale.GERMANY);
System.out.println(formatter.print(period)); // output: 1 Jahr, 2 Monate und 3 Wochen
formatter = formatter.withLocale(Locale.ENGLISH);
System.out.println(formatter.print(period)); // output: 1 Jahr, 2 Monate und 3 Wochen (bug???)
formatter = PeriodFormat.wordBased(Locale.ENGLISH);
System.out.println(formatter.print(period)); // output: 1 year, 2 months and 3 weeks
You might to adjust the interpunctuation chars however. To do this you might need to copy and edit the messages-resource-files in your classpath which have this format (here english variant):
PeriodFormat.space=\
PeriodFormat.comma=,
PeriodFormat.commandand=,and
PeriodFormat.commaspaceand=, and
PeriodFormat.commaspace=,
PeriodFormat.spaceandspace=\ and
PeriodFormat.year=\ year
PeriodFormat.years=\ years
PeriodFormat.month=\ month
PeriodFormat.months=\ months
PeriodFormat.week=\ week
PeriodFormat.weeks=\ weeks
PeriodFormat.day=\ day
PeriodFormat.days=\ days
PeriodFormat.hour=\ hour
PeriodFormat.hours=\ hours
PeriodFormat.minute=\ minute
PeriodFormat.minutes=\ minutes
PeriodFormat.second=\ second
PeriodFormat.seconds=\ seconds
PeriodFormat.millisecond=\ millisecond
PeriodFormat.milliseconds=\ milliseconds
Since version 2.5 it might be also possible to apply complex regular expressions to model more complex plural rules. Personally I see it as user-unfriendly, and regular expressions might not be sufficient for languages like Arabic (my first impression). There are also other limitations with localization, see this pull request in debate.
Side notice: Java 8 is definitely not able to do localized duration formatting.
UPDATE from 2015-08-26:
With the version of my library Time4J-v4.3 (available in Maven Central) following more powerful solution is possible which supports currently 45 languages:
import static net.time4j.CalendarUnit.*;
import static net.time4j.ClockUnit.*;
// the input for creating the duration (in Joda-Time called Period)
IsoUnit[] units = {YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS};
PlainTimestamp start = PlainDate.of(2013, 4, 11).atTime(13, 45, 21);
PlainTimestamp end = SystemClock.inLocalView().now();
// create the duration
Duration<?> duration = Duration.in(units).between(start, end);
// print the duration (here not abbreviated, but with full unit names)
String s = PrettyTime.of(Locale.US).print(duration, TextWidth.WIDE);
System.out.println(s);
// example output: 1 year, 5 months, 7 days, 3 hours, 25 minutes, and 49 seconds
Why is Time4J better for your problem?
It has a more expressive way to say in which units a duration should be calculated.
It supports 45 languages.
It supports the sometimes complex plural rules of languages inclusive right-to-left scripts like in Arabic without any need for manual configuration
It supports locale-dependent list patterns (usage of comma, space or words like "and")
It supports 3 different text widths: WIDE, ABBREVIATED (SHORT) and NARROW
The interoperability with Java-8 is better because Java-8-types like java.time.Period or java.time.Duration are understood by Time4J.

Related

Parse period/duration string literal into Period/Duration instance in Java

I would like to parse any duration/period string literal (in the format shown in the examples below) into a Period/Duration instance in Java.
By duration/period, I mean an amount of time that contains both date-based and time-based quantities. In Java, Duration alone only works for time-based quantities like seconds, minutes, and hours, while Period alone only works for date-based quantities like years, months, and days.
The string to parse won't always contain all quantities.
Examples:
29 days 23 hours 59 minutes 20 seconds
4 years 10 months 10 seconds
10 days 9 minutes
1 month
How would I do that, if possible?
Convert your string to ISO 8601 format and parse into PeriodDuration
My idea is to use the PeriodDuration class of the ThreeTen Extra project. It combines the Period and Duration classes that you already mention. A limitation is that its parse method only accepts ISO 8601 format, so we need to convert your string into that format first. ISO 8601 format is like P10DT9M for 10 days 9 minutes. The P is fixed. The T marks the beginning of the time-based part if there is one (hours, minutes, seconds) to separate it from the date-based part (years, months, weeks, days). A nice consequence is that we can always tell whether M means months or minutes: If M comes before T or there is no T, it means months. If M comes after T, it means minutes. My first code snippet converts your string to ISO 8601.
String[] examples = {
"29 days 23 hours 59 minutes 20 seconds",
"4 years 10 months 10 seconds",
"10 days 9 minutes",
"1 month"
};
for (String pds : examples) {
// If hours, minutes or seconds are present, put a T before them
String temp = pds.replaceFirst("(\\d+ *(?:hour|minute|second))", " T $1");
// Abbreviate all units to 1 letter; remove spaces
temp = temp.replaceAll("([ymwdhms])[a-z]*", "$1").replace(" ", "");
// Prepend P
String iso = "P" + temp;
System.out.println(iso);
}
Output so far is:
P29dT23h59m20s
P4y10mT10s
P10dT9m
P1m
After this we should be able to parse:
PeriodDuration pd = PeriodDuration.parse(iso);
I have not installed ThreeTen Extra, so I have not tried this last line before posting it. The OP reported in a comment that it works like a charm on ThreeTen Extra 1.6.0. The only caveat is that the units have to be in descending order when parsing the string, otherwise it throws a DateTimeParseException.
Don’t worry that ISO 8601 format is usually given in upper case and your units are in lower case. The documentation of the parse method states:
The sections have suffixes in ASCII of "Y" for years, "M" for months,
"W" for weeks, "D" for days, "H" for hours, "M" for minutes, "S" for
seconds, accepted in upper or lower case.
Edit: With thanks to Arvind Kumar Avinash for the comment, if you need to accept upper case letters too, add the (?i) flag expression in your regular expression to enable case-insensitive matching:
temp = temp.replaceAll("(?i)([ymwdhms])[a-z]*", "$1").replace(" ", "");
Links
ThreeTen Extra project home
Documentation of PeriodDuration.parse()
Wikipedia article: ISO 8601

SimpleDateFormat error while parsing a date yyyyMMdd format java [duplicate]

In java.util.Calendar, January is defined as month 0, not month 1. Is there any specific reason to that ?
I have seen many people getting confused about that...
It's just part of the horrendous mess which is the Java date/time API. Listing what's wrong with it would take a very long time (and I'm sure I don't know half of the problems). Admittedly working with dates and times is tricky, but aaargh anyway.
Do yourself a favour and use Joda Time instead, or possibly JSR-310.
EDIT: As for the reasons why - as noted in other answers, it could well be due to old C APIs, or just a general feeling of starting everything from 0... except that days start with 1, of course. I doubt whether anyone outside the original implementation team could really state reasons - but again, I'd urge readers not to worry so much about why bad decisions were taken, as to look at the whole gamut of nastiness in java.util.Calendar and find something better.
One point which is in favour of using 0-based indexes is that it makes things like "arrays of names" easier:
// I "know" there are 12 months
String[] monthNames = new String[12]; // and populate...
String name = monthNames[calendar.get(Calendar.MONTH)];
Of course, this fails as soon as you get a calendar with 13 months... but at least the size specified is the number of months you expect.
This isn't a good reason, but it's a reason...
EDIT: As a comment sort of requests some ideas about what I think is wrong with Date/Calendar:
Surprising bases (1900 as the year base in Date, admittedly for deprecated constructors; 0 as the month base in both)
Mutability - using immutable types makes it much simpler to work with what are really effectively values
An insufficient set of types: it's nice to have Date and Calendar as different things,
but the separation of "local" vs "zoned" values is missing, as is date/time vs date vs time
An API which leads to ugly code with magic constants, instead of clearly named methods
An API which is very hard to reason about - all the business about when things are recomputed etc
The use of parameterless constructors to default to "now", which leads to hard-to-test code
The Date.toString() implementation which always uses the system local time zone (that's confused many Stack Overflow users before now)
Because doing math with months is much easier.
1 month after December is January, but to figure this out normally you would have to take the month number and do math
12 + 1 = 13 // What month is 13?
I know! I can fix this quickly by using a modulus of 12.
(12 + 1) % 12 = 1
This works just fine for 11 months until November...
(11 + 1) % 12 = 0 // What month is 0?
You can make all of this work again by subtracting 1 before you add the month, then do your modulus and finally add 1 back again... aka work around an underlying problem.
((11 - 1 + 1) % 12) + 1 = 12 // Lots of magical numbers!
Now let's think about the problem with months 0 - 11.
(0 + 1) % 12 = 1 // February
(1 + 1) % 12 = 2 // March
(2 + 1) % 12 = 3 // April
(3 + 1) % 12 = 4 // May
(4 + 1) % 12 = 5 // June
(5 + 1) % 12 = 6 // July
(6 + 1) % 12 = 7 // August
(7 + 1) % 12 = 8 // September
(8 + 1) % 12 = 9 // October
(9 + 1) % 12 = 10 // November
(10 + 1) % 12 = 11 // December
(11 + 1) % 12 = 0 // January
All of the months work the same and a work around isn't necessary.
C based languages copy C to some degree. The tm structure (defined in time.h) has an integer field tm_mon with the (commented) range of 0-11.
C based languages start arrays at index 0. So this was convenient for outputting a string in an array of month names, with tm_mon as the index.
There has been a lot of answers to this, but I will give my view on the subject anyway.
The reason behind this odd behavior, as stated previously, comes from the POSIX C time.h where the months were stored in an int with the range 0-11.
To explain why, look at it like this; years and days are considered numbers in spoken language, but months have their own names. So because January is the first month it will be stored as offset 0, the first array element. monthname[JANUARY] would be "January". The first month in the year is the first month array element.
The day numbers on the other hand, since they do not have names, storing them in an int as 0-30 would be confusing, add a lot of day+1 instructions for outputting and, of course, be prone to alot of bugs.
That being said, the inconsistency is confusing, especially in javascript (which also has inherited this "feature"), a scripting language where this should be abstracted far away from the langague.
TL;DR: Because months have names and days of the month do not.
In Java 8, there is a new Date/Time API JSR 310 that is more sane. The spec lead is the same as the primary author of JodaTime and they share many similar concepts and patterns.
I'd say laziness. Arrays start at 0 (everyone knows that); the months of the year are an array, which leads me to believe that some engineer at Sun just didn't bother to put this one little nicety into the Java code.
Probably because C's "struct tm" does the same.
Because programmers are obsessed with 0-based indexes. OK, it's a bit more complicated than that: it makes more sense when you're working with lower-level logic to use 0-based indexing. But by and large, I'll still stick with my first sentence.
java.time.Month
Java provides you another way to use 1 based indexes for months. Use the java.time.Month enum. One object is predefined for each of the twelve months. They have numbers assigned to each 1-12 for January-December; call getValue for the number.
Make use of Month.JULY (Gives you 7)
instead of Calendar.JULY (Gives you 6).
(import java.time.*;)
Personally, I took the strangeness of the Java calendar API as an indication that I needed to divorce myself from the Gregorian-centric mindset and try to program more agnostically in that respect. Specifically, I learned once again to avoid hardcoded constants for things like months.
Which of the following is more likely to be correct?
if (date.getMonth() == 3) out.print("March");
if (date.getMonth() == Calendar.MARCH) out.print("March");
This illustrates one thing that irks me a little about Joda Time - it may encourage programmers to think in terms of hardcoded constants. (Only a little, though. It's not as if Joda is forcing programmers to program badly.)
For me, nobody explains it better than mindpro.com:
Gotchas
java.util.GregorianCalendar has far fewer bugs and gotchas than the
old java.util.Date class but it is still no picnic.
Had there been programmers when Daylight Saving Time was first
proposed, they would have vetoed it as insane and intractable. With
daylight saving, there is a fundamental ambiguity. In the fall when
you set your clocks back one hour at 2 AM there are two different
instants in time both called 1:30 AM local time. You can tell them
apart only if you record whether you intended daylight saving or
standard time with the reading.
Unfortunately, there is no way to tell GregorianCalendar which you
intended. You must resort to telling it the local time with the dummy
UTC TimeZone to avoid the ambiguity. Programmers usually close their
eyes to this problem and just hope nobody does anything during this
hour.
Millennium bug. The bugs are still not out of the Calendar classes.
Even in JDK (Java Development Kit) 1.3 there is a 2001 bug. Consider
the following code:
GregorianCalendar gc = new GregorianCalendar();
gc.setLenient( false );
/* Bug only manifests if lenient set false */
gc.set( 2001, 1, 1, 1, 0, 0 );
int year = gc.get ( Calendar.YEAR );
/* throws exception */
The bug disappears at 7AM on 2001/01/01 for MST.
GregorianCalendar is controlled by a giant of pile of untyped int
magic constants. This technique totally destroys any hope of
compile-time error checking. For example to get the month you use
GregorianCalendar. get(Calendar.MONTH));
GregorianCalendar has the raw
GregorianCalendar.get(Calendar.ZONE_OFFSET) and the daylight savings
GregorianCalendar. get( Calendar. DST_OFFSET), but no way to get the
actual time zone offset being used. You must get these two separately
and add them together.
GregorianCalendar.set( year, month, day, hour, minute) does not set
the seconds to 0.
DateFormat and GregorianCalendar do not mesh properly. You must
specify the Calendar twice, once indirectly as a Date.
If the user has not configured his time zone correctly it will default
quietly to either PST or GMT.
In GregorianCalendar, Months are numbered starting at January=0,
rather than 1 as everyone else on the planet does. Yet days start at 1
as do days of the week with Sunday=1, Monday=2,… Saturday=7. Yet
DateFormat. parse behaves in the traditional way with January=1.
The true reason why
You would think that when we deprecated most of Date and added the new
Calendar class, we would have fixed Date's biggest annoyance: the fact
that January is month 0. We certainly should have, but unfortunately
we didn't. We were afraid that programmers would be confused if Date
used zero-based months and Calendar used one-based months. And a few
programmers probably would have been. But in hindsight, the fact that
Calendar is still zero-based has caused an enormous amount of
confusion, and it was probably the biggest single mistake in the Java
international API's.
Quoted from International Calendars in Java by Laura Werner, link at the bottom.
The better alternative: java.time
This may just be repeating what others have said, throw the old and poorly designed Calendar class overboard and use java.time, the modern Java date and time API. There months are consistently sanely numbered from 1 for January through 12 for December.
If you are getting a Calendar from a legacy API not yet upgraded to java.time, the first thing to do is to convert to a modern ZonedDateTime. Depending on your needs you may do further conversions from there. In most of the world the Calendar object you get will virtually always be an instance of the GregorianCalendar subclass (since the Calendar class itself is abstract). To demonstreate:
Calendar oldfashionedCalendarObject = Calendar.getInstance();
ZonedDateTime zdt
= ((GregorianCalendar) oldfashionedCalendarObject).toZonedDateTime();
System.out.println(zdt);
System.out.format("Month is %d or %s%n", zdt.getMonthValue(), zdt.getMonth());
Output when I ran just now in my time zone:
2021-03-17T23:18:47.761+01:00[Europe/Copenhagen]
Month is 3 or MARCH
Links
International Calendars in Java by Laura Werner
Oracle tutorial: Date Time explaining how to use java.time.
tl;dr
Month.FEBRUARY.getValue() // February → 2.
2
Details
The Answer by Jon Skeet is correct.
Now we have a modern replacement for those troublesome old legacy date-time classes: the java.time classes.
java.time.Month
Among those classes is the Month enum. An enum carries one or more predefined objects, objects that are automatically instantiated when the class loads. On Month we have a dozen such objects, each given a name: JANUARY, FEBRUARY, MARCH, and so on. Each of those is a static final public class constant. You can use and pass these objects anywhere in your code. Example: someMethod( Month.AUGUST )
Fortunately, they have sane numbering, 1-12 where 1 is January and 12 is December.
Get a Month object for a particular month number (1-12).
Month month = Month.of( 2 ); // 2 → February.
Going the other direction, ask a Month object for its month number.
int monthNumber = Month.FEBRUARY.getValue(); // February → 2.
Many other handy methods on this class, such as knowing the number of days in each month. The class can even generate a localized name of the month.
You can get the localized name of the month, in various lengths or abbreviations.
String output =
Month.FEBRUARY.getDisplayName(
TextStyle.FULL ,
Locale.CANADA_FRENCH
);
février
Also, you should pass objects of this enum around your code base rather than mere integer numbers. Doing so provides type-safety, ensures a valid range of values, and makes your code more self-documenting. See Oracle Tutorial if unfamiliar with the surprisingly powerful enum facility in Java.
You also may find useful the Year and YearMonth classes.
About java.time
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, .Calendar, & java.text.SimpleDateFormat.
The Joda-Time project, now in maintenance mode, advises migration to java.time.
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
Where to obtain the java.time classes?
Java SE 8 and SE 9 and later
Built-in.
Part of the standard Java API with a bundled implementation.
Java 9 adds some minor features and fixes.
Java SE 6 and SE 7
Much of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
Android
The ThreeTenABP project adapts ThreeTen-Backport (mentioned above) for Android specifically.
See How to use….
The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.
Set the month to Calendar.MARCH, or compare to see if it == Calendar.JUNE, for example.
The Date and Calendar classes date back to the very early days of Java, when folks were still figuring things out, and they are widely regarded as not very well designed.
If Calendar were created today with the same design, rather than ints for Calendar.JUNE, etc., they'd use enums.
It isn't exactly defined as zero per se, it's defined as Calendar.January. It is the problem of using ints as constants instead of enums. Calendar.January == 0.
Because language writing is harder than it looks, and handling time in particular is a lot harder than most people think. For a small part of the problem (in reality, not Java), see the YouTube video "The Problem with Time & Timezones - Computerphile" at https://www.youtube.com/watch?v=-5wpm-gesOY. Don't be surprised if your head falls off from laughing in confusion.
In addition to DannySmurf's answer of laziness, I'll add that it's to encourage you to use the constants, such as Calendar.JANUARY.
Because everything starts with 0. This is a basic fact of programming in Java. If one thing were to deviate from that, then that would lead to a whole slue of confusion. Let's not argue the formation of them and code with them.

SimpleDateFormat: Inconsistent Pattern Letters

Recently I looked into the Documentation for SimpleDateFormat and noticed some inconsistencies (in my opinion) in how they handle the letters for parsing.
For example, look at these representations:
M: Month in year
D: Day in year
d: Day in month
"x in year" is a bigger timespan than "x in month" and has therefore uppercase letters so this makes perfect sense to me.
But then there is
w: Week in year
W: Week in month
Here, the letters are swapped, which is totally counter-intuitive in my opinion. It seems like these two should be the other way around, to conform to the "pattern" mentioned above.
Another example are the different hour-representations:
H: Hour in day (0-23)
k: Hour in day (1-24)
K: Hour in am/pm (0-11)
h: Hour in am/pm (1-12)
I kinda get the idea. Uppercase letters for hours starting with 0, lowercase letters for hours starting with 1.
Here, both lowercase letters should be swapped, because shouldn't the same letters belong to the same category? (H/h for hour in day, K/k for hour in am/pm)
So my question is this: Is there a reason behind this seemingly counter-intuitive representation?
The only reason i could think of is, that some of these pattern letters were added at a later time and they couldn't change the already existing ones, because of downwards compatibility. But other than that, it doesn't make much sense to me.
Citation:
"The only reason i could think of is, that some of these pattern
letters were added at a later time and they couldn't change the
already existing ones, because of downwards compatibility."
Your suspicion is correct. But you cannot (only) blame Sun respective Oracle designers for that. They have just overtaken the whole stuff originally from Taligent (now merged into IBM). And IBM itself is one of the leading companies behind Unicode consortium which defined the CLDR-standard. In that standard all these pattern symbols were defined (indeed in a totally inconsistent manner - only explainable by historic development).
Worse, the inconsistencies in CLDR don't stop: Recently we have got a NARROW variant in addition to SHORT, LONG etc. That means if you want the shortes possible representation of a month as a single letter then you need to specify the pattern symbol MMMMM (5 letters because one letter M is already reserved for the numerical short form).
Another notice: SimpleDateFormat does not even strictly follow CLDR. For example Oracle has defined the pattern symbol "u" as ISO-Day number of week (1 = Monday, ..., 7 = Sunday) in Java-version 7 although CLDR has already introduced the same symbol earlier as the proleptic ISO-year. And Java 8 again deviates, invents new symbols not known in CLDR but else tries to follow CLDR more closely.
We have already remarkable differences using pattern languages (compare Java-6, Java-7, Java-8, pure CLDR and Joda-Time). And I fear this will never stop.

Parsing String to Dates - Java

This is the problem:
I have some .csv files with travels info, and the dates appear like strings (each line for one travel):
"All Mondays from January-May and October-December. All days from June To September"
"All Fridays from February to June"
"Monday, Friday and Saturday and Sunday from 10 January to 30 April"
"from 01 of November to 30 April. All days except fridays from 2 to 24 of november and sunday from 2 to 30 of december"
"All sundays from 02 december to 28 april"
"5, 12, 20 of march, 11, 18 of april, 2, 16, 30 of may, 6, 13, 27 june"
"All saturdays from February to June, and from September to December"
"1 to 17 of december, 1 to 31 of january"
"All mondays from February to november"
I must parse the strings to Dates, and keep them into an array for each travel.
The problem is that I don't know how to do it. Even my univesrity teachers told me that they don't know how to do so :S. I can't find/create a pattern using http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
After parsing them i have to search all travels between two dates.
But how? How to parse them? it's possible?
This requires Natural Language Processing (NLP) , see Wikipedia for an account:
http://en.wikipedia.org/wiki/Natural_language_processing.
Your problem as stated is very hard. There are many ways of representing a single date, and your examples include ranges of dates and formulae for generating dates. It sounds as if you have a limited subset of language - frequent use of "all", "from", etc.
If you are in control of the language (i.e. these are being generated by humans who comply with your documentation) then you have a chance of formalising it (although it will take a lot of work - months). If you are not in charge of it, then every time a new phrase appears you will have to add it to the specs.
I suggest you got through the file and look for stock phrases "All [weekdayname]s [from | between | until | before]". Or "in [January | February ...]". Then substitute these in in phrases. If you find this covers all the cases you may be able to extract particular phrases". But if you have anaphora like "next Tuesday" it will be much harder.
You're in the domain of NLP (Natural Language Processing), what is possible or impossible is fuzzy in this domain. From a fast Google search, I've found that the Natty Date Parser might be useful for you.
For more theory background on NLP, you might be interested in Natural Language Processing course of Stanford University on Coursera (at the moment the course is not open for enrolment, but lectures are available for free.
You can also use a set of strict regular expressions that would match only one of your possible cases and apply them from the most restrictive to the most relaxed.
The first thing I would define to attack your problem is what you expect as an output of your method, since in some cases it's a single date, in some cases an interval, in some others multiple intervals.

ISO 8601 Time Interval Parsing in Java

ISO 8601 defines a syntax for representing a time interval.
There are four ways to express a time interval:
Start and end, such as "2007-03-01T13:00:00Z/2008-05-11T15:30:00Z"
Start and duration, such as "2007-03-01T13:00:00Z/P1Y2M10DT2H30M"
Duration and end, such as "P1Y2M10DT2H30M/2008-05-11T15:30:00Z"
Duration only, such as "P1Y2M10DT2H30M", with additional context information
If any elements are missing from the end value, they are assumed to be the same as for the start value including the time zone. This feature of the standard allows for concise representations of time intervals. For example, the date of a two-hour meeting including the start and finish times could be simply shown as "2007-12-14T13:30/15:30", where "/15:30" implies "/2007-12-14T15:30" (the same date as the start), or the beginning and end dates of a monthly billing period as "2008-02-15/03-14", where "/03-14" implies "/2008-03-14" (the same year as the start).
In addition, repeating intervals are formed by adding "R[n]/" to the beginning of an interval expression, where R is used as the letter itself and [n] is replaced by the number of repetitions. Leaving out the value for [n] means an unbounded number of repetitions. So, to repeat the interval of "P1Y2M10DT2H30M" five times starting at "2008-03-01T13:00:00Z", use "R5/2008-03-01T13:00:00Z/P1Y2M10DT2H30M".
I am looking for a good Java parser (if possible compatible with the Joda-Time library) to parse this syntax. Any pointers to a good library ?
java.time
The java.time framework built into Java 8 and later has a Duration.parse method for parsing an ISO 8601 formatted duration:
java.time.Duration d = java.time.Duration.parse("PT1H2M34S");
System.out.println("Duration in seconds: " + d.get(java.time.temporal.ChronoUnit.SECONDS));
Prints Duration in seconds: 3754
For anyone on a project that might be restricted from using 3rd party libraries (licensing reasons, or whatever), Java itself provides at least a portion of this capability, since Java 1.6 (or earlier?), using the javax.xml.datatype.DatatypeFactory.newDuration(String) method and Duration class. The DatatypeFactory.newDuration(String) method will parse a string in "PnYnMnDTnHnMnS" format. These classes are intended for XML manipulation, but since XML uses ISO 8601 time notation, they also serve as convenient duration parsing utilities.
Example:
import javax.xml.datatype.*;
Duration dur = DatatypeFactory.newInstance().newDuration("PT5H12M36S");
int hours = dur.getHours(); // Should return 5
I haven't personally used any duration format except the 4th one you list, so I can't vouch for whether it successfully parses them or not.
I take it you have already tried Joda-Time? Feeding the example strings from your question through Interval.parse(Object) reveals that it can handle "start and end", "start and duration" and "duration and end", but not implied fields nor repetition.
2007-03-01T13:00:00Z/2008-05-11T15:30:00Z => from 2007-03-01T13:00:00.000Z to 2008-05-11T15:30:00.000Z
2007-03-01T13:00:00Z/P1Y2M10DT2H30M => from 2007-03-01T13:00:00.000Z to 2008-05-11T15:30:00.000Z
P1Y2M10DT2H30M/2008-05-11T15:30:00Z => from 2007-03-01T13:00:00.000Z to 2008-05-11T15:30:00.000Z
2007-12-14T13:30/15:30 => java.lang.IllegalArgumentException: Invalid format: "15:30" is malformed at ":30"
R5/2008-03-01T13:00:00Z/P1Y2M10DT2H30M => java.lang.IllegalArgumentException: Invalid format: "R5"
The only other comprehensive date/time library that I know of is JSR-310, which does not appear to handle intervals like these.
At this point, building your own improvements on top of Joda-Time is probably your best choice, sorry. Are there any specific ISO interval formats that you need to handle beyond those already supported by Joda-Time?
The only library which is capable to model all the features of interval parsing you want is actually my library Time4J (range-module). Examples:
// case 1 (start/end)
System.out.println(MomentInterval.parseISO("2012-01-01T14:15Z/2014-06-20T16:00Z"));
// output: [2012-01-01T14:15:00Z/2014-06-20T16:00:00Z)
// case 1 (with some elements missing at end component and different offset)
System.out.println(MomentInterval.parseISO("2012-01-01T14:15Z/08-11T16:00+00:01"));
// output: [2012-01-01T14:15:00Z/2012-08-11T15:59:00Z)
// case 1 (with missing date and offset at end component)
System.out.println(MomentInterval.parseISO("2012-01-01T14:15Z/16:00"));
// output: [2012-01-01T14:15:00Z/2012-01-01T16:00:00Z)
// case 2 (start/duration)
System.out.println(MomentInterval.parseISO("2012-01-01T14:15Z/P2DT1H45M"));
// output: [2012-01-01T14:15:00Z/2012-01-03T16:00:00Z)
// case 3 (duration/end)
System.out.println(MomentInterval.parseISO("P2DT1H45M/2012-01-01T14:15Z"));
// output: [2011-12-30T12:30:00Z/2012-01-01T14:15:00Z)
// case 4 (duration only, in standard ISO-format)
Duration<IsoUnit> isoDuration = Duration.parsePeriod("P2DT1H45M");
// case 4 (duration only, in alternative representation)
Duration<IsoUnit> isoDuration = Duration.parsePeriod("P0000-01-01T15:00");
System.out.println(isoDuration); // output: P1M1DT15H
Some remarks:
Other interval classes exist with similar parsing capabilities, for example DateInterval or TimestampInterval in the package net.time4j.range.
For handling durations only (which can span both calendar and clock units as well), see also the javadoc. There are also formatting features, see nested class Duration.Formatter or the localized version net.time4j.PrettyTime (actually in 86 languages).
Interoperability is offered with Java-8 (java.time-package) but not with Joda-Time. For example: The start or end component of a MomentInterval can easily be queried by getStartAsInstant() or getEndAsInstant().
Repeating intervals are supported by the class IsoRecurrence. Example:
IsoRecurrence<MomentInterval> ir =
IsoRecurrence.parseMomentIntervals("R5/2008-03-01T13:00:00Z/P1Y2M10DT2H30M");
ir.intervalStream().forEach(System.out::println);
Output:
[2008-03-01T13:00:00Z/2009-05-11T15:30:00Z)
[2009-05-11T15:30:00Z/2010-07-21T18:00:00Z)
[2010-07-21T18:00:00Z/2011-10-01T20:30:00Z)
[2011-10-01T20:30:00Z/2012-12-11T23:00:00Z)
[2012-12-11T23:00:00Z/2014-02-22T01:30:00Z)

Categories