SimpleDateFormat: Inconsistent Pattern Letters - java

Recently I looked into the Documentation for SimpleDateFormat and noticed some inconsistencies (in my opinion) in how they handle the letters for parsing.
For example, look at these representations:
M: Month in year
D: Day in year
d: Day in month
"x in year" is a bigger timespan than "x in month" and has therefore uppercase letters so this makes perfect sense to me.
But then there is
w: Week in year
W: Week in month
Here, the letters are swapped, which is totally counter-intuitive in my opinion. It seems like these two should be the other way around, to conform to the "pattern" mentioned above.
Another example are the different hour-representations:
H: Hour in day (0-23)
k: Hour in day (1-24)
K: Hour in am/pm (0-11)
h: Hour in am/pm (1-12)
I kinda get the idea. Uppercase letters for hours starting with 0, lowercase letters for hours starting with 1.
Here, both lowercase letters should be swapped, because shouldn't the same letters belong to the same category? (H/h for hour in day, K/k for hour in am/pm)
So my question is this: Is there a reason behind this seemingly counter-intuitive representation?
The only reason i could think of is, that some of these pattern letters were added at a later time and they couldn't change the already existing ones, because of downwards compatibility. But other than that, it doesn't make much sense to me.

Citation:
"The only reason i could think of is, that some of these pattern
letters were added at a later time and they couldn't change the
already existing ones, because of downwards compatibility."
Your suspicion is correct. But you cannot (only) blame Sun respective Oracle designers for that. They have just overtaken the whole stuff originally from Taligent (now merged into IBM). And IBM itself is one of the leading companies behind Unicode consortium which defined the CLDR-standard. In that standard all these pattern symbols were defined (indeed in a totally inconsistent manner - only explainable by historic development).
Worse, the inconsistencies in CLDR don't stop: Recently we have got a NARROW variant in addition to SHORT, LONG etc. That means if you want the shortes possible representation of a month as a single letter then you need to specify the pattern symbol MMMMM (5 letters because one letter M is already reserved for the numerical short form).
Another notice: SimpleDateFormat does not even strictly follow CLDR. For example Oracle has defined the pattern symbol "u" as ISO-Day number of week (1 = Monday, ..., 7 = Sunday) in Java-version 7 although CLDR has already introduced the same symbol earlier as the proleptic ISO-year. And Java 8 again deviates, invents new symbols not known in CLDR but else tries to follow CLDR more closely.
We have already remarkable differences using pattern languages (compare Java-6, Java-7, Java-8, pure CLDR and Joda-Time). And I fear this will never stop.

Related

JSR310 How to internationalize a pattern for a month-day?

Im trying to format a date without a year (just day and month, e.g 12.10)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT) still yield year for me (12.10.20).
so I tried DateTimeFormatter.ofPattern("dd. MM") but that obviously hardcodes order and dot, which wont make american users happy. (who expect slashes and month first)
How can I internationalize a pattern? Is there some abstract syntax for separators etc?
Well, as Ole pointed out there is no 100% satisfying solution using java.time only. But my library Time4J has found a solution based on the data of the CLDR repository (ICU4J also gives support) using the type AnnualDate (as replacement for MonthDay):
LocalDate yourLocalDate = ...;
MonthDay md = MonthDay.from(yourLocalDate);
AnnualDate ad = AnnualDate.from(md);
ChronoFormatter<AnnualDate> usStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.US, AnnualDate.chronology());
ChronoFormatter<AnnualDate> germanStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.GERMANY, AnnualDate.chronology());
System.out.println("US-format: " + usStyle.format(ad)); // US-format: 12/31
System.out.println("German: " + germanStyle.format(ad)); // German: 31.12.
I don’t think that a solution can be made that gives 100 % satisfactory results for all locales. Let’s give it a shot anyway.
Locale formattingLocale = Locale.getDefault(Locale.Category.FORMAT);
String formatPattern = DateTimeFormatterBuilder.getLocalizedDateTimePattern(
FormatStyle.SHORT, null, IsoChronology.INSTANCE, formattingLocale);
// If year comes first, remove it and all punctuation and space before and after it
formatPattern = formatPattern.replaceFirst("^\\W*[yu]+\\W*", "")
// If year comes last and is preceded by a space somewhere, break at the space
// (preserve any punctuation before the space)
.replaceFirst("\\s\\W*[yu]+\\W*$", "")
// Otherwise if year comes last, remove it and all punctuation and space before and after it
.replaceFirst("\\W*[yu]+\\W*$", "");
DateTimeFormatter monthDayFormatter
= DateTimeFormatter.ofPattern(formatPattern, formattingLocale);
For comparison I am printing a date both using the normal formatter with year from your question and using my prepared formatter.
LocalDate exampleDate = LocalDate.of(2020, Month.DECEMBER, 31);
System.out.format(formattingLocale, "%-11s %s%n",
exampleDate.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)),
exampleDate.format(monthDayFormatter));
Output in French locale (Locale.FRENCH):
31/12/2020 31/12
In Locale.GERMAN:
31.12.20 31.12
Edit: My German girl friend informs me that this is wrong. We should always write a dot after each of the two numbers because both are ordinal numbers. Meno Hochschild, the German author of the other answer, also produces 31.12. with two dots for German.
In Locale.US:
12/31/20 12/31
It might make American users happy. In Swedish (Locale.forLanguageTag("sv")):
2020-12-31 12-31
In a comment I mentioned Bulgarian (bg):
31.12.20 г. 31.12
As far as I have understood, “г.” (Cyrillic g and a dot) is an abbreviation of a word that means year, so when leaving out the year, we should probably leave this abbreviation out too. I’m in doubt whether we ought to include the dot after 12.
Finally Hungarian (hr):
31. 12. 2020. 31. 12.
How the code works: We are first inquiring DateTimeFormatterBuilder about the short date format pattern for the locale. I assume that this is the pattern that your formatter from the question is also using behind the scenes (haven’t checked). I then use different regular expressions to remove the year from different variants, see the comments in the code. Year may be represented by y or u, so I take both into account (in practice y is used). Now it’s trivial to build a new formatter from the modified pattern. For the Bulgarian: from my point of view there is an error in Java regular expressions, they don’t recognize Cyrillic letters as word characters, which is why г was removed too (the error is in documentation too, it claims that a word character is [a-zA-Z_0-9]).
We were lucky, though, in our case it produces the result that I wanted.
If you’re happy with a 90 % solution, this would be my suggestion, and I hope you can modify it to any needs your users in some locale may have.
Link: Documentation of Java regular expressions (regex)

SimpleDateFormat error while parsing a date yyyyMMdd format java [duplicate]

In java.util.Calendar, January is defined as month 0, not month 1. Is there any specific reason to that ?
I have seen many people getting confused about that...
It's just part of the horrendous mess which is the Java date/time API. Listing what's wrong with it would take a very long time (and I'm sure I don't know half of the problems). Admittedly working with dates and times is tricky, but aaargh anyway.
Do yourself a favour and use Joda Time instead, or possibly JSR-310.
EDIT: As for the reasons why - as noted in other answers, it could well be due to old C APIs, or just a general feeling of starting everything from 0... except that days start with 1, of course. I doubt whether anyone outside the original implementation team could really state reasons - but again, I'd urge readers not to worry so much about why bad decisions were taken, as to look at the whole gamut of nastiness in java.util.Calendar and find something better.
One point which is in favour of using 0-based indexes is that it makes things like "arrays of names" easier:
// I "know" there are 12 months
String[] monthNames = new String[12]; // and populate...
String name = monthNames[calendar.get(Calendar.MONTH)];
Of course, this fails as soon as you get a calendar with 13 months... but at least the size specified is the number of months you expect.
This isn't a good reason, but it's a reason...
EDIT: As a comment sort of requests some ideas about what I think is wrong with Date/Calendar:
Surprising bases (1900 as the year base in Date, admittedly for deprecated constructors; 0 as the month base in both)
Mutability - using immutable types makes it much simpler to work with what are really effectively values
An insufficient set of types: it's nice to have Date and Calendar as different things,
but the separation of "local" vs "zoned" values is missing, as is date/time vs date vs time
An API which leads to ugly code with magic constants, instead of clearly named methods
An API which is very hard to reason about - all the business about when things are recomputed etc
The use of parameterless constructors to default to "now", which leads to hard-to-test code
The Date.toString() implementation which always uses the system local time zone (that's confused many Stack Overflow users before now)
Because doing math with months is much easier.
1 month after December is January, but to figure this out normally you would have to take the month number and do math
12 + 1 = 13 // What month is 13?
I know! I can fix this quickly by using a modulus of 12.
(12 + 1) % 12 = 1
This works just fine for 11 months until November...
(11 + 1) % 12 = 0 // What month is 0?
You can make all of this work again by subtracting 1 before you add the month, then do your modulus and finally add 1 back again... aka work around an underlying problem.
((11 - 1 + 1) % 12) + 1 = 12 // Lots of magical numbers!
Now let's think about the problem with months 0 - 11.
(0 + 1) % 12 = 1 // February
(1 + 1) % 12 = 2 // March
(2 + 1) % 12 = 3 // April
(3 + 1) % 12 = 4 // May
(4 + 1) % 12 = 5 // June
(5 + 1) % 12 = 6 // July
(6 + 1) % 12 = 7 // August
(7 + 1) % 12 = 8 // September
(8 + 1) % 12 = 9 // October
(9 + 1) % 12 = 10 // November
(10 + 1) % 12 = 11 // December
(11 + 1) % 12 = 0 // January
All of the months work the same and a work around isn't necessary.
C based languages copy C to some degree. The tm structure (defined in time.h) has an integer field tm_mon with the (commented) range of 0-11.
C based languages start arrays at index 0. So this was convenient for outputting a string in an array of month names, with tm_mon as the index.
There has been a lot of answers to this, but I will give my view on the subject anyway.
The reason behind this odd behavior, as stated previously, comes from the POSIX C time.h where the months were stored in an int with the range 0-11.
To explain why, look at it like this; years and days are considered numbers in spoken language, but months have their own names. So because January is the first month it will be stored as offset 0, the first array element. monthname[JANUARY] would be "January". The first month in the year is the first month array element.
The day numbers on the other hand, since they do not have names, storing them in an int as 0-30 would be confusing, add a lot of day+1 instructions for outputting and, of course, be prone to alot of bugs.
That being said, the inconsistency is confusing, especially in javascript (which also has inherited this "feature"), a scripting language where this should be abstracted far away from the langague.
TL;DR: Because months have names and days of the month do not.
In Java 8, there is a new Date/Time API JSR 310 that is more sane. The spec lead is the same as the primary author of JodaTime and they share many similar concepts and patterns.
I'd say laziness. Arrays start at 0 (everyone knows that); the months of the year are an array, which leads me to believe that some engineer at Sun just didn't bother to put this one little nicety into the Java code.
Probably because C's "struct tm" does the same.
Because programmers are obsessed with 0-based indexes. OK, it's a bit more complicated than that: it makes more sense when you're working with lower-level logic to use 0-based indexing. But by and large, I'll still stick with my first sentence.
java.time.Month
Java provides you another way to use 1 based indexes for months. Use the java.time.Month enum. One object is predefined for each of the twelve months. They have numbers assigned to each 1-12 for January-December; call getValue for the number.
Make use of Month.JULY (Gives you 7)
instead of Calendar.JULY (Gives you 6).
(import java.time.*;)
Personally, I took the strangeness of the Java calendar API as an indication that I needed to divorce myself from the Gregorian-centric mindset and try to program more agnostically in that respect. Specifically, I learned once again to avoid hardcoded constants for things like months.
Which of the following is more likely to be correct?
if (date.getMonth() == 3) out.print("March");
if (date.getMonth() == Calendar.MARCH) out.print("March");
This illustrates one thing that irks me a little about Joda Time - it may encourage programmers to think in terms of hardcoded constants. (Only a little, though. It's not as if Joda is forcing programmers to program badly.)
For me, nobody explains it better than mindpro.com:
Gotchas
java.util.GregorianCalendar has far fewer bugs and gotchas than the
old java.util.Date class but it is still no picnic.
Had there been programmers when Daylight Saving Time was first
proposed, they would have vetoed it as insane and intractable. With
daylight saving, there is a fundamental ambiguity. In the fall when
you set your clocks back one hour at 2 AM there are two different
instants in time both called 1:30 AM local time. You can tell them
apart only if you record whether you intended daylight saving or
standard time with the reading.
Unfortunately, there is no way to tell GregorianCalendar which you
intended. You must resort to telling it the local time with the dummy
UTC TimeZone to avoid the ambiguity. Programmers usually close their
eyes to this problem and just hope nobody does anything during this
hour.
Millennium bug. The bugs are still not out of the Calendar classes.
Even in JDK (Java Development Kit) 1.3 there is a 2001 bug. Consider
the following code:
GregorianCalendar gc = new GregorianCalendar();
gc.setLenient( false );
/* Bug only manifests if lenient set false */
gc.set( 2001, 1, 1, 1, 0, 0 );
int year = gc.get ( Calendar.YEAR );
/* throws exception */
The bug disappears at 7AM on 2001/01/01 for MST.
GregorianCalendar is controlled by a giant of pile of untyped int
magic constants. This technique totally destroys any hope of
compile-time error checking. For example to get the month you use
GregorianCalendar. get(Calendar.MONTH));
GregorianCalendar has the raw
GregorianCalendar.get(Calendar.ZONE_OFFSET) and the daylight savings
GregorianCalendar. get( Calendar. DST_OFFSET), but no way to get the
actual time zone offset being used. You must get these two separately
and add them together.
GregorianCalendar.set( year, month, day, hour, minute) does not set
the seconds to 0.
DateFormat and GregorianCalendar do not mesh properly. You must
specify the Calendar twice, once indirectly as a Date.
If the user has not configured his time zone correctly it will default
quietly to either PST or GMT.
In GregorianCalendar, Months are numbered starting at January=0,
rather than 1 as everyone else on the planet does. Yet days start at 1
as do days of the week with Sunday=1, Monday=2,… Saturday=7. Yet
DateFormat. parse behaves in the traditional way with January=1.
The true reason why
You would think that when we deprecated most of Date and added the new
Calendar class, we would have fixed Date's biggest annoyance: the fact
that January is month 0. We certainly should have, but unfortunately
we didn't. We were afraid that programmers would be confused if Date
used zero-based months and Calendar used one-based months. And a few
programmers probably would have been. But in hindsight, the fact that
Calendar is still zero-based has caused an enormous amount of
confusion, and it was probably the biggest single mistake in the Java
international API's.
Quoted from International Calendars in Java by Laura Werner, link at the bottom.
The better alternative: java.time
This may just be repeating what others have said, throw the old and poorly designed Calendar class overboard and use java.time, the modern Java date and time API. There months are consistently sanely numbered from 1 for January through 12 for December.
If you are getting a Calendar from a legacy API not yet upgraded to java.time, the first thing to do is to convert to a modern ZonedDateTime. Depending on your needs you may do further conversions from there. In most of the world the Calendar object you get will virtually always be an instance of the GregorianCalendar subclass (since the Calendar class itself is abstract). To demonstreate:
Calendar oldfashionedCalendarObject = Calendar.getInstance();
ZonedDateTime zdt
= ((GregorianCalendar) oldfashionedCalendarObject).toZonedDateTime();
System.out.println(zdt);
System.out.format("Month is %d or %s%n", zdt.getMonthValue(), zdt.getMonth());
Output when I ran just now in my time zone:
2021-03-17T23:18:47.761+01:00[Europe/Copenhagen]
Month is 3 or MARCH
Links
International Calendars in Java by Laura Werner
Oracle tutorial: Date Time explaining how to use java.time.
tl;dr
Month.FEBRUARY.getValue() // February → 2.
2
Details
The Answer by Jon Skeet is correct.
Now we have a modern replacement for those troublesome old legacy date-time classes: the java.time classes.
java.time.Month
Among those classes is the Month enum. An enum carries one or more predefined objects, objects that are automatically instantiated when the class loads. On Month we have a dozen such objects, each given a name: JANUARY, FEBRUARY, MARCH, and so on. Each of those is a static final public class constant. You can use and pass these objects anywhere in your code. Example: someMethod( Month.AUGUST )
Fortunately, they have sane numbering, 1-12 where 1 is January and 12 is December.
Get a Month object for a particular month number (1-12).
Month month = Month.of( 2 ); // 2 → February.
Going the other direction, ask a Month object for its month number.
int monthNumber = Month.FEBRUARY.getValue(); // February → 2.
Many other handy methods on this class, such as knowing the number of days in each month. The class can even generate a localized name of the month.
You can get the localized name of the month, in various lengths or abbreviations.
String output =
Month.FEBRUARY.getDisplayName(
TextStyle.FULL ,
Locale.CANADA_FRENCH
);
février
Also, you should pass objects of this enum around your code base rather than mere integer numbers. Doing so provides type-safety, ensures a valid range of values, and makes your code more self-documenting. See Oracle Tutorial if unfamiliar with the surprisingly powerful enum facility in Java.
You also may find useful the Year and YearMonth classes.
About java.time
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, .Calendar, & java.text.SimpleDateFormat.
The Joda-Time project, now in maintenance mode, advises migration to java.time.
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
Where to obtain the java.time classes?
Java SE 8 and SE 9 and later
Built-in.
Part of the standard Java API with a bundled implementation.
Java 9 adds some minor features and fixes.
Java SE 6 and SE 7
Much of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
Android
The ThreeTenABP project adapts ThreeTen-Backport (mentioned above) for Android specifically.
See How to use….
The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.
Set the month to Calendar.MARCH, or compare to see if it == Calendar.JUNE, for example.
The Date and Calendar classes date back to the very early days of Java, when folks were still figuring things out, and they are widely regarded as not very well designed.
If Calendar were created today with the same design, rather than ints for Calendar.JUNE, etc., they'd use enums.
It isn't exactly defined as zero per se, it's defined as Calendar.January. It is the problem of using ints as constants instead of enums. Calendar.January == 0.
Because language writing is harder than it looks, and handling time in particular is a lot harder than most people think. For a small part of the problem (in reality, not Java), see the YouTube video "The Problem with Time & Timezones - Computerphile" at https://www.youtube.com/watch?v=-5wpm-gesOY. Don't be surprised if your head falls off from laughing in confusion.
In addition to DannySmurf's answer of laziness, I'll add that it's to encourage you to use the constants, such as Calendar.JANUARY.
Because everything starts with 0. This is a basic fact of programming in Java. If one thing were to deviate from that, then that would lead to a whole slue of confusion. Let's not argue the formation of them and code with them.

Joda api not return proper islamic date

I am converting Gregorian dates to Islamic dates. I am setting its leap year pattern to Indian leap year but it is not working.
I make for loop and Gregorian date which takes current month and count is days and convert it to Islamic date. What i want
Here is my code
for(int i=0;i<maxDay;i++)
{
eng.add(String.valueOf(i+1));
DateTime dtISO=new DateTime(currentY,currentMonth+1,i+1,0,0);
DateTimeZone asia= DateTimeZone.forID("Asia/Riyadh");
DateTime dtIslamic = dtISO.withChronology(
IslamicChronology.getInstance(
asia,
IslamicChronology.LEAP_YEAR_INDIAN));
String islamicDateArr="";
split=dtIslamic.toString().split("T");
split=split[0].split("-");
if(i==0 || Integer.parseInt(split[2])==1)
{
isl.add(String.valueOf(split[2]+" "+islamicMonths[Integer.parseInt(split[1])-1]));
continue;
}
isl.add(String.valueOf(split[2]));
}
Your code seems to be correct.
Since you told me to have tried every of the four leap year patterns of Joda-Time without success I get the feeling there might be a bug or just a missing feature because among all supported leap year patterns there should be a pair of patterns which are different by one day (and you observe one day difference).
Other people have already submitted bug issues. See here:
issue 163
issue 107
As you can see it will be hard for you to convince the project leader to solve your problem for you. Maybe he is right when saying that Joda-Time has not a bug, but is just not complete.
Keep in mind that according to R.H. van Gent the calculated islamic calendar algorithm knows at least 8 instead of 4 variants because there are 4 intercalary schemes and (for each scheme) two variations depending on the precise start of the islamic epoch (Thursday versus Friday epoch). So Joda-Time is just not supporting all variants.
What are the alternatives to Joda-Time?
a) Java-8 and its backport Threeten-BP (for Java-6+7) support the table-driven Umalqura-calendar of Saudi-Arabia (sighting-based). I am not sure if this solves your problem however (if not then you might supply a hand-written file containing the table data relevant for you - a lot of work). Note that both libraries don't support algorithm-based islamic calendars.
b) Some people have written their own home-grown workarounds. I have found this hijri converter via Google. No idea if this works for you.
c) IBM offers a Hijri calendar in its ICU-project. It offers different leap year patterns than Joda-Time. Maybe it helps.
Side note: As you can see the current Java-support for Hijri calendars is not really satisfying. That is why I decided to set up a new implementation in my own library Time4J. It is scheduled for maybe 2-3 months later in autumn 2015.

Time interval localization in Java

I need to represent a time interval as localized string like this: 10 hours 25 minutes 1 second depending on Locale.
It is pretty easy to realize by hand in English:
String hourStr = hours == 1 ? "hour" : "hours" etc.
But I need some "out-of-the-box" Java (maybe Java8) mechanism according to rules of different languages.
Does Java have it, or I need to realize it for each Locale used in app by myself?
Look at Joda-Time. It supports the languages English, Danish, Dutch, French, German, Japanese, Polish, Portuguese and Spanish with version 2.5.
Period period = new Period(new LocalDate(2013, 4, 11), LocalDate.now());
PeriodFormatter formatter = PeriodFormat.wordBased(Locale.GERMANY);
System.out.println(formatter.print(period)); // output: 1 Jahr, 2 Monate und 3 Wochen
formatter = formatter.withLocale(Locale.ENGLISH);
System.out.println(formatter.print(period)); // output: 1 Jahr, 2 Monate und 3 Wochen (bug???)
formatter = PeriodFormat.wordBased(Locale.ENGLISH);
System.out.println(formatter.print(period)); // output: 1 year, 2 months and 3 weeks
You might to adjust the interpunctuation chars however. To do this you might need to copy and edit the messages-resource-files in your classpath which have this format (here english variant):
PeriodFormat.space=\
PeriodFormat.comma=,
PeriodFormat.commandand=,and
PeriodFormat.commaspaceand=, and
PeriodFormat.commaspace=,
PeriodFormat.spaceandspace=\ and
PeriodFormat.year=\ year
PeriodFormat.years=\ years
PeriodFormat.month=\ month
PeriodFormat.months=\ months
PeriodFormat.week=\ week
PeriodFormat.weeks=\ weeks
PeriodFormat.day=\ day
PeriodFormat.days=\ days
PeriodFormat.hour=\ hour
PeriodFormat.hours=\ hours
PeriodFormat.minute=\ minute
PeriodFormat.minutes=\ minutes
PeriodFormat.second=\ second
PeriodFormat.seconds=\ seconds
PeriodFormat.millisecond=\ millisecond
PeriodFormat.milliseconds=\ milliseconds
Since version 2.5 it might be also possible to apply complex regular expressions to model more complex plural rules. Personally I see it as user-unfriendly, and regular expressions might not be sufficient for languages like Arabic (my first impression). There are also other limitations with localization, see this pull request in debate.
Side notice: Java 8 is definitely not able to do localized duration formatting.
UPDATE from 2015-08-26:
With the version of my library Time4J-v4.3 (available in Maven Central) following more powerful solution is possible which supports currently 45 languages:
import static net.time4j.CalendarUnit.*;
import static net.time4j.ClockUnit.*;
// the input for creating the duration (in Joda-Time called Period)
IsoUnit[] units = {YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS};
PlainTimestamp start = PlainDate.of(2013, 4, 11).atTime(13, 45, 21);
PlainTimestamp end = SystemClock.inLocalView().now();
// create the duration
Duration<?> duration = Duration.in(units).between(start, end);
// print the duration (here not abbreviated, but with full unit names)
String s = PrettyTime.of(Locale.US).print(duration, TextWidth.WIDE);
System.out.println(s);
// example output: 1 year, 5 months, 7 days, 3 hours, 25 minutes, and 49 seconds
Why is Time4J better for your problem?
It has a more expressive way to say in which units a duration should be calculated.
It supports 45 languages.
It supports the sometimes complex plural rules of languages inclusive right-to-left scripts like in Arabic without any need for manual configuration
It supports locale-dependent list patterns (usage of comma, space or words like "and")
It supports 3 different text widths: WIDE, ABBREVIATED (SHORT) and NARROW
The interoperability with Java-8 is better because Java-8-types like java.time.Period or java.time.Duration are understood by Time4J.

Parsing String to Dates - Java

This is the problem:
I have some .csv files with travels info, and the dates appear like strings (each line for one travel):
"All Mondays from January-May and October-December. All days from June To September"
"All Fridays from February to June"
"Monday, Friday and Saturday and Sunday from 10 January to 30 April"
"from 01 of November to 30 April. All days except fridays from 2 to 24 of november and sunday from 2 to 30 of december"
"All sundays from 02 december to 28 april"
"5, 12, 20 of march, 11, 18 of april, 2, 16, 30 of may, 6, 13, 27 june"
"All saturdays from February to June, and from September to December"
"1 to 17 of december, 1 to 31 of january"
"All mondays from February to november"
I must parse the strings to Dates, and keep them into an array for each travel.
The problem is that I don't know how to do it. Even my univesrity teachers told me that they don't know how to do so :S. I can't find/create a pattern using http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
After parsing them i have to search all travels between two dates.
But how? How to parse them? it's possible?
This requires Natural Language Processing (NLP) , see Wikipedia for an account:
http://en.wikipedia.org/wiki/Natural_language_processing.
Your problem as stated is very hard. There are many ways of representing a single date, and your examples include ranges of dates and formulae for generating dates. It sounds as if you have a limited subset of language - frequent use of "all", "from", etc.
If you are in control of the language (i.e. these are being generated by humans who comply with your documentation) then you have a chance of formalising it (although it will take a lot of work - months). If you are not in charge of it, then every time a new phrase appears you will have to add it to the specs.
I suggest you got through the file and look for stock phrases "All [weekdayname]s [from | between | until | before]". Or "in [January | February ...]". Then substitute these in in phrases. If you find this covers all the cases you may be able to extract particular phrases". But if you have anaphora like "next Tuesday" it will be much harder.
You're in the domain of NLP (Natural Language Processing), what is possible or impossible is fuzzy in this domain. From a fast Google search, I've found that the Natty Date Parser might be useful for you.
For more theory background on NLP, you might be interested in Natural Language Processing course of Stanford University on Coursera (at the moment the course is not open for enrolment, but lectures are available for free.
You can also use a set of strict regular expressions that would match only one of your possible cases and apply them from the most restrictive to the most relaxed.
The first thing I would define to attack your problem is what you expect as an output of your method, since in some cases it's a single date, in some cases an interval, in some others multiple intervals.

Categories