here's my code:
String t1="postby <span title=\"2011-4-5 17:22\">yesterday 17:22</span>";
String t2="postby 2010-11-12 10:02";
I want get 2011-4-5 17:22 , 2010-11-12 10:02 from t1 or t2,using one regex expression
(input t1 or t2,output the date)
how to do? (please give to me some example code,thanks)
\d{4}-\d{1,2}-\d{1,2} \d{2}:\d{2}
A few notes:
you will have to escape the slashes in a string: String pattern = "\\d{4}-\\d{1,2}....."
\d means "digit" (0-9)
{x} means "x times"
{x,y} means "at least x, but not more than y times"
Reference: java.util.regex.Pattern
How many false matches will you allow? Bozho already suggested the pattern
\d{4}-\d{1,2}-\d{1,2} \d{2}:\d{2}
But that matches the following questionable cases: 0000-1-1 00:00 (there is no year zero), 2011-0-1 00:00 (there is no month zero), 2011-13-1 00:00 (there is no month 13), 2011-1-32 00:00 (there is no month-day 32) 2011-12-31 24:00 (there is at most one leap second) and 2011-12-31 23:61 (there is at most one leap seond).
You are wanting to parse date-times that are almost, but not quite, in ISO-8601 format. If you can, please use that international standard format.
In one of my programs (a shell script using grep), I've used the following regular expression:
^20[0-9][0-9]-[01][0-9]-[0-3][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]UTC$
I had an extra T and UTC to deal with, was interested only in dates in this century, and parsed with seconds precision. I see I was not so restrictive on hour and minute values, probably because traditional C/C++ conversions can handle them.
I guess you therefore could use something like the following:
\d{4}-[01]\d-[0-3]\d [0-2]\d:[0-6]\d
Related
Im trying to format a date without a year (just day and month, e.g 12.10)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT) still yield year for me (12.10.20).
so I tried DateTimeFormatter.ofPattern("dd. MM") but that obviously hardcodes order and dot, which wont make american users happy. (who expect slashes and month first)
How can I internationalize a pattern? Is there some abstract syntax for separators etc?
Well, as Ole pointed out there is no 100% satisfying solution using java.time only. But my library Time4J has found a solution based on the data of the CLDR repository (ICU4J also gives support) using the type AnnualDate (as replacement for MonthDay):
LocalDate yourLocalDate = ...;
MonthDay md = MonthDay.from(yourLocalDate);
AnnualDate ad = AnnualDate.from(md);
ChronoFormatter<AnnualDate> usStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.US, AnnualDate.chronology());
ChronoFormatter<AnnualDate> germanStyle =
ChronoFormatter.ofStyle(DisplayMode.SHORT, Locale.GERMANY, AnnualDate.chronology());
System.out.println("US-format: " + usStyle.format(ad)); // US-format: 12/31
System.out.println("German: " + germanStyle.format(ad)); // German: 31.12.
I don’t think that a solution can be made that gives 100 % satisfactory results for all locales. Let’s give it a shot anyway.
Locale formattingLocale = Locale.getDefault(Locale.Category.FORMAT);
String formatPattern = DateTimeFormatterBuilder.getLocalizedDateTimePattern(
FormatStyle.SHORT, null, IsoChronology.INSTANCE, formattingLocale);
// If year comes first, remove it and all punctuation and space before and after it
formatPattern = formatPattern.replaceFirst("^\\W*[yu]+\\W*", "")
// If year comes last and is preceded by a space somewhere, break at the space
// (preserve any punctuation before the space)
.replaceFirst("\\s\\W*[yu]+\\W*$", "")
// Otherwise if year comes last, remove it and all punctuation and space before and after it
.replaceFirst("\\W*[yu]+\\W*$", "");
DateTimeFormatter monthDayFormatter
= DateTimeFormatter.ofPattern(formatPattern, formattingLocale);
For comparison I am printing a date both using the normal formatter with year from your question and using my prepared formatter.
LocalDate exampleDate = LocalDate.of(2020, Month.DECEMBER, 31);
System.out.format(formattingLocale, "%-11s %s%n",
exampleDate.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT)),
exampleDate.format(monthDayFormatter));
Output in French locale (Locale.FRENCH):
31/12/2020 31/12
In Locale.GERMAN:
31.12.20 31.12
Edit: My German girl friend informs me that this is wrong. We should always write a dot after each of the two numbers because both are ordinal numbers. Meno Hochschild, the German author of the other answer, also produces 31.12. with two dots for German.
In Locale.US:
12/31/20 12/31
It might make American users happy. In Swedish (Locale.forLanguageTag("sv")):
2020-12-31 12-31
In a comment I mentioned Bulgarian (bg):
31.12.20 г. 31.12
As far as I have understood, “г.” (Cyrillic g and a dot) is an abbreviation of a word that means year, so when leaving out the year, we should probably leave this abbreviation out too. I’m in doubt whether we ought to include the dot after 12.
Finally Hungarian (hr):
31. 12. 2020. 31. 12.
How the code works: We are first inquiring DateTimeFormatterBuilder about the short date format pattern for the locale. I assume that this is the pattern that your formatter from the question is also using behind the scenes (haven’t checked). I then use different regular expressions to remove the year from different variants, see the comments in the code. Year may be represented by y or u, so I take both into account (in practice y is used). Now it’s trivial to build a new formatter from the modified pattern. For the Bulgarian: from my point of view there is an error in Java regular expressions, they don’t recognize Cyrillic letters as word characters, which is why г was removed too (the error is in documentation too, it claims that a word character is [a-zA-Z_0-9]).
We were lucky, though, in our case it produces the result that I wanted.
If you’re happy with a 90 % solution, this would be my suggestion, and I hope you can modify it to any needs your users in some locale may have.
Link: Documentation of Java regular expressions (regex)
Is this a bug or a feature?
The DateTimeFormatter JavaDoc explicitly states that when I use the OOOO pattern in my formatter, the full form of localized timezone should be used (emphasis mine):
Four letters outputs the full form, which is localized offset text, such as 'GMT, with 2-digit hour and minute field, optional second field if non-zero, and colon, for example 'GMT+08:00'.
But in case the time is in GMT+0:
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("EEE yyyy.MM.dd HH:mm:ss.SSS OOOO");
String timestamp = OffsetDateTime.ofInstant(Instant.now(), ZoneOffset.UTC).format(formatter);
System.out.println(timestamp);
This is the output:
Mon 2019.02.25 22:30:00.586 GMT
Expected:
Mon 2019.02.25 22:30:00.586 GMT+00:00
A bug? We seem to agree that the observed behaviour does not agree with the documentation (or at least you will have to do a very creative reading of the documentation to make it match).
A feature? As far as I can tell the observed behaviour is a conscious decision at some point. The source code for the private inner class LocalizedOffsetIdPrinterParser inside DateTimeFormatterBuilder contains if (totalSecs != 0) { before printing hours, minutes and seconds. It doesn’t look like a copy-paste error since the exact same code line is nowhere else in the file (offset 0 is treated specially in a number of places, but I am not aware of anywhere else it is left out completely).
On Java 8 format pattern OOOO neither parses GMT alone nor GMT+00:00, which must be a bug. It’s fixed in Java 11. On Java 11 OOOO parses GMT alone just fine, so they must have considered this acceptable (it parses GMT+00:00 and GMT-00:00 too, though).
You may consider filing a bug with Oracle and/or OpenJDK (I’m unsure about the right place these days). Whether they will reject it, fix the documentation or fix the code — I dare not try to guess.
Workaround: 'GMT'xxx
Anyway, I want my +00:00 somehow.
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("EEE yyyy.MM.dd HH:mm:ss.SSS 'GMT'xxx");
Wed 2019.02.27 08:46:43.226 GMT+00:00
This is not a bug.
The Java specification follows LDML (CLDR). See here for the definition of "OOOO" and here for the definition of "localized GMT format":
Localized GMT format: A constant, specific offset from GMT (or UTC), which may be in a translated form. There are two styles for this. The first is used when there is an explicit non-zero offset from GMT; this style is specified by the element and element. The long format always uses 2-digit hours field and minutes field, with optional 2-digit seconds field. The short format is intended for the shortest representation and uses hour fields without leading zero, with optional 2-digit minutes and seconds fields. The digits used for hours, minutes and seconds fields in this format are the locale's default decimal digits:
"GMT+03:30" (long)
"GMT+3:30" (short)
"UTC-03.00" (long)
"UTC-3" (short)
"Гриинуич+03:30" (long)
Otherwise (when the offset from GMT is zero, referring to GMT itself) the style specified by the element is used:
"GMT"
"UTC"
"Гриинуич"
Since the offset from GMT is zero, the bottom clause applies, and the output is just "GMT" (or whatever is the correct localized text for your locale).
Hopefully the Javadoc can be clarified in a future release.
I try this code:
import java.time.*;
...
LocalDateTime now = LocalDateTime.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern(
"dd-MMM-yyyy HH:mm:ss.n");
System.out.format("Now = %s %n", now.format(formatter));
in order to get an output with subsecond information
Now = 12-Apr-2018 14:47:38.039578300
Unfortunately, in the first 100 ms of every second, the leading zero of the subsecond information is omitted and I get a very misleading output Now = 12-Apr-2018 14:47:38.39578300 , which can be easily misinterpreted as about 38.4 sec, or 396 ms after the full second, instead of the real 38.04 sec.
The only workarond I found, is a format of ss.nnnnnnnnn with exactly 9 n, to get my desired output.
Edit:
There is something nicer, which I missed in this area when posting this question.
I'm not really interested in Nanoseconds, but a fractional part of seconds (about ms resolution), is what I'm really looking for.
Then, this one is much more suitable
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss.SSS");
The capital S indicates the number of subsecond digits, including leading zeros of course.
To get better control of fractions you can use the builder, not just pattern letters. Specifically appendFraction.
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("dd-MMM-yyyy HH:mm:ss")
.appendFraction(ChronoField.NANO_OF_SECOND, 1, 9, true)
.toFormatter();
Pattern letter "n" is rarely what you want.
If you want just ms resolution, you can use S instead of n:
DateTimeFormatter formatter = DateTimeFormatter
.ofPattern("dd-MMM-yyyy HH:mm:ss.SSS", Locale.US);
This will print just the first 3 fractional digits (which is ms resolution):
12-Apr-2018 14:47:38.039
Note that I used a java.util.Locale to define the language to be used for the month name. That's becase the JVM might not always be set to English, and the results can't be what you expect. Ex: my JVM is set to Portuguese and the month name is "abr". Setting a specific locale eliminates this problem.
To print all the 9 digits, using either nnnnnnnnn or SSSSSSSSS will work.
We can see why it behaves like this when we check the javadoc. S and n have different presentations:
Symbol Meaning Presentation Examples
------ ------- ------------ -------
S fraction-of-second fraction 978
n nano-of-second number 987654321
S is a fraction, while n is a number. The docs tell you the difference:
Number: If the count of letters is one, then the value is output using the minimum number of digits and without padding.
Fraction: Outputs the nano-of-second field as a fraction-of-second. The nano-of-second value has nine digits, thus the count of pattern letters is from 1 to 9. If it is less than 9, then the nano-of-second value is truncated, with only the most significant digits being output.
So, just 1 n will print the value without padding (without the 0 in the beginning), leading to the wrong output you've got, while SSS will give you the correct output.
I don't believe there is anything better than nnnnnnnnn. As per DateTimeFormatter docs for n pattern the leading zeros will be truncated if less than 9 pattern letters are used:
Fraction: Outputs the nano-of-second field as a fraction-of-second.
The nano-of-second value has nine digits, thus the count of pattern
letters is from 1 to 9. If it is less than 9, then the nano-of-second
value is truncated, with only the most significant digits being
output.
n and N are the only nano fields supported by DateTimeFormatter.
I having problems to generate a regex for a range of dates.
For example this range [2015-11-17, 2017-10-05], How can I do? to validate if having a date belogns to that range using regex.
And second question if is possible to have a generic regex which I can use for several range of date, only replacing few values in the regex with the new ranges I have, and the regex continues validating a range of dates , but with the new ranges. Thanks in advance for help =)
Do not use Regex
As the comments state, Regex is not appropriate for a range of dates, nor any span of time. Regex is intended to be “dumb” in the sense of looking only at the syntax of the text not the semantics (the meaning).
java.time
Use the java.time framework built into Java 8 and later.
Parse your strings into LocalDate objects.
LocalDate start = LocalDate.parse( "2015-11-17" );
Compare by calling the isEqual, isBefore, and isAfter methods.
Note that we commonly use the Half-Open approach in date-time work where the beginning is inclusive while the ending is exclusive.
These issues are covered already in many other Questions and Answers on Stack Overflow. So I have abbreviated my discussion here.
Just for completeness: You can actually use regular expressions to recognize any finite set of strings, such as a specific date range, however it would be more of an academic exercise than an actual recommended usage. However, if you happen to be programming some arcane hardware it could actually be necessary.
Assuming the input is always a valid date in the given format, the regex for your example could consist of:
2015-0[1-9].* - 2015 January to September
2015-10.* - 2015 October
2015-11-0[1-9] - 2015 November 1 to 9
2015-11-1[0-7] - 2015 November 10 to 17
2016.* - all dates of 2016
Add analogously for 2017, make a disjunction using | (a|b|c|...), apply escaping of the regex implementation you use and then you have your date checker. If the input is not guaranteed to be a valid date it gets a bit more complicated but is still possible.
By default, the toString method of Instant uses the DateTimeFormatter.ISO_INSTANT formatter. That formatter won’t print the digits for fraction-of-second if they happen to be 0.
java-time examples:
2015-10-08T17:13:07.589Z
2015-10-08T17:13:07Z
Joda-Time examples (and what I'd expect from java.time):
2015-10-08T17:13:07.589Z
2015-10-08T17:13:07.000Z
This is really frustrating to parse in some systems. Elasticsearch was the first problem I encountered, there's no pre-defined format that supports optional millis, but I can probably work around that with a custom format. The default just seems wrong.
It appears that you can’t really build your own format string for Instants anyway. Is the only option implementing my own java.time.format.DateTimeFormatterBuilder.InstantPrinterParser?
Just create a DateTimeFormatter that keeps three fractional digits.
DateTimeFormatter formatter = new DateTimeFormatterBuilder().appendInstant(3).toFormatter();
Then use it. For example:
System.out.println(formatter.format(Instant.now()));
System.out.println(formatter.format(Instant.now().truncatedTo(ChronoUnit.SECONDS)));
…prints (at the time I run it):
2015-10-08T21:26:16.571Z
2015-10-08T21:26:16.000Z
Excerpt of the class doc:
… The fractionalDigits parameter allows the output of the fractional second to be controlled. Specifying zero will cause no fractional digits to be output. From 1 to 9 will output an increasing number of digits, using zero right-padding if necessary. The special value -1 is used to output as many digits as necessary to avoid any trailing zeroes. …