Can't get regex to work - java

I am trying to figure out how to write an regex that will match a time. The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on. What i currently have is this:
\\d{2}:?\\d{0,2}-{1}\\d{2}:?\\d{0,2}
which does work until a date comes along. This regex will capture if a string like this comes 2013-11-05. I don't want it to find dates. I know i should use Lookbehind but i can't get it to work.
And i am using Jsoup Element getElementsMatchingOwnText method if that information is of any interest.
The time string is included in a html source. like this: (but with more text above and below)
<td class="text">2013-11-04</td>

Try this. Start with the base regex:
\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?
That is:
one-to-two digits, optionally followed by : and two more digits
followed by a hyphen
followed by one-to-two digits, optionally followed by : and two more digits
This matches all your core cases:
11-12
1-2
1:15-2
10-3:45
2:15-11:30
etc. Now mix in negative lookbehind and negative lookahead to invalidate matches that appear within undesired contexts. Let's invalidate the match when a digit or dash or colon appears directly to the left or right of the match:
The negative lookbehind: (?<!\d|-|:)
The negative lookahead: (?!\d|-|:)
Slap the neg-lookbehind at the beginning, and the neg-lookahead at the end, you get:
(?<!\d|-|:)(\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?)(?!\d|-|:)
or as a Java String (by request)
Pattern p = Pattern.compile("(?<!\\d|-|:)(\\d{1,2}(:\\d\\d)?-\\d{1,2}(:\\d\\d)?)(?!\\d|-|:)");
Now while the lookaround has eliminated matches within dates, you're still matching some silly things like 99:99-88:88 because \d matches any digit 0-9. You can mix more restrictive character classes into this regex to address that issue. For example, with a 12-hour clock:
For the hour part, use
(1[0-2]|0?[1-9])
instead of
\d{1,2}
For the minute part use
(0[0-9]|[1-5][0-9])
instead of
\d\d
Mixing the more restrictive character classes into the regex yields this nearly impossible to grok and maintain beast:
(?<!\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\d|-|:)
As Java code:
Pattern p = Pattern.compile("(?<!\\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\\d|-|:)");

Simple method:
((\d{2}(:\d{2})?)-?){2}
A safer; more verbose regular expression:
([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?
Example in action:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class App {
private static final String TIME_FORMAT = "%02d:%02d";
private static final String TIME_RANGE = "([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?";
public static void main(String[] args) {
String passage = "The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on.";
Pattern pattern = Pattern.compile(TIME_RANGE);
Matcher matcher = pattern.matcher(passage);
int count = 0;
while (matcher.find()) {
String time1 = formattedTime(matcher.group(1), matcher.group(3));
String time2 = formattedTime(matcher.group(4), matcher.group(6));
System.out.printf("Time #%d: %s - %s\n", count, time1, time2);
count++;
}
}
private static String formattedTime(String strHour, String strMinute) {
int intHour = parseInt(strHour);
int intMinute = parseInt(strMinute);
return String.format(TIME_FORMAT, intHour, intMinute);
}
private static int parseInt(String str) {
return str != null ? Integer.parseInt(str) : 0;
}
}
Output:
Time #0: 11:15 - 12:15
Time #1: 11:00 - 12:15
Time #2: 11:00 - 12:00

Related

Java regular expressions starts and ends with and contains

I have a file that I need to use regex to replace a specific character.
I have strings of the following format:
1234 4215 "aaa.bbb" 5215 1524
and I need to replace the periods with colons.
I know that these periods are always contained within quotation marks, so I need a regex that finds a substring that starts with '"', ends with '"', and contains "." and replace the "." with ":". Could someone shed some light?
You can use:
str = str.replaceAll("\\.(?!(([^"]*"){2})*[^"]*$)", ":");
RegEx Demo
This regex will find dots if those are inside double quotes by using a lookahead to make sure there are NOT even number of quotes after the dot.
Update
After thinking about it, your question says "period(s)" possibly more than one period in double quotes.
Here's a way to cover that scenario
public static void main(String[] args) throws Exception {
String str = "1234 \"aaa.bbb\" \"a.aa.b.bb\" 5215 1524 \"12.345.123\" \".sage.\" \".afwe\" \"....\"";
// Find all substrings in double quotes
Matcher matcher = Pattern.compile("\"(.*?)\"").matcher(str);
while (matcher.find()) {
// Extract the match
String match = matcher.group(1);
// Replace all the periods with colons
match = match.replaceAll("\\.", ":");
// Replace the original matched group with the new string
str = str.replace(matcher.group(1), match);
}
System.out.println(str);
}
Results:
1234 "aaa:bbb" "a:aa:b:bb" 5215 1524 "12:345:123" ":sage:" ":afwe" "::::"
And after testing #anubhava pattern, his produces the same results so more credit to him for simplicity (+1).
OLD ANSWER
You can try this pattern in a String.replaceAll()
"\"([^\\.]*?)(\\.)([^\\.]*?)\""
With a replacement of
"\"$1:$3\""
This essentially captures the contents, between double quotes, into groups (1-3).
Group 1 ($1) - All characters, present or not (*?), that is not a period
Group 2 ($2) - The period
Group 3 ($3) - All characters, present or not (*?), that is not a period
and replaces it with "{Group 1}:{Group 3}"
public static void main(String[] args) throws Exception {
String str = "1234 4215 \"aaa.bbb\" 5215 1524 \"12345.123\" \"sage.\" \".afwe\" \".\"";
System.out.println(str.replaceAll("\"([^\\.]*?)(\\.)([^\\.]*?)\"", "\"$1:$3\""));
}
Results:
1234 4215 "aaa:bbb" 5215 1524 "12345:123" "sage:" ":afwe" ":"

Find a number of a given number of digits between given separators

What regex/pattern can I use to find the following pattern in a string?
#nnnn:
nnnn can be any 4-digit long number as long as it is sorrounded by a hashtag and a colon.
I have tried the code below:
String string = "#8226:";
if(string.matches( ".*\\d:.*" )) {
System.out.println( "Yes" );
}
It DOES work, but it matches other strings like below:
"This is a string 1234: Hahaha!" // Outputs "Yes"
"Hello 1834: World!!!" // Outputs "Yes"
I want it to only match the pattern at the top of the question.
Can anybody tell me where did I go wrong?
It can be done with Regular Expression
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FindPattern {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("#[0-9]{4}:");
String text = "#1233:#3433:abc#3993: #a343:___#8888:ki";
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
output is:
#1233:
#3433:
#3993:
#8888:
You have already a pattern: #nnnn:. The only problem is that this is not a java compatible regular expression. Let's convert.
# and : are valid character literals, so let these untouched.
As you probably know (according to your solution), a number is denoted with the \d sequence (note, there are some alternatives, e. g. [0-9], \p{Digit}). Just replace all ns with \d:
#\d\d\d\d:
There are four equal subpatterns here, so we can shorten it with a fixed quantifier:
#\d{4}:
You can now write string.matches("#\\d{4}:"). Note that this is slow because compiles the given regex pattern every time. If this code is called frequently, I would consider using a precompiled Pattern like:
Pattern HASH_NUMBER_COLON_PATTERN = Pattern.compile("#\\d{4}:");
// ...
if (HASH_NUMBER_COLON_PATTERN.matcher(yourString).matches()) {
// ...
}
Even better to use some regular expression builder library, such as regex-builder, JavaVerbalExpressions or RegexBee. These tools can make your intention very clear. RegexBee example:
Pattern HASH_NUMBER_COLON_PATTERN = Bee
.then(Bee.fixedChar('#'))
.then(Bee.intBetween(1000, 9999))
.then(Bee.fixedChar(':'))
.toPattern()

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

How do I make a regex match for measurement units?

I'm building a small Java library which has to match units in strings. For example, if I have "300000000 m/s^2", I want it to match against "m" and "s^2".
So far, I have tried most imaginable (by me) configurations resembling (I hope it's a good start)
"[[a-zA-Z]+[\\^[\\-]?[0-9]+]?]+"
To clarify, I need something that will match letters[^[-]numbers] (where [ ] denotes non obligatory parts). That means: letters, possibly followed by an exponent which is possibly negative.
I have studied regex a little bit, but I'm really not fluent, so any help will be greatly appreciated!
Thank you very much,
EDIT:
I have just tried the first 3 replies
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
and it doesn't work... I know the code which tests the patterns work, because if I try something simple, like matching "[0-9]+" in "12345", it will match the whole string. So, I don't get what's still wrong. I'm trying with changing my brackets for parenthesis where needed at the moment...
CODE USED TO TEST:
public static void main(String[] args) {
String input = "30000 m/s^2";
// String input = "35345";
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
String regex10 = "[0-9]+";
String regex = "([a-zA-Z]+)(?:\\^\\-?[0-9]+)?";
Pattern pattern = Pattern.compile(regex3);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("MATCHES");
do {
int start = matcher.start();
int end = matcher.end();
// System.out.println(start + " " + end);
System.out.println(input.substring(start, end));
} while (matcher.find());
}
}
([a-zA-Z]+)(?:\^(-?\d+))?
You don't need to use the character class [...] if you're matching a single character. (...) here is a capturing bracket for you to extract the unit and exponent later. (?:...) is non-capturing grouping.
You're mixing the use of square brackets to denote character classes and curly brackets to group. Try this instead:
[a-zA-Z]+(\^-?[0-9]+)?
In many regular expression dialects you can use \d to mean any digit instead of [0-9].
Try
"[a-zA-Z]+(?:\\^-?[0-9]+)?"

Categories