Java Regex "-[0-9]{0,}" seems to match "-abc" - java

Regex:
"-[0-9]{0,}"
String:
"-abc"
According to the test here, that should not happen. I assume I'm doing something wrong in my code.
Code:
public static void main(String[] args) {
String s = "-abc";
String regex = "-[0-9]{0,}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
if (matcher.group().length() == 0)
break;
// get the number less the dash
int beginIndex = matcher.start();
int endIndex = matcher.end();
String number = s.substring(beginIndex + 1, endIndex);
s = s.replaceFirst(regex, "negative " + number);
}
System.out.println(s);
}
Some context: The speech synthesis program I use cannot pronounce numbers with a leading negative sign, so it must be replaced with the word "negative".

-[0-9]{0,}
means your sting must have -, then could be 0 or more numbers.
so -abc is 0 number case
you didn't specify ^ and $, so your regex matches foo-bar or lll-0 even abc- as well

{0,} has exactly the same meaning as *. You regexp thus means "a dash that can be followed by digits". -abc contains a dash, so the pattern get found.
-\d+ should suit your needs better (don't forget to escape the backslash for java: -\\d+).
If you want the whole string to match the pattern, anchor your regexp with ^ and $: ^-\d+$.

Related

Extracting a substring from String with regex in Java (with condition)

I need to extract a substring from a string using regex. The tricky (for me) part is that the string may be in one of two formats:
either LLDDDDLDDDDDDD/DDD (eg. AB1000G242424/001) or just between 1 and 7 digits (eg. 242424).
The substring I need to extract would needs to be:
If string is 7 digits or longer, then extract substring consisting of 7 digits.
Else (if string is shorter than 7 digits), then extract substring consisting of 1-6 digits.
Below is one of my tries.
String regex = ("([0-9]{7}|[0-9]{0,6})");
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
matcher = pattern.matcher("242424");
String extractedNr1 = "";
while (matcher.find()) {
extractedNr1 += matcher.group();
}
matcher = pattern.matcher("AB1000G242424/001");
String extractedNr2 = "";
while (matcher.find()) {
extractedNr2 += matcher.group();
}
System.out.println("ExtractedNr1 = " + extractedNr1);
System.out.println("ExtractedNr2 = " + extractedNr2);
Output:
ExtractedNr1 = 242424
ExtractedNr2 = 1000242424001
I understand the second one is a concat from all the groups, but don't understand why matches are arranged like that. Can I make a regex that will stop immidiately after finding a match (with priority for the first option, that is 7 digits)?
I thought about using some conditional statement, but apparently these are not supported in java.util.regex, and I cannot use third party library.
I can do this in java obviously, but the whole point is in using regex.
Regex is a secundary concern, the occurrences of digits must be compared by length. As in regex \d stand for digit and \D for non-digit you can use String.splitAsStream as follows:
Optional<String> digits takeDigits(String s) {
return s.splitAsStream("\\D+")
filter(w -> !w.isEmpty() && w.length() <= 7)
max(Comparator.comparingInt(String::length));
}
You can use String.replaceAll to remove the non-digit characters:
String extracted = new String("AB1000G242424/001").replaceAll("[^0-9]","");
if (extracted.length() > 7)
extracted = extracted.substring(0, 7);
Output:
1000242

How to get all integers before hyphen from java String

I want to parse through hyphen, the answer should be 0 0 1 (integer), what could be the best way to parse in java
public static String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Please help me out.
Use the below regex with Pattern and matcher classes.
Pattern.compile("\\d+(?=-)");
\\d+ - Matches one or more digits. + repeats the previous token \\d (which matches a digit character) one or more times.
(?=-) - Only if it's followed by an hyphen. (?=-) Called positive lookahead assertion which asserts that the match must be followed by an - symbol.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Matcher m = Pattern.compile("\\d+(?=-)").matcher(str);
while(m.find())
{
System.out.println(m.group());
}
one lazy way: if you already know the pattern of the string, use substring and indexof to locate your word.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
integer int1 = Integer.parseInt(str.substring(str.indexOf("["),str.indexOf("-S1")));
and so on.

Get the last index of a letter followed by numeric

I'm trying to parse a URL and I'd like to test for the last index of a couple characters followed by a numeric value.
Example
used-cell-phone-albany-m3359_l12201
I'm trying to determine if the last "-m" is followed by a numeric value.
So something like this, "used-cell-phone-albany-m3359_l12201".contains("m" followed by numeric)
I'm assuming it needs to be done with regular expressions, but I'm not for sure.
You could use a pattern like [a-z]\\d which searches for any numbers which appear next to a character between a-z, you can specify other characters within the group if you wish...
Pattern pattern = Pattern.compile("[a-z]\\d", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("used-cell-phone-albany-m3359_l12201");
while (matcher.find()) {
int startIndex = matcher.start();
int endIndex = matcher.end();
String match = matcher.group();
System.out.println(startIndex + "-" + endIndex + " = " + match);
}
The problem is, your test String actually contains two matches m3 and l1
The above example will display
23-25 = m3
29-31 = l1
Updated with feedback
If you can guarantee the marker (ie -m), then it comes a lot simpler...
Pattern pattern = Pattern.compile("-m\\d", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("used-cell-phone-albany-m3359_l12201");
if (matcher.find()) {
int startIndex = matcher.start();
int endIndex = matcher.end();
String match = matcher.group();
System.out.println(startIndex + "-" + endIndex + " = " + match);
}
In Java, convert the URL to a String if necessary and then run
URLString.match("^.*m[0-9]+$").
Only if that returns true, then the URL ends with "m" followed by a number. That can be refined with a more precise ending pattern. The reason this regex tests the pattern at the end of the string is because $ in a regex matches the end of the string; "[0-9]+" matches a sequencs of one or more numerical digits; "^" matches the beginning of the string; and ".*" matches zero or more arbitrary but printable characters including white space, letters, numbers and puctuation marks.
To determine if the last "m" is followed by a number then use
URLString.match("^.+?m[0-9].*$")
Here ".+?" greedily matches all characters up to the very last "m".

Sequences of characters in java

I was a exercise to do . "Sequences of characters - passwords, which from left to right consists of 3 consecutive digits, 4 letters (the English alphabet) consecutive, and one or more characters from the set {*, ^,%, #, ~,!, &, |, #, $}." I do it , but i isnt work :/
public class regex {
public static void main(String[] args) {
String regex = "[\\d]{3}[a-aZ-Z]{4}[,#!%]+";
String txt = "394aZbr#";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(txt);
while(m.find()){
String s = m.group();
System.out.println("pass : " + s);
}
Result of my exrcise :
pass: 493ahTz#
Could you help me ?
[\\d]{3}[a-aZ-Z]{4}[,#!%]+
Do not use [] if you are only using one \d, you can use {3} with \d directly.
[a-aZ-Z] that is exactly the same as [aZ], you must use [a-zA-Z]
The last part seems good but you may want to add all the chars that you mentioned before.
Result: \\d{3}[a-zA-Z]{4}[,#!%]+

Punctuation Regex in Java

First, i'm read the documentation as follow
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
And i want find any punctuation character EXCEPT #',& but i don't quite understand.
Here is :
public static void main( String[] args )
{
// String to be scanned to find the pattern.
String value = "#`~!#$%^";
String pattern = "\\p{Punct}[^#',&]";
// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
// Now create matcher object.
Matcher m = r.matcher(value);
if (m.find()) {
System.out.println("Found value: " + m.groupCount());
} else {
System.out.println("NO MATCH");
}
}
Result is NO MATCH.
Is there any mismatch ?
Thanks
MRizq
You're matching two characters, not one. Using a (negative) lookahead should solve the task:
(?![#',&])\\p{Punct}
You may use character subtraction here:
String pat = "[\\p{Punct}&&[^#',&]]";
The whole pattern represents a character class, [...], that contains a \p{Punct} POSIX character class, the && intersection operator and [^...] negated character class.
A Unicode modifier might be necessary if you plan to also match all Unicode punctuation:
String pat = "(?U)[\\p{Punct}&&[^#',&]]";
^^^^
The pattern matches any punctuation (with \p{Punct}) except #, ', , and &.
If you need to exclude more characters, add them to the negated character class. Just remember to always escape -, \, ^, [ and ] inside a Java regex character class/set. E.g. adding a backslash and - might look like "[\\p{Punct}&&[^#',&\\\\-]]" or "[\\p{Punct}&&[^#',&\\-\\\\]]".
Java demo:
String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^#',&]]";
Pattern r = Pattern.compile(pattern); // Create a Pattern object
Matcher m = r.matcher(value); // Now create matcher object.
while (m.find()) {
System.out.println("Found value: " + m.group());
}
Output:
Found value: #
Found value: !
Found value: #
Found value: %
Found value: ,

Categories