Trouble implementing partial matches with regular expression on Android

Trouble implementing partial matches with regular expression on Android - java

I am creating a regular expression to evaluate if an IP address is a valid multicast address. This validation is occurring in real time while you type (if you type an invalid / out of range character it is not accepted) so I cannot simply evaluate the end result against the regex. The problem I am having with it is that it allows for a double period after each group of numbers (224.. , 224.0.., 224.0.0.. all show as valid).
The code below is a static representation of what's happening. Somehow 224.. is showing as a legal value. I've tested this regex online (non-java'ized: ^2(2[4-9]|3\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$ ) and it works perfectly and does not accept the invalid input i'm describing.
Pattern p = Pattern.compile("^2(2[4-9]|3\\d)(\\.(25[0-5]|2[0-4]\\d|[0-1]?\\d?\\d)){3}$");
Matcher m = p.matcher("224..");
if (!m.matches() && !m.hitEnd()) {
System.out.println("Invalid");
} else {
System.out.println("Valid");
}
It seems that the method m.hitEnd() is evaluating to true whenever I input 224.. which does not make sense to me.
If someone could please look this over and make sure I'm not making any obvious mistake and maybe explain why hitEnd() is returning true in this case I'd appreciate it.
Thanks everyone.

After doing some evaluating myself (after discovering this was on Android), I realized that the same code responds differently on Dalvik than it does on a regular JVM.
The code is:
Pattern p = Pattern.compile("^2(2[4-9]|3\\d)(\\.(25[0-5]|2[0-4]\\d|[0-1]?\\d?\\d)){3}$");
Matcher m = p.matcher("224..");
if (!m.matches() && !m.hitEnd()) {
System.out.println("Invalid");
} else {
System.out.println("Valid");
}
This code (albeit modified a bit), prints Valid on Android and Invalid on the JVM.

I do not know how have you tested your regex but it does not look correct according to your description.
Your regext requires all 4 sections of digits. There is no chance it will match 224..
Only [0-1] and \d are marked with question mark and therefore are optional.
So, without dealing with details of limitations of wich specific digits are permitted I'd suggest you something like this:
^\\d{1-3}\\.(\\d{0-3}\\.)?(\\d{0-3}\\.)?(\\d{0-3}\\.)?$
And you do not have to use hitEnd(): $ in the end is enough. And do not use matches(). Use find() instead. matches() is like find() but adds ^ and $ automatically.

I just tested out your code and m.hitEnd() evaluates to false for me, and I am receiving invalid...
So I'm not really sure what the problem here is?

I reported bug 20625 in Dalvik. In the interim, you don't need to use hitEnd(), having the $ suffix should be sufficient.
public void testHitEnd() {
String text = "b";
String pattern = "^aa$";
Matcher matcher = Pattern.compile(pattern).matcher(text);
assertFalse(matcher.matches());
assertFalse(matcher.hitEnd());
}

Related

Java Truth OR assertion

I would like to check with Java Truth assertion library if any of the following statements is satisfied:
assertThat(strToCheck).startsWith("a");
assertThat(strToCheck).contains("123");
assertThat(strToCheck).endsWith("#");
In another word, I am checking if strToCheck starts with a OR contains the substring 123, OR ends with #. Aka, if any of the 3 conditions applies. I am just giving the assertions as an example.
Is there a way to do the logical OR assertion with Truth?
I know with Hamcrest, we could do something like:
assertThat(strToCheck, anyOf(startsWith("a"), new StringContains("123"), endsWith("#")));

assertTrue(strToCheck.startsWith("a") || strToCheck.contains("123") ||strToCheck.endsWith("#"));
You can do what you asked for with this single line only.

Why not use a regular expression to solve this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String strToCheck = "afoobar123barfoo#";
Pattern pattern = Pattern.compile("a.*123.*#");
Matcher matcher = pattern.matcher(strToCheck);
boolean matchFound = matcher.find();
//matchFound now contains a true/false value.
}
}

All the ways of doing this with Truth currently either are very clumsy or don't produce as informative a failure message as we'd aim for. See this comment on issue 991, which mentions some possible future enhancements, but this is never going to be something that Truth is as good at as Hamcrest is.
If I were writing a test that needed this, I would probably write something like:
boolean valid =
strToCheck.startsWith("a")
|| strToCheck.contains("123")
|| strToCheck.endsWith("#");
if (!valid) {
assertWithMessage(
"expected to be a valid <some description of what kind of string you expect>"
+ "\nbut was: %s", strToCheck)
.fail()
}
And then I'd extract that to a method if it's going to be commonly needed.

Going to flip this on its head, since you're talking about testing.
You should be explicit about what you're asserting, and not so wide-open about it.
For instance, it sounds like you're expecting something like:
a...123#
a123#
a
#
123
...but you may only actually care about one of those cases.
So I would encourage you to explicitly validate only one of each. Even though Hamcrest allows you to find any match, this too feels like an antipattern; you should be more explicit about what it is you're expecting given a set of strings.

Regex to match if string only contains all characters from a character set, plus an optional one

I ran into a wee problem with Java regex. (I must say in advance, I'm not very experienced in either Java or regex.)
I have a string, and a set of three characters. I want to find out if the string is built from only these characters. Additionally (just to make it even more complicated), two of the characters must be in the string, while the third one is **optional*.
I do have a solution, my question is rather if anyone can offer anything better/nicer/more elegant, because this makes me cry blood when I look at it...
The set-up
There mandatory characters are: | (pipe) and - (dash).
The string in question should be built from a combination of these. They can be in any order, but both have to be in it.
The optional character is: : (colon).
The string can contain colons, but it does not have to. This is the only other character allowed, apart from the above two.
Any other characters are forbidden.
Expected results
Following strings should work/not work:
"------" = false
"||||" = false
"---|---" = true
"|||-|||" = true
"--|-|--|---|||-" = true
...and...
"----:|--|:::|---::|" = true
":::------:::---:---" = false
"|||:|:::::|" = false
"--:::---|:|---G---n" = false
...etc.
The "ugly" solution
Now, I have a solution that seems to work, based on this stackoverflow answer. The reason I'd like a better one will become obvious when you've recovered from seeing this:
if (string.matches("^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$") || string.matches("^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$")) {
//do funny stuff with a meaningless string
} else {
//don't do funny stuff with a meaningless string
}
Breaking it down
The first regex
"^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$"
checks for all three characters
The next one
"^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$"
check for the two mandatory ones only.
...Yea, I know...
But believe me I tried. Nothing else gave the desired result, but allowed through strings without the mandatory characters, etc.
The question is...
Does anyone know how to do it a simpler / more elegant way?
Bonus question: There is one thing I don't quite get in the regexes above (more than one, but this one bugs me the most):
As far as I understand(?) regular expressions, (?\\|)? should mean that the character | is either contained or not (unless I'm very much mistaken), still in the above setup it seems to enforce that character. This of course suits my purpose, but I cannot understand why it works that way.
So if anyone can explain, what I'm missing there, that'd be real great, besides, this I suspect holds the key to a simpler solution (checking for both mandatory and optional characters in one regex would be ideal.
Thank you all for reading (and suffering ) through my question, and even bigger thanks for those who reply. :)
PS
I did try stuff like ^[\\|\\-(?:\\:)?)]$, but that would not enforce all mandatory characters.

Use a lookahead based regex.
^(?=.*\\|)(?=.*-)[-:|]+$
or
^(?=.*\\|)[-:|]*-[-:|]*$
or
^[-:|]*(?:-:*\\||\\|:*-)[-:|]*$
DEMO 1DEMO 2
(?=.*\\|) expects atleast one pipe.
(?=.*-) expects atleast one hyphen.
[-:|]+ any char from the list one or more times.
$ End of the line.

Here is a simple answer:
(?=.*\|.*-|.*-.*\|)^([-|:]+)$
This says that the string needs to have a '-' followed by '|', or a '|' followed by a '-', via the look-ahead. Then the string only matches the allowed characters.
Demo: http://fiddle.re/1hnu96

Here is one without lookbefore and -hind.
^[-:|]*\\|[-:|]*-[-:|]*|[-:|]*-[-:|]*\\|[-:|]*$
This doesn't scale, so Avinash's solution is to be preferred - if your regex system has the lookbe*.

regex see if lowercase exists [duplicate]

This question already has answers here:
Trying to check if string contains special characters or lowercase java
(4 answers)
Closed 7 years ago.
I'm still learning with regex.
I'm trying to check if the string contains ANY lowercase values. If it does, I just want to return false.
I've read answers on here, but they don't seem to work in my program, YET, they work in a regex emulator online.
if (str.matches("[a-z]+"){
System.out.println("removed");
return false;
This seems to highlight the lowercase letters in regexr but not in my program. Any help please?

If you're just looking for the correct regular expression itself .*[a-z].* as provided by #Adrian Leonhard as a comment above, is indeed correct. However, I think its important to mention that regular expressions take a very long time to compile, and if this if statement is nested in a loop it might be a good idea to use the full regular expression implementation in java.util.regex.*. rather than the convenience methods provided in String. To do so, first compile a Pattern object from your string.
Pattern p = Pattern.compile(".*[a-z].*");
This way the regex only has to be compiled once instead of every time String#matches(String regex) is called. Regex compilation is very computationally intensive. Then, create a matcher using your input string.
Matcher m = p.matcher(str);
Now, call Matcher#find()
if(m.find()) {
//Your code here
}
However, you could also just test to see if
str.toUpperCase().equals(str)
It's up to you. I would only use regex if absolutely necessary as it can slow down your program, and isn't very elegant in this case. At least you know how to use them properly in the future now.

How to retrieve portion of number that's within parenthesis in Java?

For part of my Java assignment I'm required to select all records that have a certain area code. I have custom objects within an ArrayList, like ArrayList<Foo>.
Each object has a String phoneNumber variable. They are formatted like "(555) 555-5555"
My goal is to search through each custom object in the ArrayList<Foo> (call it listOfFoos) and place the objects with area code "616" in a temporaryListOfFoos ArrayList<Foo>.
I have looked into tokenizers, but was unable to get the syntax correct. I feel like what I need to do is similar to this post, but since I'm only trying to retrieve the first 3 digits (and I don't care about the remaining 7), this really didn't give me exactly what I was looking for. Ignore parentheses with string tokenizer?
What I did as a temporary work-around, was...
for (int i = 0; i<listOfFoos.size();i++){
if (listOfFoos.get(i).getPhoneNumber().contains("616")){
tempListOfFoos.add(listOfFoos.get(i));
}
}
This worked for our current dataset, however, if there was a 616 anywhere else in the phone numbers [like "(555) 616-5555"] it obviously wouldn't work properly.
If anyone could give me advice on how to retrieve only the first 3 digits, while ignoring the parentheses, I would greatly appreciate it.

You have two options:
Use value.startsWith("(616)") or,
Use regular expressions with this pattern "^\(616\).*"
The first option will be a lot quicker.

areaCode = number.substring(number.indexOf('(') + 1, number.indexOf(')')).trim() should do the job for you, given the formatting of phone numbers you have.
Or if you don't have any extraneous spaces, just use areaCode = number.substring(1, 4).

I think what you need is a capturing group. Have a look at the Groups and capturing section in this document.
Once you are done matching the input with a pattern (for example "\((\\d+)\) \\d+-\\d+"), you can get the number in the parentheses using a matcher (object of java.util.regex.Matcher) with matcher.group(1).

You could use a regular expression as shown below. The pattern will ensure the entire phone number conforms to your pattern ((XXX) XXX-XXXX) plus grabs the number within the parentheses.
int areaCodeToSearch = 555;
String pattern = String.format("\\((%d)\\) \\d{3}-\\d{4}", areaCodeToSearch);
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(phoneNumber);
if (m.matches()) {
String areaCode = m.group(1);
// ...
}
Whether you choose to use a regular expression versus a simple String lookup (as mentioned in other answers) will depend on how bothered you are about the format of the entire string.

Regex to extract valid Http or Https

I'm currently having some issues with a regex to extract a URL.
I want my regex to take URLS such as:
http://stackoverflow.com/questions/ask
https://stackoverflow.com
http://local:1000
https://local:1000
Through some tutorials, I've learned that this regex will find all the above: ^(http|https)\://.*$ however, it will also take http://local:1000;http://invalid http://khttp://as a single string when it shouldn't take it at all.
I understand that my expression isn't written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.
Any help is greatly appreciated!
Edit:
Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement?
Sorry this will be done with Java
I also need to add the following constraint: a string such as http://local:80/test:90 fails because of the duplicate of port...aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.

This will only produce a match if if there is no :// after its first appearance in the string.
^https?:\/\/(?!.*:\/\/)\S+
Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.
In Java
Pattern reg = Pattern.compile("^https?:\\/\\/(?!.*:\\/\\/)\\S+");
Matcher m = reg.matcher("http://somesite.com");
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("No match");
}

Check your programming language to see if it already has a parser. E.g. php has parse_url()

From http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
This may change based on the programming language/tool

/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trouble implementing partial matches with regular expression on Android - java

I just tested out your code and m.hitEnd() evaluates to false for me, and I am receiving invalid... So I'm not really sure what the problem here is?

Related

Java Truth OR assertion

Regex to match if string only contains all characters from a character set, plus an optional one

regex see if lowercase exists [duplicate]

How to retrieve portion of number that's within parenthesis in Java?

Regex to extract valid Http or Https

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trouble implementing partial matches with regular expression on Android - java

I just tested out your code and m.hitEnd() evaluates to false for me, and I am receiving invalid... So I'm not really sure what the problem here is?

Related

Java Truth OR assertion

Regex to match if string *only* contains *all* characters from a character set, plus an optional one

regex see if lowercase exists [duplicate]

How to retrieve portion of number that's within parenthesis in Java?

Regex to extract valid Http or Https

Categories

Resources

Regex to match if string only contains all characters from a character set, plus an optional one