I must confess, I'm pretty useless when it comes to writing regular expressions, but I've currently got a problem that's really confusing me.
I have written a function that takes a string as input (22K in size) and performs a single regex on it, looking for Long values. One a long value has been found, it is replaced with a String value from a hashmap.
However, it keeps on missing values within the String, the regex I have written is:
Pattern.compile("[*]{3}[0-9]{1,}[*]{3}");
The long values I'm searching for in the file are formatted as such:
***nnnnnnnnnnnnnnnn***
Now the regex seems to work, but like I said, it misses some values, for example:
***1407374883553285*** - FOUND
***281474976720057*** - NOT FOUND
I'm really quite confused as to why it's missing values, I'm using a simple while loop to do the search, and matcher.find() for when it does match.
I'm assuming that either my regex isn't strict enough, or it's missing values due to the way the data is structured in the input string.
If anyone can offer any advice, I'd greatly appreciate it.
Thanks
A cleaner regex is [*]{3}\d+[*]{3}. Check it against the following to see how it goes:
final Pattern pattern = Pattern.compile("[*]{3}\\d+[*]{3}");
final Matcher matcher = pattern.matcher("inputfile");
while (matcher.find())
{
System.out.println(matcher.group());
}
You can use java.util.regex.Pattern.matches(String regEx, CharSequence input) with regualar expression as "[*]{3}[0-9]*[*]{3}"
Related
Firstly sorry for the primitive question, I am wondering how the below method is returning true
Pattern.compile("([0-9]{15})").asPredicate().test("ababx300000055773908")
Please let me know, if i am missing something here.
You have to define the start and the end of the String to match.
your pattern is matching the 15 times numeric in the whole string without considering the location of the pattern.
use regex ^[0-9]{15}$
Pattern.compile("(^[0-9]{15}$)").asPredicate().test("ababx300000055773908");
is it possible to detect the pattern of a String and store it in a variable? so, if I have a String test1234 and highlight 1234 I expect something like \d{4}.
It would require that you find a regular expression that both your highlighted substring and desired replacement match and that is in no way unique. For example, "1234" could match .{4} or \d{4} or even .+ , which is not of a unique length. So, even if you could generate a regular expression from a string, it could happen that it would be the string itself or something you didn't want. Maybe you should rethink the general desired outcome of your program and try to come up with a different way of solving the issue at hand.
Hope that helped. Good luck!
Before y'all jump on me for posting something similar to previous questions asked, yes, there seem to be a number of regex related questions but nothing which seems to help me, or at least that I can see.
I am trying to parse strings in JAVA using PATTERN and MATCHER and am really having no joy. My regular expression seems to match my input string when I use a few of the online regular expression testing websites but Java simply does not match my expression.
My input string is:
"Big apple" title="Little Apple" type="Container" url="http://malcolm.com/testing"
The regular expression I am using to match is ".*" title="(.*)" type="Container" url="(.*)"
Essentially I want to pull out the text within the second and the fourth set of quotes. There will always be 4 sets of quotes with text within and around.
I am coding as follows:
Variable XMLSubstring contains the string above (including the quotes) and is as stated, even when I print it out.
Pattern p = Pattern.compile(".* title=\"(.*)\" type=\"Container\" url=\"(.*)\"");
m = p.matcher(XMLSubstring);
It doesn't appear to be rocket science I'm attempting but I'm pulling my hair out trying to debug the bloody thing.
Is there something wrong with my regex pattern?
Is there something wrong with the code I am using?
Am I simply a moron and should stop coding with immediate effect?
EDIT & UPDATE: I have found the problem. My string had a space at the end of it which was breaking the parser! How silly, and I think based on that, I need to accept the third suggestion of mine and give up programming. Thanks all for your assistance.
Try this,
String str="\"Big apple\" title=\"Little Apple\" type=\"Container\" url=\"http://malcolm.com/testing\"";
Pattern p=Pattern.compile(".* title=\\\".*\\\" type=\\\"Container\\\" url=\\\".*\\\"");
Matcher m=p.matcher(str);
First of all, here is a chunk of affected code:
// (somewhere above, data is initialized as a String with a value)
Pattern detailsPattern = Pattern.compile("**this is a valid regex, omitted due to length**", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher detailsMatcher = detailsPattern.matcher(data);
Log.i("Scraper", "Initialized pattern and matcher, data length "+data.length());
boolean found = detailsMatcher.find();
Log.i("Scraper", "Found? "+((found)?"yep":"nope"));
I omitted the regex inside Pattern.compile because it's very long, but I know it works with the given data set; or if it doesn't, it shoudn't break anything anyway.
The trouble is, I do get the feedback I/Scraper(23773): Initialized pattern and matcher, data length 18861 but I never see the "Found?" line, it is just stuck on the find() call.
Is this a known Android bug? I've tried it over and over and just can't get it to work. Somehow, I think something over the past few days broke this because my app was working fine before, and I have in the past couple days received several comments of the app not working so it is clearly affecting other users as well.
How can I further debug this?
Some regexes can take a very, very long time to evaluate. In particular, regexes that have lots of quantifiers can cause the regex engine to do a huge amount of backtracking to explore all of the possible ways that the input string might match. And if it is going to fail, it has to explore all of those possibilities.
(Here is an example:
regex = "a*a*a*a*a*a*b"; // 6 quantifiers
input = "aaaaaaaaaaaaaaaaaaaa"; // 20 characters
A typical regex engine will do in the region of 20^6 character comparisons before deciding that the input string does not match.)
If you showed us the regex and the string you are trying to match, we could give a better diagnosis, and possibly offer some alternatives. But if you are trying to extract information from HTML, then the best solution is to not use regexes at all. There are HTML parsers that are specifically designed to deal with real-world HTML.
How long is the string you are trying to parse ?
How long and how complicated is the regex you are trying to match ?
Have you tried to break down your regex down to simpler bits ? Adding up the bits one after another will let you see when it breaks and maybe why.
make some RE like [a-zA-Z]* pass it as argument to compile(),here this example allows only characters small & cap.
Read my blogpost on android validation for more info.
I had the same issue and I solved it replacing all the wildchart . with [\s\S]. I really don't know why it worked for me but it did. I come from Javascript world and I know in there that expression is faster for being evaluated.
Hey, I've been trying to figure out why this regular expression isn't matching correctly.
List l_operators = Arrays.asList(Pattern.compile(" (\\d+)").split(rtString.trim()));
The input string is "12+22+3"
The output I get is -- [,+,+]
There's a match at the beginning of the list which shouldn't be there? I really can't see it and I could use some insight. Thanks.
Well, technically, there is an empty string in front of the first delimiter (first sequence of digits). If you had, say a line of CSV, such as abc,def,ghi and another one ,jkl,mno you would clearly want to know that the first value in the second string was the empty string. Thus the behaviour is desirable in most cases.
For your particular case, you need to deal with it manually, or refine your regular expression somehow. Like this for instance:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(rtString);
if (m.find()) {
List l_operators = Arrays.asList(p.split(rtString.substring(m.end()).trim()));
// ...
}
Ideally however, you should be using a parser for these type of strings. You can't for instance deal with parenthesis in expressions using just regular expressions.
That's the behavior of split in Java. You just have to take it (and deal with it) or use other library to split the string. I personally try to avoid split from Java.
An example of one alternative is to look at Splitter from Google Guava.
Try Guava's Splitter.
Splitter.onPattern("\\d+").omitEmptyStrings().split(rtString)