I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}
Related
I'm looking for the regex expression that will detect repeating symbols in a String. And currently I didn't found solution that fits all my requirements.
Requirements are pretty simple:
detect any repeating symbol in a String;
to be able to setup repeating count (eg. more than twice)
Examples of required detection (of symbol 'a', more than 2 times, true if detects, false otherwise)
"Abcdefg" - false
"AbcdaBCD" - false
"abcd_ab_ab" - true (symbol 'a' used three times)
"aabbaabb" - true (symbols 'a' used four times)
Since I'm not a pro in regex and usage of them - code snippet and explanation would be appreciated!
Thanks!
I think that
(.).*\1
would work:
(.) match a single character and capture
.* match any intervening characters
\1 match the captured group again.
(You'd need to compile with the DOTALL flag, or replace . with [\s\S] or similar if the string contains characters not ordinarily matched by .)
and if you want to require that it is found at least 3 times, just change the quantifier of the second two bullets:
(.)(.*\1){2}
etc.
This is going to be pretty inefficient, though, because it's going to have to do the "search for the next matching character" between every character in the string and the end of the string, making it at least quadratic.
You might be as well off not using regular expressions, e.g.
char[] cs = str.toCharArray();
Arrays.sort(cs);
int n = numOccurrencesRequired - 1;
for (int i = n; i < cs.length; ++i) {
boolean allSame = true;
for (int j = 1; j <= n && allSame; ++j) {
allSame = cs[i] == cs[i - j];
}
if (allSame) return true;
}
return false;
This sorts all of the same characters together, allowing you just to pass over the string once looking for adjacent equal characters.
Note that this doesn't quite work for any symbol: it will split up multi-char codepoints like 🍕. You can adapt the code above to work with codepoints, rather than chars.
Try this regex: (.)(?:.*\1)
It basically matches any character (.) is followed by anything .* and itself \1. If you want to check for 2 or more repeats only add {n,} at the end with n being the number of repeats you want to check for.
Yea, such regex exists but just because the set of characters is finite.
regex: .*(a.*a|b.*b|c.*c|...|y.*y|z.*z).*
It makes no sense. Use another approach:
String string = "something";
int[] count = new int[256];
for (int i = 0; i < string.length; i++) {
int temp = int(string.charAt(i));
count[temp]++;
}
Now you have all characters counted and you can use them as you wish.
I have a small string like: "OT-0*02"
and I have a "database" where I have the full strings "OT-0502" and "OT-0602".
I need to locate these based on the user input, which looks like the first string above, the star marks the unknown character, and it could be anywhere.
How do I do this? I've tried fooling around with the Pattern.matches...stuff and the regex thingy but it doesn't seem to give me any solution.
This is what I've got so far
rsz="OT-0*02";
int cv = 0;
while (cv < jarmu.size()-1){
if (Pattern.matches(jarmu.get(cv).substring(9), rsz)) {
System.out.println("fasz");
}
cv++;
}
It sounds like you just want a simple wildcard search where * matches any char. To do that, change all * in the user input to ., and escape everything else. Then you can use it as a pattern.
String userPattern = "O*-0*02";
String[] parts = userPattern.split("\\*", -1);
for (int i = 0; i < parts.length(); i++) parts[i] = "\\Q" + parts[i] + "\\E";
userPattern = String.join(".", parts);
Then use ****.matches(userPattern) to check if a string matches the pattern.
Basically, you want * to mean "match any single char" but in regex, . performs this function, so you replace *s with .s. However, you don't want anything else to be interpreted as a special character, so the string is broken up into parts at the *s, and each part is quoted using \Q and \E to remove any special meaning, and then put together, with .s where the *s used to be. The -1 means *s at the ends of the string won't be lost.
String abc = "||:::|:|::";
It should return true if there's two | and three : appearances.
I'm not sure how to use "regex" or if it's the right method to use. There's no specific pattern in the abc String.
Using a regex would be a bad idea, especially if there's no specific order to them. Make a function that counts the number of times a character sppears in a string, and use that:
public int count(String base, char toFind)
{
int count = 0;
char[] haystack = base.toCharArray();
for (int i = 0; i < haystack.length; i++)
if (haystack[i] == toFind)
count++;
return count;
}
String abc = "||:::|:|::";
if (count(abc,"|") >= 2 && count(abc,":") >= 3)
{
//Do some code here
}
My favorite method for searching for the number of characters in a string is int num = s.length() - s.replaceAll("|","").length(); you can do that for both and test those ints.
If you want to test all conditions in one regex you can use look-ahead (?=condition).
Your regex can look like
String regex =
"(?=(.*[|]){2})"//contains two |
+ "(?=(.*:){3})"//contains three :
+ "[|:]+";//is build only from : and | characters
Now you can use it with matches like
String abc = "||:::|:|::";
System.out.println(abc.matches(regex));//true
abc = "|::::::";
System.out.println(abc.matches(regex));//false
Anyway I you can avoid regex and write your own method which will calculate number of | and : in your string and check if this numbers are greater or equal to 2 and 3. You can use StringUtils.countMatches from apache-commons so your test code could look like
public static boolean testString(String s){
int pipes = StringUtils.countMatches(s, "|");
int colons = StringUtils.countMatches(s, ":");
return pipes>=2 && colons>=3;
}
or
public static boolean testString(String s){
return StringUtils.countMatches(s, "|")>=2
&& StringUtils.countMatches(s, ":")>=3;
}
This is assuming you are looking for two '|' to be one after the other and the same for the three ':'
and one follows the other .Do it using the following single regular expressions.
".*||.*:::.*"
If you are looking to just check the presence of characters and their irrespective of their order then use String.matches method using the two regular expressions with a logical AND
".*|.*|.*"
".*:.*:.*:.*"
Here is a cheat sheet for regular expressions. Its fairly simple to learn. Look at groups and quantifiers in the document to understand the above expression.
Haven't tested it, but this should work
Pattern.compile("^(?=.*[|]{2,})(?=.*[:]{3,})$");
The entire string is read by ?=.* and checked wether the allowed characters (|) occurs at least twice. The same is then done for :, only that this has to match at least three times.
I want to get the number of substrings out of a string.
The inputs are excel formulas like IF(....IF(...))+IF(...)+SUM(..) as a string. I want to count all IF( substrings. It's important that SUMIF(...) and COUNTIF(...) will not be counted.
I thought to check that there is no capital letter before the "IF", but this is giving (certainly) index out of bound. Can someone give me a suggestion?
My code:
for(int i = input.indexOf("IF(",input.length());
i != -1;
i= input.indexOf("IF(,i- 1)){
if(!isCapitalLetter(tmpFormulaString, i-1)){
ifStatementCounter++;
}
}
Although you can do the parsing by yourself as you were doing (that's possibly better for you to learn debugging so you know what your problem is)
However it can be easily done by regular expression:
String s = "FOO()FOOL()SOMEFOO()FOO";
Pattern p = Pattern.compile("\\bFOO\\b");
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count++;
}
// count= 2
The main trick here is \b in the regex. \b means word boundary. In short, if there is a alphanumeric character at the position of \b, it will not match.
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
I think you can solve your problem by finding String IF(.
Try to do same thing in another way .
For example:
inputStrin = IF(hello)IF(hello)....IF(helloIF(hello))....
inputString.getIndexOf("IF(");
That solves your problem?
Click Here Or You can use regular expression also.
I need to find a word in a HTML source code. Also I need to count occurrence. I am trying to use regular expression. But it says 0 match found.
I am using regular expression as I thought its the best way. In case of any better way, please let me know.
I need to find the occurrence of the word "hsw.ads" in HTML source code.
I have taken following steps.
int count = 0;
{
Pattern p = Pattern.compile(".*(hsw.ads).*");
Matcher m = p.matcher(SourceCode);
while(m.find())count++;
}
But the count is 0;
Please let me know your solutions.
Thank you.
Help Seeker
You are not matching any "expression", so probably a simple string search would be better. commons-lang has StringUtils.countMatches(source, "yourword").
If you don't want to include commons-lang, you can write that manually. Simply use source.indexOf("yourword", x) multiple times, each time supplying a greater value of x (which is the offset), until it gets -1
You should try this.
private int getWordCount(String word,String source){
int count = 0;
{
Pattern p = Pattern.compile(word);
Matcher m = p.matcher(source);
while(m.find()) count++;
}
return count;
}
Pass the word (Not pattern) you want to search in a string.
To find a string in Java you can use String methods indexOf which tells you the index of the first character of the string you searched for. To find all of them and count them you can do this (there might be a faster way but this should work). I would recommend using StringUtils CountMatches method.
String temp = string; //Copy to save the string
int count = 0;
String a = "hsw.ads";
int i = 0;
while(temp.indexOf(a, i) != -1) {
count++;
i = temp.indexof(a, i) + a.length() + 1;
}
StringUtils.countMatches(SourceCode, "hsw.ads") ought to work, however sticking with the approach you have above (which is valid), I'd recommend a few things:
1. As John Haager mentioned, remove the opening/closing .* will help, becuase you're looking for that exact substring
2. You want to escape the '.' because you're searching for a literal '.' and not a wildcard
3. I would make this Pattern a constant and re-use it rather than re-creating it each time.
That said, I'd still suggest using the approaches above, but I thought I'd just point out your current approach isn't conceptually flawed; just a few implementation details missing.
Your code and regular expression is valid. You don't need to include the .* at the beginning and the end of your regex. For example:
String t = "hsw.ads hsw.ads hsw.ads";
int count = 0;
Matcher m = Pattern.compile("hsw\\.ads").matcher(t);
while (m.find()){ count++; }
In this case, count is 3. And another thing, if you're going to use a regex, if you REALLY want to specifically look for a '.' period between hsw and ads, you need to escape it.