Get number of exact substrings - java

I want to get the number of substrings out of a string.
The inputs are excel formulas like IF(....IF(...))+IF(...)+SUM(..) as a string. I want to count all IF( substrings. It's important that SUMIF(...) and COUNTIF(...) will not be counted.
I thought to check that there is no capital letter before the "IF", but this is giving (certainly) index out of bound. Can someone give me a suggestion?
My code:
for(int i = input.indexOf("IF(",input.length());
i != -1;
i= input.indexOf("IF(,i- 1)){
if(!isCapitalLetter(tmpFormulaString, i-1)){
ifStatementCounter++;
}
}

Although you can do the parsing by yourself as you were doing (that's possibly better for you to learn debugging so you know what your problem is)
However it can be easily done by regular expression:
String s = "FOO()FOOL()SOMEFOO()FOO";
Pattern p = Pattern.compile("\\bFOO\\b");
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count++;
}
// count= 2
The main trick here is \b in the regex. \b means word boundary. In short, if there is a alphanumeric character at the position of \b, it will not match.
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

I think you can solve your problem by finding String IF(.
Try to do same thing in another way .
For example:
inputStrin = IF(hello)IF(hello)....IF(helloIF(hello))....
inputString.getIndexOf("IF(");
That solves your problem?
Click Here Or You can use regular expression also.

Related

Counting comma and any text in java String

I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}

Regex to match only letters and numbers

Can you help with this code?
It seems easy, but always fails.
#Test
public void normalizeString(){
StringBuilder ret = new StringBuilder();
//Matcher matches = Pattern.compile( "([A-Z0-9])" ).matcher("P-12345678-P");
Matcher matches = Pattern.compile( "([\\w])" ).matcher("P-12345678-P");
for (int i = 1; i < matches.groupCount(); i++)
ret.append(matches.group(i));
assertEquals("P12345678P", ret.toString());
}
Constructing a Matcher does not automatically perform any matching. That's in part because Matcher supports two distinct matching behaviors, differing in whether the match is implicitly anchored to the beginning of the Matcher's region. It appears that you could achieve your desired result like so:
#Test
public void normalizeString(){
StringBuilder ret = new StringBuilder();
Matcher matches = Pattern.compile( "[A-Z0-9]+" ).matcher("P-12345678-P");
while (matches.find()) {
ret.append(matches.group());
}
assertEquals("P12345678P", ret.toString());
}
Note in particular the invocation of Matcher.find(), which was a key omission from your version. Also, the nullary Matcher.group() returns the substring matched by the last find().
Furthermore, although your use of Matcher.groupCount() isn't exactly wrong, it does lead me suspect that you have the wrong idea about what it does. In particular, in your code it will always return 1 -- it inquires about the pattern, not about matches to it.
First of all you don't need to add any group because entire match can be always accessed by group 0, so instead of
(regex) and group(1)
you can use
regex and group(0)
Next thing is that \\w is already character class so you don't need to surround it with another [ ], because it will be similar to [[a-z]] which is same as [a-z].
Now in your
for (int i = 1; i < matches.groupCount(); i++)
ret.append(matches.group(i));
you will iterate over all groups from 1 but you will exclude last group, because they are indexed from 1 so n so i<n will not include n. You would need to use i <= matches.groupCount() instead.
Also it looks like you are confusing something. This loop will not find all matches of regex in input. Such loop is used to iterate over groups in used regex after match for regex was found.
So if regex would be something like (\w(\w))c and your match would be like abc then
for (int i = 1; i < matches.groupCount(); i++)
System.out.println(matches.group(i));
would print
ab
b
because
first group contains two characters (\w(\w)) before c
second group is the one inside first one, right after first character.
But to print them you actually would need to first let regex engine iterate over your input and find() match, or check if entire input matches() regex, otherwise you would get IllegalStateException because regex engine can't know from which match you want to get your groups (there can be many matches of regex in input).
So what you may want to use is something like
StringBuilder ret = new StringBuilder();
Matcher matches = Pattern.compile( "[A-Z0-9]" ).matcher("P-12345678-P");
while (matches.find()){//find next match
ret.append(matches.group(0));
}
assertEquals("P12345678P", ret.toString());
Other way around (and probably simpler solution) would be actually removing all characters you don't want from your input. So you could just use replaceAll and negated character class [^...] like
String input = "P-12345678-P";
String result = input.replaceAll("[^A-Z0-9]+", "");
which will produce new string in which all characters which are not A-Z0-9 will be removed (replaced with "").

java check the first and last char are uppercase

I am trying to achieve this.
I have a string of 9char (always the same). But i also know that the first and last char is always a aplhabetic, it must be. the rest in the middle are numbers. How to check for that.
I got this logic so far, syntax is my problem
string samplestring;
samplestring = a1234567B
If(samplestring.length() == 9 && samplestring.substring(0,1).uppercase && samplestring.substring(8,9) && samplestring.THE REST OF THE CHAR IN THE MIDDLE ARE DIGITS)
{
println("yes this is correct");
}
else
{
println("retype");
}
Please dont mind about the simple english just want to know the syntax but the logic is there i hope..
Also can please show me those lowercase ones how to convert to uppercase?
A regular expression would be suitable:
String s = new String("A2345678Z");
if (s.matches("[A-Z][0-9]{7}[A-Z]")))
{
}
Regular expression explained:
[A-Z] means any uppercase letter
[0-9]{7} means 7 digits
Pattern p = Pattern.compile("^[A-Za-z]\\d+[A-Za-z]$");
Matcher m = p.match("A1234567B");
if (m.matches()) {
//
}
Edit:
If there are always seven digits, you can replace the \\d+ with \\d{7}
String str="A12345678B";
char first = str.charAt(0);
char second = str.charAt(str.length()-1);
if(Character.isUpperCase(first)&& Character.isUpperCase(second)){
//do something
}

Java Find word in a String

I need to find a word in a HTML source code. Also I need to count occurrence. I am trying to use regular expression. But it says 0 match found.
I am using regular expression as I thought its the best way. In case of any better way, please let me know.
I need to find the occurrence of the word "hsw.ads" in HTML source code.
I have taken following steps.
int count = 0;
{
Pattern p = Pattern.compile(".*(hsw.ads).*");
Matcher m = p.matcher(SourceCode);
while(m.find())count++;
}
But the count is 0;
Please let me know your solutions.
Thank you.
Help Seeker
You are not matching any "expression", so probably a simple string search would be better. commons-lang has StringUtils.countMatches(source, "yourword").
If you don't want to include commons-lang, you can write that manually. Simply use source.indexOf("yourword", x) multiple times, each time supplying a greater value of x (which is the offset), until it gets -1
You should try this.
private int getWordCount(String word,String source){
int count = 0;
{
Pattern p = Pattern.compile(word);
Matcher m = p.matcher(source);
while(m.find()) count++;
}
return count;
}
Pass the word (Not pattern) you want to search in a string.
To find a string in Java you can use String methods indexOf which tells you the index of the first character of the string you searched for. To find all of them and count them you can do this (there might be a faster way but this should work). I would recommend using StringUtils CountMatches method.
String temp = string; //Copy to save the string
int count = 0;
String a = "hsw.ads";
int i = 0;
while(temp.indexOf(a, i) != -1) {
count++;
i = temp.indexof(a, i) + a.length() + 1;
}
StringUtils.countMatches(SourceCode, "hsw.ads") ought to work, however sticking with the approach you have above (which is valid), I'd recommend a few things:
1. As John Haager mentioned, remove the opening/closing .* will help, becuase you're looking for that exact substring
2. You want to escape the '.' because you're searching for a literal '.' and not a wildcard
3. I would make this Pattern a constant and re-use it rather than re-creating it each time.
That said, I'd still suggest using the approaches above, but I thought I'd just point out your current approach isn't conceptually flawed; just a few implementation details missing.
Your code and regular expression is valid. You don't need to include the .* at the beginning and the end of your regex. For example:
String t = "hsw.ads hsw.ads hsw.ads";
int count = 0;
Matcher m = Pattern.compile("hsw\\.ads").matcher(t);
while (m.find()){ count++; }
In this case, count is 3. And another thing, if you're going to use a regex, if you REALLY want to specifically look for a '.' period between hsw and ads, you need to escape it.

Java regex: check if word has non alphanumeric characters

This is my code to determine if a word contains any non-alphanumeric characters:
String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.Compile("\\W*");
Matcher m = p.Matcher(term);
if(matcher.find())
found = true;
I am wondering if the regex expression is wrong. I know "\W" would matches any non-word characters. Any idea on what I am missing ??
Change your regex to:
.*\\W+.*
This is the expresion you are looking for:
"^[a-zA-Z0-9]+$"
When it evaluates to false that means does not match so that mean you found what you wanted.
It's 2016 or later and you should think about international strings from other alphabets than just Latin. The frequently cited [^a-zA-Z] will not match in that case. There are better ways in Java now:
[^\\p{IsAlphabetic}^\\p{IsDigit}]
See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.
Methods are in the wrong case.
The matcher was declared as m but used as matcher.
The repetition should be "one or many" + instead of "zero or many " *
This works correctly:
String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.compile("\\W+");//<-- compile( not Compile(
Matcher m = p.matcher(term); //<-- matcher( not Matcher
if(m.find()) { //<-- m not matcher
found = true;
}
Btw, it would be enough if you just :
boolean found = m.find();
:)
The problem is the '*'. '*' matches ZERO or more characters. You want to match at least one non word character, so you must use '+' as the quantity modifier. Hence match \W+ (Capital W there for NON word)
Your expression does not take account of possible non-English letters. It's also more complicated than it needs to be. Unless you are using regexs for some reason other than need (such as your professor having told you to) you are much better off with:
boolean found = false;
for (int i=0;i<mystring.length();++i) {
if (!Character.isLetterOrDigit(mystring.charAt(i))) {
found=true;
break;
}
}
When I had to do this same thing the regex I use is "(\w)*" Thats what I use. Not sure if capitol w is the same but I also used parenthesis.
If you are okay to use Apache StringUtils, then it's as simple as following
StringUtils.isAlphanumeric(inp)
if (value.matches(".*[^a-zA-Z0-9].*")) { // tested, seems to work.
System.out.println("match");
} else {
System.out.println("no match");
}

Categories