I need to find a word in a HTML source code. Also I need to count occurrence. I am trying to use regular expression. But it says 0 match found.
I am using regular expression as I thought its the best way. In case of any better way, please let me know.
I need to find the occurrence of the word "hsw.ads" in HTML source code.
I have taken following steps.
int count = 0;
{
Pattern p = Pattern.compile(".*(hsw.ads).*");
Matcher m = p.matcher(SourceCode);
while(m.find())count++;
}
But the count is 0;
Please let me know your solutions.
Thank you.
Help Seeker
You are not matching any "expression", so probably a simple string search would be better. commons-lang has StringUtils.countMatches(source, "yourword").
If you don't want to include commons-lang, you can write that manually. Simply use source.indexOf("yourword", x) multiple times, each time supplying a greater value of x (which is the offset), until it gets -1
You should try this.
private int getWordCount(String word,String source){
int count = 0;
{
Pattern p = Pattern.compile(word);
Matcher m = p.matcher(source);
while(m.find()) count++;
}
return count;
}
Pass the word (Not pattern) you want to search in a string.
To find a string in Java you can use String methods indexOf which tells you the index of the first character of the string you searched for. To find all of them and count them you can do this (there might be a faster way but this should work). I would recommend using StringUtils CountMatches method.
String temp = string; //Copy to save the string
int count = 0;
String a = "hsw.ads";
int i = 0;
while(temp.indexOf(a, i) != -1) {
count++;
i = temp.indexof(a, i) + a.length() + 1;
}
StringUtils.countMatches(SourceCode, "hsw.ads") ought to work, however sticking with the approach you have above (which is valid), I'd recommend a few things:
1. As John Haager mentioned, remove the opening/closing .* will help, becuase you're looking for that exact substring
2. You want to escape the '.' because you're searching for a literal '.' and not a wildcard
3. I would make this Pattern a constant and re-use it rather than re-creating it each time.
That said, I'd still suggest using the approaches above, but I thought I'd just point out your current approach isn't conceptually flawed; just a few implementation details missing.
Your code and regular expression is valid. You don't need to include the .* at the beginning and the end of your regex. For example:
String t = "hsw.ads hsw.ads hsw.ads";
int count = 0;
Matcher m = Pattern.compile("hsw\\.ads").matcher(t);
while (m.find()){ count++; }
In this case, count is 3. And another thing, if you're going to use a regex, if you REALLY want to specifically look for a '.' period between hsw and ads, you need to escape it.
Related
I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}
I am trying to search for a String inside a file content which I got into a String.
I've tried to use Pattern and Matcher, which worked for this case:
Pattern p = Pattern.compile("(</machine>)");
Matcher m = p.matcher(text);
while(m.find()) //if the text "(</machine>)" was found, enter
{
Counter++;
}
return Counter;
Then, I tried to use the same code to find how many tags I have:
Pattern tagsP = Pattern.compile("(</");
Matcher tagsM = tagsP.matcher(text);
while(tagsM.find()) //if the text "(</" was found, enter
{
CounterTags++;
}
return CounterTags;
which in this case, the return value was always 0.
Try using the below code , btw not using Pattern:-
String actualString = "hello hi how(</machine>) are you doing. Again hi (</machine>) friend (</machine>) hope you are (</machine>)doing good.";
//actualString which you get from file content
String toMatch = Pattern.quote("(</machine>)");// for coverting to regex literal
int count = actualString .split(toMatch, -1).length - 1; // split the actualString to array based on toMatch , so final match count should be -1 than array length.
System.out.println(count);
Output :- 4
You can use Apache commons-lang util library, there is a function countMatches exactly for you:
int count = StringUtils.countMatches(text, "substring");
Also this function is null-safe.
I recommend you to explore Apache commons libraries, they provide a lot of useful common util methods.
Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");
I want to get the number of substrings out of a string.
The inputs are excel formulas like IF(....IF(...))+IF(...)+SUM(..) as a string. I want to count all IF( substrings. It's important that SUMIF(...) and COUNTIF(...) will not be counted.
I thought to check that there is no capital letter before the "IF", but this is giving (certainly) index out of bound. Can someone give me a suggestion?
My code:
for(int i = input.indexOf("IF(",input.length());
i != -1;
i= input.indexOf("IF(,i- 1)){
if(!isCapitalLetter(tmpFormulaString, i-1)){
ifStatementCounter++;
}
}
Although you can do the parsing by yourself as you were doing (that's possibly better for you to learn debugging so you know what your problem is)
However it can be easily done by regular expression:
String s = "FOO()FOOL()SOMEFOO()FOO";
Pattern p = Pattern.compile("\\bFOO\\b");
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count++;
}
// count= 2
The main trick here is \b in the regex. \b means word boundary. In short, if there is a alphanumeric character at the position of \b, it will not match.
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
I think you can solve your problem by finding String IF(.
Try to do same thing in another way .
For example:
inputStrin = IF(hello)IF(hello)....IF(helloIF(hello))....
inputString.getIndexOf("IF(");
That solves your problem?
Click Here Or You can use regular expression also.
I've been implementing an application to retrieve a word inside a incoming String parameter, this String parameter can vary since it is an URL, but the pattern for almost all the incoming url's is the same. For instance I could have:
GET /com.myapplication.v4.ws.monitoring.ModuleSystemMonitor HTTP/1.1
or
GET /com.myapplication.filesystem.ws.ModuleFileSystem/getIdFolders/jsonp?idFolder=idFis1&callback=__gwt_jsonp__.P0.onSuccess&failureCallback=__gwt_jsonp__.P0.onFailure HTTP/1.1
So in any case, I want to extract the word that starts with Module, for example, for the first incoming parameter I want to get: ModuleSystemMonitor. And for the second one I want to get the word: ModuleFileSystem.
This is the requirement, I'm not allowed to do anything else but this: just a method that receives a line and try to extract the words I mentioned: ModuleSystemMonitor and ModuleFileSystem.
I've been thinkng of using StringTokenizer class or String#split method, but I'm not sure if they are the best option. I tried and it is easy to get the word begins with Module using indexOf, but how to cut the word if from some cases it comes with a white space like the first sample or it comes with a "/" (slash) in the second. I know I can make an "if" statement and cut it when it is white space or it is slash but I wonder to know if there is another way that could be more dynamic.
Thanks in advance for your time and help. Best regards.
I'm not sure this is the best solution but you could try this:
String[] tmp = yourString.Split("\\.|/| ");
for (int i=0; i< tmp.length(); i++) {
if (tmp[i].matches("^Module.*")) {
return tmp[i];
}
}
return null;
You can just use String.indexOf and String.substring like this:
int startIndex = url.indexOf("Module");
for (int index = startIndex + "Module".length; i < url.length; i++
{
if (!Character.isLetter(url.charAt(index))
{
return url.substring(startIndex, index));
}
}
Based on the assumption that the first non-letter character is the end marker of the word.
String stringToSearch = "GET /com.myapplication.v4.ws.monitoring.ModuleSystemMonitor HTTP/1.1";
Pattern pattern = Pattern.compile("(Module[a-zA-Z]*)");
Matcher matcher = pattern.matcher(stringToSearch);
if (matcher.find()){
System.out.println(matcher.group(1));
}