Pattern Matching inside brackets with % sybol - java

I am a newbie to Java and have been trying to pattern match some data inside a TD tag and brackets with a percentage symbol, but for the life of me cannot get it to work.
I am sure it is very simple and I Just want to extract the numbers before the % symbol in here :
<td>0 items (0%)</td>
I have tried quite a number of suggestions but none seem to work.
linecache = readercache.readLine();
System.out.println(linecache);
Pattern patterncf1 = Pattern.compile("\\((.*?)\\)");
tried
Pattern patterncf1 = Pattern.compile("<td>\\d+ \\w+ \\((\\d+)?%\\)</td>");
tried
Pattern patterncf1 = Pattern.compile("<td>\\((\\d+)?%\\)</td>");
tried
Pattern patterncf1 = Pattern.compile("\\((\\d+)?%\\)");
but am always getting
<td>0 items (0%)</td>
Exception in thread "Thread-0" java.lang.IllegalStateException: No match found
I also tried the suggestion below but still erroring out and I would assume that this is the right group in this case.
linecache = readercache.readLine();
System.out.println(linecache);
String pattern = "\\d+(?=%)";
Pattern patterncf1 = Pattern.compile(pattern)
Matcher matchercf1 = patterncf1.matcher(linecache);
String passedvalue = matchercf1.group(1);
System.out.println(passedvalue);
This part in a different section of code works fine.
Pattern patternmb1 = Pattern.compile("<td>(.+?) GB</td>");
Matcher matchermb1 = patternmb1.matcher(line);
if (matchermb1.find()) {
String passedvalue = matchermb1.group(1);
String[] tmpStr = passedvalue.split("\\.") ;
String withoutDecStr = tmpStr[0];
Float passedvalue2 = Float.valueOf(withoutDecStr);
System.out.println("MIU: " + passedvalue2);
JVMinusearray.add(passedvalue2);
I would appreciate if someone could offer some advice please.
Thanks

You can use the following:
Pattern pattern = Pattern.compile("<td>.*\\((\\d+)%\\)</td>");
Matcher matcher = pattern.matcher("<td>0 items (2000%)</td>");
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
You will get the number appended to %.

if you want to extract numbers before %, the following will match
(\\d+(?=%))
demo
Edit:
from your comment, i understood that the problem is in identifying the correct group to pick. in this regex, what you want in goup 1, you have to use group1 to make it work.
linecache = readercache.readLine();
System.out.println(linecache);
String pattern = "(\\d+(?=%))"; // just include ()
Pattern patterncf1 = Pattern.compile(pattern)
Matcher matchercf1 = patterncf1.matcher(linecache);
String passedvalue = matchercf1.group(1);
System.out.println(passedvalue);

Thanks for your help. It seems to work with a static string of text but not from the reading in of the data from the html file, so I will take this offline and see what's going on, but both suggestions have worked fine.
Thank you for your time. I appreciate it.
Regards,
Paul

Related

why pattern/matcher find one match twice

I want to find < a > tags in a StringBuilder (result) and insert a word (INSERTED-WORD/) before their href attribute.
code:
Pattern pattern = Pattern.compile("<a [a-zA-Z0-9=\":.;\\s&%_#/\\\\()\\-']*href=['\"]");
Matcher matcher = pattern.matcher(result);
while (matcher.find()) {
int index2 = result.indexOf(matcher.group(0))+ matcher.group(0).length();
result.insert(index2, "INSERTED-WORD/");
}
But some of tags are found twice (or more) and INSERTED-WORD/ is inserted before their href attribute twice or more.
for example,I want to find this tag :
< a class="link" href="www.example.com">link< /a>
and then change it to
< a class="link" href="INSERTED-WORD/www.example.com">link< /a>
.
but this code changes it to
< a class="link"
href="INSERTED-WORD/INSERTED-WORD/INSERTED-WORD/www.example.com">link<
/a>
How can I fix it?
The behavior you see is caused by the use of indexOf. When something is found more than once indexOf will search for the same matched string and always return the index of the first match.
This is not the only problem with your code. You also modify result while it is used by matcher, java's Matcher was not designed to deal with that and will not work correctly. An obvious problem is that it will think result is shorter than it actually is and there might be other problems.
The following will fix your code:
Pattern pattern = Pattern.compile("<a [a-zA-Z0-9=\":.;\\s&%_#/\\\\()\\-']*href=['\"]");
Matcher matcher = pattern.matcher(result.toString()); // Create new String instead of using result
int found = 0;
while (matcher.find()) {
int index2 = matcher.end();
result.insert(index2 + found++ * "INSERTED-WORD/".length(), "INSERTED-WORD/");
}
I will leave it to you to figure out why found is required, run the code without it and see what happens.
Notes
This is not a good way to solve your problem anubhava offered a much simpler solution in his comment: result = new StringBuilder(result.toString().replaceAll("<a [^>]*?href=\"(?!INSERTED-WORD/)", "$0INSERTED-WORD/"));
The recommended way to parse html is with an html parser https://jsoup.org/ is a good one.

regex command(remove everything but specified txt)

Does anyone out there know of a regex command that will take the following string
url = http://184.154.145.114:8013/wlraac name = wlr samplerate = 44100 channels = 2 format = S16le
and remove everything but the following
wlr
This line will come up multiple times, where everything changes after the = sign and each time all I want to keep is whats after name =
any help is appreciated
You could do something like
.*name =\s*(\w+).*
and replace with the content of group 1
See it here on Regexr
I search for "name =" and anything before. The \s* matches the following whitespace.
Then the \w+ inside brackets. \w will match any character and digit and underscore (if you use the option Pattern.UNICODE_CHARACTER_CLASS otherwise it sticks to ASCII only) . Because of the brackets it is stored in the first group.
String in = " url = http://184.154.145.114:8013/wlraac name = wlr samplerate = 44100 channels = 2 format = S16le";
Pattern r = Pattern.compile(".*name =\\s*(\\w+).*");
Matcher m = r.matcher(in);
String result = m.replaceAll("$1");
System.out.println(result);
Or your code
String str = line2.replaceAll(".*name =\\S*(\\W).*", "$1");
From your description its a little bit hard to understand what you need.
But regex is overkill. You should use smth like:
String s = myString.substring(myString.indexOf("name =")+6);
I'd recommend you to extract the word that appears after =, i.e.
Pattern p = Pattern.compile("=\\s*(\\S+)");
Matcher m = p.matcher(str);
if (m.find()) {
String value = m.group(1); // contains your wlr
...............
}

Getting specific portion of string from a Matcher

I'd like to get a portion of a matched string coming from a Matcher, like this:
Pattern pat = Pattern.compile("a.*l.*z");
Matcher match = pat.matcher("abcdlmnoz"); // I'd want to get bcd AND mno
ArrayList<String> values = match.magic(); //here is where your magic happens =)
ArrayList<String> is only for this example, I could be happy to recieve either a List or individual String items. The best would be what.htaccess files and RewriteRule's do:
RewriteRule (.*)/path?(.*) $1/$2/modified-path/
Well, putting those (.*) into $arguments would be as cool as an ArrayList or accessing String separately. I've been looking for something at Java Matcher API, but I didn't happen to see anything useful inside.
Thanks in advance, guys.
You can capture groups in a regexp match using (_):
Pattern pat = Pattern.compile("a(.*)l(.*)z");
boolean b = match.matches(); // don't forget to attempt the match
Then use match.group(n) to get that portion of the capture. The groups are stored in the match object.
Capturing GroupsOracle
Look at the matcher's "group" method and peruse the doc you linked to for references to groups, which is what the parentheses in the regex do :)
...
String testStr = "abcdlmnoz";
String myRE = "a(.*)l(.*)z";
Pattern myRECompiled = Pattern.compile (myRE,
DOTALL);
Matcher myMatcher = myRECompiled.matcher (testStr);
myMatcher.find ();
System.out.println (myMatcher.group (1));
System.out.println (myMatcher.group (2));
...

Java regular expression with hyphen

I need to match and parse data in a file that looks like:
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
but the pattern I wrote below does not seem to work. Can someone help me understand why?
final String patternStr = "(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";
final Pattern p = Pattern.compile(patternStr);
while ((this.currentLine = this.reader.readLine()) != null) {
final Matcher m = p.matcher(this.currentLine);
if (m.matches()) {
System.out.println("SUCCESS");
}
}
It looks correct.
Something odd is conatined in your lines, probably. Look for some extra spaces and line breaks.
Try this:
final Matcher m = p.matcher(this.currentLine.trim());
Have you tried escaping the - as \\-?
It should work. Make sure there is no invisible characters, you an trim each line. You can refine the code as :
final String patternStr = "(\\d{4})-(\\d{1})-(\\d{2})-(\\d{3})-(\\d{1})-(\\d{6})";
There is white space in the data
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
final String patternStr = "\\s*(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";

Parsing text from the end (using regular expressions)

I have a seemingly simple problem though i am unable to get my head around it.
Let's say i have the following string: 'abcabcabcabc' and i want to get the last occurrence of 'ab'. Is there a way i can do this without looping through all the other 'ab's from the beginning of the string?
I read about anchoring the end of the string and then parsing the string with the required regular expression. I am unsure how to do this in Java (is it supported?).
Update: I guess i have caused a lot of confusion with my (over) simplified example. Let me try another one. Say, i have a string as thus - '12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more'. Here, i want the last date and hence i need to use regular expressions. I hope this is a better example.
Thanks,
Anirudh
Firstly, thanks for all the answers.
Here is what i tried and this worked for me:
Pattern pattern = Pattern.compile("(ab)(?!.*ab)");
Matcher matcher = pattern.matcher("abcabcabcd");
if(matcher.find()) {
System.out.println(matcher.start() + ", " + matcher.end());
}
This displays the following:
6, 8
So, to generalize - <reg_ex>(?!.*<reg_ex>) should solve this problem where '?!' signifies that the string following it should not be present after the string that precedes '?!'.
Update: This page provides a more information on 'not followed by' using regex.
This will give you the last date in group 1 of the match object.
.*(\d{2}/\d{2}/\d{4})
Pattern p = Pattern.compile("ab.*?$");
Matcher m = p.matcher("abcabcabcabc");
boolean b = m.matches();
I do not understand what you are trying to do. Why only the last if they are all the same? Why a regular expression and why not int pos = s.lastIndexOf(String str) ?
For the date example, you could do this with the Pattern API and not in the regex itself. The basic idea is to get all the matches, then return the last one.
public static void main(String[] args) {
// this may be over-kill, you can replace with a much simpler but more lenient version
final String dateRegex = "\\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\\b";
final String sample = "12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more";
List<String> allMatches = getAllMatches(dateRegex, sample);
System.out.println(allMatches.get(allMatches.size() - 1));
}
private static List<String> getAllMatches(final String regex, final String input) {
final Matcher matcher = Pattern.compile(regex).matcher(input);
return new ArrayList<String>() {{
while (matcher.find())
add(input.substring(matcher.start(), matcher.end()));
}};
}

Categories