<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>
I know this regex expression is used to retrieve the value of src. Can anyone teach me how i should interpret this expression? stucked at it.
Explaining:
<img matches exactly the string "<img"
[^>]+ matches multiple times of everything but >, so the tag will not be closed
src matches exactly the string "src"
\\s* matches any number of whitespace characters
= matches exactly the string "="
\\s* matches any number of whitespace characters
['\"] matches the two quotes. The double quote is escaped, because otherwise it will terminate the string of the regex
([^'\"]+) mathches multiple times everything but quotes. The contents are wrapped in brackets, so that they are declared as group and can be retrieved later
['\"] matches the two quotes. The double quote is escaped, because otherwise it will terminate the string of the regex
[^>]* matches the remaining non ">" characters
> matches exactly the string ">", the closing bracket of the tag.
I would not agree this expression is a crap, just a bit complex.
EDIT Here you go some examplary code:
String str = "<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";
String text = "<img alt=\"booo\" src=\"image.jpg\"/>";
Pattern pattern = Pattern.compile (str);
Matcher matcher = pattern.matcher (text);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
The output is:
<img alt="booo" src="image.jpg"/>
image.jpg
So matcher.group(1) returns what you want. experiment a bit with this code.
Hi check one of the tutorials available on the net - e.g. http://www.vogella.com/articles/JavaRegularExpressions/article.html. Section 3.1 and 3.2 common matching symbols explains briefly each symbol and what it replaces as well as metacharacters. Break what you have here into smaller chunks to understand it easier. For example you have \s in two places it is a metacharacter for a whitespace character. Backslash is an escape character in Java thus you have \s instead of \s. After each of them you have a . Section 3.3 explains the quantifiers - this particular one means it occurs 0 or more times. Thus the \s means "search for a whitespace character that occurs 0 or more times". You do the same with other chunks.
Hope it helps.
Related
here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out
Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits
Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.
I saw one code example and didn't understand how this prints only Print statement.
Appreciate your help on this.
String str = "<a href=/utility/ReportResult.jsp?reportId=5>Print</a>";
System.out.println(str.replaceAll("\\<.*?\\>", ""));
OutPut: Print
How to modify my regex expression to print Print<>Report instead of PrintReport. Below is my regex and statement.
String str = "Print<>Report";
System.out.println(str.replaceAll("<.*?>", ""));
In order to print Print<>Report instead of PrintReport, change the * by +:
System.out.println(str.replaceAll("<.+?>", ""));
// here __^
* means 0 or more precedent character
+ means 1 or more precedent character
You don't have to escape the < (angular braces). So in java str.replaceAll("<.*?>", "") will be sufficient.
How it works :
<.*?> --> Search for first < then match everything until the next >. Note that .*? is called lazy selector / matcher.
Its a Regex says anything b/w "<" and ">" must be repalce by ""(blank string)
So
<a href=/utility/ReportResult.jsp?reportId=5>==> ""(blank)
</a>==>""(blank)
and only "Print" left
First, the leading backslashes are treated as an escape sequence for Java, so the actual regular expression is \<.*?\>
The \<' matches the<` character (the backslash again is an escape sequence, which indicates that the following character should be interpreted literally and not as a regex operator). This is the beginning of an html tag.
The . token matches any character.
The ? is a reluctant quantifier that indicates that the preceding token (any character in this case) should be matched zero or more times.
The /> matches the end of a tag. Because the ? is reluctant, the . does not match the character(s) that can be matched by this token.
I have a string with data separated by commas like this:
$d4kjvdf,78953626,10.0,103007,0,132103.8945F,
I tried the following regex but it doesn't match the strings I want:
[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,
The $ at the beginning of your data string is not matching the regex. Change the first character class to [$a-zA-Z0-9]. And a couple of the comma separated values contain a literal dot. [$.a-zA-Z0-9] would cover both cases. Also, it's probably a good idea to anchor the regex at the start and end by adding ^ and $ to the beginning and end of the regex respectively. How about this for the full regex:
^[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,$
Update:
You said number of commas is your primary matching criteria. If there should be 6 commas, this would work:
^([^,]+,){6}$
That means: match at least 1 character that is anything but a comma, followed by a comma. And perform the aforementioned match 6 times consecutively. Note: your data must end with a trailing comma as is consistent with your sample data.
Well your regular expression is certainly jarbled - there are clearly characters (like $ and .) that your expression won't match, and you don't need to \\ escape ,s. Lets first describe our requirements, you seem to be saying a valid string is defined as:
A string consisting of 6 commas, with one or more characters before each one
We can represent that with the following pattern:
(?:[^,]+,){6}
This says match one or more non-commas, followed by a comma - [^,]+, - six times - {6}. The (?:...) notation is a non-capturing group, which lets us say match the whole sub-expression six times, without it, the {6} would only apply to the preceding character.
Alternately, we could use normal, capturing groups to let us select each individual section of the matching string:
([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?
Now we can not only match the string, but extract its contents at the same time, e.g.:
String str = "$d4kjvdf,78953626,10.0,103007,0,132103.8945F,";
Pattern regex = Pattern.compile(
"([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?");
Matcher m = regex.matcher(str);
if(m.matches()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
This prints:
$d4kjvdf
78953626
10.0
103007
0
132103.8945F
I'm trying to get quoted strings using regexp.
String regexp = "('([^\\\\']+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*'|\"([^\\\\\"]+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*\")";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(source);
while (m.find()) {
String newElement = m.group(1);
//...
}
It works well, but if source text contains
' onkeyup="this.value = this.value.replace (/\D/, \'\')">'
program goes into eternal loop.
How can I correctly get this string?
For example, I have a text(php code):
'qty'=>'<input type="text" maxlength="3" class="qty_text" id='.$key.' value ='
The result should be
'qty'
'<input type="text" maxlength="3" class="qty_text" id='
' value ='
Your regex seems to work okay when presented with a string it matches; it's when it can't match that it goes into the endless loop. (In this case it's the \D that's causing it to choke.) But that regex is much more complicated than it needs to be; you're trying to match them, not validate them. Here's the quintessential regex for a string literal in C-style languages:
"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*"
...and the single-quoted version, for languages that support that style:
'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)*'
It uses Friedl's "unrolled loop" technique for maximum efficiency. Here's the Java code for it, as generated by RegexBuddy 4:
Pattern regex = Pattern.compile(
"\"[^\"\\\\\r\n]*(?:\\\\.[^\"\\\\\r\n]*)*\"|'[^'\\\\\r\n]*(?:\\\\.[^'\\\\\r\n]*)*'"
);
Maybe I misunderstand the principle, but that looks rather trivial now that you added the example.
Consider this for instance:
String input = "'qty'=>'<input type=\"text\" maxlength=\"3\" class=\"qty_text\" id='.$key.' value ='";
String otherInput = "' onkeyup=\"this.value = this.value.replace (/\\D/, \'\')\">'";
// matching anything starting with single quote and ending with single quote
// included, reluctant quantified
Pattern p = Pattern.compile("'.+?'");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group());
}
m = p.matcher(otherInput);
System.out.println();
while (m.find()) {
System.out.println(m.group());
}
Output:
'qty'
'<input type="text" maxlength="3" class="qty_text" id='
' value ='
' onkeyup="this.value = this.value.replace (/\D/, '
')">'
See the Java Pattern documentation for more detailed explanations.
The character groups that match neither backslashes nor quotes shouldn't be followed by a +. Remove the +es to fix the hang (which was due to catastrophic backtracking).
Also, your original regex wasn't recognizing \D as a valid backslash escape - therefore the string constant in your test input containing \D wasn't being matched. If you make the rules of your regex more liberal to recognize any character immediately following a backslash as part of the string constant, it will behave the way you expect.
"('([^\\\\']|\\\\.)*'|\"([^\\\\\"]|\\\\.)*\")"
You can do it all in one line using split() with the right regex:
String[] array = source.replaceAll("^[^']+", "").split("(?<!\\G.)(?<=').*?(?='|$)");
There's a reasonable amount of regex kung fu going on here, so I'll break it down:
The delimiter is wrapped by even/odd quotes, but can not contain the quotes because split() consumes the delimiter, so a look behind (?<=') and look ahead (?=') (which are non-consuming) is used to match the quotes instead of a literal quote in the regex
a reluctant match .*? for characters between the quotes ensures that it stops at the next quote (instead of matching through to the last quote)
I added an alternate match for end of input tot he look ahead (?='|$) in case there's no trailing close quote
And saving the best for last, the regex that is key to making this all work is the negative look behind (?<!\\G.) which means "don't match on the end of the previous match" and ensures the next match advances past the end of the previous delimiter, without which you would end up with just the quote characters in your array. \G matches the end of the previous match, but also matches start of input for the first match, so it rather neatly automatically handles not matching on the first quote - thus making the delimiter wrapped in even/odd quote instead of odd/even as it would be otherwise.
To cater for the input's first character not being a quote, you need to strip off the leading characters before splitting - that's why the replaceAll() is needed
Here's some test code using your sample input:
String source = "'qty'=>'<input type=\"text\" maxlength=\"3\" class=\"qty_text\" id='.$key.' value ='";
String[] array = source.replaceAll("^[^']+", "").split("(?<!\\G.)(?<=').*?(?='|$)");
System.out.println(Arrays.toString(array));
Output:
['qty', '<input type="text" maxlength="3" class="qty_text" id=', ' value =']
I want to remove all the leading and trailing punctuation in a string. How can I do this?
Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.
., #, _, &, /, - are allowed if surrounded by letters
or digits
\' is allowed if preceded by a letter or digit
I tried
Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
term=term.replaceAll("(^\\p{Punct})", "");
but it didn't work!!
Ok. So basically you want to find some pattern in your string and act if the pattern in matched.
Doing this the naiive way would be tedious. The naiive solution could involve something like
while(myString.StartsWith("." || "," || ";" || ...)
myString = myString.Substring(1);
If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.
Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
As for your problem, you could try this:
myString.replaceFirst("^[^a-zA-Z]+", "")
The meaning of the regex:
the first ^ means that in this pattern, what comes next has to be at
the start of the string.
The [] define the chars. In this case, those are things that are NOT
(the second ^) letters (a-zA-Z).
The + sign means that the thing before it can be repeated and still
match the regex.
You can use a similar regex to remove trailing chars.
myString.replaceAll("[^a-zA-Z]+$", "");
the $ means "at the end of the string"
You could use a regular expression:
private static final Pattern PATTERN =
Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");
public static String trimPunctuation(String s) {
Matcher m = PATTERN.matcher(s);
m.find();
return m.group(1);
}
The boundary matchers ^ and $ ensure the whole input is matched.
A dot . matches any single character.
A star * means "match the preceding thing zero or more times".
The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).
The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.
Use this tutorial on patterns. You have to create a regex that matches string starting with alphabet or number and ending with alphabet or number and do inputString.matches("regex")