Java regex pattern overmatching(Pattern matches one sequence instead of two) - java

When I write: "integralfrom1to10ofx^2)+integralfrom1to10ofx^3)",
I expect my regex:
// INTEGRAL BASIC
Pattern integralFinalPattern = Pattern.compile("integralfrom(.*)to(.*)of(.*)\\)");
Matcher integralFinalMatcher = integralFinalPattern.matcher(regexDedicatedString);
if(integralFinalMatcher.find()){
String integral_user_input = integralFinalMatcher.group(0);
String integral_lower_index = integralFinalMatcher.group(1);
String integral_upper_index = integralFinalMatcher.group(2);
String integral_formula = integralFinalMatcher.group(3);
String ultimateLatexIntegral = "(\\int_{"+ integral_lower_index
+"}^{"+ integral_upper_index +"} " + integral_formula + ")";
mathFormula = mathFormula.replace(integral_user_input, ultimateLatexIntegral);
}
to match these two strings separately, but for now it would interpret it as one.
And in result of it I'd get the following latex SVG:
I would like to have output with two separate integrals, like here:
How can I achieve this with regex?
Obviously, I seek for an idea that would make it work for more than two pieces.

You're doing a lot of work that the Matcher class can do for you. Check it out:
Pattern p = Pattern.compile("integralfrom(?<upper>.*?)to(?<lower>.*?)of(?<formula>.*?)\\)");
Matcher m = p.matcher(subject);
result = m.replaceAll("\\\\int_{${upper}}^{${lower}} (${formula})");
With an input of "integralfrom1to10ofx^2)+integralfrom1to10ofx^3)", the result is:
\int_{1}^{10} (x^2)+\int_{1}^{10} (x^3)

Related

Masking Email address along with domain using regex

The requirement is as below:
Input: rajani#gmail.com
Output: r****i#*****.com
I tried below two regex's but I could not able to mask the gmail(domain name). Kindly help me on this.
String masked_email_Address2=email_Address.replaceAll("(?<=.{1}).(?=[^#]*?.#)", "*");
Output received as r****i#gmail.com
I searched in stack overflow on this, I got the below regex but it does not produce the correct result:
String masked_email_Address1=email_Address.replaceAll("\\b(\\w)[^#]+#\\S+(\\.[^\\s.]+)", "$1***#****$2");
Output received as: r***#****.com -- One star(*) is missed between R&#.
I started out trying to do this with a one-liner using String#replaceAll as you were doing, but then gave up, because variable length lookbehinds are not supported, and I could not come up with a pattern which did not use them.
Instead, try just using a format pattern matcher:
String email = "rajani#gmail.com";
String pattern = "([^#]+)#(.*)\\.(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(email);
if (m.find( )) {
StringBuilder sb = new StringBuilder("");
sb.append(m.group(1).charAt(0));
sb.append(m.group(1).substring(1).replaceAll(".", "*"));
sb.append("#");
sb.append(m.group(2).replaceAll(".", "*"));
sb.append(".").append(m.group(3));
System.out.println(sb);
}
Demo
This may look like a lot of code to do a relatively small formatting job on an email address. If you like, you may put this code into utility method, and then you can still get the masking effect with a single line of code, when you call the method.
How about:
String masked_email_Address2=email_Address.replaceAll("(.)?[^#]*([^#])#\\S+(\\.[^\\s.]+)?", "$1****$2#****$3");
This will work as long as your address is longer than 1 character long.
Try this:
int idx = email_Address.indexOf('#');
String part1 = email_Address.substring(1, idx-1).replaceAll(".", "\\*");
String part2 = email_Address.substring(idx + 1, email_Address.lastIndexOf('.')).replaceAll(".", "\\*");
String masked_email_Address1=email_Address.replaceAll("^(\\S)[^#]+(\\S)#.*(\\..*)", "$1"+ part1 + "$2#" + part2 + "$3");

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

How to ignore characters before and after my pattern?

I need some help creating a regex (in java) to find a pattern like:
(30.6284, -27.3493)
It's roughly a latitude longitude pair. Building from smaller pieces, I've come up with this:
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
which works ok if I don't have any characters before or after the parenthesis. So this fails:
"hello (30.6284, -27.3493) "
but it'll work if I remove the "hello " before and the trailing whitespace. How can I ignore any other sequence of characters before and after the expression?
Thanks
You can use the following piece of code to find and extract multiple instances of the pattern in your text.
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
String text = "hello (30.6284, -27.3493) (30.6284, -27.3493) ";
Pattern p = Pattern.compile(def);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(text.substring(m.start(), m.end()));
}
I came up with this using this website: http://regexpal.com/ and http://www.regextester.com/
\(-?\d+\.?\d+, -?\d+\.?\d+\)
This will match, but not capture, and probably isn't in your language specific format (but should be easily modifiable. To support capturing you could use this one:
\((-?\d+\.?\d+), (-?\d+\.?\d+)\)
String s = "hello (30.6284, -27.3493) ";
System.out.println(s.replaceAll(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*","$1"));
output:
(30.6284, -27.3493)
Note that if you're going to be looping through to find things, I would use something like this:
Matcher m = Pattern.compile(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*").matcher(s);
while(m.find()){
System.out.println(m.start()+ " " + m.group(1));
}

regex command(remove everything but specified txt)

Does anyone out there know of a regex command that will take the following string
url = http://184.154.145.114:8013/wlraac name = wlr samplerate = 44100 channels = 2 format = S16le
and remove everything but the following
wlr
This line will come up multiple times, where everything changes after the = sign and each time all I want to keep is whats after name =
any help is appreciated
You could do something like
.*name =\s*(\w+).*
and replace with the content of group 1
See it here on Regexr
I search for "name =" and anything before. The \s* matches the following whitespace.
Then the \w+ inside brackets. \w will match any character and digit and underscore (if you use the option Pattern.UNICODE_CHARACTER_CLASS otherwise it sticks to ASCII only) . Because of the brackets it is stored in the first group.
String in = " url = http://184.154.145.114:8013/wlraac name = wlr samplerate = 44100 channels = 2 format = S16le";
Pattern r = Pattern.compile(".*name =\\s*(\\w+).*");
Matcher m = r.matcher(in);
String result = m.replaceAll("$1");
System.out.println(result);
Or your code
String str = line2.replaceAll(".*name =\\S*(\\W).*", "$1");
From your description its a little bit hard to understand what you need.
But regex is overkill. You should use smth like:
String s = myString.substring(myString.indexOf("name =")+6);
I'd recommend you to extract the word that appears after =, i.e.
Pattern p = Pattern.compile("=\\s*(\\S+)");
Matcher m = p.matcher(str);
if (m.find()) {
String value = m.group(1); // contains your wlr
...............
}

Java Regex: how to capture multiple matches in the same line

I am trying to match a regex pattern in Java, and I have two questions:
Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.
For example I have this input line:
1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0 2819357575609397706
And I am interested in these strings:
Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
Screen+Refresh+Rate%7C120HZ
Assuming the known beginning is filter=**, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&) should get you what you need. Use Matcher.find() to get all occurrences of the pattern in a given string. Using the test string you provided, the following:
final Pattern p = Pattern.compile("(?:filter=\\*\\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1));
}
Will output:
1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**
If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.
String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
"Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
"%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
String[] pair = eq.split(param);
params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
System.out.format("%s = %s\n", param.getKey(), param.getValue());
}
Output
subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.
in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:
"filter=([^&]+)&"
Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+ in java:
Pattern p = Pattern.compile("(?<=filter=\\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
EDIT:
Added [^&*]+ to the end of the regex to prevent the ** from being included in the second match.
EDIT2:
Changed regular expression to use lookbehind.
The regex you're looking for is
Screen\+Refresh\+Rate[^&]*
You could use Matcher.find() to find all matches.
are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&".
your can try the following:
String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Pattern p = Pattern.compile("filter=(?:\\**)([^&]+?)(?:\\**)&");
Matcher matcher = p.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}

Categories