Java Regex Replace with Capturing Group - java

Is there any way to replace a regexp with modified content of capture group?
Example:
Pattern regex = Pattern.compile("(\\d{1,2})");
Matcher regexMatcher = regex.matcher(text);
resultString = regexMatcher.replaceAll("$1"); // *3 ??
And I'd like to replace all occurrence with $1 multiplied by 3.
edit:
Looks like, something's wrong :(
If I use
Pattern regex = Pattern.compile("(\\d{1,2})");
Matcher regexMatcher = regex.matcher("12 54 1 65");
try {
String resultString = regexMatcher.replaceAll(regexMatcher.group(1));
} catch (Exception e) {
e.printStackTrace();
}
It throws an IllegalStateException: No match found
But
Pattern regex = Pattern.compile("(\\d{1,2})");
Matcher regexMatcher = regex.matcher("12 54 1 65");
try {
String resultString = regexMatcher.replaceAll("$1");
} catch (Exception e) {
e.printStackTrace();
}
works fine, but I can't change the $1 :(
edit:
Now, it's working :)

How about:
if (regexMatcher.find()) {
resultString = regexMatcher.replaceAll(
String.valueOf(3 * Integer.parseInt(regexMatcher.group(1))));
}
To get the first match, use #find(). After that, you can use #group(1) to refer to this first match, and replace all matches by the first maches value multiplied by 3.
And in case you want to replace each match with that match's value multiplied by 3:
Pattern p = Pattern.compile("(\\d{1,2})");
Matcher m = p.matcher("12 54 1 65");
StringBuffer s = new StringBuffer();
while (m.find())
m.appendReplacement(s, String.valueOf(3 * Integer.parseInt(m.group(1))));
System.out.println(s.toString());
You may want to look through Matcher's documentation, where this and a lot more stuff is covered in detail.

earl's answer gives you the solution, but I thought I'd add what the problem is that's causing your IllegalStateException. You're calling group(1) without having first called a matching operation (such as find()). This isn't needed if you're just using $1 since the replaceAll() is the matching operation.

Java 9 offers a Matcher.replaceAll() that accepts a replacement function:
resultString = regexMatcher.replaceAll(
m -> String.valueOf(Integer.parseInt(m.group()) * 3));

Source: java-implementation-of-rubys-gsub
Usage:
// Rewrite an ancient unit of length in SI units.
String result = new Rewriter("([0-9]+(\\.[0-9]+)?)[- ]?(inch(es)?)") {
public String replacement() {
float inches = Float.parseFloat(group(1));
return Float.toString(2.54f * inches) + " cm";
}
}.rewrite("a 17 inch display");
System.out.println(result);
// The "Searching and Replacing with Non-Constant Values Using a
// Regular Expression" example from the Java Almanac.
result = new Rewriter("([a-zA-Z]+[0-9]+)") {
public String replacement() {
return group(1).toUpperCase();
}
}.rewrite("ab12 cd efg34");
System.out.println(result);
Implementation (redesigned):
import static java.lang.String.format;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public abstract class Rewriter {
private Pattern pattern;
private Matcher matcher;
public Rewriter(String regularExpression) {
this.pattern = Pattern.compile(regularExpression);
}
public String group(int i) {
return matcher.group(i);
}
public abstract String replacement() throws Exception;
public String rewrite(CharSequence original) {
return rewrite(original, new StringBuffer(original.length())).toString();
}
public StringBuffer rewrite(CharSequence original, StringBuffer destination) {
try {
this.matcher = pattern.matcher(original);
while (matcher.find()) {
matcher.appendReplacement(destination, "");
destination.append(replacement());
}
matcher.appendTail(destination);
return destination;
} catch (Exception e) {
throw new RuntimeException("Cannot rewrite " + toString(), e);
}
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(pattern.pattern());
for (int i = 0; i <= matcher.groupCount(); i++)
sb.append(format("\n\t(%s) - %s", i, group(i)));
return sb.toString();
}
}

Related

Finding patterns using Regex and replacing them with processed data

There are input strings of the format
${ENC}:107ec5141234742beec5cb5b1917e2e6:{ENC}$${ENC}:d0b2ddf0b9e7b397558c20c6232‌​37c4f:{ENC}$${ENC}:85d6f3cd7dcc5c67cad68ae45a0d5afc:{ENC}$${ENC}:5c0dfb55a843f830‌​024df0d74993b668:{ENC}$
As you can see, the data ( in bold), are prefixed with ${ENC}: and suffixed with :{ENC}$. And i want to replace all the Strings in between them with processed data.
I am using the Regular Expression:
\$\{ENC\}\:(.*?)\:\{ENC\}\$
which after escaping for java:
\\$\\{ENC\\}\\:(.*?)\\:\\{ENC\\}\\$
to find the matches and replace the Strings.
My code sample is below:
String THE_REGEX = "\\$\\{ENC\\}\\:(.*?)\\:\\{ENC\\}\\$";
Pattern THE_PATTERN = Pattern.compile(THE_REGEX);
public static boolean isProcessingRequired(String data){
if(data == null){
return false;
}
return data.matches(THE_REGEX);
}
public String getProcessedString(String dataString){
Matcher matcher = THE_PATTERN.matcher(dataString);
if(matcher.matches()){
String processedData = null;
String dataItem = matcher.group(1);
String procItem = doSomeProcessing(dataItem);
processedData = dataString.replaceAll("\\$\\{ENC\\}:" + encData + ":\\{ENC\\}\\$", procItem);
if(isProcessingRequired(processedData)){
processedData = getProcessedString(processedData);
}
return processedData;
} else {
return dataString;
}
}
public String doSomeProcessing(String str){
// do some processing on the string
// for now:
str = "DONE PROCESSING!!"
return str;
}
But at matcher.group(1), I'm getting
107ec5141234742beec5cb5b1917e2e6:ENC}$${ENC}:d0b2ddf0b9e7b397558c20c623237c4f:{ENC}$${ENC}:85d6f3cd7dcc5c67cad68ae45a0d5afc:{ENC}$${ENC}:5c0dfb55a843f830024df0d74993b668
instead of
107ec5141234742beec5cb5b1917e2e6
which I was expecting.
I'm using the ? in regex to avoid this problem.
And when I tried it at the www.regexe.com, regex appears to be fine
What am I doing wrong here?
The problem is that you are using Matcher.matches() instead of Matcher.find().
From the javadoc:
public boolean matches()
Attempts to match the entire region against the pattern.
public boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
Here is a simple code expliciting the difference :
Matcher matcher = Pattern.compile("\\Q${ENC}\\E(.*?)\\Q{ENC}$\\E").matcher("${ENC}1{ENC}$${ENC}2{ENC}$");
if (matcher.matches()) {
System.out.println(matcher.group(1)); // Will print "1{ENC}$${ENC}2"
}
matcher.reset();
if (matcher.find()) {
System.out.println(matcher.group(1)); // Will print "1"
}

Match String with Regex and print in java

I am trying to read a JSON feed (it is not JSON parsing) and detect the numbers for further manipulation. This is my trying code:
try {
String regex = "^-?\\d+$";
Pattern myPattern = Pattern.compile(regex);
Matcher regexMatcher = myPattern.matcher(jsonString);
while (regexMatcher.find()) {
for (int i = 0; i < regexMatcher.groupCount(); i++) {
System.out.println(regexMatcher.group(i));
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
And this is the JSON string link:
http://uhunt.felix-halim.net/api/cpbook/3
I want to print only the numbers like:
100
-345
785
What is the wrong with my code? I am new in Regex and can't figure out the solution.
For your particular JSON you can use this:
String regex = ", (-?\\d+)";
And regexMatcher.group(1) will give you the number
First, you have to provide "(" and ")" in order to create the matching groups you're looking for.
In addition to that, remove "^" and "$" since there may be more than one hit per line.
This should work:
try {
String regex = "(-?\\d+)";
Pattern myPattern = Pattern.compile(regex);
Matcher regexMatcher = myPattern.matcher(jsonString);
while (regexMatcher.find()) {
for (int i = 0; i < regexMatcher.groupCount(); i++) {
System.out.println(regexMatcher.group(i));
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
You can use next regex:
"(?<=, )(-?\\d+)"
or
"(?<=,\\p{Space})(-?\d+)"
In this case regexMatcher.group(0) returns appropriate results.

Matcher can't match

I have the following code. I need to check the text for existing any of the words from some list of banned words. But even if this word exists in the text matcher doesn't see it. here is the code:
final ArrayList<String> regexps = config.getProperty(property);
for (String regexp: regexps){
Pattern pt = Pattern.compile("(" + regexp + ")", Pattern.CASE_INSENSITIVE);
Matcher mt = pt.matcher(plainText);
if (mt.find()){
result = result + "message can't be processed because it doesn't satisfy the rule " + property;
reason = false;
System.out.println("reason" + mt.group() + regexp);
}
}
What is wrong? This code can'f find regexp в[ыy][шs]лит[еe], which is regexp in the plainText = "Вышлите пожалуйста новый счет на оплату на Санг, пока согласовывали, уже
прошли его сроки. Лиценз...". I also tried another variants of the regexp but everything is useless
The trouble is elsewhere.
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("(qwer)");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
}
}
This prints out true. You may want to use word boundary as delimiter, though:
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("\\bqwer\\b");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
mt = pt.matcher("asdfqwer zxcv");
System.out.println(mt.find());
}
}
The parenthesis are useless unless you need to capture the keyword in a group. But you already have it to begin with.
Use ArrayList's built in functions indexOf(Object o) and contains(Object o) to check if a String exists anywhere in the Array and where.
e.g.
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("hello");
System.out.println(keywords.contains("hello"));
System.out.println(keywords.indexOf("hello"));
outputs:
true
0
Try this to filter out messages which contain banned words using the following regex which uses OR operator.
private static void findBannedWords() {
final ArrayList<String> keywords = new ArrayList<String>();
keywords.add("f$%k");
keywords.add("s!#t");
keywords.add("a$s");
String input = "what the f$%k";
String bannedRegex = "";
for (String keyword: keywords){
bannedRegex = bannedRegex + ".*" + keyword + ".*" + "|";
}
Pattern pt = Pattern.compile(bannedRegex.substring(0, bannedRegex.length()-1));
Matcher mt = pt.matcher(input);
if (mt.matches()) {
System.out.println("message can't be processed because it doesn't satisfy the rule ");
}
}

regex pattern to match particular uri from list of urls

I have a list of urls (lMapValues ) with wild cards like as mentioned in the code below
I need to match uri against this list to find matching url.
In below code I should get matching url as value of d in the map m.
That means if part of uri is matching in the list of urls, that particular url should be picked.
I tried splitting uri in tokens and then checking each token in list lMapValues .However its not giving me correct result.Below is code for that.
public class Matcher
{
public static void main( String[] args )
{
Map m = new HashMap();
m.put("a","https:/abc/eRControl/*");
m.put("b","https://abc/xyz/*");
m.put("c","https://work/Mypage/*");
m.put("d","https://cr/eRControl/*");
m.put("e","https://custom/MyApp/*");
List lMapValues = new ArrayList(m.values());
List tokens = new ArrayList();
String uri = "cr/eRControl/work/custom.jsp";
StringTokenizer st = new StringTokenizer(uri,"/");
while(st.hasMoreTokens()) {
String token = st.nextToken();
tokens.add(token);
}
for(int i=0;i<lMapValues.size();i++) {
String value = (String)lMapValues.get(i);
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
java.util.regex.Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(value);
}
}
}
}
Please help me with regex pattern to achieve above objective.
Any help will be appreciated.
It's much simpler to check if a string starts with a certain value with String.indexOf().
String[] urls = {
"abc/eRControl",
"abc/xyz",
"work/Mypage",
"cr/eRControl",
"custom/MyApp"
};
String uri = "cr/eRControl/work/custom.jsp";
for (String url : urls) {
if (uri.indexOf(url) == 0) {
System.out.println("Matched: " + url);
}else{
System.out.println("Not matched: " + url);
}
}
Also. There is no need to store the scheme into the map if you are never going to match against it.
if I understand your goal correctly, you might not even need regular expressions here.
Try this...
package test;
import java.util.HashSet;
import java.util.Set;
public class PartialURLMapper {
private static final Set<String> PARTIAL_URLS = new HashSet<String>();
static {
PARTIAL_URLS.add("cr/eRControl");
// TODO add more partial Strings to check against input
}
public static String getPartialStringIfMatching(final String input) {
if (input != null && !input.isEmpty()) {
for (String partial: PARTIAL_URLS) {
// this will be case-sensitive
if (input.contains(partial)) {
return partial;
}
}
}
// no partial match found, we return an empty String
return "";
}
// main method just to add example
public static void main(String[] args) {
System.out.println(PartialURLMapper.getPartialStringIfMatching("cr/eRControl/work/custom.jsp"));
}
}
... it will return:
cr/eRControl
The problem is that i is acting as a key not as an index on
String value = (String)lMapValues.get(i);
you will be better served exchanging the map for a list, and using the for each loop.
List<String> patterns = new ArrayList<String>();
...
for (String pattern : patterns) {
....
}

Tokenize a string with a space in java

I want to tokenize a string like this
String line = "a=b c='123 456' d=777 e='uij yyy'";
I cannot split based like this
String [] words = line.split(" ");
Any idea how can I split so that I get tokens like
a=b
c='123 456'
d=777
e='uij yyy';
The simplest way to do this is by hand implementing a simple finite state machine. In other words, process the string a character at a time:
When you hit a space, break off a token;
When you hit a quote keep getting characters until you hit another quote.
Depending on the formatting of your original string, you should be able to use a regular expression as a parameter to the java "split" method: Click here for an example.
The example doesn't use the regular expression that you would need for this task though.
You can also use this SO thread as a guideline (although it's in PHP) which does something very close to what you need. Manipulating that slightly might do the trick (although having quotes be part of the output or not may cause some issues). Keep in mind that regex is very similar in most languages.
Edit: going too much further into this type of task may be ahead of the capabilities of regex, so you may need to create a simple parser.
line.split(" (?=[a-z+]=)")
correctly gives:
a=b
c='123 456'
d=777
e='uij yyy'
Make sure you adapt the [a-z+] part in case your keys structure changes.
Edit: this solution can fail miserably if there is a "=" character in the value part of the pair.
StreamTokenizer can help, although it is easiest to set up to break on '=', as it will always break at the start of a quoted string:
String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
switch (st.ttype) {
case StreamTokenizer.TT_NUMBER:
System.out.println(st.nval);
break;
case StreamTokenizer.TT_WORD:
System.out.println(st.sval);
break;
case '=':
System.out.println("=");
break;
default:
System.out.println(st.sval);
}
}
outputs
Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy
If you leave out the two lines that convert numeric characters to alpha, then you get d=777.0, which might be useful to you.
Assumptions:
Your variable name ('a' in the assignment 'a=b') can be of length 1 or more
Your variable name ('a' in the assignment 'a=b') can not contain the space character, anything else is fine.
Validation of your input is not required (input assumed to be in valid a=b format)
This works fine for me.
Input:
a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk' 123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Output:
a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'
123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Code:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
// SPACE CHARACTER followed by
// sequence of non-space characters of 1 or more followed by
// first occuring EQUALS CHARACTER
final static String regex = " [^ ]+?=";
// static pattern defined outside so that you don't have to compile it
// for each method call
static final Pattern p = Pattern.compile(regex);
public static List<String> tokenize(String input, Pattern p){
input = input.trim(); // this is important for "last token case"
// see end of method
Matcher m = p.matcher(input);
ArrayList<String> tokens = new ArrayList<String>();
int beginIndex=0;
while(m.find()){
int endIndex = m.start();
tokens.add(input.substring(beginIndex, endIndex));
beginIndex = endIndex+1;
}
// LAST TOKEN CASE
//add last token
tokens.add(input.substring(beginIndex));
return tokens;
}
private static void println(List<String> tokens) {
for(String token:tokens){
System.out.println(token);
}
}
public static void main(String args[]){
String test = "a=b " +
"abc='123 456' " +
"&=777 " +
"#='uij yyy' " +
"ABC='slk slk' " +
"123sdkljhSDFjflsakd#*#&=456sldSLKD)#(";
List<String> tokens = RegexTest.tokenize(test, p);
println(tokens);
}
}
Or, with a regex for tokenizing, and a little state machine that just adds the key/val to a map:
String line = "a = b c='123 456' d=777 e = 'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
String found = m.group();
if (state.equals("key")) {
if (found.equals("=") || found.startsWith("'"))
{ System.err.println ("ERROR"); }
else { key = found; state = "equals"; }
} else if (state.equals("equals")) {
if (! found.equals("=")) { System.err.println ("ERROR"); }
else { state = "value"; }
} else if (state.equals("value")) {
if (key == null) { System.err.println ("ERROR"); }
else {
if (found.startsWith("'"))
found = found.substring(1,found.length()-1);
keyval.put (key, found);
key = null;
state = "key";
}
}
}
if (! state.equals("key")) { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);
prints out
map: {d=777, e=uij yyy, c=123 456, a=b}
It does some basic error checking, and takes the quotes off the values.
This solution is both general and compact (it is effectively the regex version of cletus' answer):
String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
System.out.println(m.group()); // or whatever you want to do
}
In other words, find all runs of characters that are combinations of quoted strings or non-space characters; nested quotes are not supported (there is no escape character).
public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123 456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
token = tokenizer.nextToken();
value = token.contains("'") ? value + " " + token : token ;
if(!value.contains("'") || value.endsWith("'")) {
//Split the strings and get variables into hashmap
attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
value ="";
}
}
System.out.println(attributes);
}
output:
{d=777, a=b, e='uij yyy', c='123 456'}
In this case continuous space will be truncated to single space in the value.
here attributed hashmap contains the values
import java.io.*;
import java.util.Scanner;
public class ScanXan {
public static void main(String[] args) throws IOException {
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("<file name>")));
while (s.hasNext()) {
System.out.println(s.next());
<write for output file>
}
} finally {
if (s != null) {
s.close();
}
}
}
}
java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
int index = token.indexOf('=');
String key = token.substring(0, index);
String value = token.substring(index + 1);
}
Have you tried splitting by '=' and creating a token out of each pair of the resulting array?

Categories