I want to check prohibition words.
In my codes,
public static String filterText(String sText) {
Pattern p = Pattern.compile("test", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sText);
StringBuffer buf = new StringBuffer();
while (m.find()){
m.appendReplacement(buf, maskWord(m.group()));
}
m.appendTail(buf);
return buf.toString();
}
public static String maskWord(String str) {
StringBuffer buf = new StringBuffer();
char[] ch = str.toCharArray();
for (int i = 0; i < ch.length; i++) {
buf.append("*");
}
return buf.toString();
}
If you receive the sentence "test is test", it will be expressed as "**** is ****" using the above code.
But I want to filter out at least a few tens to a few hundred words.
The words are stored in the DB.(DB Type: Oralce)
So how do I check multiple words?
Assuming you are using Java 9 you could use Matcher.replaceAll to replace the words in one statement. You can also use String.replaceAll to replace every character with '*'.
A pattern can contain many alternatives in it. You could construct a pattern with all the words required.
Pattern pattern = Pattern.compile("(word1|word2|word3)");
String result = pattern.matcher(input)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));
Alternatively, you could have a list of patterns and then replace each in turn:
for (Pattern pattern: patternList)
result = pattern.matcher(result)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));
Related
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?
You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa
Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa
I want to check for pattern matching, and if the pattern matches, then I wanted to replace those text matches with the element in the test array at the given index.
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"}
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$5\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.groupCount());
System.out.println(matcher.replaceAll("test"));
}
System.out.println(text);
}
}
I want the end result text string to be in this format:
{\"test1\":\"one\",\"test2\":\"$two\",\"test3\":\"three\",\"test4\":\"four\"}
but the while loop is exiting after one match and "test" is replaced everywhere like this:
{"test1":"test","test2":"test","test3":"test","test4":"test"}
Using the below code I got the result:
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"};
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$2\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher m = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, test[Integer.parseInt(m.group(1)) - 1]);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
But, if I have a replacement text array like this,
String[] test={"$$one","two","three","four"};
then, because of the $$, I am getting an exception in thread "main":
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)**
The following line is your problem:
System.out.println(matcher.replaceAll("test"));
If you remove it the loop will walk through all matches.
As a solution for your problem, you could replace the loop with something like this:
For Java 8:
StringBuffer out = new StringBuffer();
while (matcher.find()) {
String r = test[Integer.parseInt(matcher.group(1)) - 1];
matcher.appendReplacement(out, r);
}
matcher.appendTail(out);
System.out.println(out.toString());
For Java 9 and above:
String x = matcher.replaceAll(match -> test[Integer.parseInt(match.group(1)) - 1]);
System.out.println(x);
This only works, if you replace the $5 with $2 which is what I would assume is your goal.
Concerning the $ signs in the replacement string, the documentation states:
A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).
In other words, you must write your replacement array as String[] test = { "\\$\\$one", "two", "three", "four" };
I can do a regex solution if you like, but this is much easier (assuming this is the desired output).
int count = 1;
for (String s : test) {
text = text.replace("$" + count++, s);
}
System.out.println(text);
It prints.
{"test1":"one","test2":"two","test3":"three","test4":"four"}
I need a regular expression to remove certain characters but preserve what was removed into a new string. I'm hoping to avoid using two separate expressions.
Example,
Lets say I want to remove all numbers from a string but preserve them and place them in a different string.
"a1b23c" would become "abc" AND a new string for "123"
Thanks for any help!
You can do what you describe with a find / replace loop using Matcher.appendReplacement() and Matcher.appendTail(). For your example:
Matcher matcher = Pattern.compile("\\d+").matcher("a1b23c");
StringBuffer nonDigits = new StringBuffer();
StringBuffer digits = new StringBuffer();
while (matcher.find()) {
digits.append(matcher.group());
matcher.appendReplacement(nonDigits, "");
}
matcher.appendTail(nonDigits);
System.out.println(nonDigits);
System.out.println(digits);
Output:
abc
123
You do have to use StringBuffer instead of StringBuilder for this approach, because that's what Matcher supports.
If you are doing simple things like removing digits, it would be easier to use a pair of StringBuilders:
StringBuilder digits = new StringBuilder();
StringBuilder nonDigits = new StringBuilder();
for (int i = 0; i < str.length(); ++i) {
char ch = str.charAt(i);
if (Character.isDigit(ch) {
digits.append(ch);
} else {
nonDigits.append(ch);
}
}
System.out.println(nonDigits);
System.out.println(digits);
I have string with multiple {!XXX} phrases. For example:
Kumar gaurav {!str1} is just {!str2}, adasdas {!str3}
I need to replace all {!str} values with corresponding str, how to replace all {!str} from my string?
You can use a Pattern and Matcher, which provides you the means to query the string for a unknown number of elements, in combination with a regular expression of \{!str\d\} which will allow you to break the text down based on the tags
For example...
String text = "All that {!str1} is {!str2}";
Map<String, String> values = new HashMap<>(25);
values.put("{!str1}", "glitters");
values.put("{!str2}", "gold");
Pattern p = Pattern.compile("\\{!str\\d\\}");
Matcher matcher = p.matcher(text);
while (matcher.find()) {
String match = matcher.group();
text = text.replaceAll("\\" + match, values.get(match));
}
System.out.println(text);
Which outputs
All that glitters is gold
You could also use something like...
int previousStart = 0;
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
int start = matcher.start();
int end = matcher.end();
sb.append(text.substring(previousStart, start));
sb.append(values.get(match));
previousStart = end;
}
if (previousStart < text.length()) {
sb.append(text.substring(previousStart));
}
Which does away with the String creation in a loop and relies more on the position of the match to cut the original text around the tokens, which makes me happier ;)
use this regex, simple
String string="hello world {!hello}";
string=string.replaceAll("\\{!(.*?)\\}", "replace");
System.out.println(string); //this will print (hello world replace)
I am currently creating a Java program to rewrite some outdated Java classes in our software. Part of the conversion includes changing variable names from containing underscores to using camelCase instead. The problem is, I cannot simply replace all underscores in the code. We have some classes with constants and for those, the underscore should remain.
How can I replace instances like string_label with stringLabel, but DO NOT replace underscores that occur after the prefix "Parameters."?
I am currently using the following which obviously does not handle excluding certain prefixes:
public String stripUnderscores(String line) {
Pattern p = Pattern.compile("_(.)");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group(1).toUpperCase());
}
m.appendTail(sb);
return sb.toString();
}
You could possibly try something like:
Pattern.compile("(?<!(class\\s+Parameters.+|Parameters\\.[\\w_]+))_(.)")
which uses a negative lookbehind.
You would probably be better served using some kind of refactoring tool that understood scoping semantics.
If all you check for is a qualified name like Parameters.is_module_installed then you will replace
class Parameters {
static boolean is_module_installed;
}
by mistake. And there are more corner cases like this. (import static Parameters.*;, etc., etc.)
Using regular expressions alone seems troublesome to me. One way you can make the routine smarter is to use regex just to capture an expression of identifiers and then you can examine it separately:
static List<String> exclude = Arrays.asList("Parameters");
static String getReplacement(String in) {
for(String ex : exclude) {
if(in.startsWith(ex + "."))
return in;
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
static String stripUnderscores(String line) {
Pattern p = Pattern.compile("([_$\\w][_$\\w\\d]+\\.?)+");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, getReplacement(m.group()));
}
m.appendTail(sb);
return sb.toString();
}
But that will still fail for e.g. class Parameters { is_module_installed; }.
It could be made more robust by further breaking down each expression:
static String getReplacement(String in) {
if(in.contains(".")) {
StringBuilder result = new StringBuilder();
String[] parts = in.split("\\.");
for(int i = 0; i < parts.length; ++i) {
if(i > 0) {
result.append(".");
}
String part = parts[i];
if(i == 0 || !exclude.contains(parts[i - 1])) {
part = getReplacement(part);
}
result.append(part);
}
return result.toString();
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
That would handle a situation like
Parameters.a_b.Parameters.a_b.c_d
and output
Parameters.a_b.Parameters.a_b.cD
That's impossible Java syntax but I hope you see what I mean. Doing a little parsing yourself goes a long way.
Maybe you can have another Pattern:
Pattern p = Pattern.compile("^Parameters.*"); //^ means the beginning of a line
If this matches , don't replace anything.