I am currently creating a Java program to rewrite some outdated Java classes in our software. Part of the conversion includes changing variable names from containing underscores to using camelCase instead. The problem is, I cannot simply replace all underscores in the code. We have some classes with constants and for those, the underscore should remain.
How can I replace instances like string_label with stringLabel, but DO NOT replace underscores that occur after the prefix "Parameters."?
I am currently using the following which obviously does not handle excluding certain prefixes:
public String stripUnderscores(String line) {
Pattern p = Pattern.compile("_(.)");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group(1).toUpperCase());
}
m.appendTail(sb);
return sb.toString();
}
You could possibly try something like:
Pattern.compile("(?<!(class\\s+Parameters.+|Parameters\\.[\\w_]+))_(.)")
which uses a negative lookbehind.
You would probably be better served using some kind of refactoring tool that understood scoping semantics.
If all you check for is a qualified name like Parameters.is_module_installed then you will replace
class Parameters {
static boolean is_module_installed;
}
by mistake. And there are more corner cases like this. (import static Parameters.*;, etc., etc.)
Using regular expressions alone seems troublesome to me. One way you can make the routine smarter is to use regex just to capture an expression of identifiers and then you can examine it separately:
static List<String> exclude = Arrays.asList("Parameters");
static String getReplacement(String in) {
for(String ex : exclude) {
if(in.startsWith(ex + "."))
return in;
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
static String stripUnderscores(String line) {
Pattern p = Pattern.compile("([_$\\w][_$\\w\\d]+\\.?)+");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, getReplacement(m.group()));
}
m.appendTail(sb);
return sb.toString();
}
But that will still fail for e.g. class Parameters { is_module_installed; }.
It could be made more robust by further breaking down each expression:
static String getReplacement(String in) {
if(in.contains(".")) {
StringBuilder result = new StringBuilder();
String[] parts = in.split("\\.");
for(int i = 0; i < parts.length; ++i) {
if(i > 0) {
result.append(".");
}
String part = parts[i];
if(i == 0 || !exclude.contains(parts[i - 1])) {
part = getReplacement(part);
}
result.append(part);
}
return result.toString();
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
That would handle a situation like
Parameters.a_b.Parameters.a_b.c_d
and output
Parameters.a_b.Parameters.a_b.cD
That's impossible Java syntax but I hope you see what I mean. Doing a little parsing yourself goes a long way.
Maybe you can have another Pattern:
Pattern p = Pattern.compile("^Parameters.*"); //^ means the beginning of a line
If this matches , don't replace anything.
Related
I have a java project and the following regex pattern with named capture groups:
(?<department>\w+(-\w)??)\s{1,5}(?<number>\w+(-\w+)?)-(?<section>\w+)\s(?<term>\d+)\s(?<campus>\w{2})
I wanted to replace the value of one of the named group with a wild card character (*). All of the replace methods in the Matcher class appear to be tied to replacing a specific regex value. Since the string is not guaranteed to be unique, I want to replace by the group name.
Is there a way to leverage the Matcher class to provide this substitution capability?
I realized that I can use the start and end methods of the matcher to determine the range of characters that need to be replaced. I can then use a StringBuilder to delete the range and insert the specified replacement value. I wrote the following method to handle this situation.
public static String replaceNamedGroup(String source, Pattern pattern, String groupName, String replaceValue) {
if (source == null || pattern == null) {
return null;
}
Matcher m = pattern.matcher(source);
if (m.find()) {
int start = m.start(groupName);
int end = m.end(groupName);
StringBuilder sb = new StringBuilder(source);
sb = sb.delete(start, end);
if (replaceValue != null) {
sb = sb.insert(start, replaceValue);
}
return sb.toString();
} else {
return source;
}
}
Below is some code to show how it is used
String str = "ABC 123-123 1234 AB";
Pattern pattern = Pattern.compile("(?<department>\w+(-\w)??)\s{1,5}(?<number>\w+(-\w+)?)-(?<section>\w+)\s(?<term>\d+)\s(?<campus>\w{2})");
String output = replaceNamedGroup(str, pattern, "term", "*");
//outputs Output: ABC 123-123 * AB
System.out.println("Output: " + output);
I want to check prohibition words.
In my codes,
public static String filterText(String sText) {
Pattern p = Pattern.compile("test", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sText);
StringBuffer buf = new StringBuffer();
while (m.find()){
m.appendReplacement(buf, maskWord(m.group()));
}
m.appendTail(buf);
return buf.toString();
}
public static String maskWord(String str) {
StringBuffer buf = new StringBuffer();
char[] ch = str.toCharArray();
for (int i = 0; i < ch.length; i++) {
buf.append("*");
}
return buf.toString();
}
If you receive the sentence "test is test", it will be expressed as "**** is ****" using the above code.
But I want to filter out at least a few tens to a few hundred words.
The words are stored in the DB.(DB Type: Oralce)
So how do I check multiple words?
Assuming you are using Java 9 you could use Matcher.replaceAll to replace the words in one statement. You can also use String.replaceAll to replace every character with '*'.
A pattern can contain many alternatives in it. You could construct a pattern with all the words required.
Pattern pattern = Pattern.compile("(word1|word2|word3)");
String result = pattern.matcher(input)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));
Alternatively, you could have a list of patterns and then replace each in turn:
for (Pattern pattern: patternList)
result = pattern.matcher(result)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));
I am trying to censor specific strings, and patterns within my application but my matcher doesn't seem to be finding any results when searching for the Pattern.
public String censorString(String s) {
System.out.println("Censoring... "+ s);
if (findPatterns(s)) {
System.out.println("Found pattern");
for (String censor : foundPatterns) {
for (int i = 0; i < censor.length(); i++)
s.replace(censor.charAt(i), (char)42);
}
}
return s;
}
public boolean findPatterns(String s) {
for (String censor : censoredWords) {
Pattern p = Pattern.compile("(.*)["+censor+"](.*)");//regex
Matcher m = p.matcher(s);
while (m.find()) {
foundPatterns.add(censor);
return true;
}
}
return false;
}
At the moment I'm focusing on just the one pattern, if the censor is found in the string. I've tried many combinations and none of them seem to return "true".
"(.*)["+censor+"](.*)"
"(.*)["+censor+"]"
"["+censor+"]"
"["+censor+"]+"
Any help would be appreciated.
Usage: My censored words are "hello", "goodbye"
String s = "hello there, today is a fine day."
System.out.println(censorString(s));
is supposed to print " ***** today is a fine day. "
Your regex is right!!!!. The problem is here.
s.replace(censor.charAt(i), (char)42);
If you expect this line to rewrite the censored parts of your string it will not. Please check the java doc for string.
Please find below the program which will do what you intend to do. I removed your findpattern method and just used the replaceall with regex in String API. Hope this helps.
public class Regex_SO {
private String[] censoredWords = new String[]{"hello"};
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
Regex_SO regex_SO = new Regex_SO();
regex_SO.censorString("hello there, today is a fine day. hello again");
}
public String censorString(String s) {
System.out.println("Censoring... "+ s);
for(String censoredWord : censoredWords){
String replaceStr = "";
for(int index = 0; index < censoredWord.length();index++){
replaceStr = replaceStr + "*";
}
s = s.replaceAll(censoredWord, replaceStr);
}
System.out.println("Censored String is .. " + s);
return s;
}
}
Since this seem like homework I cant give you working code, but here are few pointers
consider using \\b(word1|word2|word3)\\b regex to find specific words
to create char representing * you can write it as '*'. Don't use (char)42 to avoid magic numbers
to create new string which will have same length as old string but will be filled with only specific characters you can use String newString = oldString.replaceAll(".","*")
to replace on-the-fly founded match with new value you can use appendReplacement and appendTail methods from Matcher class. Here is how code using it should look like
StringBuffer sb = new StringBuffer();//buffer for string with replaced values
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(yourText);
while (m.find()){
String match = m.group(); //this will represent current match
String newValue = ...; //here you need to decide how to replace it
m.appentReplacemenet(sb, newValue );
}
m.appendTail(sb);
String censoredString = sb.toString();
I am trying to perform multiple string replacements using Java's Pattern and Matcher, where the regex pattern may include metacharacters (e.g. \b, (), etc.). For example, for the input string fit i am, I would like to apply the replacements:
\bi\b --> EYE
i --> I
I then followed the coding pattern from two questions (Java Replacing multiple different substring in a string at once, Replacing multiple substrings in Java when replacement text overlaps search text). In both, they create an or'ed search pattern (e.g foo|bar) and a Map of (pattern, replacement), and inside the matcher.find() loop, they look up and apply the replacement.
The problem I am having is that the matcher.group() function does not contain information on matching metacharacters, so I cannot distinguish between i and \bi\b. Please see the code below. What can I do to fix the problem?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.*;
public class ReplacementExample
{
public static void main(String argv[])
{
Map<String, String> replacements = new HashMap<String, String>();
replacements.put("\\bi\\b", "EYE");
replacements.put("i", "I");
String input = "fit i am";
String result = doit(input, replacements);
System.out.printf("%s\n", result);
}
public static String doit(String input, Map<String, String> replacements)
{
String patternString = join(replacements.keySet(), "|");
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
StringBuffer resultStringBuffer = new StringBuffer();
while (matcher.find())
{
System.out.printf("match found: %s at start: %d, end: %d\n",
matcher.group(), matcher.start(), matcher.end());
String matchedPattern = matcher.group();
String replaceWith = replacements.get(matchedPattern);
// Do the replacement here.
matcher.appendReplacement(resultStringBuffer, replaceWith);
}
matcher.appendTail(resultStringBuffer);
return resultStringBuffer.toString();
}
private static String join(Set<String> set, String delimiter)
{
StringBuilder sb = new StringBuilder();
int numElements = set.size();
int i = 0;
for (String s : set)
{
sb.append(Pattern.quote(s));
if (i++ < numElements-1) { sb.append(delimiter); }
}
return sb.toString();
}
}
This prints out:
match found: i at start: 1, end: 2
match found: i at start: 4, end: 5
fIt I am
Ideally, it should be fIt EYE am.
You mistyped one of your regexes:
replacements.put("\\bi\\", "EYE"); //Should be \\bi\\b
replacements.put("i", "I");
You may also want to make your regexes unique. There is no guarantee of order with map.getKeySet() so it may just be replacing i with I before checking \\bi\\b.
You could use capture groups, without straying too far from your existing design. So instead of using the matched pattern as the key, you look up based on the order within a List.
You would need to change the join method to put parantheses around each of the patterns, something like this:
private static String join(Set<String> set, String delimiter) {
StringBuilder sb = new StringBuilder();
sb.append("(");
int numElements = set.size();
int i = 0;
for (String s : set) {
sb.append(s);
if (i++ < numElements - 1) {
sb.append(")");
sb.append(delimiter);
sb.append("("); }
}
sb.append(")");
return sb.toString();
}
As a side note, the use of Pattern.quote in the original code listing would have caused the match to fail where those metacharacters were present.
Having done this, you would now need to determine which of the capture groups was responsible for the match. For simplicity I'm going to assume that none of the match patterns will themselves contain capture groups, in which case something like this would work, within the matcher while loop:
int index = -1;
for (int j=1;j<=replacements.size();j++){
if (matcher.group(j) != null) {
index = j;
break;
}
}
if (index >= 0) {
System.out.printf("Match on index %d = %s %d %d\n", index, matcher.group(index), matcher.start(index), matcher.end(index));
}
Next, we would like to use the resulting index value to index straight back into the replacements. The original code uses a HashMap, which is not suitable for this; you're going to have to refactor that to use a pair of Lists in some form, one containing the list of match patterns and the other the corresponding list of replacement strings. I won't do that here, but I hope that provides enough detail to create a working solution.
I have some html strings which contains images. I need to remove spaces from image name because some tablets do not accept them. (I already renamed all image resources). I think the only fix part is ...
src="file:///android_asset/images/ ?? ?? .???"
because those links are valid links.
I spent half day on it and still struggling on performance issue. The following code works but really slow...
public static void main(String[] args) {
String str = "<IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/eye_anatomy 1 .jpg\" width=350 border=0></P> fd ssda f \r\n"
+ "fd <P align=center><IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/ eye_anato my 1 .bmp\" width=350 border=0></P>\r\n"
+ "\r\n<IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/eye_anatomy1.png\" width=350 border=0>\r\n";
Pattern p = Pattern.compile("(.*?)(src=\"file:///android_asset/images/)(.*?\\s+.*?)(\")", Pattern.DOTALL);
Matcher m = p.matcher(str);
StringBuilder sb = new StringBuilder("");
int i = 0;
while (m.find()) {
sb.append(m.group(1)).append(m.group(2)).append(m.group(3).replaceAll("\\s+", "")).append(m.group(4));
i = m.end();
}
sb.append(str.substring(i, str.length()));
System.out.println(sb.toString());
}
So the real question is, how can I remove spaces from image name efficiently using regex.
Thank you.
Regex is as regex does. :-) Serious the regex stuff is great for really particular cases, but for stuff like this I find myself writing lower-level code. So the following isn't a regex; it's a function. But it does what you want and does it much faster than your regex. (That said, if someone does comes up with a regex that fits the bill and performs well I'd love to see it.)
The following function segments the source string using spaces as delimiters, then recognizes and cleans up your alt and src attributes by not appending spaces while assembling the result. I did the alt attribute only because you were putting file names there too. One side effect is that this will collapse multiple spaces into one space in the rest of the markup, but browsers do that anyway. You can optimize the code a bit by re-using a StringBuilder. It presumes double-quotes around attributes.
I hope this helps.
private String removeAttrSpaces(final String str) {
final StringBuilder sb = new StringBuilder(str.length());
boolean inAttribute = false;
for (final String segment : str.split(" ")) {
if (segment.startsWith("alt=\"") || segment.startsWith("src=\"")) {
inAttribute = true;
}
if (inAttribute && segment.endsWith("\"")) {
inAttribute = false;
}
sb.append(segment);
if (!inAttribute) {
sb.append(' ');
}
}
return sb.toString();
}
Here's a function that should be faster http://ideone.com/vlspF:
private static String removeSpacesFromImages(String aText){
Pattern p = Pattern.compile("(?<=src=\"file:///android_asset/images/)[^\"]*");
StringBuffer result = new StringBuffer();
Matcher matcher = p.matcher(aText);
while ( matcher.find() ) {
matcher.appendReplacement(result, matcher.group(0).replaceAll("\\s+",""));
}
matcher.appendTail(result);
return result.toString();
}