Using String's ReplaceAll with regex - java

How to repalace the following String combination:
word1="word2"
With the following String combination:
word1="word3"
Using word boundaries \b.
I used the following, but did't work:
String word2 = "word2";
String word3 = "word3";
String oldLine = "word1=\"" + word2 + "\"";
String newLine = "word1=\"" + word3 + "\"";
String lineToReplace = "\\b" + oldLine + "\\b";
String changedCont = cont.replaceAll(lineToReplace, newLine);
Where cont is a String that contains a lot of characters including word1="word2" String combinations.

Remove the last \b. It will not do what you think, " is not a word character.

String input = "alma word1=\"word2\"";
String replacement = "word1=\"word3\"";
String output = input.replaceAll("\\bword1=\\\"word2\\\"", replaceMent);

If you replace your lineToReplace line by this:
String lineToReplace = "\\b" + oldLine + "(?!\\w)";
It should work the way you want.

You have word boundaries \b inside your string (the ") and you are using word boundaries in your regexp . Remove that last \b for example.

The only word boundary you need is at the front - the rest of your match already has word boundaries built in (the quotes etc).
This will work:
cont.replaceAll("\\bword1=\"word2\"", "word1=\"word3\"");

Related

Why matcher.find() for input parameter always return 'false'?

I have a strange situation which I find difficult to understand regarding regex matcher.
When I pass the next input parameter issueBody to the matcher, the matcher.find() always return false, while passing a hard-coded String with the same value as the issueBody - it works as expected.
The regex function:
private Map<String, String> extractCodeSnippet(Set<String> resolvedIssueCodeLines, String issueBody) {
String codeSnippetForCodeLinePattern = "\\(Line #%s\\).*\\W\\`{3}\\W+(.*)(?=\\W+\\`{3})";
Map<String, String> resolvedIssuesMap = new HashMap<>();
for (String currentResolvedIssue : resolvedIssueCodeLines) {
String currentCodeLinePattern = String.format(codeSnippetForCodeLinePattern, currentResolvedIssue);
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(issueBody);
while (matcher.find()) {
resolvedIssuesMap.put(currentResolvedIssue, matcher.group());
}
}
return resolvedIssuesMap;
}
The following always return false
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(issueBody);
While the following always return true
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher("**SQL_Injection** issue exists # **VB_3845_112_lines/encode.frm** in branch **master**\n" +
"\n" +
"Severity: High\n" +
"\n" +
"CWE:89\n" +
"\n" +
"[Vulnerability details and guidance](https://cwe.mitre.org/data/definitions/89.html)\n" +
"\n" +
"[Internal Guidance](https://checkmarx.atlassian.net/wiki/spaces/AS/pages/79462432/Remediation+Guidance)\n" +
"\n" +
"[ppttx](http://WIN2K12-TEMP/bbcl/ViewerMain.aspx?planid=1010013&projectid=10005&pathid=1)\n" +
"\n" +
"Lines: 41 42 \n" +
"\n" +
"---\n" +
"[Code (Line #41):](null#L41)\n" +
"```\n" +
" user_name = txtUserName.Text\n" +
"```\n" +
"---\n" +
"[Code (Line #42):](null#L42)\n" +
"```\n" +
" password = txtPassword.Text\n" +
"```\n" +
"---\n");
My question is - why? what is the difference between the two statements?
TL;DR:
By using Pattern.UNIX_LINES, you tell Java regex engine to match with . any char but a newline, LF. Use
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.UNIX_LINES);
In your hard-coded string, you have only newlines, LF endings, while your issueBody most likely contains \r\n, CRLF endings. Your pattern only matches a single non-word char with \W (see \\W\\`{3} pattern part), but CRLF consists of two non-word chars. By default, . does not match line break chars, so it does not match neither \r, CR, nor \n, LF. The \(Line #%s\).*\W\`{3} fails right because of this:
\(Line #%s\) - matches (Line #<NUMBER>)
.* - matches 0 or more chars other than any line break char (up to CR or CRLF)
\W - matches a char other than a letter/digit/_ (so, only \r or \n)
\`{3} - 3 backticks - these are only matched if there was a \n ending, not \r\n (CRLF).
Again, by using Pattern.UNIX_LINES, you tell Java regex engine to match with . any char but a newline, LF.
BTW, Pattern.MULTILINE only makes ^ match at the start of each line, and $ to match at the end of each line, and since there are neither ^, nor $ in your pattern, you may safely discard this option.

Java regular expression: new line with forward slash in split

I want to capture below from regular expression.
/
--//
or it can be
--//
I tried:
public static final String DELIMITER = "/\\n--//|-//";
ddl.addAll(..).split(DELIMITER)));
and combinations but nothing working.
I am using on Windows
You don't need to escape so much. You are missing a second newline in your delimiter:
String newline = System.getProperty("line.separator");
String text = "before" + "/" + newline + newline + "--//" + "after";
System.out.println(text);
String delimiter = "/" + newline + newline + "--//";
String[] parts = text.split(delimiter);
System.out.println(parts[0]); // prints "before"
System.out.println(parts[1]); // prints "after"

How to replace multiple spaces and newlines with one blank line

How to remove multiple spaces and newlines in a string, but preserve at least one blank line for each group of blank lines.
For example, change:
"This is
a string.
Something."
to
"This is
a string.
Something."
I'm using .trim() to strip whitespace from the beginning and end of a string, but I couldn't find anything for removing multiple spaces and newlines in a string.
I would like to keep just one whitespace and one newline.
The one-line solution to remove multiple spaces/newlines, but preserve at least one blank line from multiple blank lines:
str = str.replaceAll("(?m)(^ *| +(?= |$))", "").replaceAll("(?m)^$([\r\n]+?)(^$[\r\n]+?^)+", "$1");
Each individual line is trimmed too.
Here's some test code:
String str = " This is\r\n " +
"\r\n" +
" \r\n " +
" \r \n \n " +
"\r\n" +
" a string. ";
str = str.trim().replaceAll("(?m)(^ *| +(?= |$))", "").replaceAll("(?m)^$([\r\n]+?)(^$[\r\n]+?^)+", "$1");
System.out.println(str);
Output:
This is
a string.
The previous advice will trim all whitespace, including the linefeeds and replace them with a single space.
text.replaceAll("\\n\\s*\\n", "\\n").replaceAll("[ \\t\\x0B\\f]+", " ").trim());
First it replaces any instances of linefeeds with only whitespace between them with a single linefeed, then it trims down any other whitespace to a single space ignoring linefeeds.
Here is what I came up with after a bit of testing...
public String keepOneWS(String str) {
Pattern p = Pattern.compile("(\\s+)");
Matcher m = p.matcher(str);
Pattern pBlank = Pattern.compile("[ \t]+");
String newLineReplacement = System.getProperty("line.separator") +
System.getProperty("line.separator");
StringBuffer sb = new StringBuffer();
while (m.find()) {
if(pBlank.matcher(m.group(1)).matches()) {
m.appendReplacement(sb, " ");
} else {
m.appendReplacement(sb, newLineReplacement);
}
}
m.appendTail(sb);
return sb.toString().trim();
}
public void testKeepOneWS() {
String str = " This \t is\r\n " +
"\r\n" +
" \r\n " +
" \r \n \t \n " +
"\r\n" +
" a \t string. \t ";
String expected = "This is" + System.getProperty("line.separator")+
System.getProperty("line.separator") + "a string.";
String actual = keepOneWS(str);
System.out.println("'" + actual + "'");
assertEquals(expected, actual);
}
After a goup of whitespace is captured, it is checked whether it consists only of spaces, if yes then that goup is replaced by one single space, otherwise the goup consits of spaces and line terminators, in this case the group is replaced by one line terminator.
The output is:
'This is
a string.'

How to take a substring using pattern match

I have
String content= "<a data-hovercard=\"/ajax/hovercard/group.php?id=180552688740185\">
<a data-hovercard=\"/ajax/hovercard/group.php?id=21392174\">"
I want to get all the id between "group.php?id=" and "\""
Ex:180552688740185
Here is my code:
String content1 = "";
Pattern script1 = Pattern.compile("group.php?id=.*?\"");
Matcher mscript1 = script1.matcher(content);
while (mscript1.find()) {
content1 += mscript1.group() + "\n";
}
But for some reason it does not work.
Can you give me some advice?
Why are you using .*? to match the id. .*? will match every character. You just need to check for digits. So, just use \\d.
Also, you need to capture the id and then print it.
// To consider special characters as literals
String str = Pattern.quote("group.php?id=") + "(\\d*)";
Pattern script1 = Pattern.compile(str);
// Your matcher line
while (mscript1.find()) {
content += mscript1.group(1) + "\n"; // Capture group 1 contains your id
}

removing space before new line in java

i have a space before a new line in a string and cant remove it (in java).
I have tried the following but nothing works:
strToFix = strToFix.trim();
strToFix = strToFix.replace(" \n", "");
strToFix = strToFix.replaceAll("\\s\\n", "");
myString.replaceAll("[ \t]+(\r\n?|\n)", "$1");
replaceAll takes a regular expression as an argument. The [ \t] matches one or more spaces or tabs. The (\r\n?|\n) matches a newline and puts the result in $1.
try this:
strToFix = strToFix.replaceAll(" \\n", "\n");
'\' is a special character in regex, you need to escape it use '\'.
I believe with this one you should try this instead:
strToFix = strToFix.replace(" \\n", "\n");
Edit:
I forgot the escape in my original answer. James.Xu in his answer reminded me.
Are you sure?
String s1 = "hi ";
System.out.println("|" + s1.trim() + "|");
String s2 = "hi \n";
System.out.println("|" + s2.trim() + "|");
prints
|hi|
|hi|
are you sure it is a space what you're trying to remove? You should print string bytes and see if the first byte's value is actually a 32 (decimal) or 20 (hexadecimal).
trim() seems to do what your asking on my system. Here's the code I used, maybe you want to try it on your system:
public class so5488527 {
public static void main(String [] args)
{
String testString1 = "abc \n";
String testString2 = "def \n";
String testString3 = "ghi \n";
String testString4 = "jkl \n";
testString3 = testString3.trim();
System.out.println(testString1);
System.out.println(testString2.trim());
System.out.println(testString3);
System.out.println(testString4.trim());
}
}

Categories