I'm getting message from other program where some characters are changed:
\n (enter) -> #
(hash) # -> \#
\ -> \\\\
When I'm trying to reverse these change with my code it's not working, probably of that
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
This is my code:
public String changeChars(final String message) {
String changedMessage = message;
changedMessage = changePattern(changedMessage, "[^\\\\][#]", "\n");
changedMessage = changePattern(changedMessage, "([^\\\\][\\\\#])", "#");
changedMessage = changePattern(changedMessage, "[\\\\\\\\\\\\\\\\]", "\\\\");
return changedMessage;
}
private String changePattern(final String message, String patternString, String replaceString) {
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
return matcher.replaceAll(replaceString);
}
I assume that your encoding method works like this.
replace all \ with \\\\
mark originally placed # as \#
now since we know that all originally placed # have \ before it we can use it to mark new lines \n with #.
Code for that could be something like
data = data.replace("\\", "\\\\\\\\");
data = data.replace("#", "\\#");
data = data.replace("\n", "#");
To reverse this operation we need to start from the end (form last replacement)
We will replace all # that don't have \ before it with new line \n marks (if we started with 2nd replacement \# -> # we wouldn't know later which of # ware replacements of \n).
After that we can safely replace \# with # (this way we will get rid of additional \ that wasn't in original String and it won't bother our last replacement step).
and lastly we replace \\\\ with \.
Here is how we can do it.
//your previous regex [^\\\\][#] describes "any character that is not \ and #
//but since we don't want to include that additional non `\` mark while replacing
//we should use negative look-behind mechanism "(?<!prefix)"
data = data.replaceAll("(?<!\\\\)#", "\n");
//now since we got rid of additional "#" its time to replace `\#` to `#`
data = data.replace("\\#", "#");
//and lastly `\\\\` to `\`
data = data.replace("\\\\\\\\", "\\");
Related
I can replace dollar signs by using Matcher.quoteReplacement. I can replace words by adding boundary characters:
from = "\\b" + from + "\\b";
outString = line.replaceAll(from, to);
But I can't seem to combine them to replace words with dollar signs.
Here's an example. I am trying to replace "$temp4" (NOT $temp40) with "register1".
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = Matcher.quoteReplacement(from);
from = "\\b" + from + "\\b"; //do whole word replacement
outString = line.replaceAll(from, to);
System.out.println(outString);
Outputs
"add, $temp4, $temp40, 42"
How do I get it to replace $temp4 and only $temp4?
Use unambiguous word boundaries, (?<!\w) and (?!\w), instead of \b that are context dependent:
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
See the regex demo.
The (?<!\w) is a negative lookbehind that fails the match if there is a non-word char immediately to the left of the current location and (?!\w) is a negative lookahead that fails the match if there is a non-word char immediately to the right of the current location. The Pattern.quote(from) is necessary to escape any special chars in the from variable.
See the Java demo:
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
outString = line.replaceAll(from, to);
System.out.println(outString);
// => add, register1, $temp40, 42
Matcher.quoteReplacement() is for the replacement string (to), not the regex (from). To include a string literal in the regex, use Pattern.quote():
from = Pattern.quote(from);
$ has special meaning in regex (it means “end of input”). To remove any special meaning from characters in your target, wrap it in regex quote/unquote expressions \Q...\E. Also, because $ is not ”word” character, the word boundary won’t wiork, so use look arounds instead:
line = line.replaceAll("(?<!\\S)\\Q" + from + "\\E(?![^ ,])", to);
Normally, Pattern.quote is the way to go to escape characters that may be specially interpreted by the regex engine.
However, the regular expression is still incorrect, because there is no word boundary before the $ in line; space and $ are both non-word characters. You need to place the word boundary after the $ character. There is no need for Pattern.quote here, because you're escaping things yourself.
String from = "\\$\\btemp4\\b";
Or more simply, because you know there is a word boundary between $ and temp4 already:
String from = "\\$temp4\\b";
The from variable can be constructed from the expression to replace. If from has "$temp4", then you can escape the dollar sign and add a word boundary.
from = "\\" + from + "\\b";
Output:
add, register1, $temp40, 42
I am writing code to detect bad keywords in a file. Here are the steps that I follow:
Tokenize using StreamTokenizer
Use pattern matcher to find the matches
while(streamTokenizer.nextToken() != StreamTokenizer.TT_EOF){
if(streamTokenizer.ttype == StreamTokenizer.TT_WORD) {
String token = streamTokenizer.sval.trim().replaceAll("\\\\n", "")
final Matcher matcher = badKeywordPattern.matcher(token)
if(matcher.find()) { // bad tokens found
return true;
}
}
}
String token = streamTokenizer.sval.trim().replaceAll("\\\\n", "") is done to match token spanning multiple lines with \. Example:
bad\
token
However the replace is not working. Any suggestions? Any other ways to do this?
Assuming you want to remove all \ placed at end of the line, along with line separator you could use replaceAll("\\\\\\R","").
To represent \ in regex (which is what replaceAll uses) we need to escape it with another \, which leaves us with \\. But since \ is also special in String literals we need to escape each of them again with another backslash which leaves us with "\\\\"
Since Java 8 we can use \R (which needs to be written as "\\R" since \ requires escaping) to represent line separators like \r \n or \r\n pair.
If I understand correctly, you do not want to use regex (which is what String.replaceAll does), just do literal string replacement with String.replace, and use one fewer backslash:
String token = streamTokenizer.sval.trim().replace("\\\n", "")
Based on #Pshemo answer, which shows you how \ & \n presented in regex, and as mentioned here. You could do it like this:
String[] tkns = streamTokenizer.sval.trim().split("\\\\\\R"); // yourString = "bad\\\ntaken"
StringBuffer token= new StringBuffer();
for (String tkn : tkns)
{
token.append(tkn);
//System.out.println(tkn);
}
//final Matcher matcher = badKeywordPattern.matcher(token)
I want to remove \\\" from a string using java. I have tried with the code mentioned below but I could not get the expected result.
str.replaceAll("\\\"","");
input string:
{\"name\":\"keyword\",\"value\":\"\\\"duck''s\\\"\",\"compareVal\":\"contains\"}
expected string:
{\"name\":\"keyword\",\"value\":\"duck''s\",\"compareVal\":\"contains\"}
Use replace():
str = str.replace("\\\\\"", "");
replaceAl() uses regex for its search term (which would require a more complex string literal), but you don't need regex - your search term is plain text.
Note also that java string literals require each of your search characters to be escaped (by a leading backslash).
str.replace("\\\\\"","");
Explanation:
First \ => escaping a '\'
Second \ => escaping a '\'
\" => escaping '"'
Because \ and " are reserved symbols you have to indicate you want to treat them as the symbol they are by escaping with \ before.
public static void main(String s[])
{
String inputString = "\\\"name\\\"";
String outputString = inputString.replace("\\", "").replace("\"","");
System.out.println("Output string is as following :" + outputString);
}
I'm trying to replace some text in a file and the string contains a file path which requires some back slashes, normally using "\" works fine and produces a single \ on the output but my current code is not outputting any backslashes
String newConfig = readOld().replaceAll(readOld(),"[HKEY_CURRENT_USER\\Software\\xxxx\\xxxx\\Config]");
The "\" starts an escape sequence,
A character preceded by a backslash (\) is an escape sequence and has special meaning to the compiler.
So, (ludicrously perhaps)
String old = readOld();
String newConfig = old.replaceAll(old,
"[HKEY_CURRENT_USER\\\\Software\\\\xxxx\\\\xxxx\\\\Config]");
Or,
String old = readOld();
char backSlash = '\\';
String newConfig = old.replaceAll(old,
"[HKEY_CURRENT_USER" + backSlash + backSlash + "Software"
+ backSlash + backSlash + "xxxx"
+ backSlash + backSlash + "xxxx"
+ backSlash + backSlash + "Config]");
You should use replace here as it may possible your readOld() method may be having some special characters (i.e +,*,. etc.) which are reserved in regExp so better to use replace.(As replaceAll may throw Exception for invalid regular Expression)
String newConfig = readOld().replace(readOld(),"replacement");
As here it seems you are replacing whole String why not just assign String directly to newConfig
From JavaDoc for replaceAll
Backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement String
So either go For \\\\ (As suggested by Elliott Frinch) in String or use replace.
I have a string that contains some text followed by a blank line. What's the best way to keep the part with text, but remove the whitespace newline from the end?
Use String.trim() method to get rid of whitespaces (spaces, new lines etc.) from the beginning and end of the string.
String trimmedString = myString.trim();
String.replaceAll("[\n\r]", "");
This Java code does exactly what is asked in the title of the question, that is "remove newlines from beginning and end of a string-java":
String.replaceAll("^[\n\r]", "").replaceAll("[\n\r]$", "")
Remove newlines only from the end of the line:
String.replaceAll("[\n\r]$", "")
Remove newlines only from the beginning of the line:
String.replaceAll("^[\n\r]", "")
tl;dr
String cleanString = dirtyString.strip() ; // Call new `String::string` method.
String::strip…
The old String::trim method has a strange definition of whitespace.
As discussed here, Java 11 adds new strip… methods to the String class. These use a more Unicode-savvy definition of whitespace. See the rules of this definition in the class JavaDoc for Character::isWhitespace.
Example code.
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.strip();
System.out.println("after->>"+input+"<<-");
Or you can strip just the leading or just the trailing whitespace.
You do not mention exactly what code point(s) make up your newlines. I imagine your newline is likely included in this list of code points targeted by strip:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+0
If your string is potentially null, consider using StringUtils.trim() - the null-safe version of String.trim().
If you only want to remove line breaks (not spaces, tabs) at the beginning and end of a String (not inbetween), then you can use this approach:
Use a regular expressions to remove carriage returns (\\r) and line feeds (\\n) from the beginning (^) and ending ($) of a string:
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "")
Complete Example:
public class RemoveLineBreaks {
public static void main(String[] args) {
var s = "\nHello world\nHello everyone\n";
System.out.println("before: >"+s+"<");
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "");
System.out.println("after: >"+s+"<");
}
}
It outputs:
before: >
Hello world
Hello everyone
<
after: >Hello world
Hello everyone<
I'm going to add an answer to this as well because, while I had the same question, the provided answer did not suffice. Given some thought, I realized that this can be done very easily with a regular expression.
To remove newlines from the beginning:
// Trim left
String[] a = "\n\nfrom the beginning\n\n".split("^\\n+", 2);
System.out.println("-" + (a.length > 1 ? a[1] : a[0]) + "-");
and end of a string:
// Trim right
String z = "\n\nfrom the end\n\n";
System.out.println("-" + z.split("\\n+$", 2)[0] + "-");
I'm certain that this is not the most performance efficient way of trimming a string. But it does appear to be the cleanest and simplest way to inline such an operation.
Note that the same method can be done to trim any variation and combination of characters from either end as it's a simple regex.
Try this
function replaceNewLine(str) {
return str.replace(/[\n\r]/g, "");
}
String trimStartEnd = "\n TestString1 linebreak1\nlinebreak2\nlinebreak3\n TestString2 \n";
System.out.println("Original String : [" + trimStartEnd + "]");
System.out.println("-----------------------------");
System.out.println("Result String : [" + trimStartEnd.replaceAll("^(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])|(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])$", "") + "]");
Start of a string = ^ ,
End of a string = $ ,
regex combination = | ,
Linebreak = \r\n|[\n\x0B\x0C\r\u0085\u2028\u2029]
Another elegant solution.
String myString = "\nLogbasex\n";
myString = org.apache.commons.lang3.StringUtils.strip(myString, "\n");
For anyone else looking for answer to the question when dealing with different linebreaks:
string.replaceAll("(\n|\r|\r\n)$", ""); // Java 7
string.replaceAll("\\R$", ""); // Java 8
This should remove exactly the last line break and preserve all other whitespace from string and work with Unix (\n), Windows (\r\n) and old Mac (\r) line breaks: https://stackoverflow.com/a/20056634, https://stackoverflow.com/a/49791415. "\\R" is matcher introduced in Java 8 in Pattern class: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
This passes these tests:
// Windows:
value = "\r\n test \r\n value \r\n";
assertEquals("\r\n test \r\n value ", value.replaceAll("\\R$", ""));
// Unix:
value = "\n test \n value \n";
assertEquals("\n test \n value ", value.replaceAll("\\R$", ""));
// Old Mac:
value = "\r test \r value \r";
assertEquals("\r test \r value ", value.replaceAll("\\R$", ""));
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");