I have a string that contains some text followed by a blank line. What's the best way to keep the part with text, but remove the whitespace newline from the end?
Use String.trim() method to get rid of whitespaces (spaces, new lines etc.) from the beginning and end of the string.
String trimmedString = myString.trim();
String.replaceAll("[\n\r]", "");
This Java code does exactly what is asked in the title of the question, that is "remove newlines from beginning and end of a string-java":
String.replaceAll("^[\n\r]", "").replaceAll("[\n\r]$", "")
Remove newlines only from the end of the line:
String.replaceAll("[\n\r]$", "")
Remove newlines only from the beginning of the line:
String.replaceAll("^[\n\r]", "")
tl;dr
String cleanString = dirtyString.strip() ; // Call new `String::string` method.
String::strip…
The old String::trim method has a strange definition of whitespace.
As discussed here, Java 11 adds new strip… methods to the String class. These use a more Unicode-savvy definition of whitespace. See the rules of this definition in the class JavaDoc for Character::isWhitespace.
Example code.
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.strip();
System.out.println("after->>"+input+"<<-");
Or you can strip just the leading or just the trailing whitespace.
You do not mention exactly what code point(s) make up your newlines. I imagine your newline is likely included in this list of code points targeted by strip:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+0
If your string is potentially null, consider using StringUtils.trim() - the null-safe version of String.trim().
If you only want to remove line breaks (not spaces, tabs) at the beginning and end of a String (not inbetween), then you can use this approach:
Use a regular expressions to remove carriage returns (\\r) and line feeds (\\n) from the beginning (^) and ending ($) of a string:
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "")
Complete Example:
public class RemoveLineBreaks {
public static void main(String[] args) {
var s = "\nHello world\nHello everyone\n";
System.out.println("before: >"+s+"<");
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "");
System.out.println("after: >"+s+"<");
}
}
It outputs:
before: >
Hello world
Hello everyone
<
after: >Hello world
Hello everyone<
I'm going to add an answer to this as well because, while I had the same question, the provided answer did not suffice. Given some thought, I realized that this can be done very easily with a regular expression.
To remove newlines from the beginning:
// Trim left
String[] a = "\n\nfrom the beginning\n\n".split("^\\n+", 2);
System.out.println("-" + (a.length > 1 ? a[1] : a[0]) + "-");
and end of a string:
// Trim right
String z = "\n\nfrom the end\n\n";
System.out.println("-" + z.split("\\n+$", 2)[0] + "-");
I'm certain that this is not the most performance efficient way of trimming a string. But it does appear to be the cleanest and simplest way to inline such an operation.
Note that the same method can be done to trim any variation and combination of characters from either end as it's a simple regex.
Try this
function replaceNewLine(str) {
return str.replace(/[\n\r]/g, "");
}
String trimStartEnd = "\n TestString1 linebreak1\nlinebreak2\nlinebreak3\n TestString2 \n";
System.out.println("Original String : [" + trimStartEnd + "]");
System.out.println("-----------------------------");
System.out.println("Result String : [" + trimStartEnd.replaceAll("^(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])|(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])$", "") + "]");
Start of a string = ^ ,
End of a string = $ ,
regex combination = | ,
Linebreak = \r\n|[\n\x0B\x0C\r\u0085\u2028\u2029]
Another elegant solution.
String myString = "\nLogbasex\n";
myString = org.apache.commons.lang3.StringUtils.strip(myString, "\n");
For anyone else looking for answer to the question when dealing with different linebreaks:
string.replaceAll("(\n|\r|\r\n)$", ""); // Java 7
string.replaceAll("\\R$", ""); // Java 8
This should remove exactly the last line break and preserve all other whitespace from string and work with Unix (\n), Windows (\r\n) and old Mac (\r) line breaks: https://stackoverflow.com/a/20056634, https://stackoverflow.com/a/49791415. "\\R" is matcher introduced in Java 8 in Pattern class: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
This passes these tests:
// Windows:
value = "\r\n test \r\n value \r\n";
assertEquals("\r\n test \r\n value ", value.replaceAll("\\R$", ""));
// Unix:
value = "\n test \n value \n";
assertEquals("\n test \n value ", value.replaceAll("\\R$", ""));
// Old Mac:
value = "\r test \r value \r";
assertEquals("\r test \r value ", value.replaceAll("\\R$", ""));
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
Related
I have a string with \r\n, \r, \n or \" characters in it. How can I replace them faster?
What I already have is:
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replace("\\r\\n", "\n").replace("\\r", "").replace("\\n", "").replace("\\", ""));
But my code does not look beautiful enough.
I found on the Internet something like:
replace("\\r\\n|\\r|\\n|\\", "")
I tried that, but it didn't work.
You can wrap it in a method, put /r/n, /n and /r in a list. iterate the list and replace all such characters and return the modified string.
public String replaceMultipleSubstrings(String original, List<String> mylist){
String tmp = original;
for(String str: mylist){
tmp = tmp.replace(str, "");
}
return tmp;
}
Test:
mylist.add("\\r");
mylist.add("\\r\\n");
mylist.add("\\n");
mylist.add("\\"); // add back slash
System.out.println("original:" + s);
String x = new Main().replaceMultipleSubstrings(s, mylist);
System.out.println("modified:" + x);
Output:
original:Kerner\r\n kyky\r hihi\n \"
modified:Kerner kyky hihi "
I don't know if your current replacement logic be correct, but it says now that either \n, \r, or \r\n gets replaced with empty string, and backslash also gets replaced with empty string. If so, then you can try the following regex replace all:
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replaceAll("\\r|\\n|\\r\\n|\\\\", ""));
One problem I saw with your attempt is that you are calling replace(), not replaceAll(), so it would only do a single replacement and then stop.
String.replaceAll() can be used, in your question you tried to use String.replace() which does not interpret regular expressions, only plain replacement strings...
You also need to escape the \\ again, i.e. \\\\ instead of \\
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replaceAll("\\\\r|\\\\n|\\\\\"", ""));
Output
Kerner kyky hihi
Note the differences between String.replaceAll() and String.replace()
String.replaceAll()
Replaces each substring of this string that matches the given regular
expression with the given replacement.
String.replace()
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence.
Use a regular expression if you want to do all the replaces in one go.
http://www.javamex.com/tutorials/regular_expressions/search_replace.shtml
i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}
I had a look at other stackoverflow questions and couldn't find one that asked the same question, so here it is:
How do you match the first and last characters of a string (can be multi-line or empty).
So for example:
String = "this is a simple sentence"
Note that the string includes the beginning and ending quotation marks.
How do I get match the first and last characters where the string begins and ends with a quotation mark (").
I tried:
^"|$" and \A"\Z"
but these do not produce the desired result.
Thanks for your help in advance :)
Is this what you are looking for?
String input = "\"this is a simple sentence\"";
String result = input.replaceFirst("(?s)^\"(.*)\"$", " $1 ");
This will replace the first and last character of the input string with spaces if it starts and ends with ". It will also work across multiple lines since the DOTALL flag is specified by (?s).
The regex that matches the whole input ".*". In java, it looks like this:
String regex = "\".*\"";
System.out.println("\"this is a simple sentence\"".matches(regex)); // true
System.out.println("this is a simple sentence".matches(regex)); // false
System.out.println("this is a simple sentence\"".matches(regex)); // false
If you want to remove the quotes, use this:
String input = "\"this is a simple sentence\"";
input = input.replaceAll("(^\"|\"$)", "")); // this is a simple sentence (without any quotes)
If you want this to work over multiple lines, use this:
String input = "\"this is a simple sentence\"\n\"and another sentence\"";
System.out.println(input + "\n");
input = input.replaceAll("(?m)(^\"|\"$)", "");
System.out.println(input);
which produces output:
"this is a simple sentence"
"and another sentence"
this is a simple sentence
and another sentence
Explanation of regex (?m)(^"|"$):
(?m) means "Caret and dollar match after and before newlines for the remainder of the regular expression"
(^"|"$) means ^" OR "$, which means "start of line then a double quote" OR "double quote then end of line"
Why not use the simple logic of getting the first and last characters based on charAt method of String? Place a few checks for empty/incomplete strings and you should be done.
String regexp = "(?s)\".*\"";
String data = "\"This is some\n\ndata\"";
Matcher m = Pattern.compile(regexp).matcher(data);
if (m.find()) {
System.out.println("Match starts at " + m.start() + " and ends at " + m.end());
}
How to remove duplicate white spaces (including tabs, newlines, spaces, etc...) in a string using Java?
Like this:
yourString = yourString.replaceAll("\\s+", " ");
For example
System.out.println("lorem ipsum dolor \n sit.".replaceAll("\\s+", " "));
outputs
lorem ipsum dolor sit.
What does that \s+ mean?
\s+ is a regular expression. \s matches a space, tab, new line, carriage return, form feed or vertical tab, and + says "one or more of those". Thus the above code will collapse all "whitespace substrings" longer than one character, with a single space character.
Source: Java: Removing duplicate white spaces in strings
You can use the regex
(\s)\1
and
replace it with $1.
Java code:
str = str.replaceAll("(\\s)\\1","$1");
If the input is "foo\t\tbar " you'll get "foo\tbar " as outputBut if the input is "foo\t bar" it will remain unchanged because it does not have any consecutive whitespace characters.
If you treat all the whitespace characters(space, vertical tab, horizontal tab, carriage return, form feed, new line) as space then you can use the following regex to replace any number of consecutive white space with a single space:
str = str.replaceAll("\\s+"," ");
But if you want to replace two consecutive white space with a single space you should do:
str = str.replaceAll("\\s{2}"," ");
String str = " Text with multiple spaces ";
str = org.apache.commons.lang3.StringUtils.normalizeSpace(str);
// str = "Text with multiple spaces"
Try this - You have to import java.util.regex.*;
Pattern pattern = Pattern.compile("\\s+");
Matcher matcher = pattern.matcher(string);
boolean check = matcher.find();
String str = matcher.replaceAll(" ");
Where string is your string on which you need to remove duplicate white spaces
hi the fastest (but not prettiest way) i found is
while (cleantext.indexOf(" ") != -1)
cleantext = StringUtils.replace(cleantext, " ", " ");
this is running pretty fast on android in opposite to an regex
Though it is too late, I have found a better solution (that works for me) that will replace all consecutive same type white spaces with one white space of its type. That is:
Hello!\n\n\nMy World
will be
Hello!\nMy World
Notice there are still leading and trailing white spaces. So my complete solution is:
str = str.trim().replaceAll("(\\s)+", "$1"));
Here, trim() replaces all leading and trailing white space strings with "". (\\s) is for capturing \\s (that is white spaces such as ' ', '\n', '\t') in group #1. + sign is for matching 1 or more preceding token. So (\\s)+ can be consecutive characters (1 or more) among any single white space characters (' ', '\n' or '\t'). $1 is for replacing the matching strings with the group #1 string (which only contains 1 white space character) of the matching type (that is the single white space character which has matched). The above solution will change like this:
Hello!\n\n\nMy World
will be
Hello!\nMy World
I have not found my above solution here so I have posted it.
If you want to get rid of all leading and trailing extraneous whitespace then you want to do something like this:
// \\A = Start of input boundary
// \\z = End of input boundary
string = string.replaceAll("\\A\\s+(.*?)\\s+\\z", "$1");
Then you can remove the duplicates using the other strategies listed here:
string = string.replaceAll("\\s+"," ");
You can also try using String Tokeniser, for any space, tab, newline, and all. A simple way is,
String s = "Your Text Here";
StringTokenizer st = new StringTokenizer( s, " " );
while(st.hasMoreTokens())
{
System.out.print(st.nextToken());
}
This can be possible in three steps:
Convert the string in to character array (ToCharArray)
Apply for loop on charater array
Then apply string replace function (Replace ("sting you want to replace"," original string"));
For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).
What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.
Reassign the variable to a substring:
s = s.substring(0, s.length() - 1)
Also an alternative way of solving your problem: you might also want to consider using a StringTokenizer to read the file and set the delimiters to be the characters you don't want to be part of words.
Use:
String str = "whatever";
str = str.replaceAll("[,.]", "");
replaceAll takes a regular expression. This:
[,.]
...looks for each comma and/or period.
To remove the last character do as Mark Byers said
s = s.substring(0, s.length() - 1);
Additionally, another way to remove the characters you don't want would be to use the .replace(oldCharacter, newCharacter) method.
as in:
s = s.replace(",","");
and
s = s.replace(".","");
You can't modify a String in Java. They are immutable. All you can do is create a new string that is substring of the old string, minus the last character.
In some cases a StringBuffer might help you instead.
The best method is what Mark Byers explains:
s = s.substring(0, s.length() - 1)
For example, if we want to replace \ to space " " with ReplaceAll, it doesn't work fine
String.replaceAll("\\", "");
or
String.replaceAll("\\$", ""); //if it is a path
Note that the word boundaries also depend on the Locale. I think the best way to do it using standard java.text.BreakIterator. Here is an example from the java.sun.com tutorial.
import java.text.BreakIterator;
import java.util.Locale;
public static void main(String[] args) {
String text = "\n" +
"\n" +
"For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).\n" +
"\n" +
"What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.\n" +
"\n" +
"Every help appreciated. Thanx";
BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.getDefault());
extractWords(text, wordIterator);
}
static void extractWords(String target, BreakIterator wordIterator) {
wordIterator.setText(target);
int start = wordIterator.first();
int end = wordIterator.next();
while (end != BreakIterator.DONE) {
String word = target.substring(start, end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
Source: http://java.sun.com/docs/books/tutorial/i18n/text/word.html
You can use replaceAll() method :
String.replaceAll(",", "");
String.replaceAll("\\.", "");
String.replaceAll("\\(", "");
etc..