Regex add space between all punctuation

Regex add space between all punctuation - java

I need to add spaces between all punctuation in a string.
\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?" -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!" -> "1 . B , 3 . D !"
I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.
But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.
private String addSpacesBeforePunctuation(String s) {
StringBuilder builder = new StringBuilder();
final String nonpunctuation = "[a-zA-Z\\d]+";
final String punctuation = "[^a-zA-Z\\d]+";
String found;
while (!s.isEmpty()) {
// regex stuff goes here
found = ???; // found group from respective regex goes here
builder.append(found);
builder.append(" ");
s = s.replaceFirst(found, "");
}
return builder.toString().trim();
}
However this doesn't feel like the right way to go... I think I'm over complicating things...

You can use lookarounds based regex using punctuation property \p{Punct} in Java:
str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
(?<=\\S) Asserts if prev char is not a white-space
(?<=\\p{Punct}) asserts a position if previous char is a punctuation char
(?=\\p{Punct}) asserts a position if next char is a punctuation char
(?=\\S) Asserts if next char is not a white-space
IdeOne Demo

When you see a punctuation mark, you have four possibilities:
Punctuation is surrounded by spaces
Punctuation is preceded by a space
Punctuation is followed by a space
Punctuation is neither preceded nor followed by a space.
Here is code that does the replacement properly:
String ss = s
.replaceAll("(?<=\\S)\\p{Punct}", " $0")
.replaceAll("\\p{Punct}(?=\\S)", "$0 ");
It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.
Demo.

Related

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

I am a beginner working on a diff and regenerate algorithm but for Strings. I store the patch in a file. To regenerate the new string from old I use that file. Although the code works, I face a problem when using space.
I use replaceAll(" ", ""); for removing spaces. This is fine when the string is [char][space][char], but creates problem when it is like [space][space][space]. Here, I want that the space be retained(only one).
I thought of doing replaceAll(" ", " ");. But this would leave spaces in type [char][space][char]. I am using scanner to scan through the string.
Is there a way to achieve this?
Input Output
c => c
cc => cc
c c => cc
c c => This is not possible. Since there will be padding of one space for each character
c c => c c

We can also split the string on where there are more than one white space, then join the resulting array by into a string using the Stream and Collector API.
Also we would replace the single spaces by using replaceAll() in a Stream#map operation:
String test = " this is a test of space in string ";
//using the pattern \\s{n,} for splitting at multi spaces
String[] arr = test.split("\\s{2,}");
String s = Arrays.stream(arr)
.map(str -> str.replaceAll(" ", ""))
.collect(Collectors.joining(" "));
System.out.println(s);
Output:
this isatestof spaceinstring

You could use lookarounds to do your replacement:
String newText = text
.replaceAll("(?<! ) (?! )", "")
.replaceAll(" +", " ");
The first replaceAll removes any space not surrounded by spaces; the second one replaces the remaining sequences of spaces by a single one.
Ideone example. Sequences of two or more spaces become a single space, and single spaces are removed.
Lookarounds
A lookaround in the context of regular expressions is a collective term for lookbehinds and lookaheads. These are so-called zero-width assertions, that means they match a certain pattern, but do not actually consume characters. There are positive and negative lookarounds.
A short example: the pattern Ira(?!q) matches the substring Ira, but only if it's not followed by a q. So if the input string is Iraq, it won't match, but if the input string is Iran, then the match is Ira.
More info:
https://www.regular-expressions.info/lookaround.html

If you want to replace any group of space by one you could use:
value.replaceAll("\\s+", " ")

I had to use two replacements:
String e = "a b c";
e = e.replaceAll("([A-Z|a-z])\\s([A-Z|a-z])", "$1$2");
e = e.replaceAll(" "," ");
System.out.println(e);
Which prints
ab c
The first one replaces any letter-space-letter combo with just the two letters, and then the second replaces any triple-space with a single space.
The first replacement is using backreferences. $1 refers to the part inside the first set of parenthesis that matches the first letter, and $2 refers to the part inside the second set of parenthesis.
If you have leading/trailing spaces on the input, you can call trim() before doing the replacements.
e = e.trim()

Java replace strings between two commas

String = "9,3,5,*****,1,2,3"
I'd like to simply access "5", which is between two commas, and right before "*****"; then only replace this "5" to other value.
How could I do this in Java?

You can try using the following regex replacement:
String input = "9,3,5,*****,1,2,3";
input = input.replaceAll("[^,]*,\\*{5}", "X,*****");
Here is an explanation of the regex:
[^,]*, match any number of non-comma characters, followed by one comma
\\*{5} followed by five asterisks
This means to match whatever CSV term plus a comma comes before the five asterisks in your string. We then replace this with what you want, along with the five stars in the original pattern.
Demo here:
Rextester

I'd use a regular expression with a lookahead, to find a string of digits that precedes ",*****", and replace it with the new value. The regular expression you're looking for would be \d+(?=,\*{5}) - that is, one or more digits, with a lookahead consisting of a comma and five asterisks. So you'd write
newString = oldString.replaceAll("\\d+(?=,\\*{5})", "replacement");
Here is an explanation of the regex pattern used in the replacement:
\\d+ match any numbers of digits, but only when
(?=,\\*{5}) we can lookahead and assert that what follows immediately
is a single comma followed by five asterisks
It is important to note that the lookahead (?=,\\*{5}) asserts but does not consume. Hence, we can ignore it with regards to the replacement.

I considered newstr be "6"
String str = "9,3,5,*****,1,2,3";
char newstr = '6';
str = str.replace(str.charAt(str.indexOf(",*") - 1), newstr);
Also if you are not sure about str length check for IndexOutOfBoundException
and handle it

You could split on , and then join with a , (after replacing 5 with the desired value - say X). Like,
String[] arr = "9,3,5,*****,1,2,3".split(",");
arr[2] = "X";
System.out.println(String.join(",", arr));
Which outputs
9,3,X,*****,1,2,3

you can use spit() for replacing a string
String str = "9,3,5,*****,1,2,3";
String[] myStrings = str.split(",");
String str1 = myStrings[2];

Splitting a string on whitespaces

I'm currently trying to splice a string into a multi-line string.
The regex should select white-spaces which has 13 characters before.
The problem is that the 13 character count does not reset after the previous selected white-space. So, after the first 13 characters, the regex selects every white-space.
I'm using the following regex with a positive look-behind of 13 characters:
(?<=.{13})
(there is a whitespace at the end)
You can test the regex here and the following code:
import java.util.ArrayList;
public class HelloWorld{
public static void main(String []args){
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
for (String string : str.split("(?<=.{13}) ")) {
System.out.println(string);
}
}
}
The output of this code is as follows:
This is a test.
The
app
should
break
this
string
in
substring
on
whitespaces
after
13
characters
But it should be:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters

You may actually use a lazy limiting quantifier to match the lines and then replace with $0\n:
.{13,}?[ ]
See the regex demo
IDEONE demo:
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
System.out.println(str.replaceAll(".{13,}?[ ]", "$0\n"));
Note that the pattern matches:
.{13,}? - any character that is not a newline (if you need to match any character, use DOTALL modifier, though I doubt you need it in the current scenario), 13 times at least, and it can match more characters but up to the first space encountered
[ ] - a literal space (a character class is redundant, but it helps visualize the pattern).
The replacement pattern - "$0\n" - is re-inserting the whole matched value (it is stored in Group 0) and appends a newline after it.

You can just match and capture 13 characters before white spaces rather than splitting.
Java code:
Pattern p = Pattern.compile( "(.{13}) +" );
Matcher m = p.matcher( text );
List<String> matches = new ArrayList<>();
while(m.find()) {
matches.add(m.group(1));
}
It will produce:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters
RegEx Demo

you can do this with the .split and using regular expression. It would be like this
line.split("\\s+");
This will spilt every word with one or more whitespace.

How to replace last letter to another letter in java using regular expression

i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak

Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"

You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");

Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.

if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}

How to remove duplicate white spaces in string using Java?

How to remove duplicate white spaces (including tabs, newlines, spaces, etc...) in a string using Java?

Like this:
yourString = yourString.replaceAll("\\s+", " ");
For example
System.out.println("lorem ipsum dolor \n sit.".replaceAll("\\s+", " "));
outputs
lorem ipsum dolor sit.
What does that \s+ mean?
\s+ is a regular expression. \s matches a space, tab, new line, carriage return, form feed or vertical tab, and + says "one or more of those". Thus the above code will collapse all "whitespace substrings" longer than one character, with a single space character.
Source: Java: Removing duplicate white spaces in strings

You can use the regex
(\s)\1
and
replace it with $1.
Java code:
str = str.replaceAll("(\\s)\\1","$1");
If the input is "foo\t\tbar " you'll get "foo\tbar " as outputBut if the input is "foo\t bar" it will remain unchanged because it does not have any consecutive whitespace characters.
If you treat all the whitespace characters(space, vertical tab, horizontal tab, carriage return, form feed, new line) as space then you can use the following regex to replace any number of consecutive white space with a single space:
str = str.replaceAll("\\s+"," ");
But if you want to replace two consecutive white space with a single space you should do:
str = str.replaceAll("\\s{2}"," ");

String str = " Text with multiple spaces ";
str = org.apache.commons.lang3.StringUtils.normalizeSpace(str);
// str = "Text with multiple spaces"

Try this - You have to import java.util.regex.*;
Pattern pattern = Pattern.compile("\\s+");
Matcher matcher = pattern.matcher(string);
boolean check = matcher.find();
String str = matcher.replaceAll(" ");
Where string is your string on which you need to remove duplicate white spaces

hi the fastest (but not prettiest way) i found is
while (cleantext.indexOf(" ") != -1)
cleantext = StringUtils.replace(cleantext, " ", " ");
this is running pretty fast on android in opposite to an regex

Though it is too late, I have found a better solution (that works for me) that will replace all consecutive same type white spaces with one white space of its type. That is:
Hello!\n\n\nMy World
will be
Hello!\nMy World
Notice there are still leading and trailing white spaces. So my complete solution is:
str = str.trim().replaceAll("(\\s)+", "$1"));
Here, trim() replaces all leading and trailing white space strings with "". (\\s) is for capturing \\s (that is white spaces such as ' ', '\n', '\t') in group #1. + sign is for matching 1 or more preceding token. So (\\s)+ can be consecutive characters (1 or more) among any single white space characters (' ', '\n' or '\t'). $1 is for replacing the matching strings with the group #1 string (which only contains 1 white space character) of the matching type (that is the single white space character which has matched). The above solution will change like this:
Hello!\n\n\nMy World
will be
Hello!\nMy World
I have not found my above solution here so I have posted it.

If you want to get rid of all leading and trailing extraneous whitespace then you want to do something like this:
// \\A = Start of input boundary
// \\z = End of input boundary
string = string.replaceAll("\\A\\s+(.*?)\\s+\\z", "$1");
Then you can remove the duplicates using the other strategies listed here:
string = string.replaceAll("\\s+"," ");

You can also try using String Tokeniser, for any space, tab, newline, and all. A simple way is,
String s = "Your Text Here";
StringTokenizer st = new StringTokenizer( s, " " );
while(st.hasMoreTokens())
{
System.out.print(st.nextToken());
}

This can be possible in three steps:
Convert the string in to character array (ToCharArray)
Apply for loop on charater array
Then apply string replace function (Replace ("sting you want to replace"," original string"));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex add space between all punctuation - java

Related

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

Java replace strings between two commas

Splitting a string on whitespaces

How to replace last letter to another letter in java using regular expression

How to remove duplicate white spaces in string using Java?

Categories

Resources