Regex required to update a character - java

I have a String : testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing
I want to replace the character s with some other character sequence suppose : <b>X</b> but i want the character sequence s to remain intact i.e. regex should not update the character s with a previous character as "<".
I used the JAVA code :
String str = testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
str = str.replace("s[^<]", "<b>X</b>");
The problem is that the regex would match 2 characters, s and following character if it is not ">" and Sting.replace would replace both the characters. I want only s to be replaced and not the following character.
Any help would be appreciated. Since i have lots of such replacements i don't want to use a loop matching each character and updating it sequentially.

There are other ways, but you could, for example, capture the second character and put it back:
str = str.replaceAll("s([^<])", "<b>X\\1</b>");

Looks like you want a negative lookahead:
s(?!<)
String str = "testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;";
System.out.println(str.replaceAll("s(?!<)", "<b>X</b>"));
output:
te<b>X</b>ting<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;

Use look arounds to assert, but not capture, surrounding text:
str = str.replaceAll("s(?![^<]))", "whatever");
Or, capture and put back using a back reference $1:
str = str.replaceAll("s([^<])", "whatever$1");
Note that you need to use replaceAll() (which use regex), rather than replace() (which uses plain text).

Related

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.
It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")
Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc
You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

Unable to Split by ":[" in Java

I have 2 nested HashMaps as a String which I am trying to parse.
My String is as follows :
"20:[cost:431.14, Count:19, Tax:86.228"
Therefore I need to Split by ":[" in order to get my key, 20, For some reason I'm not able to do this.
I have tried :
myString.split(":[") and myString.split("\\:[") but neither seem to work.
Can anyone detect what I have wrong here?
Thanks in Advance
You have to escape the character [ , but not the character : like below:
String str = "20:[cost:431.14, Count:19, Tax:86.228";
String[] spl = str.split(":\\[");
String.split use regex.
Splits this string around matches of the given regular expression.
You need to escape [ since this is a "reserved" character in regular expresionn, not :
myString.split(":\\[")
Not that you could/should set a limit if you only want the first cell
myString.split(":\\[", 2);
This will return an array of 2 cell, so after the first occurence, it doesn't need to read the rest of the String. (This is not really necessary but good to know).
Use Pattern.quote to automatically escape your string
String string = "20:[cost:431.14, Count:19, Tax:86.228";
String[] split = string.split(Pattern.quote(":["));
Another solution :
Therefore I need to Split by ":[" in order to get my key, 20. For
some reason I'm not able to do this.
In this case you can use replaceAll with some regex to get this input so you can use :
String str = "20:[cost:431.14, :[Count:19, Tax:86.228";
String result = str.replaceAll("(.*?):\\[.*", "$1");// output 20
regex demo
If the key is just an integer you can use (\d+):\[ check regex demo
be noted '[' character is special character in regular expression so you have to make an escape character like \\ str.split(":\\["); and remember the string is immutable so if do you want to use it twice you have to reassign it with split like this String[] spl =str.split(":\\[");
Another solution if you just need the key "20" in your String is to substring it to get the part before the delimiter.
String key = myString.substring(0, myString.indexOf(":["));

Java Regex - Remove Non-Alphanumeric characters except line breaks

I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns. I have the following regular expression, but it keeps joining words before and after a line break.
[^\\p{Alnum}\\s]
How would I be able to preserve the line breaks or convert them into spaces so that I don't have words joining?
An example of this issue is shown below:
Original Text
and refreshingly direct
when compared with the hand-waving of Swinburne.
After Replacement:
and refreshingly directwhen compared with the hand-waving of Swinburne.
You may add these chars to the regex, not \s, as \s matches any whitespace:
String reg = "[^\\p{Alnum}\n\r]";
Or, you may use character class subtraction:
String reg = "[\\P{Alnum}&&[^\n\r]]";
Here, \P{Alnum} matches any non-alphanumeric and &&[^\n\r] prevents a LF and CR from matching.
A Java test:
String s = "&&& Text\r\nNew line".replaceAll("[^\\p{Alnum}\n\r]+", "");
System.out.println(s);
// => Text
Newline
Note that there are more line break chars than LF and CR. In Java 8, \R construct matches any style linebreak and it matches \u000D\u000A|\[\u000A\u000B\u000C\u000D\u0085\u2028\u2029\].
So, to exclude matching any line breaks, you may use
String reg = "[^\\p{Alnum}\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]+";
You can use this regex [^A-Za-z0-9\\n\\r] for example :
String result = str.replaceAll("[^a-zA-Z0-9\\n\\r]", "");
Example
Input
aaze03.aze1654aze987 */-a*azeaze\n hello *-*/zeaze+64\nqsdoi
Output
aaze03aze1654aze987aazeaze
hellozeaze64
qsdoi
I made a mistake with my code. I was reading in a file line by line and building the String, but didn't add a space at the end of each line. Therefore there were no actual line breaks to replace.
That's a perfect case for Guava's CharMatcher:
String input = "and refreshingly direct\n\rwhen compared with the hand-waving of Swinburne.";
String output = CharMatcher.javaLetterOrDigit().or(CharMatcher.whitespace()).retainFrom(input);
Output will be:
and refreshingly direct
when compared with the handwaving of Swinburne

Java Replace Unicode Characters in a String

I have a string which contains multiple unicode characters. I want to identify all these unicode characters, ex: \ uF06C, and replace it with a back slash and four hexa digits without "u" in it.
Example:
Source String: "add \uF06Cd1 Clause"
Result String: "add \F06Cd1 Clause"
How can achieve this in Java?
Edit:
Question in link Java Regex - How to replace a pattern or how to is different from this as my question deals with unicode character. Though it has multiple literals, it is considered as one single character by jvm and hence regex won't work.
The correct way to do this is using a regex to match the entire unicode definition and use group-replacement.
The regex to match the unicode-string:
A unicode-character looks like \uABCD, so \u, followed by a 4-character hexnumber string. Matching these can be done using
\\u[A-Fa-f\d]{4}
But there's a problem with this:
In a String like "just some \\uabcd arbitrary text" the \u would still get matched. So we need to make sure the \u is preceeded by an even number of \s:
(?<!\\)(\\\\)*\\u[A-Fa-f\d]{4}
Now as an output, we want a backslash followed by the hexnum-part. This can be done by group-replacement, so let's get start by grouping characters:
(?<!\\)(\\\\)*(\\u)([A-Fa-f\d]{4})
As a replacement we want all backlashes from the group that matches two backslashes, followed by a backslash and the hexnum-part of the unicode-literal:
$1\\$3
Now for the actual code:
String pattern = "(?<!\\\\)(\\\\\\\\)*(\\\\u)([A-Fa-f\\d]{4})";
String replace = "$1\\\\$3";
Matcher match = Pattern.compile(pattern).matcher(test);
String result = match.replaceAll(replace);
That's a lot of backslashes! Well, there's an issue with java, regex and backslash: backslashes need to be escaped in java and regex. So "\\\\" as a pattern-string in java matches one \ as regex-matched character.
EDIT:
On actual strings, the characters need to be filtered out and be replaced by their integer-representation:
StringBuilder sb = new StringBuilder();
for(char c : in.toCharArray())
if(c > 127)
sb.append("\\").append(String.format("%04x", (int) c));
else
sb.append(c);
This assumes by "unicode-character" you mean non-ASCII-characters. This code will print any ASCII-character as is and output all other characters as backslash followed by their unicode-code. The definition "unicode-character" is rather vague though, as char in java always represents unicode-characters. This approach preserves any control-chars like "\n", "\r", etc., which is why I chose it over other definitions.
Try using String.replaceAll() method
s = s.replaceAll("\u", "\");

Java: Regex to extract all non numerals AND leading +, if any

Here's a string method I'm using in Java to remove all non-numerals from a given string:
replaceAll("[^\\d.]", "")
Here's an example of what this would return:
Original string: $%^&*+89896 89#6
New string: 89896896
However, now I need to also retain the leading '+' sign if one exists such as in the case of the above illustration (thus, the new string should be +89896896). If this were PHP, I could have simply used a preg function with (^\+)|([\d]+) to get precisely the results I want. I am not sure how to implement it in Java (Android) though.
I came up with this
replaceAll("([^\+])([\D]+)", "")
But the results seem to be slightly distorted. Here's one test result:
Original string: +u +00786uy769+&jh6ghj765765
New string: +007876765765
Desired result: +007867696765765
What am I doing wrong with my expression?
P.S. I would like to avoid using the Pattern and Matcher classes unless that were the only way out.
Use a negative lookbehind based regex in string.replaceAll function.
string.replaceAll("(?<!^)\\+|[^\\d+]", "");
DEMO
If you don't want to remove dot then add dot inside the character class.
string.replaceAll("(?<!^)\\+|[^\\d+.]", "");
(?<!^)\\+ would match all the plus symbols except the one at the start.
You can use:
str = str.replaceAll("\\+(?!\\d)|[^\\d.+]", "");
RegEx Demo
\\+(?!\\d) will avoid matching + is it is followed by a digit.

Categories