Replace everything except ONE single char - java

I am dealing with some cells where I have to extract certain letters from these cells. I want to replace a whole String with " " except from one single-standing character. The biggest challenge to me has been to tell my Regex code only to look for a single Char and remove everything else.
To further elaborate and simplify; I need my Regex to replace everything with "" execept from a single character that standa ALONE (I.E white spaces left and right or linked to a number)
public class Main {
public static void main(String[] args) {
String test = "22A 302 abc";
String works = test.replaceAll("^\\w[\\s\\S]*", " ");
System.out.println(works);
//Desired result: A
}
}

You could match a digit or space before and after capturing a char [A-Z].
In the replacement use group 1.
^.*[\d ]([A-Z])[ \d].*$
Regex demo | Java demo
If there can be only a single uppercase char in the string:
^[^A-Z]*[\d ]([A-Z])[ \d][^A-Z]*$
Regex demo
Example code
String test = "22A 302 abc";
String works = test.replaceAll("^.*[\\d ]([A-Z])[ \\d].*$", "$1");
System.out.println(works);
Output
A
To match between digits 0-9, horizontal whitespace chars or punctuations:
String works = test.replaceAll("^.*[\\p{P}0-9\\h]([A-Z])[\\p{P}0-9\\h].*$", "$1");

You can do it as follows:
public class Test {
public static void main(String[] args) {
String test = "22A 302 abc";
String works = test.replaceAll("\\d+([A-Z]).*", "$1");
System.out.println(works);
}
}
Output:
A
Explanation: Replace everything with capturing group-1 ($1 in the code given above) which has just a letter A-Z preceded by an integer number(\\d+) and can have anything (.*) after it.

Related

Replace .(dot) inside number Java between two elements

I have this String:
String str = "<p>23.5</p>";
And i want to replace the dot for comma only inside elements. The output i need is:
<p>23,5</p>
I cant figure it out, i have this:
str = str.replaceAll("(?<=<p>)\\.(?=</p>)", ",");
But it doesnt work. I need to replace dot only in elements with particular tag (is an xml in a String), in this case .
Thank you
You may use capturing groups + escape the /:
str = str.replaceAll("(?<=<p>)(\\d*)\\.(\\d+)(?=<\\/p>)", "$1,$2");
If you want to replace dot in all numbers, you may just as well use
str = str.replaceAll("(\\d*)\\.(\\d+)", "$1,$2");
Following regex will match the dot character that is between numerical characters
(?<=\d)\.(?=\d)
Regex Explanation:
\d - match any digit between 0-9
(?<=\d)\. - positive look-behind to match any . character that has a digit just before it
\.(?=\d) - positive look-ahead to match any . character that has a digit just after it
Demo:
https://regex101.com/r/WMEjPl/1
Java Code Example:
public static void main(String args[]) {
String regex = "(?<=\\d)\\.(?=\\d)";
String str = "<p>23.5</p>";
String str2 = "Mr. John <p>23.5</p> Hello";
String str3 = "Mr. John <p>23.5</p> Hello 12.2324";
System.out.println(str.replaceAll(regex, ",")); // <p>23,5</p>
System.out.println(str2.replaceAll(regex, ",")); // Mr. John <p>23,5</p> Hello
System.out.println(str3.replaceAll(regex, ",")); // Mr. John <p>23,5</p> Hello 12,2324
}

Java: Weirdness in replaceAll RegEx

I'm trying to manipulate a String in Java to recognize the markdown options in Facebook Messenger.
I tested the RegEx in a couple of online testers and it worked, but when I tried to implement in Java, it's only recognizing text surrounded by underscores. I have an example that shows the problem here:
private String process(String input) {
String processed = input.replaceAll("(\\b|^)\\_(.*)\\_(\\b|$)", "underscore")
.replaceAll("(\\b|^)\\*(.*)\\*(\\b|$)", "star")
.replaceAll("(\\b|^)```(.*)```(\b|$)", "backticks")
.replaceAll("(\\b|^)\\~(.*)\\~(\\b|$)", "tilde")
.replaceAll("(\\b|^)\\`(.*)\\`(\\b|$)", "tick")
.replaceAll("(\\b|^)\\\\\\((.*)\\\\\\)(\\b|$)", "backslashparen")
.replaceAll("\\*", "%"); // am I matching stars wrong?
return processed;
}
public void test() {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
I expect all the lines would match and be replaced, but only the first line was matched. I wondered if it was because it was the first line, so I copied it in the middle and it matched both. Then I figured I might have missed something matching the special characters, so I added the snip to match the astericks and replace with a percent sign and it worked. The output I'm getting is like so:
underscore
%text%
~Text~
`Text`
underscore
``` Text ```
\(Text\)
~Text~
Any ideas what I might be missing?
Thanks.
If you're using word boundaries then there is no need to match anchors in alternation because word boundary also matches start and end positions. So this are actually redundant matches:
(?:^|\b)
(?:\b|$)
and both can be just be replaced by \b.
However looking at your regex please note that only underscore is considered a word character and *, ~, ` are not word characters hence \b cannot be used around those characters instead \B should be used which is inverse of \b.
Besides this some more improvements can be done like using a negated character class instead of greedy .* and removing unnecessary group.
Code:
class MyRegex {
public static void main (String[] args) {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
private static String process(String input) {
String processed = input.replaceAll("\\b_[^_]+_\\b", "underscore")
.replaceAll("\\B\\*[^*]+\\*\\B", "star")
.replaceAll("\\B```.+?```\\B", "backticks")
.replaceAll("\\B~[^~]+~\\B", "tilde")
.replaceAll("\\B`[^`]+`\\B", "tick")
.replaceAll("\\B\\\\\\(.*?\\\\\\)\\B", "backslashparen");
return processed;
}
}
Code Demo

Using a regex to match a word ending in a comma but not within another word

I want to use a regex to achieve two objectives: match a string only when it is a complete word (don't match "on" inside of "contact"), and match strings that end with a comma or period.
This is an example. It is meant to find the string (str2) in str and replace it with the same string surrounded by parenthesis.
while(scan2.hasNext()) {
    String str2 = scan2.next();
    str = str.replaceAll("\\b" + str2 + "\\b", "(" + str2 + ")");
}
It does avoid matching strings within words, but it ignores strings that end in a comma or period.
How would I do this?
public class Main {
public static void main(String[] args) {
System.out.println(replace("upon contact", "on"));
System.out.println(replace("upon contact,", "contact"));
System.out.println(replace("upon contact", "contact"));
}
private static String replace(String s1, String s2) {
return s1.replaceAll(String.format("\\b(%s)\\b(?=[.,])", s2), "\\($1\\)");
}
}
upon contact // matches only complete words
upon (contact), // replaces match with (match)
upon contact // only matches if ends with , or .
The following regex matches string ending with comma/period or string composed by a single complete word:
(?s)(^(?<A>\b\w+\b)$)|((?s)^(?<B>.+(?<=[,.]))$)
See also https://regex101.com/r/E78rQV/1/ for more explanations.
I took the liberty of adding exclamation point and question mark.
Brackets means it will match for any of the characters inside the brackets.
str = str.replaceAll("\\b" + str2 + "[\\b.,!?]", "(" + str2 + ")");

How can I replace everything between 2nd "." and “:” in java?

Been researching online but haven't been able to find a solution.
I've got the following string '555.8.0.i5:790.2.0.i19:904.1.0:8233.2:' in Java.
Whats the best way I can remove everything from and including the second dot to the colon?
I want the string to end up looking like this: 555.8:790.2:904.1:8233.2:
I saw on another post someone had referenced the second dot with java regex (\d+.\d.) but I'm not sure how to do the trim.
EDIT:
I have tried the following java regex .replaceAll("\\.(.*?):", ":"); but it seems to remove everything from the first dot. Not sure how to get it to trim from the second dot.
In your case, you may use
.replaceAll("(\\.[^:.]+)\\.[^:]+", "$1")
See the regex demo
Details:
(\\.[^:.]+) - Capture group 1 capturing a dot and 1+ chars other than a literal dot and colon
\\. - a literal dot
[^:]+ - 1+ chars other than a colon.
In the replacement pattern, only a $1 backreference to the value captured in Group 1 is used.
Do you have to use regex? Here is a solution using Java:
public static void main(String[] args) {
String myString = "555.8.0.i5:790.2.0.i19:904.1.0:8233.2:";
StringBuilder sb = new StringBuilder();
//Split the string into an array of strings at each colon
String[] stringParts = myString.split(":");
//Loop over each substring
for (String stringPart : stringParts) {
//Find the index of the second dot
int secondDotIndex = stringPart.indexOf('.', 1 + stringPart.indexOf('.', 1));
//If a second dot exists then remove everything after and including the dot
if (secondDotIndex != -1) {
stringPart = stringPart.substring(0, secondDotIndex);
}
//Append each string part and colon back to the final string
sb.append(stringPart);
sb.append(":");
}
System.out.println(sb.toString());
}
The final println prints 555.8:790.2:904.1:8233.2:

java regex match string containing words with no digits and optionally separated by comma

Inspired by a previous question, I'm trying to find a regex that matches a string containing at least one word formed by only characters, not digits. So \w is not applicable. Comma separated words are ok only if there are not two commas in a row.
This is the best I've found is:
(.*\s+,?)*([a-zA-Z]+)+(,?\s+.*)*
but it doesn't match the following strings:
aaaaa,11111
11111,aaaaa
11111,aaaaa,
,aaaaa
aaaaa,
,aaaaa,
aaaaa,11111,,
,,aaaaa,bbbbb
aaaaa,,bbbbb,ccccc
aaaaa,bbbbb,,ccccc
aaaaa,bbbbb,ccccc
aaaaa,11111
Here's a test program to determine if a regex is correct:
import java.util.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(.*\\s+,?)*([a-zA-Z]+)+(,?\\s+.*)*";
String shouldMatch[] = new String[] {
"aaaaa",
"aaaaa bbbbb",
"aaaaa 11111",
"11111 aaaaa",
"aaaaa,11111",
"aaaaa, 11111",
"aaaaa, 11111",
"11111,aaaaa",
"11111, aaaaa",
"11111, aaaaa",
"11111,aaaaa,",
",aaaaa",
"aaaaa,",
",aaaaa,",
"aaaaa,11111,,",
",,aaaaa,bbbbb",
"aaaaa1111 bbbbb",
"aaaaa1111 bbbbb ccccc",
"aaaaa1111bbbbb ccccc",
"aaaaa11111bbbbb ccccc 22222",
",,aaaaa bbbbb",
"aaaaa,,bbbbb ccccc",
"aaaaa,,bbbbb,ccccc",
"aaaaa,bbbbb,,ccccc",
"aaaaa,bbbbb,ccccc",
"aaaaa,11111"
};
String shouldNotMatch[] = new String[] {
"aaaaa11111",
"11111bbbbb",
"aaaaa11111bbbbb",
"aaaaa11111bbbbb 11111ccccc",
"aaaaa11111bbbbb ccccc11111",
"aaaaa,,bbbbb",
"aaaaa,,11111",
",,aaaaa",
"aaaaa,,",
"11111",
"11111,22222",
"11111 22222",
""
};
boolean result = true;
for(String stringToTest : shouldMatch){
if (!(stringToTest.matches(regex))){
System.out.println(stringToTest + " Don't match. WRONG.");
result = false;
}
}
for(String stringToTest : shouldNotMatch){
if (stringToTest.matches(regex)){
System.out.println(stringToTest + " Match. WRONG.");
result = false;
}
}
if (result){
System.out.println("Congratulation, your regex is right.");
}
else {
System.out.println("Result of one ore more test is wrong.");
}
}
}
Edit: Added some more String that should not match the regex, empty string and numbers only (plus comma or spaces).
This works, I checked with your test program:
String regex = "^.*?(?<=\\s|^|,)(?<!,,)[A-Za-z]+(?!,,)(?=\\s|,|$).*$";
^ "begins with"
.*? non-greedy for any non-newline character
(?<=\\s|^|,) Positive look behind for white space or beginning of string or ,, since they are the only valid characters that can come before our definition of word
(?<!,,) Negative look behind for ,,, as they are now allowed before word
[A-Za-z]+ 1 or more letters
(?!,,) Negative look ahead for ,, as they are now allowed after word
(?=\\s|,|$) Positive look ahead for white space or end of string or ,, since they are the only valid characters that can come after our definition of word
$ "ends with"
Based on your example following should work:
String regex = "(?i)(?=.*?(?<!,,)\\b[a-z]+\\b(?!,,))[, \\w]+";

Categories