Been researching online but haven't been able to find a solution.
I've got the following string '555.8.0.i5:790.2.0.i19:904.1.0:8233.2:' in Java.
Whats the best way I can remove everything from and including the second dot to the colon?
I want the string to end up looking like this: 555.8:790.2:904.1:8233.2:
I saw on another post someone had referenced the second dot with java regex (\d+.\d.) but I'm not sure how to do the trim.
EDIT:
I have tried the following java regex .replaceAll("\\.(.*?):", ":"); but it seems to remove everything from the first dot. Not sure how to get it to trim from the second dot.
In your case, you may use
.replaceAll("(\\.[^:.]+)\\.[^:]+", "$1")
See the regex demo
Details:
(\\.[^:.]+) - Capture group 1 capturing a dot and 1+ chars other than a literal dot and colon
\\. - a literal dot
[^:]+ - 1+ chars other than a colon.
In the replacement pattern, only a $1 backreference to the value captured in Group 1 is used.
Do you have to use regex? Here is a solution using Java:
public static void main(String[] args) {
String myString = "555.8.0.i5:790.2.0.i19:904.1.0:8233.2:";
StringBuilder sb = new StringBuilder();
//Split the string into an array of strings at each colon
String[] stringParts = myString.split(":");
//Loop over each substring
for (String stringPart : stringParts) {
//Find the index of the second dot
int secondDotIndex = stringPart.indexOf('.', 1 + stringPart.indexOf('.', 1));
//If a second dot exists then remove everything after and including the dot
if (secondDotIndex != -1) {
stringPart = stringPart.substring(0, secondDotIndex);
}
//Append each string part and colon back to the final string
sb.append(stringPart);
sb.append(":");
}
System.out.println(sb.toString());
}
The final println prints 555.8:790.2:904.1:8233.2:
Related
I'm trying to manipulate a String in Java to recognize the markdown options in Facebook Messenger.
I tested the RegEx in a couple of online testers and it worked, but when I tried to implement in Java, it's only recognizing text surrounded by underscores. I have an example that shows the problem here:
private String process(String input) {
String processed = input.replaceAll("(\\b|^)\\_(.*)\\_(\\b|$)", "underscore")
.replaceAll("(\\b|^)\\*(.*)\\*(\\b|$)", "star")
.replaceAll("(\\b|^)```(.*)```(\b|$)", "backticks")
.replaceAll("(\\b|^)\\~(.*)\\~(\\b|$)", "tilde")
.replaceAll("(\\b|^)\\`(.*)\\`(\\b|$)", "tick")
.replaceAll("(\\b|^)\\\\\\((.*)\\\\\\)(\\b|$)", "backslashparen")
.replaceAll("\\*", "%"); // am I matching stars wrong?
return processed;
}
public void test() {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
I expect all the lines would match and be replaced, but only the first line was matched. I wondered if it was because it was the first line, so I copied it in the middle and it matched both. Then I figured I might have missed something matching the special characters, so I added the snip to match the astericks and replace with a percent sign and it worked. The output I'm getting is like so:
underscore
%text%
~Text~
`Text`
underscore
``` Text ```
\(Text\)
~Text~
Any ideas what I might be missing?
Thanks.
If you're using word boundaries then there is no need to match anchors in alternation because word boundary also matches start and end positions. So this are actually redundant matches:
(?:^|\b)
(?:\b|$)
and both can be just be replaced by \b.
However looking at your regex please note that only underscore is considered a word character and *, ~, ` are not word characters hence \b cannot be used around those characters instead \B should be used which is inverse of \b.
Besides this some more improvements can be done like using a negated character class instead of greedy .* and removing unnecessary group.
Code:
class MyRegex {
public static void main (String[] args) {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
private static String process(String input) {
String processed = input.replaceAll("\\b_[^_]+_\\b", "underscore")
.replaceAll("\\B\\*[^*]+\\*\\B", "star")
.replaceAll("\\B```.+?```\\B", "backticks")
.replaceAll("\\B~[^~]+~\\B", "tilde")
.replaceAll("\\B`[^`]+`\\B", "tick")
.replaceAll("\\B\\\\\\(.*?\\\\\\)\\B", "backslashparen");
return processed;
}
}
Code Demo
I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");
Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm
You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.
Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}
You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]
I'm working on a text to HTML parser. I'm using the "##" notation to mark a Bold character. Ex.
Example ##Bold text in a paragraph
Turns to:
Example <strong>Bold</strong> text in paragraph
The following code works, however I've found out that it works just on the last Bold notation found:
private static String escapeBold(String sCurrentLine) {
if (sCurrentLine.indexOf("##") < 0) {
return sCurrentLine;
}
String newString = null;
String oldString = null;
String chars[] = sCurrentLine.split(" ");
for (String s : chars) {
if (s.startsWith("##")) {
newString = "<strong>" + s.replaceAll("##", "") + "</strong>";
oldString = s;
}
}
return (sCurrentLine.replaceAll(oldString, newString));
}
Is there a simpler way to do it, maybe with a RegExpr ?
Thanks!
It looks like your method can look like
private static String escapeBold(String sCurrentLine) {
return sCurrentLine.replaceAll("##(\\w+)", "<strong>$1</strong>");
}
It will try to find each ##someWord and place someWord part in group 1. In replacement we are using match stored in group 1 via $1 and simply surrounding it with <strong> tags.
To understand this code you need to know that replaceAll(regex,replacement) uses regular expression (regex) to find part which we want to modify, and replacement describes how we want to modify it.
In regex \\w represents characters in range a-z A-Z 0-9 and _. If you want to include other characters you can create your own character class, or use \\S which represents all non-whitespace characters.
i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}
For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?
Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.
A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12
You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].
If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29