I have strings "ABC DE", "ABC FE", "ABC RE".
How to replace the characters between ABC and E using regex?
Trying to do this with a regex and replace
str.replace((ABC )[^*](E), 'G');
If you want to remove any characters that appear between "ABC " and "E", then you could accomplish this via lookaheads and the replaceAll() method :
String[] strings = { "ABC DE", "ABC FE", "ABC RE" };
for(int s = 0; s < strings.length; s++){
// Update each string, replacing these characters with a G
strings[s] = strings[s].replaceAll("(?<=ABC ).*(?=E)","G"));
}
Likewise if you didn't explicitly want the space after "ABC", simply remove it from the lookahead by using (?<=ABC).*(?=E).
You can see an interactive example of this here.
You probably want to use regex replaceAll with "ABC(.*?)E" instead.
str = str.replaceAll("ABC(.*?)E", "G");
Explanation:
ABC matches the characters ABC literally (case sensitive)
1st Capturing group (.*?)
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
E matches the character E literally (case sensitive)
Related
[![enter image description here][1]][1]I want to replace substring src="/slm/attachment/63338424306/Note.jpg" with substring jira/rally/images using regular expression in Java.
Below is the query to get the list of the String which contains substring src="/slm/attachment/63338424306/Note.jpg"
criteria.add(Restrictions.like("comment", "%<img%"));
criteria.setMaxResults(1);
List<Comments> list = criteria.list();
How can i replace using regex? Please help me here.
Let's say xxxxxxxxsrc="/slm/attachment/63338424306/Note.jpgxxxxxxxx is the string then after the replacement I am expecting xxxxxxxxsrc="jira/rally/images/Note.jpgxxxxxxxx
the no. 63338424306 can be any random no.
image name & format 'Note.jpg' can be changed i.e. 'abc.png' etc.
Basically, I want to replace /slm/attachment/63338424306/ with jira/rally/images
Thanks to all of you for your answers. I have updated the question little bit, please help me with that.
yourString.replaceAll("src=\"/slm/attachment", "src=\"/jira/rally/images");
You could use a capturing group for the src=" part and match the part that you want to replace.
(src\s*=\s*")/slm/attachment/\d+
( Capture group
src\s*=\s*" Match src, 0+ whitespace chars, =, 0+ whitespace chars and "
) Close group
/slm/attachment/ Match literally
\d+ Match 1+ digits
Note that if you want to match 0+ spaces only and no newlines, you could use a space only or [ \t]* to match a space and tab instead of \s*
In Java
String regex = "(src\\s*=\\s*\")/slm/attachment/\\d+";
And use the first capturing group in the replacement:
$1jira/rally/images
Result:
src="jira/rally/images/Note.jpg
Regex demo | Java demo
For example:
String string = "src = \"/slm/attachment/63338424306/Note.jpg";
System.out.println(string.replaceAll("(src\\s*=\\s*\")/slm/attachment/\\d+", "$1jira/rally/images"));
// src = "jira/rally/images/Note.jpg
You can use the following replacement sequences:
String a = "abc 123 src=\"/slm/attachment/63338424306/Note.jpg abc 132";
String b = "abc 123 src=\"/slm/attachment/61118424306/Note.jpg xyz";
String c = "123xxsrc=\"/slm/attachment/51238424306/Note.jpgxx324";
System.out.println(a.replaceAll("(?<=src=\")/slm/attachment/\\d+","jira/rally/images"));
System.out.println(b.replaceAll("(?<=src=\")/slm/attachment/\\d+","jira/rally/images"));
System.out.println(c.replaceAll("(?<=src=\")/slm/attachment/\\d+","jira/rally/images"));
output:
abc 123 src="jira/rally/images/Note.jpg abc 132
abc 123 src="jira/rally/images/Note.jpg xyz
123xxsrc="jira/rally/images/Note.jpgxx324
regex demo: https://regex101.com/r/ZtRg49/7/
I have string inside brackets like following format:
[space string space]
I want to extract the string if the string is in UUID format.
example : [ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]
With java regular expression how can I get d6a413f4-059c-11e8-ba89-0ed5f89f718b ?
For your given example, you could use a lookaround to match what is between the [ and the ]:
(?<=\[ ).*?(?= \])
Explanation
(?= \]) positive lookbehind to assert that what is before is [
.*? match any character zero or more times non greedy
(?= \]) positive lookahead to assert that what follows is ]
For example:
String regex = "(?<=\\[ ).*?(?= \\])";
String string = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Java example output
Using regex
\[ ([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ]
Regex101
Why you don't want to do this
If you know that your string will definitely have the right format then you can just use substring to get the UUID
class Main {
public static void main(String... args) {
String s = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
System.out.println(s.substring(2, s.length()-2));
}
}
Try it online!
This will be faster than using the regex option.
Regex to check if given String contains valid UUID:
"\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]"
So, what is going on in this regex:
\\[ - character ‘[‘ and whitespace after it
[a-f0-9]{8} – characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly eight times (123e5670 part)
\\- - ‘-‘ character
(?:[a-f0-9]{4}\\-){3} – non-capturing group that you want to be present exactly three times (this non-capturing group should contain exactly 4 characters that are in the range from ‘a’ to ‘f’ or from ‘0’ to ‘9’. After these 4 characters there must be present ‘-‘ character) (a234-b234-c234- part)
[a-f0-9]{12} - characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly twelve times (d23456789012 part)
\\] – whitespace and ‘]’ character
After searching String for match with find() method, you only print capturing group #1 with group(1) method ( capturing group #1 is contained in parenthesis () )
Your UUID is in capture group 1. Here is a simple example how you can get UUID from source String:
String source = "[ 123e5670-a234-b234-c234-d23456789012 ]";
Pattern p = Pattern.compile("\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]");
Matcher m = p.matcher(source);
if(m.find()) {
System.out.println( m.group(1));
}
I want to split a string like this: "1.2 5" to be tokenized to {"1", ".", "2", "5"} (order matters), I was trying to do this with String.split() using the following regex: ([0-9])\w*|\. but this is what I want to match, not the delimiters.
Is there maybe another method that does this? Is it even possible to split two words that are connected while keeping both intact? (e.g split "1.2" like the above example)
More examples:
"1 2 8" => {"1", "2", "8"}
"1 122 .8" => {"1", "122", "." "8"}
"1 2.800" => {"1", "2", "." "800"}
This regex should work (demo):
s.split("(?=\\.)(?<! )|(?<=\\.)| +")
It works by spliting on places in the string where:
the next character is a literal . (lookahead) and the preceding character is not a space (negative lookbehind)
the preceding character is a literal . (lookbehind)
there are one or more space characters
The java split function removes any matching part of the string. In the case of the lookahead/lookbehind matches, they are are zero-width so split doesn't actually consume any of the string when spliting. The zero-width match basically just marks a position in the string to split at.
This solution will works for all your given examples, and it also works for multiple spaces. Here's a demo.
In response to your comment about the (?<! ) part of the regex. Without that part, The pattern matches every space character, and the position before every . and after every .. One of your examples had a space followed by a . (e.g. "2 .8") which would split like this:
["2", "", ".", "8"]
Note the empty string in the 2nd position. This is because it has split on the space, and then found a position before a ., and split there too. The (?<! ) prevents this by saying "only split before a . if it's not preceded by a space character.
You don't need regex matching, java has a built-in StringTokenizer that is just for this.
Try this:
StringTokenizer st = new StringTokenizer("1.2 5", ". ");
while(st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
Output:
1
2
5
EDIT: and if you want to include the delimiters, use new StringTokenizer(string, delimiters, returnDelims=true). In that case, the output is:
1
.
2
5
If you just want to return the dot, but not the space, skip it in the loop.
I'd rather collect all the non-digit and non-whitespace symbols with [^\d\s] and digits with a \d:
String s = "1.2 5";
Pattern pattern = Pattern.compile("\\d+|[^\\d\\s]+");
Matcher matcher = pattern.matcher(s);
List<String> lst = new ArrayList<>();
while (matcher.find()){
lst.add(matcher.group(0));
}
System.out.println(lst); // => [1, 122, ., 8]
See the Java demo
Pattern details:
\d+ - 1 or more digits
| - or
[^\d\s]+ - one or more chars other than a whitespace or digit
And here is a regex demo.
If i have this String:
String line = "This, is Stack; Overflow.";
And want to split it into the following array of strings:
String[] array = ...
so the array contains this output:
["This",",","is","Stack",";","Overflow","."]
What regex expression should i put into the split() method ?
Just split your input according to the spaces or the boundaries which exists between a word character and a non-word character, vice-versa.
String s = "This, is Stack; Overflow.";
String parts[] = s.split("\\s|(?<=\\w)(?=\\W)");
System.out.println(Arrays.toString(parts));
\s matches any kind of whitespace character, \w matches a word character and \W matches a non-word character.
\s matches a space character.
(?<=\\w) Positive look-behind which asserts that the match must be preceded by a word character (a-z, A-Z, 0-9, _).
(?=\\W) Positive look-ahead which asserts that the match must be followed by a non-word character(any character other than the word character). So this (?<=\\w)(?=\\W) regex matches only the boundaries not a character.
Thus splitting the input according to the matches spaces and the boundaries will give you the desired output.
DEMO
OR
String s = "This, is Stack; Overflow.";
String parts[] = s.split("\\s|(?<=\\w)(?=\\W)|(?<=[^\\w\\s])(?=\\w)");
System.out.println(Arrays.toString(parts));
Output:
[This, ,, is, Stack, ;, Overflow, .]
You can do that with this pattern:
\\s+|(?<=\\S)(?=[^\\w\\s])|(?<=[^\\w\\s])\\b
it trims whitespaces and deals with consecutive special characters, example:
With ;This, is Stack; ;; Overflow.
you obtain: [";", "This", ",", "is", "Stack", ";", ";", ";", "Overflow", "."]
But obviously, the more efficient way is to not use the split method but the find method with this pattern:
\\w+|[^\\w\\s]
i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks
Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity
You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.