Regex for string has condition before and after string - java

I want to create regex match with string has before is " " or "." and use it in replaceAll in Java.
Example:
LLS.LLS kLLS LLS
I use regex \bLLS\b.
The result I want this is LLS.
But if I want to find string "/LLS" the regex is failed.
Example: \b/LLS\b
Find in string: BBA /LLS CCA
Can you help me to find a regex?
Thanks

You could use a positive lookahead and lookbehind to assert what is on the left and rigt is either a space or a dot in a characters class.:
(?<=[ .])/?LLS(?=[ .])
In Java:
String regex = "(?<=[ .])/?LLS(?=[ .])";
Regex demo

You need to remove the first \b, you don't need to set a word boundary at the beginning
\/LLS\b
You can also make the / optional with ? to match both LLS or /LLS like so
\/?LLS\b
Here is an example https://regexr.com/48p5e

Related

Erase any string that doesn't match a pattern using replaceall()

I need to replace ALL characters that don't follow a pattern with "".
I have strings like:
MCC-QX-1081
TEF-CO-QX-4949
SPARE-QX-4500
So far the closest I am using the following regex.
String regex = "[^QX,-,\\d]";
Using the replaceAll String method I get QX1081 and the expected result is QX-1081
You're using a character class which matches single characters, not patterns.
You want something like
String resultString = subjectString.replaceAll("^.*?(QX-\\d+)?$", "$1");
which works as long as nothing follows the QX-digits part in your strings.
Put the dash at the end of the regex: [^QX,\d-]
Next you just have to substring to filter out the first dash.
Don't know exactly what you expect for all strings but if you want to match a dash in a character class then it must be set as last character.
You are using a character class where you have to either escape the hyphen or put it at the start or at the end like [^QX,\d-] or else you are matching a range from a comma to a comma. But changing that will give you -QX-1081 which is not the desired result.
You could match your pattern and then replace with the first capturing group $1:
^(?:[A-Z]+-)+(QX-\d+)$
In Java you have to double escape matching a digit \\d
That will match:
^ Start of the string
(?:[A-Z]+-)+ Repeat 1+ times one or more uppercase charactacters followed by a hyphen
(QX-\d+) Capture in a group QX- followed by 1+ digits
$ End of the string
For example:
String result = "MCC-QX-1081".replaceAll("^(?:[A-Z]+-)+(QX-\\d+)$", "$1");
System.out.println(result); // QX-1081
See the Regex demo | Java demo
Note that if you are doing just 1 replacement, you could also use replaceFirst

regex for '(number)'

I need to replace all the 's around numbers, to nothing.. for example:
'1' to 1
'100' to 100
which is the optimal way to do this? is there a regex to do this so I can use it in the replace() function of the String class?
You can use replaceAll method with regex support:
str = str.replaceAll("'(\\d+)'", "$1");
(\\d+) will match and group digits surrounded by single quotes on either side and then we use $1 in replacement which is the back-reference to captured value in regex.
If it's in a String and you want the integer why don't you just parse it.
int a = Integer.parseInt("100");

Removing repeated characters in String

I am having strings like this "aaaabbbccccaaddddcfggghhhh" and i want to remove repeated characters get a string like this "abcadcfgh".
A simplistic implementation for this would be :
for(Character c:str.toCharArray()){
if(c!=prevChar){
str2.append(c);
prevChar=c;
}
}
return str2.toString();
Is it possible to have a better implementation may be using regex?
You can do this:
"aaaabbbccccaaddddcfggghhhh".replaceAll("(.)\\1+","$1");
The regex uses backreference and capturing groups.
The normal regex is (.)\1+ but you've to escape the backslash by another backslash in java.
If you want number of repeated characters:
String test = "aaaabbbccccaaddddcfggghhhh";
System.out.println(test.length() - test.replaceAll("(.)\\1+","$1").length());
Demo
With regex, you can replace (.)\1+ with the replacement string $1.
You can use Java's String.replaceAll() method to simply do this with a regular expression.
String s = "aaaabbbccccaaddddcfggghhhh";
System.out.println(s.replaceAll("(.)\\1{1,}", "$1")) //=> "abcadcfgh"
Regular expression
( group and capture to \1:
. any character except \n
) end of \1
\1{1,} what was matched by capture \1 (at least 1 times)
use this pattern /(.)(?=\1)/g and replace with nothing
Demo

Regex negative lookbehind to escape period/dot

I had been struggling to find out a regex pattern that would escape "." if a escape char is found before it. Negative lookbehind was promising but I suppose it doesn't work for "." as with below syntax
String test = "hostname.domain.com/abc/def/v1.8/ghi"
In above example, string needs to be split by "." , but I need to escape the v1.8 so that v1 and 8 are not treated as different array elements in the URI part.
String test = "hostname.domain.com/abc/def/v1\\.8/ghi"
test.split("(?!\\\\).");
The expected output {"hostname","domain","com/abc/def/v1.8/ghi"} . The URI context path should not be split by "." if it carries any "." it would just for representing version.
The above negative lookbehind syntax works for other char's like -, but doesn't work for ".". I assume the escape character needs to be different, but adding other escape chars might cause issue in further processing of the string as the input is of URI type and don't want any reserved/special chars in URI to be used as char to prepend for this. Any thoughts/help from anyone is appreciated.
Why use regex..Use URL class
URL url=new URL(yourURL);
url.getPath();//abc/def/v1.8/ghi
url.getPort();//-1 in your case
url.getHost();//hostname.domain.com
You can now split the hostname with .
You can use this negative lookahead regex:
(?!\\\\)(?:^|.)\\.
OR Using negative lookbehind:
(?<!\\\\)\\.
Online Demo: http://www.rubular.com/r/Sqa2P7A6dR and http://www.rubular.com/r/xgE7onrwzX
To avoid multiple use of escape characters in the regex string (one level of escaping is removed by the Java compiler; the other level is removed by the regex engine) it is possible to "escape" characters by enclosing them in square brackets. For example, \\\\. would become a more readable [.].
In your case, you could tell Java not to use a dot that is between two digits, because it's a decimal separator:
String test = "hostname.domain.com/abc/def/v1.8/ghi";
for (String s : test.split("(?<!\\d)[.](?!\\d)")) {
System.out.println(s);
}
Here is a demo on ideone.
try this expr
String[] s = "hostname.domain.com/abc/def/v1.8/ghi".split("(?<!/.{0,99})\\.");

regex to find substring between special characters

I am running into this problem in Java.
I have data strings that contain entities enclosed between & and ; For e.g.
&Text.ABC;, &Links.InsertSomething;
These entities can be anything from the ini file we have.
I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.
I am trying to use regex to pattern match and failing.
Can anyone suggest the regex for this problem?
Thanks!
Here is the regex:
"&[A-Za-z]+(\\.[A-Za-z]+)*;"
It starts by matching the character &, followed by one or more letters (both uppercase and lower case) ([A-Za-z]+). Then it matches a dot followed by one or more letters (\\.[A-Za-z]+). There can be any number of this, including zero. Finally, it matches the ; character.
You can use this regex in java like this:
Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");
Or just
"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");
If you want to remove whitespaces after the matched tokens, you can use this re:
"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace
And there is a nice online regular expression tester which uses the java regexp library.
http://www.regexplanet.com/simple/index.html
You can try:
input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");
See it

Categories