Find a subtring in a string using a regular expression - JAVA - java

Suppose i have a string " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' "
I want to replace the substring a.b.c in the string which are only outside the single quote , but it is not working.
Here is my code
`
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
Pattern p = Pattern.compile("a\\.b\\.c");
Matcher m = p.matcher(str);
int x = m.find()
`

use this pattern : a\.b\.c(?=(([^']*'){2})*[^']*$) Demo

To search for a substring outside quotes, you can do something like this:
Pattern pat = Pattern.compile("^(?:[^']|'[^']*')*?a\\.b\\.c");
The first part will skip over:
every character that isn't a quote mark ([^']), or
every sequence of non-quote-mark characters enclosed in quotes ('[^']*').
Once those are skipped, then if it sees the pattern you want, it will know that it isn't inside quote marks.
This will handle a simple case. If things start getting more complicated, e.g. you want to allow \' to quote a quote mark in your input string the way C or Java does in a string literal, the regex starts getting more complicated, and you can quickly reach a point whether either your regex is unreadable or regexes aren't suitable solutions.
EDIT: fixed to put "reluctant" qualifier after second *, so that the first a.b.c will be found.
EDIT 2: If you want to replace the substring you find, it gets trickier. The above pattern matches the entire beginning of the string up through a.b.c, and I couldn't get a look-behind to work so that the match would be only the a.b.c part. I think you'll need to put the beginning of the string in a group, and then use $1 in the replacement string to copy the beginning:
Pattern pat = Pattern.compile("^((?:[^']|'[^']*')*?)a\\.b\\.c");
Matcher m = pat.matcher(source);
if (m.find()) {
result = m.replaceFirst("$1replacement");
}
I'm not sure replaceAll works with this, so if you want to replace all of them, you may need to loop.

I wouldn't mess with REGEX.
public static void main(String[] args) {
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
String[] s = str.split("'");
str = s[0].replaceAll("[abc]", "") + "'"+ s[1]+"'"
+ s[2].replaceAll("[abc]", "");
System.out.println(str);
}
OP:
kk ..jkmk jjko ... jjj 'a.b.ckkkkkkkkkkkkkkkk '
Inefficient.. but works

Related

java regex replaceAll with negated groups

I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");
Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm
You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.
Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}
You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]

java regex escape all reserved characters

I understand that you can use Pattern.quote to escape characters within a string that is reserved by regex. But I do not understand why the following is not working:
String s="and this)";
String ps = "\\b("+Pattern.quote(s)+")\\b";
//String pp = Pattern.quote(pat);
Pattern p=Pattern.compile(ps);
Matcher mm = p.matcher("oh and this) is");
System.out.println(mm.find()); //print false, but expecting true?
When String s= "and this) is changed to String s="and this, i.e., no ), it works. How should I change the code so with ")" it also works as expected?
Thanks
Use negative look-arounds to check for non-word characters before and after the keyword:
String ps = "(?<!\\w)"+Pattern.quote(s)+"(?!\\w)";
This way you will still match the s as a whole word and it won't be a problem is the keyword has non-word characters at the beginning or end.
IDEONE demo:
String s="and this)";
String ps = "(?<!\\w)"+Pattern.quote(s)+"(?!\\w)";
Pattern p=Pattern.compile(ps);
Matcher mm = p.matcher("oh and this) is");
System.out.println(mm.find());
Result: true

Replacing only the first space in a string

I want to replace the first space character in a string with another string listed below. The word may contain many spaces but only the first space needs to be replaced. I tried the regex below but it didn't work ...
Pattern inputSpace = Pattern.compile("^\\s", Pattern.MULTILINE);
String spaceText = "This split ";
System.out.println(inputSpace.matcher(spaceText).replaceAll("&emsp;"));
EDIT:: It is an external API that I am using and I have the constraint that I can only use "replaceAll" ..
Your code doesn't work because it doesn't account for the characters between the start of the string and the white-space.
Change your code to:
Pattern inputSpace = Pattern.compile("^([^\\s]*)\\s", Pattern.MULTILINE);
String spaceText = "This split ";
System.out.println(inputSpace.matcher(spaceText).replaceAll("$1&emsp;"));
Explanation:
[^...] is to match characters that don't match the supplied characters or character classes (\\s is a character class).
So, [^\\s]* is zero-or-more non-white-space characters. It's surrounded by () for the below.
$1 is the first thing that appears in ().
Java regex reference.
The preferred way, however, would be to use replaceFirst: (although this doesn't seem to conform to your requirements)
String spaceText = "This split ";
spaceText = spaceText.replaceFirst("\\s", "&emsp;");
You can use the String.replaceFirst() method to replace the first occurence of the pattern
System.out.println(" all test".replaceFirst("\\s", "test"));
And String.replaceFirst() internally calls Matcher.replaceFirst() so its equivalent to
Pattern inputSpace = Pattern.compile("\\s", Pattern.MULTILINE);
String spaceText = "This split ";
System.out.println(inputSpace.matcher(spaceText).replaceFirst("&emsp;"));
Do in 2 steps:
indexOf(" ") will tell you where is the index
result = str.substring(0, index) + str.substring(index+1, str.length())
The idea is this, you may need to adjust the index values properly according to API.
It should be faster than regexp, because there is 2x arraycopy and not need to text compile pattern matching and stuff.
Can use Apache StringUtils:
import org.apache.commons.lang.StringUtils;
public class substituteFirstOccurrence{
public static void main(String[] args){
String text = "Word1 Word2 Word3";
System.out.println(StringUtils.replaceOnce(text, " ", "-"));
// output: "Word1-Word2 Word3"
}
}
We can simply use yourString.replaceFirst(" ", ""); in Kotlin.

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

Find words in string surrounded by "[" and "]":

I need help with a simple task in java. I have the following sentence:
Where Are You [Employee Name]?
your have a [Shift] shift..
I need to extract the strings that are surrounded by [ and ] signs.
I was thinking of using the split method with " " parameter and then find the single words, but I have a problem using that if the phrase I'm looking for contains: " ". using indexOf might be an option as well, only I don't know what is the indication that I have reached the end of the String.
What is the best way to perform this task?
Any help would be appreciated.
Try with regex \[(.*?)\] to match the words.
\[: escaped [ for literal match as it is a meta char.
(.*?) : match everything in a non-greedy way.
Sample code:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift.");
while(m.find()) {
System.out.println(m.group());
}
Here you go Java regular expression that extract text between two brackets including white spaces:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="[ Employee Name ]";
String re1=".*?";
String re2="( )";
String re3="((?:[a-z][a-z]+))"; // Word 1
String re4="( )";
String re5="((?:[a-z][a-z]+))"; // Word 2
String re6="( )";
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String ws1=m.group(1);
String word1=m.group(2);
String ws2=m.group(3);
String word2=m.group(4);
String ws3=m.group(5);
System.out.print("("+ws1.toString()+")"+"("+word1.toString()+")"+"("+ws2.toString()+")"+"("+word2.toString()+")"+"("+ws3.toString()+")"+"\n");
}
}
}
if you want to ignore white space remove "( )";
This is a Scanner base solution
Scanner sc = new Scanner("Where Are You [Employee Name]? your have a [Shift] shift..");
for (String s; (s = sc.findWithinHorizon("(?<=\\[).*?(?=\\])", 0)) != null;) {
System.out.println(s);
}
output
Employee Name
Shift
Use a StringBuilder (I assume you don't need synchronization).
As you suggested, indexOf() using your square bracket delimiters will give you a starting index and an ending index. use substring(startIndex + 1, endIndex - 1) to get exactly the string you want.
I'm not sure what you meant by the end of the String, but indexOf("[") is the start and indexOf("]") is the end.
That's pretty much the use case for a regular expression.
Try "(\\[[\\w ]*\\])" as your expression.
Pattern p = Pattern.compile("(\\[[\\w ]*\\])");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift..");
if (m.find()) {
String found = m.group();
}
What does this expression do?
First it defines a group (...)
Then it defines the starting point for that group. \[ matches [ since [ itself is a 'keyword' for regular expressions it has to be masked by \ which is reserved in Java Strings and has to be masked by another \
Then it defines the body of the group [\w ]*... here the regexpression [] are used along with \w (meaning \w, meaning any letter, number or undescore) and a blank, meaning blank. The * means zero or more of the previous group.
Then it defines the endpoint of the group \]
and closes the group )

Categories