Java regular expression lookahead

Java regular expression lookahead - java

I have strings that I need to use regex to replace a specific character. The strings are in the following format:
"abc.edf" : "abc.abc", "ghi.ghk" : "bbb.bbb" , "qwq.tyt" : "ddd.ddd"
I need to replace the periods, '.', that are between the strings in quotes before the colon but not the strings in quotes after the colon and before the comma. Could someone shed some light?

This pattern will match the entire part that you want to touch: "\w{3}\.\w{3}" : "\w{3}\.\w{3}". Since it includes the colon and the values on both side, it won't match ones where there is a comma between the values. Depending on your needs, you may need to change \w to some other character class.
But, as I'm sure you are aware, you don't want to replace the entire string. You only want to replace the one character. There are two ways to do that. You can either use look-aheads and look-behinds to exclude everything else except the period from the resulting match:
Pattern: (?<="\w{3})\.(?=\w{3}" : "\w{3}\.\w{3}")
Replacement: :
Or, if the look-aheads and look-behinds confuse you, you could just capture the whole thing and include the original values from the captured groups in the replacement value:
Pattern: ("\w{3})\.(\w{3}" : "\w{3}\.\w{3}")
Replacement: $1:$2

Try with the following patern: /.(?=[a-z]+)/g
Working regex-demo for substitution # regex101
Java Working Demo:
public class StackOverFlow31520446 {
public static String text;
public static String pattern;
public static String replacement;
static {
text = "\"abc.edf\" : \"123.231\", \"ghi.ghk\" : \"456.678\" , \"qwq.tyt\" : \"141.242\"";
pattern = "\\.(?=[a-z]+)";
replacement = ";";
}
public static String replaceMatches(String text, String pattern, String replacement) {
return text.replaceAll(pattern, replacement);
}
public static void main(String[] args) {
System.out.println(replaceMatches(text, pattern, replacement));
}
}

Not sure what you intend to do with the string but this is a way to
match the contents of the quote's.
The contents are in capture buffer 1.
You could use a callback to replace the dots within the
contents, passing that back within the main replacement function.
Find: "([^"]*\.[^"]*)"(?=\s*:)
Replace: " + func( call to replace dots from capt buff 1 ) + "
Formatted:
" # Open quote
( [^"]* \. [^"]* ) # (1), group 1 - contents
" # Close quote
(?= # Lookahead, must be a colon
\s*
:
)

If would go for a different approach (maybe it is even faster). In your loop over all strings first try if the string matches a number \d*\.?\d* - if not, do the replacement of . with : (without any regexp).
Would that solve your problem?

You can do it without look arounds:
str = str.replaceAll("(\\D)\\.(\\D)", "$1:$2");
should be sufficient for the task.

Related

Java: Weirdness in replaceAll RegEx

I'm trying to manipulate a String in Java to recognize the markdown options in Facebook Messenger.
I tested the RegEx in a couple of online testers and it worked, but when I tried to implement in Java, it's only recognizing text surrounded by underscores. I have an example that shows the problem here:
private String process(String input) {
String processed = input.replaceAll("(\\b|^)\\_(.*)\\_(\\b|$)", "underscore")
.replaceAll("(\\b|^)\\*(.*)\\*(\\b|$)", "star")
.replaceAll("(\\b|^)```(.*)```(\b|$)", "backticks")
.replaceAll("(\\b|^)\\~(.*)\\~(\\b|$)", "tilde")
.replaceAll("(\\b|^)\\`(.*)\\`(\\b|$)", "tick")
.replaceAll("(\\b|^)\\\\\\((.*)\\\\\\)(\\b|$)", "backslashparen")
.replaceAll("\\*", "%"); // am I matching stars wrong?
return processed;
}
public void test() {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
I expect all the lines would match and be replaced, but only the first line was matched. I wondered if it was because it was the first line, so I copied it in the middle and it matched both. Then I figured I might have missed something matching the special characters, so I added the snip to match the astericks and replace with a percent sign and it worked. The output I'm getting is like so:
underscore
%text%
~Text~
`Text`
underscore
``` Text ```
\(Text\)
~Text~
Any ideas what I might be missing?
Thanks.

If you're using word boundaries then there is no need to match anchors in alternation because word boundary also matches start and end positions. So this are actually redundant matches:
(?:^|\b)
(?:\b|$)
and both can be just be replaced by \b.
However looking at your regex please note that only underscore is considered a word character and *, ~, ` are not word characters hence \b cannot be used around those characters instead \B should be used which is inverse of \b.
Besides this some more improvements can be done like using a negated character class instead of greedy .* and removing unnecessary group.
Code:
class MyRegex {
public static void main (String[] args) {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
private static String process(String input) {
String processed = input.replaceAll("\\b_[^_]+_\\b", "underscore")
.replaceAll("\\B\\*[^*]+\\*\\B", "star")
.replaceAll("\\B```.+?```\\B", "backticks")
.replaceAll("\\B~[^~]+~\\B", "tilde")
.replaceAll("\\B`[^`]+`\\B", "tick")
.replaceAll("\\B\\\\\\(.*?\\\\\\)\\B", "backslashparen");
return processed;
}
}
Code Demo

java regex replaceAll with negated groups

I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");

Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm

You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.

Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}

You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]

Regular expression for string with apostrophes

I'm trying to build regex which will filter form string all non-alphabetical characters, and if any string contains single quotes then I want to keep it as an exception to the rule.
So for example when I enter
car's34
as a result I want to get
car's
when I enter
*&* Lisa's car 0)*
I want to get
Lisa's
at the moment I use this:
string.replaceAll("[^A-Za-z]", "")
however, it gives me only alphabets, and removed the desired single quotas.

This will also remove apostrophes that are not "part if words":
string = string.replaceAll("[^A-Za-z' ]+|(?<=^|\\W)'|'(?=\\W|$)", "")
.replaceAll(" +", " ").trim();
This first simply adds an apostrophe to the list of chars you want to keep, but uses look arounds to find apostrophes not within words, so
I'm a ' 123 & 'test'
would become
I'm a test
Note how the solitary apostrophe was removed, as well as the apostrophes wrapping test, but I'm was preserved.
The subsequent replaceAll() is to replace multiple spaces with a single space, which will result if there's a solitary apostrophe in the input. A further call to trim() was added in case it occurs at the end of the input.
Here's a test:
String string = "I'm a ' 123 & 'test'";
string = string.replaceAll("[^A-Za-z' ]+|(?<=^|\\W)'|'(?=\\W|$)", "").replaceAll(" +", " ").trim();
System.out.println(string);
Output:
I'm a test

Isn't this working ?
[^A-Za-z']

The obvious solution would be:
string.replaceAll("[^A-Za-z']", "")
I suspect you want something more.

You can try the regular expression:
[^\p{L}' ]
\p{L} denote the category of Unicode letters.
In ahother hand, you need to use a constant of Pattern for avoid recompiled the expression every time, something like that:
private static final Pattern REGEX_PATTERN =
Pattern.compile("[^\\p{L}' ]");
public static void main(String[] args) {
String input = "*&* Lisa's car 0)*";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("")
); // prints " Lisa's car "
}

#Bohemian has a good idea but word boundaries are called for instead of lookaround:
string.replaceAll("([^A-Za-z']|\B'|'\B)+", " ");

Regex to match only commas not in parentheses?

I have a string that looks something like the following:
12,44,foo,bar,(23,45,200),6
I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?

Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):
Pattern regex = Pattern.compile(
", # Match a comma\n" +
"(?! # only if it's not followed by...\n" +
" [^(]* # any number of characters except opening parens\n" +
" \\) # followed by a closing parens\n" +
") # End of lookahead",
Pattern.COMMENTS);
This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.

Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.
The regex is very simple:
\(.*?\)|(,)
The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.
In this demo, you can see the Group 1 captures in the lower right pane.
You said you want to match the commas, but you can use the same general idea to split or replace.
To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();
// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list
// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program
Here is a live demo
To use the same technique for splitting or replacing, see the code samples in the article in the reference.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...

I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.
String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
/* do something. */
firstComma = beforeParen.indexOf(',', firstComma + 1);
}
(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX

You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.

You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);

This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.

You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.

Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regular expression lookahead - java

If would go for a different approach (maybe it is even faster). In your loop over all strings first try if the string matches a number \d\.?\d - if not, do the replacement of . with : (without any regexp). Would that solve your problem?

You can do it without look arounds: str = str.replaceAll("(\\D)\\.(\\D)", "$1:$2"); should be sufficient for the task.

Related

Java: Weirdness in replaceAll RegEx

java regex replaceAll with negated groups

Regular expression for string with apostrophes

Regex to match only commas not in parentheses?

java regular expression

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regular expression lookahead - java

If would go for a different approach (maybe it is even faster). In your loop over all strings first try if the string matches a number \d*\.?\d* - if not, do the replacement of . with : (without any regexp). Would that solve your problem?

You can do it without look arounds: str = str.replaceAll("(\\D)\\.(\\D)", "$1:$2"); should be sufficient for the task.

Related

Java: Weirdness in replaceAll RegEx

java regex replaceAll with negated groups

Regular expression for string with apostrophes

Regex to match only commas not in parentheses?

java regular expression

Categories

Resources

If would go for a different approach (maybe it is even faster). In your loop over all strings first try if the string matches a number \d\.?\d - if not, do the replacement of . with : (without any regexp). Would that solve your problem?