Java regex to find delimiters with release character - java

I'm trying to get a similar result \ has in Java String literals. If there are two of them, it's a \, otherwise it "escapes" whatever follows. So if there is a delimiter that follows a single release char, it doesn't count. But two release chars resolve to a release char literal, so then the following delimiter should be considered a delimiter. So, if an odd number of release chars precede a delimiter, it's ignored. For 0 or an even number it's a delimiter. So, in the code example below:
?: <- : is not a delimiter
??: <- : is a delimiter
???: <- : is not a delimiter
????: <- : is a delimiter
Here's sample code showing what doesn't work.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestPattern
{
public static void main(final String[] args)
{
final Matcher m = Pattern.compile("(\\?\\?)*[^\\?]\\:").matcher("a??:b:c");
m.find(0);
System.out.println(m.end());
}
}

The following should work
\b(\?{2})*:

The * means there can be zero of that group. So that capturing group can be the empty string. [^\\?] can be any character that isn't a ?, as ? is not a special character inside a character class. The \ is ignored.
Therefore, b: (with an empty string preceding it) matches, and the second colon is your last (and, in this case, first) match.
I think you simply want "(\\?\\?)*\\?:".

Your regex means:
Zero or more '??'
(\\?\\?)*
Followed by not '?'
[^\\?]
Ending in ':'
\\:
So, your last match is the last colon. That's why the result offset is 6.
You could change for:
final Matcher m = Pattern.compile("((\\?){2})+").matcher("a??:b:????:c");
while (m.find()){
//outputs 1 and 6, places
//you would have to start
//scaping...
System.out.println(m.start());
}

It appears that just be reversing the regex it works. Putting the "don't match a ?" first, and then the "any even number of ?'s" seems to do the trick:
[^?](\\?\\?)*:

Related

Java regex (java.util.regex). Search for dollar sign

I have a search string.
When it contains a dollar symbol, I want to capture all characters thereafter, but not include the dot, or a subsequent dollar symbol.. The latter would constitute a subsequent match.
So for either of these search strings...:
"/bla/$V_N.$XYZ.bla";
"/bla/$V_N.$XYZ;
I would want to return:
V_N
XYZ
If the search string contains percent symbols, I also want to return what's between the pair of % symbols.
The following regex seems do the trick for that.
"%([^%]*?)%";
Inferring:
Start and end with a %,
Have a capture group - the ()
have a character class containing anything except a % symbol, (caret infers not a character)
repeated - but not greedily *?
Where some languages allow %1, %2, for capture groups, Java uses backslash\number syntax instead. So, this string compiles and generates output.
I suspect the dollar symbol and dot need escaping, as they are special symbols:
$ is usually end of string
. is a meta sequence for any character.
I have tried using double backslash symbols.. \
Both as character classes .e.g. [^\\.\\$%]
and using OR'd notation %|\\$
in attempts to combine this logic and can't seem to get anything to play ball.
I wonder if another pair of eyes can see how to solve this conundrum!
My attempts so far:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "([%\\$])([^%\\.\\$]*?)\\1?";
/* Either % or $ in first capture group ([%\\$])
* Second capture group - anything except %, dot or dollar sign
* non greedy group ( *?)
* then a backreference to an optional first capture group \\1?
* Have to use two \, since you escape \ in a Java string.
*/
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(search);
List<String> results = new ArrayList<String>();
while (m.find())
{
for (int i = 0; i<= m.groupCount(); i++) {
results.add(m.group(i));
}
}
for (String result : results) {
System.out.println(result);
}
}
}
The following links may be helpful:
An interactive Java playground where you can experiment and copy/paste code.
Regex101
Java RegexTester
Java backreferences (The optional backreference \\1 in the Regex).
Link that summarises Regex special characters often found in languages
Java Regex book EPub link
Regex Info Website
Matcher class in the Javadocs
You may use
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "[%$]([^%.$]*)";
Matcher matcher = Pattern.compile(pattern).matcher(search);
while (matcher.find()){
System.out.println(matcher.group(1));
} // => V_N, XYZ
See the Java demo and the regex demo.
NOTE
You do not need an optional \1? at the end of the pattern. As it is optional, it does not restrict match context and is redundant (as the negated character class cannot already match neither $ nor%)
[%$]([^%.$]*) matches % or $, then captures into Group 1 any zero or more
chars other than %, . and $. You only need Group 1 value, hence, matcher.group(1) is used.
In a character class, neither . nor $ are special, thus, they do not need escaping in [%.$] or [%$].

Splitting a string on whitespaces

I'm currently trying to splice a string into a multi-line string.
The regex should select white-spaces which has 13 characters before.
The problem is that the 13 character count does not reset after the previous selected white-space. So, after the first 13 characters, the regex selects every white-space.
I'm using the following regex with a positive look-behind of 13 characters:
(?<=.{13})
(there is a whitespace at the end)
You can test the regex here and the following code:
import java.util.ArrayList;
public class HelloWorld{
public static void main(String []args){
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
for (String string : str.split("(?<=.{13}) ")) {
System.out.println(string);
}
}
}
The output of this code is as follows:
This is a test.
The
app
should
break
this
string
in
substring
on
whitespaces
after
13
characters
But it should be:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters
You may actually use a lazy limiting quantifier to match the lines and then replace with $0\n:
.{13,}?[ ]
See the regex demo
IDEONE demo:
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
System.out.println(str.replaceAll(".{13,}?[ ]", "$0\n"));
Note that the pattern matches:
.{13,}? - any character that is not a newline (if you need to match any character, use DOTALL modifier, though I doubt you need it in the current scenario), 13 times at least, and it can match more characters but up to the first space encountered
[ ] - a literal space (a character class is redundant, but it helps visualize the pattern).
The replacement pattern - "$0\n" - is re-inserting the whole matched value (it is stored in Group 0) and appends a newline after it.
You can just match and capture 13 characters before white spaces rather than splitting.
Java code:
Pattern p = Pattern.compile( "(.{13}) +" );
Matcher m = p.matcher( text );
List<String> matches = new ArrayList<>();
while(m.find()) {
matches.add(m.group(1));
}
It will produce:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters
RegEx Demo
you can do this with the .split and using regular expression. It would be like this
line.split("\\s+");
This will spilt every word with one or more whitespace.

Match String ending with (regex) java

I am following the suggestions on the page, check if string ends with certain pattern
I am trying to display a string that is
Starts with anything
Has the letters ".mp4" in it
Ends explicitly with ', (apostrophe followed by comma)
Here is my Java code:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
// your code goes here
String str = " _file='ANyTypEofSTR1ngHere_133444556_266545797_10798866.mp4',";
Pattern p = Pattern.compile(".*.mp4[',]$");
Matcher m = p.matcher(str);
if(m.find())
System.out.println("yes");
else
System.out.println("no");
}
}
It prints "no". How should I declare my RegEx?
There are several issues in your regex:
"Has the letters .mp4 in it" means somewhere, not necessarily just in front of ',, so another .* should be inserted.
. matches any character. Use \. to match .
[,'] is a character group, i.e. exactly one of the characters in the brackets has to occur.
You can use the following regex instead:
Pattern p = Pattern.compile(".*\\.mp4.*',$");
Your character set [',] is checking whether the string ends with ' or , a single time.
If you want to match those character one or more times, use [',]+. However, you probably don't want to use a character set in this case since you said order is important.
To match an apostrophe followed by comma, just use:
.*\\.mp4',$
Also, since . has special meaning, you need to escape it in '.mp4'.

Java regular expression lookahead

I have strings that I need to use regex to replace a specific character. The strings are in the following format:
"abc.edf" : "abc.abc", "ghi.ghk" : "bbb.bbb" , "qwq.tyt" : "ddd.ddd"
I need to replace the periods, '.', that are between the strings in quotes before the colon but not the strings in quotes after the colon and before the comma. Could someone shed some light?
This pattern will match the entire part that you want to touch: "\w{3}\.\w{3}" : "\w{3}\.\w{3}". Since it includes the colon and the values on both side, it won't match ones where there is a comma between the values. Depending on your needs, you may need to change \w to some other character class.
But, as I'm sure you are aware, you don't want to replace the entire string. You only want to replace the one character. There are two ways to do that. You can either use look-aheads and look-behinds to exclude everything else except the period from the resulting match:
Pattern: (?<="\w{3})\.(?=\w{3}" : "\w{3}\.\w{3}")
Replacement: :
Or, if the look-aheads and look-behinds confuse you, you could just capture the whole thing and include the original values from the captured groups in the replacement value:
Pattern: ("\w{3})\.(\w{3}" : "\w{3}\.\w{3}")
Replacement: $1:$2
Try with the following patern: /.(?=[a-z]+)/g
Working regex-demo for substitution # regex101
Java Working Demo:
public class StackOverFlow31520446 {
public static String text;
public static String pattern;
public static String replacement;
static {
text = "\"abc.edf\" : \"123.231\", \"ghi.ghk\" : \"456.678\" , \"qwq.tyt\" : \"141.242\"";
pattern = "\\.(?=[a-z]+)";
replacement = ";";
}
public static String replaceMatches(String text, String pattern, String replacement) {
return text.replaceAll(pattern, replacement);
}
public static void main(String[] args) {
System.out.println(replaceMatches(text, pattern, replacement));
}
}
Not sure what you intend to do with the string but this is a way to
match the contents of the quote's.
The contents are in capture buffer 1.
You could use a callback to replace the dots within the
contents, passing that back within the main replacement function.
Find: "([^"]*\.[^"]*)"(?=\s*:)
Replace: " + func( call to replace dots from capt buff 1 ) + "
Formatted:
" # Open quote
( [^"]* \. [^"]* ) # (1), group 1 - contents
" # Close quote
(?= # Lookahead, must be a colon
\s*
:
)
If would go for a different approach (maybe it is even faster). In your loop over all strings first try if the string matches a number \d*\.?\d* - if not, do the replacement of . with : (without any regexp).
Would that solve your problem?
You can do it without look arounds:
str = str.replaceAll("(\\D)\\.(\\D)", "$1:$2");
should be sufficient for the task.

Regex to match only commas not in parentheses?

I have a string that looks something like the following:
12,44,foo,bar,(23,45,200),6
I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?
Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):
Pattern regex = Pattern.compile(
", # Match a comma\n" +
"(?! # only if it's not followed by...\n" +
" [^(]* # any number of characters except opening parens\n" +
" \\) # followed by a closing parens\n" +
") # End of lookahead",
Pattern.COMMENTS);
This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.
Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.
The regex is very simple:
\(.*?\)|(,)
The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.
In this demo, you can see the Group 1 captures in the lower right pane.
You said you want to match the commas, but you can use the same general idea to split or replace.
To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();
// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list
// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program
Here is a live demo
To use the same technique for splitting or replacing, see the code samples in the article in the reference.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.
String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
/* do something. */
firstComma = beforeParen.indexOf(',', firstComma + 1);
}
(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)

Categories