Regular expression to split between pipes except in brackets - java

I have the following text line:
|random|[abc|www.abc.org]|1024|
I would like to split these into 3 parts with a regular expression
random
[abc|www.abc.org]
1024
Currently the following result is achieved with expression \|
random
[abc
www.abc.org]
1024
My problem is that I cannot exclude the pipe symbol in the middle column surrounded by the brackets [].

If you have to use split, you can use the regex
\|(?=$|[^]]+\||\[[^]]+\]\|)
https://regex101.com/r/7OxmiY/1
It will match a pipe, then lookahead for either:
$, the end of the string, so that the final | is split on, or
[^]]+\|, non-] characters until a pipe is reached, ensuring that pipes inside []s will not be split upon, or
\[[^]]+\]\| - Same as above, except with literal [ and ]s surrounding the pattern
In Java:
String input = "|random|[abc|www.abc.org]|[test]|1024|";
String[] output = input.split("\\|(?=$|[^]]+\\|)");

You can use follow code:
final String regex = "(?<=|)\\[?[\\w.]+\\|?[\\w.]+\\]?(?=|)";
final String string = "|random|[abc|www.abc.org]|[test]|1024|";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
Output:
Full match: random
Full match: [abc|www.abc.org]
Full match: [test]
Full match: 1024
See here at regex101: https://regex101.com/r/Fcb3Wx/1

Related

Java regex for preserving currency symbols along with comma and dot if they are surrounded by numbers

This is my input string
String inputString = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34"
I tried below regex to split
String[] tokens = inputString.split("(?![$£](?=(\\d)*[.,]?(\\d)*))[\\p{Punct}\\s]");
but it is not preserving comma and dot if they are surrounded by numbers. Basically,I don't want to split by comma and dot if they are part of price value
Output I get is
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112
token==>662
token==>$0
token==>33445533
token==>abc
token==>def
token==>12
token==>34
Expected output
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112.662
token==>$0.33445533
token==>abc
token==>def
token==>12
token==>34
Instead of split, you may use this simpler regex to get all the desired matches:
[$£]\w+[$£]?|[^\p{Punct}\h]+
RegEx Demo
RegEx Breakup:
[$£]: Match $ or £
\w+: Match 1+ word chars
[$£]?: Match optional $ or £
|: OR
[^\p{Punct}\h]+: Match 1+ of any char that are not whitespace or punctuation
Code:
final String regex = "[$£]\\w+[$£]?|[^\\p{Punct}\\h]+";
final String string = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("token==>" + matcher.group());
}

RegEx for capturing special chars

I am trying to replace a string using regular expression what i need basically is to convert a code like assignment:
k*=i
into
k=k+i
In my example:
jregex.Pattern p=new jregex.Pattern("([a-z]|[A-Z])([a-z]|[A-Z]|\\d)*[\\+|\\*|\\-|\\/][=]([a-z]|[A-Z])*([a-z]|[A-Z]|\\d)");
Replacer r= new Replacer(p,"1=$1,2=$2,3=$3,4=$4,5=$5,6=$6,7=$7,8=$8");
String result=r.replace("k*=i");
The regex seems to not extract the special chars.
(in this example: +, -, *, /, =)
So what I get as result is:
1=k,2=,3=,4=i,5=,6=,7=,8=
(I can extract only the k & i)
How do I solve this problem?
Here, we can design as expression similar to:
(.+)[*+-/]=(.+)
where we are capturing our k and i using these two capturing groups in the start and end:
(.+)
We can add more boundaries, if we wish, such as start and end char:
^(.+)[*+-/]=(.+)$
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(.+)[*+-/]=(.+)";
final String string = "k*=i\n"
+ "apple*=orange";
final String subst = "$1=$1+$2";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
DEMO
RegEx Circuit
jex.im visualizes regular expressions:
You could use 3 capturing groups and capturing *+/- in a character class.
([a-zA-Z])([*+/-])=([a-zA-Z])
That will match:
([a-zA-Z]) Capture group 1, match a-z A-Z
([*+/-]) Capture group 2, match * + / -
= Match literally
([a-zA-Z]) Capture group 3, match a-z A-Z
Regex demo | Java demo
And replace with:
$1=$1$2$3

regex find string between 2 characters, seperated by comma

I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O

Regex to match words of a certain length

I would like to know the regex to match words such that the words have a maximum length.
for eg, if a word is of maximum 10 characters in length, I would like the regex to match, but if the length exceeds 10, then the regex should not match.
I tried
^(\w{10})$
but that brings me matches only if the minimum length of the word is 10 characters. If the word is more than 10 characters, it still matches, but matches only first 10 characters.
I think you want \b\w{1,10}\b. The \b matches a word boundary.
Of course, you could also replace the \b and do ^\w{1,10}$. This will match a word of at most 10 characters as long as its the only contents of the string. I think this is what you were doing before.
Since it's Java, you'll actually have to escape the backslashes: "\\b\\w{1,10}\\b". You probably knew this already, but it's gotten me before.
^\w{0,10}$ # allows words of up to 10 characters.
^\w{5,}$ # allows words of more than 4 characters.
^\w{5,10}$ # allows words of between 5 and 10 characters.
Length of characters to be matched.
{n,m} n <= length <= m
{n} length == n
{n,} length >= n
And by default, the engine is greedy to match this pattern. For example, if the input is 123456789, \d{2,5} will match 12345 which is with length 5.
If you want the engine returns when length of 2 matched, use \d{2,5}?
Method 1
Word boundaries would work perfectly here, such as with:
\b\w{3,8}\b
\b\w{2,}
\b\w{,10}\b
\b\w{5}\b
RegEx Demo 1
Java
Some languages such as Java and C++ are double-escape required:
\\b\\w{3,8}\\b
\\b\\w{2,}
\\b\\w{,10}\\b
\\b\\w{5}\\b
PS: \\b\\w{,10}\\b may not work for all languages or flavors.
Test 1
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "\\b\\w{3,8}\\b";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 1
Full match: words
Full match: with
Full match: length
Full match: three
Full match: eight
Method 2
Another good-to-know method is to use negative lookarounds:
(?<!\w)\w{3,8}(?!\w)
(?<!\w)\w{2,}
(?<!\w)\w{,10}(?!\w)
(?<!\w)\w{5}(?!\w)
Java
(?<!\\w)\\w{3,8}(?!\\w)
(?<!\\w)\\w{2,}
(?<!\\w)\\w{,10}(?!\\w)
(?<!\\w)\\w{5}(?!\\w)
RegEx Demo 2
Test 2
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "(?<!\\w)\\w{1,10}(?!\\w)";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 2
Full match: words
Full match: with
Full match: length
Full match: three
Full match: to
Full match: eight
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Even, I was looking for the same regex but I wanted to include the all special character and blank spaces too. So here is the regex for that:
^[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{0,10}$
Simple, complete and tested java code, for finding words of certain length n:
int n = 10;
String regex = "\\b\\w{" + n + "}\\b";
String str = "Hello, this is a test 1234567890";
ArrayList<String> words = new ArrayList<>();
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
words.add(matcher.group(0));
}
System.out.println(words);
For more explanations and different options - see other answers.
Liked Pardeep's answer but I needed whole word bounds in a string/title that can be any messed up string an advertising dept. can think up .
**\b\w(**[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22}**)\b**
should iterate through a string ( tested notepad++ ) and get the largest group of words in the range i.e. 1,22 chars here without splitting mid word.
Here was the final command for me in python to add some LF's
name = re.sub(r"\b(\w[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22})\b","\\\1\\\n",name)

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks
If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters
Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]
^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

Categories