Regular expression to get characters before brackets or comma - java

I'm pulling my hair out a bit with this.
Say I have a string 7f8hd::;;8843fdj fls "": ] fjisla;vofje]]} fd)fds,f,f
I want to now extract this 7f8hd::;;8843fdj fls "": from the string based on the premise that the string ends with either a } or ] or , or ) but all those characters could be present I only need the first one.
I have tried without success to create a regular expression with a Matcher and Pattern class but I just can't seem to get it right.
The best I could come up with is below but my reg exp just doesn't seem to work like I think it should.
String line = "7f8hd::;;8843fdj fls "": ] fjisla;vofje]]} fd)fds,f,f";
Matcher m = Pattern.compile("(.*?)\\}|(.*?)\\]|(.*?)\\)|(.*?),").matcher(line);
while (matcher.find()) {
System.out.println(matcher.group());
}
I'm clearly not understanding reg exp correctly. Any help would be great.

^[^\]}),]*
matches from the start of the string until (but excluding) the first ], }, ) or ,.
In Java:
Pattern regex = Pattern.compile("^[^\\]}),]*");
Matcher regexMatcher = regex.matcher(line);
if (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
(You can actually remove the backslashes ([^]}),]), but I like to keep them there for clarity and for compatibility since not all regex engines recognize that idiom.)
Explanation:
^ # Match the start of the string
[^\]}),]* # Match zero or more characters except ], }, ) or ,

you could just cut the rest part by replaceAll:
String newStr = yourStr.replaceAll("[\\])},].*", "");
or by split() and get the first element.
String newStr = yourStr.split("[\\])},]")[0];

You can use this (as java string):
"(.+?)[\\]},)].*"
here is a fiddle

Could you try the regular expression (.*?)[}\]),](.*?) I tested it on rubular and worked against your example.

Related

${} - Regex Expression

I really tried to learn Regex expressions correctly, but, it really blows my mind when I need to build one of them. It`s painful and I lost several hours to build them.
So, I need the community help. I`ve an XML String, and I want to build a Regex pattern to identify any occurence of:
${Variable1}
${VARIABLE_TEST}
etc. So, anything that starts with ${ and ends with }.
Could anyone help-me?
Try with following regex:
\${([^}]+)}
Explanation:
\${ - starts with ${ (we have to escape special character $)
([^}]+) - match everything that is not }
} - ending character
demo
Regex with escaped { and }:
\$\{([^}]+)\}
try this
Matcher m = Pattern.compile("\\$\\{(.*?)}").matcher(s);
while(m.find()) {
System.out.println(m.group(1));
}
In JavaScript
const regex = /\${([^}]*)}/g;
const str = " Hello ${Variable1}, Well How are you ${VARIABLE TEST} It empty ${} It's numeric ${123\$\$\$}, numeric plus $pecial char ${#123123} ${\$%!##}";
console.log(str.match(regex));
In Java Demo
final String regex = "\\$\\{([^\\}]*)\\}";
Matcher m = Pattern.compile(regex).matcher(text);
while(m.find()) {
System.out.println(m.group(1));
}

Splitting strings delimited by [[ ]] in java?

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.
You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology
A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");
It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]
Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

regex pattern - extract a string only if separated by a hyphen

I've looked at other questions, but they didn't lead me to an answer.
I've got this code:
Pattern p = Pattern.compile("exp_(\\d{1}-\\d)-(\\d+)");
The string I want to be matched is: exp_5-22-718
I would like to extract 5-22 and 718. I'm not too sure why it's not working What am I missing? Many thanks
Try this one:
Pattern p = Pattern.compile("exp_(\\d-\\d+)-(\\d+)");
In your original pattern you specified that second number should contain exactly one digit, so I put \d+ to match as more digits as we can.
Also I removed {1} from the first number definition as it does not add value to regexp.
If the string is always prefixed with exp_ I wouldn't use a regular expression.
I would:
replaceFirst() exp_
split() the resulting string on -
Note: This answer is based on the assumptions. I offer it as a more robust if you have multiple hyphens. However, if you need to validate the format of the digits then a regular expression may be better.
In your regexp you missed required quantifier for second digit \\d. This quantifier is + or {2}.
String yourString = "exp_5-22-718";
Matcher matcher = Pattern.compile("exp_(\\d-\\d+)-(\\d+)").matcher(yourString);
if (matcher.find()) {
System.out.println(matcher.group(1)); //prints 5-22
System.out.println(matcher.group(2)); //prints 718
}
You can use the string.split methods to do this. Check the following code.
I assume that your strings starts with "exp_".
String str = "exp_5-22-718";
if (str.contains("-")){
String newStr = str.substring(4, str.length());
String[] strings = newStr.split("-");
for (String string : strings) {
System.out.println(string);
}
}

How to find and replace a substring?

For example I have such a string, in which I must find and replace multiple substrings, all of which start with #, contains 6 symbols, end with ' and should not contain ) ... what do you think would be the best way of achieving that?
Thanks!
Edit:
just one more thing I forgot, to make the replacement, I need that substring, i.e. it gets replaces by a string generated from the substring being replaced.
yourNewText=yourOldText.replaceAll("#[^)]{6}'", "");
Or programmatically:
Matcher matcher = Pattern.compile("#[^)]{6}'").matcher(yourOldText);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb,
// implement your custom logic here, matcher.group() is the found String
someReplacement(matcher.group());
}
matcher.appendTail(sb);
String yourNewString = sb. toString();
Assuming you just know the substrings are formatted like you explained above, but not exactly which 6 characters, try the following:
String result = input.replaceAll("#[^\\)]{6}'", "replacement"); //pattern to replace is #+6 characters not being ) + '
You must use replaceAll with the right regular expression:
myString.replaceAll("#[^)]{6}'", "something")
If you need to replace with an extract of the matched string, use a a match group, like this :
myString.replaceAll("#([^)]{6})'", "blah $1 blah")
the $1 in the second String matches the first parenthesed expression in the first String.
this might not be the best way to do it but...
youstring = youstring.replace("#something'", "new stringx");
youstring = youstring.replace("#something2'", "new stringy");
youstring = youstring.replace("#something3'", "new stringz");
//edited after reading comments, thanks

Categories