Java Regular Expression for finding specific string - java

I have a file with a long string a I would like to split it by specific item i.e.
String line = "{{[Metadata{"this, is my first, string"}]},{[Metadata{"this, is my second, string"}]},{[Metadata{"this, is my third string"}]}}"
String[] tab = line.split("(?=\\bMetadata\\b)");
So now when I iterate my tab I will get lines starting from word: "Metadata" but I would like lines starting from:
"{[Metadata"
I've tried something like:
String[] tab = line.split("(?=\\b{[Metadata\\b)");
but it doesnt work.
Can anyone help me how to do that, plese?

You may use
(?=\{\[Metadata\b)
See a demo on regex101.com.
Note that the backslashes need to be escaped in Java so that it becomes
(?=\\{\\[Metadata\\b)

Here is solution using a formal pattern matcher. We can try matching your content using the following regex:
(?<=Metadata\\{\")[^\"]+
This uses a lookbehind to check for the Metadata marker, ending with a double quote. Then, it matches any content up to the closing double quote.
String line = "{{[Metadata{\"this, is my first, string\"}]},{[Metadata{\"this, is my second, string\"}]},{[Metadata{\"this, is my third string\"}]}}";
String pattern = "(?<=Metadata\\{\")[^\"]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find( )) {
System.out.println(m.group(0));
}
this, is my first, string
this, is my second, string
this, is my third string

Related

Need Regex to replace all characters between first set of parenthesis from a string

I've been able to generate a regex to pull everything that is between parenthesis in a string, but I'm unclear on how to make it only happen once and only with the first set. In JAVA:
My current pattern = "\\(([^)]+)\\)"
Any help would be greatly appreciated.
Use replaceFirst instead of replaceAll
OR if you must use replaceAll let it consume rest of your string and put it back again like
replaceAll("yourRegex(.*)","yourReplacement$1");
where $1 represents match from first group (.*).
try:
String x= "Hie(Java)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
while(m.find()) {
System.out.println(m.group(1));
}
or
String str = "Hie(Java)";
String answer = str.substring(str.indexOf("(")+1,str.indexOf(")"));
for last index:
update with
String answer = str.substring(str.indexOf("(")+1,str.lastIndexOf(")"));

regular expression text between two sign

I have a text and I want to replace variables in it with proper values and my variables located between two #. When I use [/(?m)#.*?#/] to get these texts it also returns texts before and after first and last #. how could I get texts only between these two # sign. thanks in advance.
I use String.split("") method in Java.
for example I want use on the following String:
this is #the best# possible way #t#o do result!!!
and I wanna get these two results:
the best
t
In Java you can use this regex to grab value between first and second #:
String repl = input.replaceFirst("(?m)^[^#]*#([^#]*)#.*$" "$1");
To grab value between first and last #:
String repl = input.replaceFirst("(?m)^[^#]*#(.*?)#[^#]*$" "$1");
To find multiple matches use Pattern, Matcher:
Pattern p = Pattern.compile("#([^#]*)#"):
Matcher m = p.matcher(p);
while (m.find()) {
System.out.prinln(m.group(1));
}
RegEx Demo
Split() is the wrong tool to use here, use the Matcher() method to do this instead.
String s = "this is #the best# possible way #t#o do result!!!";
Pattern p = Pattern.compile("#([^#]*)#");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Output
the best
t

Splitting strings delimited by [[ ]] in java?

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.
You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology
A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");
It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]
Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Optionally using String.split(), split a string at the last occurance of a delimiter

I have a string that matches this regular expression: ^.+:[0-9]+(\.[0-9]+)*/[0-9]+$ which can easily be visualized as (Text):(Double)/(Int). I need to split this string into the three parts. Normally this would be easy, except that the (Text) may contain colons, so I cannot split on any colon - but rather the last colon.
The .* is greedy so it already does a pretty neat job of doing this, but this wont work as a regular expression into String.split() because it will eat my (Text) as part of the delimiter. Ideally I'd like to have something that would return a String[] with three strings. I'm 100% fine with not using String.split() for this.
I don't like regex (just kidding I do but I'm not very good at it).
String s = "asdf:1.0/1"
String text = s.substring(0,s.lastIndexOf(":"));
String doub = s.substring(s.lastIndexOf(":")+1,text.indexOf("/"));
String inte = s.substring(text.indexOf("/")+1,s.length());
Why don't you just use a straight up regular expression?
Pattern p = Pattern.compile("^(.*):([\\d\\.]+)/(\\d+)$");
Matcher m = p.matcher( someString );
if (m.find()) {
m.group(1); // returns the text before the colon
m.group(2); // returns the double between the colon and the slash
m.group(3); // returns the integer after the slash
}
Or similar. The pattern ^(.*):([\d\.]+)/(\d+)$ assumes that you actually have values in all three positions, and will allow just a period/fullstop in the double position, so you may want to tweak it to your specifications.
String.split() is typically used in simpler scenarios where the delimiter and formatting are more consistent and when you don't know how many elements you are going to be splitting.
Your use case calls for a plain old regular expression. You know the formatting of the string, and you know you want to collect three values. Try something like the following.
Pattern p = Pattern.compile("(.+):([0-9\\.]+)/([0-9]+)$");
Matcher m = p.matcher(myString);
if (m.find()) {
String myText = m.group(1);
String myFloat = m.group(2);
String myInteger = m.group(3);
}

java regular expression get substring

I can't find any good resource for parsing with regular expression. Could someone please show me the way.
How can I parse this statement?
"Breakpoint 10, main () at file.c:10"
I want get the substring "main ()" or 3rd word of the statement.
This works:
public void test1() {
String text = "Breakpoint 10, main () at file.c:10";
String regex = ",(.*) at";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Basically the regular expression .(.*) at with group(1) returns the value main ().
Assuming you want the 3rd word of your string (as said in your comments), first break it using a StringTokenizer. That will allow you to specify separator (space is by default)
List<String> words = new ArrayList<String>();
String str = "Breakpoint 10, main () at file.c:10";
StringTokenizer st = new StringTokenizer(str); // space by default
while(st.hasMoreElements()){
words.add(st.nextToken());
}
String result = words.get(2);
That returns main
If you also want the (), as you defined spaces as separator, you also need to take the next word words.get(3)
Good website regular-expressions.info
Good online tester regexpal.com
Java http://download.oracle.com/javase/tutorial/essential/regex/
I turn to these when I want to play with Regex
Have you seen the standard Sun tutorial on regular expressions ? In particular the section on matching groups would be of use.
Try: .*Breakpoint \d+, (.*) at
Well, the regular expression main \(\) does parse this. However, I suspect that you would like everything after the first comman and before the last "at": ,(.*) at gives you that in group(1) that is opened by the parenthesis in the expression.

Categories