Split a string in java using regex - java

I am trying to split a string using a regex "A.*B", which works just fine to retrieve strings between 'A' and 'B'. But the dot '.' doesn't include new line characters \n,\r.
Can you please guide me on how to achieve this?
Thanks
Thanks all. Pattern.DOTALL worked like a charm.
I had another question related to this. What should be done if I need to extract all the strings between 'A' and 'B' (which basically match the above regex).
I tried using find() and group() of matcher class, but with the pattern below it seems to return the whole string.
Pattern p = Pattern.compile("A.*B",Pattern.DOTALL);

Use a java.util.regex.Pattern with the MULTILINE flag:
import java.util.regex.Pattern;
Pattern pattern = Pattern.compile("A.*B", Pattern.MULTILINE);
pattern.split(string);

Compile the regex with this option: Pattern regex = Pattern.compile("A.*B",Pattern.DOTALL)

Try "A[.\\s]*B"
Or you may specify the DOTALL switch so that "." will include even line terminators. Take a look aƄ the documentation of the Pattern class.

Have a look at java.util.regex.Pattern.compile(String regex, int flags), esp. the DOTALL flag

I assume you use the Pattern, Matcher classes for this.
Have you tried providing MULTILINE to your Pattern.compile() method?
Pattern.compile(regex, Pattern.MULTILINE)
'.' = Any character (may or may not match line terminators)
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#lt

Try changing yor regex to "A(.|\\s)*B" This means A followed by any character(.) or any white character(\s) any number of times followed by B (double scaped \s is needed at java Code).
Reference for Regular Expressions (constructs, spacial characters, etc.) in Java: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Related

Java regex for matching #<string>vs<string>

I have a string "Waiting for match #indvspak and #indvsaus" and want to match the strings "#indvspak" and "#indvsaus" seperately.
I am using the following regex (^|)#.*vs.+?\s\b. But it matches the entire string starting from the hash sign. How can i achieve my requirement please help.
I though you want to match the string which startswith # contains vs and the whole string must be preceded by a non-space character.
"(?<!\\S)#\\S*vs\\S+"
(?<!\\S) negative look-behind asserts that the match won't be preceded by a non-space character.
Code:
String s = "Waiting for match #indvspak and #indvsaus";
Matcher m = Pattern.compile("(?<!\\S)#\\S*vs\\S+").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
Output:
#indvspak
#indvsaus
You need this regex:
#[^\\s]+
it matches anything after (including) # but not spaces.
Edit:
As #AvinashRaj suggested, if you want to ensure "vs" appears in the hashtag, you should use a negative lookbehind.
I highly recommend you to go though the String API, there are many methods that can help you with your problem.
EDITED
(copied from other answer comments)
Use this:
"(?<!\\B)#\\w+vs\\o/\S#vas\\S-[]"
Easy...

Regex to match \a574322 in Java

I have long string looking like this: \c53\e59\c9\e28\c20140326\a4095\c8\c15\a546\c11 and I need to find expressions starting with \a and followed by digits. For example: \a574322
And I have no idea how to build it. I can't use:
Pattern p = Pattern.compile("\\a\\d*");
because \a is special character in regex.
When I try to group it like this:
Pattern p = Pattern.compile("(\\)(a)(\\d)*");
I get unclosed group error even though there is even number of brackets.
Can you help me with this?
Thank you all very much for solution.
You can use this regex:
\\\\a\\d+
Code Demo
Since in Java you need to double escape the \\ once for String and second time for regex engine.
You have to change your regex to:
Pattern p = Pattern.compile("(\\\\a\\d+)");
The regex is:
(\\a\d+)
The idea is to escape a backslash and then also escape the backslash for \a, and match digits too.
You need 4 \.
2 to indicate to regex that it is not a special character, but a plain \, and 2 for each to tell the Java String that these are not special characters either. So you need to represent it in code this way:
"\\\\a\\d*"
Which is actually the regex \\a\d*
\\(a)[0-9]+ this should work
you can't try your regexps on this page or some similar
http://regex101.com/

java Regex - split but ignore text inside quotes?

using only regular expression methods, the method String.replaceAll and ArrayList
how can i split a String into tokens, but ignore delimiters that exist inside quotes?
the delimiter is any character that is not alphanumeric or quoted text
for example:
The string :
hello^world'this*has two tokens'
should output:
hello
worldthis*has two tokens
I know there is a damn good and accepted answer already present but I would like to add another regex based (and may I say simpler) approach to split the given text using any non-alphanumeric delimiter which not inside the single quotes using
Regex:
/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/
Which basically means match a non-alphanumeric text if it is followed by even number of single quotes in other words match a non-alphanumeric text if it is outside single quotes.
Code:
String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);
Output:
[hello, world'this*has two tokens', 2ndToken]
Demo:
Here is a live working Demo of the above code.
Use a Matcher to identify the parts you want to keep, rather than the parts you want to split on:
String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
See it working online: ideone
You cannot in any reasonable way. You are posing a problem that regular expressions aren't good at.
Do not use a regular expression for this. It won't work. Use / write a parser instead.
You should use the right tool for the right task.

How to include symbol of the end of the line ("\n") in regexp

i have regular expression which need to cover multi lines(delete comments from pascal file)
\(\*.*?\*\)|\{.*?\}|\/\/(.*$)
this works almost fine but
\(\*.*?\*\)
and
\{.*?\}
are supposed to work for multilines, but work for single only. How to make them working right(and dont make
//(.*$)
work for multi lines)
Thanks in advance
You are looking for the Pattern.DOTALL flag. Pass it to Pattern.compile like so:
Pattern p = Pattern.compile("regex", Pattern.DOTALL);
You can also set it in the regex with (?s), like: "(?s)regex"
This will match literally everything, including newlines:
Pattern regex = Pattern.compile("[\\s\\S]*");

How to match a set of string in regexp

How do I combine ch..+ and ch..- in regexp effectively without having to scan separately?
And are we using matcher in the pattern?
My output code is like this:
ch01+
ch01-
ch02+
ch02-
...
How do I combine ch..+ and ch..- in regexp effectively without having to scan separately?
Use | (pipe) for alternation:
ch..(\+|-)
And are we using matcher in the pattern?
Depends on how you're using the regexp and the pattern. To get a concrete answer, you'll have to show some actual code, or ask a much more specific question.
N.B. If you want to restrict the two characters after ch to 0-9, you can use \d, which is a shorthand character class for [0-9]:
ch\d{2}(\+|-)
You can use a character class containing just "+" and "-" like so "[+-]".
Pattern p = Pattern.compile("ch..[+-]");
Matcher m = p.matcher("ch01+");
if (m.find()) {
// found it...

Categories