Java Regex match between string and a space - java

I have the following string in Java.
ActivityRecord{7615a77 u0 com.example.grano.example_project/.MainActivity t20}
My need is to get the string MainActivity, ie the part between the ./ and the space after the word.
So basically I'm looking for a regular expression able to catch something in the middle of given characters and a white space.

You can use the expression:
(?<=\/\.)\w+?(?=\s)
Broken down:
(?<= \/\. )
^ lookbehind
^ for a literal / followed by a literal .
\w +?
^ word character
^ one or more (non-greedy)
(?= \s )
^ lookahead
^ a whitespace character
Test it here.

Assuming your proceeding text doesn't have / in it and the text you want to isolate doesn't have a space in it, you can use this
replaceAll("^[^/]*/\\.([^ ]*).*$","$1"));
which looks from the start for the first /, then /., then captures everything up to the first space from that point, and then matches everything else, and replaces it all with the capture

You can use a regex like this /\.(.*?)\s with pattern like :
String str = ...;
String regex = "/\\.(.*?)\\s";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
//-------------------------------^-----get the group (.*?) between '/.' and 'space'
}
Output
MainActivity

Related

How to split by nonescaped dot and by ignoring double blackslash?

I need to split data by dot. And I have escaped dot(.), that I should ignore. Also I should ignore escaped backslash too (\).
For example,
data1\\.d\\\\\.ata2\\\\.da\.ta3.data4
This string should be splitted to for substrings like as
data1\\
d\\\\\.ata2\\\\
da\.ta3
data4
I cannot to create regex for that. Do you know, it is possible?
I tried to use following:
(?<!\\((\\\\){2,}))\\. - not working
I can create following regex if escaped slash defined only one time:
"((?<!\\\\)\\.)|((?=([^\\\\]*((\\\\\\\\)+[^\\\\]*)))\\.)";
For example data1\\.d\.ata2.da\.ta3.data4 splitted correctly:
data1\\
d\.ata2
da\.ta3
data4
But I cannot detect backslash definition even number times.
Can you help me, please?
Thank you!
You may extract these strings using
(?s)(?:[^\\.]|\\.)+
See the regex demo. Details:
(?s) - enable the Pattern.DOTALL flag so that . could match across lines
(?:[^\\.]|\\.)+ - one or more occurrences of any char other than \ and ., or a \ followed with any char.
See a Java demo:
String line = "data1\\\\.d\\.ata2.da\\.ta3.data4";
Pattern p = Pattern.compile("(?s)(?:[^\\\\.]|\\\\.)+");
Matcher m = p.matcher(line);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
// => [data1\\, d\.ata2, da\.ta3, data4]
You may use this regex to get your matches:
(?=[^.])[^.\\]*(?:\\.[^.\\]*)*(?=\.|$)
RegEx Demo
RegEx Demo:
(?=[^.]): Make sure there is non-dot character ahead
[^.\\]*: Match 0+ of any character that is not a . not a \
(?:\\.[^.\\]*)*: A non-capture group that matches an backslash followed by an escaped character and that should be followed by 0 or more of any character that is not a . not a \. Match 0 or more of this group
(?=\.|$): Make sure we have a dot or end of line ahead

Match starting and ending character using Java Matcher class

I want to get words from string that starts with # and end with space. I've tried using this Pattern.compile("#\\s*(\\w+)") but it doesn't include characters like ' or :.
I want the solution with only Pattern Matching method.
We can try matching using the pattern (?<=\\s|^)#\\S+, which would match any word starting with #, followed by any number of non whitespace characters.
String line = "Here is a #hashtag and here is #another has tag.";
String pattern = "(?<=\\s|^)#\\S+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(0));
}
#hashtag
#another
Demo
Note: The above solution might give you an edge case of pulling in punctuation which appears at the end of a hashtag. If you don't want this, then we can rephrase the regex to only match positive certain characters, e.g. letters and numbers. But, maybe this is not a concern for you.
The opposite of \s is \S, so you can use a regex like this:
#\s*(\S+)
Or for Java:
Pattern.compile("#\\s*(\\S+)")
It will capture anything that is not a white space.
See demo here.
If you want to stop on the space character and not any white space change the \S to [^ ].
The ^ inside the brackets means it will negate whatever comes after it.
Pattern.compile("#\\s*([^ ]+)")
See demo here.

Regex + Java - how to capture trailing numbers and everything else

i'm trying to capture 2 things in a String "T3st12345"
I want to capture the trailing numbers ("12345") and also the name of the test "T3st".
This is what I have right now to match the trailing numbers with java's Matcher library:
Pattern pattern = Pattern.compile("([0-9]*$)");
Matcher matcher = pattern.matcher("T3st12345");
but it returns "no match found".
How can I make this work for the trailing numbers and how do I capture the name of the test as well?
You can use this regex with 2 captured groups:
^(.*?)(\d+)$
RegEx Demo
RegEx Breakup:
^: Start
(.*?): Captured group #1 that matches zero of any character (lazy)
(\d+): Captured group #1 that matches one or more digits before End
$: End
You may use the following regex:
Pattern pattern = Pattern.compile("(\\p{Alnum}+?)([0-9]*)");
Matcher matcher = pattern.matcher("T3st12345");
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
The (\\p{Alnum}+?)([0-9]*) pattern is used in the .matches() method (to require a full string match) and matches and captures into Group 1 one or more alphanumeric chars, as few as possible (+? is a lazy quantifier), and captures into Group 2 any zero or more digits.
Note that \\p{Alnum} can be replaced with a more explicit [a-zA-Z0-9].

Java Regex finding a substring without using space character in the pattern string?

I have a string with the value AB-CD>AY-ZV (FG). Out of this, I would like to get only the value AB-CD>AY-ZV using regex.
In this, I do not want to use space \s as part of my pattern.
Any help is appreciated.
You could use the below regex to match the first part.
String s = "AB-CD>AY-ZV (FG)";
Matcher m = Pattern.compile("^\\S+").matcher(s);
if(m.find())
{
System.out.println(m.group());
}
Here \\S+ matches one or more non-space characters and ^ asserts that we are at the start.

How can I remove all leading and trailing punctuation?

I want to remove all the leading and trailing punctuation in a string. How can I do this?
Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.
., #, _, &, /, - are allowed if surrounded by letters
or digits
\' is allowed if preceded by a letter or digit
I tried
Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
term=term.replaceAll("(^\\p{Punct})", "");
but it didn't work!!
Ok. So basically you want to find some pattern in your string and act if the pattern in matched.
Doing this the naiive way would be tedious. The naiive solution could involve something like
while(myString.StartsWith("." || "," || ";" || ...)
myString = myString.Substring(1);
If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.
Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
As for your problem, you could try this:
myString.replaceFirst("^[^a-zA-Z]+", "")
The meaning of the regex:
the first ^ means that in this pattern, what comes next has to be at
the start of the string.
The [] define the chars. In this case, those are things that are NOT
(the second ^) letters (a-zA-Z).
The + sign means that the thing before it can be repeated and still
match the regex.
You can use a similar regex to remove trailing chars.
myString.replaceAll("[^a-zA-Z]+$", "");
the $ means "at the end of the string"
You could use a regular expression:
private static final Pattern PATTERN =
Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");
public static String trimPunctuation(String s) {
Matcher m = PATTERN.matcher(s);
m.find();
return m.group(1);
}
The boundary matchers ^ and $ ensure the whole input is matched.
A dot . matches any single character.
A star * means "match the preceding thing zero or more times".
The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).
The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.
Use this tutorial on patterns. You have to create a regex that matches string starting with alphabet or number and ending with alphabet or number and do inputString.matches("regex")

Categories