Issue in writing regular express - java

How to read the string from the position (Example)5 to the end of the string in java.
QRegExp StripType(re, Qt::CaseInsensitive);
int p = StripType.indexIn(line, 0);
int len = StripType.matchedLength();
String tmp += line.mid(len);
How to convert QT into java
Where re is in the above code is regular expression and i want to covert the above into java i have tried
String s =pattern.toString();
int pos = s.indexOf(line);
Matcher matcher = Pattern.compile(re).matcher(line);
if (matcher.find()) {
System.out.println(matcher.group());
} else {
System.out.println("String contains no character other than that");
}
len = matcher.start();
But its not working correct
Thanks in Advance

To begin with you should add the Pattern.CASE_INSENSITIVE flag.
Matcher matcher = Pattern.compile(re, Pattern.CASE_INSENSITIVE).matcher(line);

Related

regex doesn't find last word in my string

I have a regular expression [a-z]\d to unpack the text witch is compressed by simple rule
hellowoooorld -> hel2owo4rld
So now i have to unpack my text and it doesn't work correctly. It can't find last word in my String
it always like skip gu4ys
StringBuilder text = new StringBuilder("Hel2o peo7ple it is ou6r wo3rld gu4ys");
Pattern pattern = Pattern.compile("[a-z]\\d");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
int startWord = matcher.start();
int numLetters = Integer.parseInt(text.substring(startWord + 1, startWord + 2));
text.deleteCharAt(startWord + 1);
for (int i = 0; i < numLetters - 1; ++i) {
text.insert(startWord + 1, text.charAt(startWord));
}
}
System.out.println(text);
Result is : Hello peooooooople it is ouuuuuur wooorld gu4ys
I expect this : Hello peooooooople it is ouuuuuur wooorld guuuuys
I can't understand why it doesn't work all is simple
It seems like Java's Matcher checks your string size when it initializes, and doesn't go past that. You are inserting to the string, which makes it longer. The matcher doesn't check that far.
A quick, though slow, fix is to re-initialize the matcher every time.
StringBuilder text = new StringBuilder("Hel2o peo7ple it is ou6r wo3rld gu4ys");
Pattern pattern = Pattern.compile("[a-z]\\d");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int startWord = matcher.start();
int numLetters = Integer.parseInt(text.substring(startWord + 1, startWord + 2));
text.deleteCharAt(startWord + 1);
for (int i = 0; i < numLetters - 1; ++i) {
text.insert(startWord + 1, text.charAt(startWord));
}
matcher = pattern.matcher(text);
}
System.out.println(text);
A faster approach would find the numbers, calculate the string length and then manually construct the string using the found numbers.
The issue is probably that the matcher is only finding the pattern [a-z]\d, which matches a single letter followed by a digit, but it is not finding the last word "gu4ys" because it doesn't match that pattern.
To fix this, you can modify the regular expression to include an optional group that matches any remaining letters at the end of the text.
Try this regex and please let me know if it worked :)
"[a-z]\d|[a-z]+"

BREAK using Regex in JAVA

I'm new to using Regex
I want to get macapp value in url just get number 12 to String how do that ?
String url = "stackoverflow.com/questions/ask:macapp=12";
Pattern pattern = ;// ?
Matcher matcher =;// ?
if(matcher.find())
{
//result = only 12
}
THX for your TIME
Pattern p = Pattern.compile("^.*:macapp=(\\d+)$");
Matcher m = p.matcher(s);
if (m.find()) {
int n = Integer.valueOf(m.group(1));
...
}

How to get String between last two underscore

I have a string "abcde-abc-db-tada_x12.12_999ZZZ_121121.333"
The result I want should be 999ZZZ
I have tried using:
private static String getValue(String myString) {
Pattern p = Pattern.compile("_(\\d+)_1");
Matcher m = p.matcher(myString);
if (m.matches()) {
System.out.println(m.group(1)); // Should print 999ZZZ
}
else {
System.out.println("not found");
}
}
If you want to continue with a regex based approach, then use the following pattern:
.*_([^_]+)_.*
This will greedily consume up to and including the second to last underscrore. Then it will consume and capture 9999ZZZ.
Code sample:
String name = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
Pattern p = Pattern.compile(".*_([^_]+)_.*");
Matcher m = p.matcher(name);
if (m.matches()) {
System.out.println(m.group(1)); // Should print 999ZZZ
} else {
System.out.println("not found");
}
Demo
Using String.split?
String given = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String [] splitted = given.split("_");
String result = splitted[splitted.length-2];
System.out.println(result);
Apart from split you can use substring as well:
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String ss = (s.substring(0,s.lastIndexOf("_"))).substring((s.substring(0,s.lastIndexOf("_"))).lastIndexOf("_")+1);
System.out.println(ss);
OR,
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String arr[] = s.split("_");
System.out.println(arr[arr.length-2]);
The get text between the last two underscore characters, you first need to find the index of the last two underscore characters, which is very easy using lastIndexOf:
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String r = null;
int idx1 = s.lastIndexOf('_');
if (idx1 != -1) {
int idx2 = s.lastIndexOf('_', idx1 - 1);
if (idx2 != -1)
r = s.substring(idx2 + 1, idx1);
}
System.out.println(r); // prints: 999ZZZ
This is faster than any solution using regex, including use of split.
As I misunderstood the logic from the code in question a bit with the first read and in the meantime there appeared some great answers with the use of regular expressions, this is my try with the use of some methods contained in String class (it introduces some variables just to make it more clear to read, it could be written in the shorter way of course) :
String s = "abcde-abc-db-ta__dax12.12_999ZZZ_121121.333";
int indexOfLastUnderscore = s.lastIndexOf("_");
int indexOfOneBeforeLastUnderscore = s.lastIndexOf("_", indexOfLastUnderscore - 1);
if(indexOfLastUnderscore != -1 && indexOfOneBeforeLastUnderscore != -1) {
String sub = s.substring(indexOfOneBeforeLastUnderscore + 1, indexOfLastUnderscore);
System.out.println(sub);
}

Finding longest regex match in Java?

I have this:
import java.util.regex.*;
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = m.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
The above only prints hello whereas I want it to print hello world.
One way to fix this is to re-order the groups in String regex = "(?<m2>(hello world))|(?<m1>(hello|universe))" but I don't have control over the regex I get in my case...
So what is the best way to find the longest match? An obvious way would be to check all possible substrings of s as mentioned here (Efficiently finding all overlapping matches for a regular expression) by length and pick the first but that is O(n^2). Can we do better?
Here is a way of doing it using matcher regions, but with a single loop over the string index:
public static String findLongestMatch(String regex, String s) {
Pattern pattern = Pattern.compile("(" + regex + ")$");
Matcher matcher = pattern.matcher(s);
String longest = null;
int longestLength = -1;
for (int i = s.length(); i > longestLength; i--) {
matcher.region(0, i);
if (matcher.find() && longestLength < matcher.end() - matcher.start()) {
longest = matcher.group();
longestLength = longest.length();
}
}
return longest;
}
I'm forcing the pattern to match until the region's end, and then I move the region's end from the rightmost string index towards the left. For each region's end tried, Java will match the leftmost starting substring that finishes at that region's end, i.e. the longest substring that ends at that place. Finally, it's just a matter of keeping track of the longest match found so far.
As a matter of optimization, and since I start from the longer regions towards the shorter ones, I stop the loop as soon as all regions that would come after are already shorter than the length of longest substring already found.
An advantage of this approach is that it can deal with arbitrary regular expressions and no specific pattern structure is required:
findLongestMatch("(?<m1>(hello|universe))|(?<m2>(hello world))", "hello world")
==> "hello world"
findLongestMatch("hello( universe)?", "hello world")
==> "hello"
findLongestMatch("hello( world)?", "hello world")
==> "hello world"
findLongestMatch("\\w+|\\d+", "12345 abc")
==> "12345"
If you are dealing with just this specific pattern:
There is one or more named group on the highest level connected by |.
The regex for the group is put in superfluous braces.
Inside those braces is one or more literal connected by |.
Literals never contain |, ( or ).
Then it is possible to write a solution by extracting the literals, sorting them by their length and then returning the first match:
private static final Pattern g = Pattern.compile("\\(\\?\\<[^>]+\\>\\(([^)]+)\\)\\)");
public static final String findLongestMatch(String s, Pattern p) {
Matcher m = g.matcher(p.pattern());
List<String> literals = new ArrayList<>();
while (m.find())
Collections.addAll(literals, m.group(1).split("\\|"));
Collections.sort(literals, new Comparator<String>() {
public int compare(String a, String b) {
return Integer.compare(b.length(), a.length());
}
});
for (Iterator<String> itr = literals.iterator(); itr.hasNext();) {
String literal = itr.next();
if (s.indexOf(literal) >= 0)
return literal;
}
return null;
}
Test:
System.out.println(findLongestMatch(
"hello world",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: hello world
System.out.println(findLongestMatch(
"hello universe",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: universe
just add the $ (End of string) before the Or separator |.
Then it check whether the string is ended of not. If ended, it will return the string. Otherwise skip that part of regex.
The below code gives what you want
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
Likewise, the below code will skip hello , hello world and match hello world there
See the usage of $ there
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))$|(?<m3>(hello world there))";
String s = "hello world there";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
If the structure of the regex is always the same, this should work:
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
//split the regex into the different groups
String[] allParts = regex.split("\\|\\(\\?\\<");
for (int i=1; i<allParts.length; i++) {
allParts[i] = "(?<" + allParts[i];
}
//find the longest string
int longestSize = -1;
String longestString = null;
for (int i=0; i<allParts.length; i++) {
Pattern pattern = Pattern.compile(allParts[i]);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
if (substring.length() > longestSize) {
longestSize = substring.length();
longestString = substring;
}
}
}
System.out.println("Longest: " + longestString);

extracting a specific part of a url using regex

i wanna extract a part of url which is at the middle of it, by using regex in java
this is what i tried,mostly the problem to detect java+regexis that its in the middle of last part of url and i have no idea how to ignore the characters after it, my regex just ignoring before it:
String regex = "https://www\\.google\\.com/(search)?q=([^/]+)/";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
}
the result should be regex+java or even regex java . but my code didnt work out...
Try:
String regex = "https://www\\.google\\.com/search\\?q=([^&]+).*";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
The result is:
https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
regex+java
EDIT
Replacing all pluses before printing:
for (int i = 0; i <= n; ++i) {
String str = matcher.group (i).replaceAll("\\+", " ");
System.out.println (str);
}
String regex = "https://www\\.google\\.com/?(search)\\?q=([^&]+)?";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group());
}
This should do your job.

Categories