Find the highest number between two literal strings using a Regular Expression - java

What is the best way to use regex to get the highest number in a group of strings that match a certain pattern.
For example:
Suppose I wanted to find the next integer to suffix an Untitled file.
Here is an example of already existing file names:
Untitled1.java -> should match
untitled2.java -> should match (case insensitive)
MyFile.java -> should not match (does not contain Untitled#.java)
NotUntitled3.java -> should not match (does not exactly match Untitled#.java)
In this example the function below should return: 3
public int getNextUntitledFileSuffix(String[] fileNames){
int nextSuffix = 1;
int maxSuffix = 1;
for (int i = 0; i < fileNames.length; i++){
//use regex to set nextSuffix
}
return nextSuffix;
}

You can use this code to extract the numeric portion from the untitledNNN.java file name:
Pattern p = Pattern.compile("^untitled(\\d+)[.]java$", Pattern.CASE_INSENSITIVE);
for (String fileName : fileNames) {
Matcher m = p.matcher(fileName);
if (!m.find()) {
continue;
}
String digits = m.group(1);
... // Parse and find the max
}
Demo.
Since you are OK with throwing an exception when the number does not fit in an int, you could mark your method with throws NumberFormatException, and use Integer.parseInt(digits) to get the value. After that you could compare the number with maxSuffix, a running max value of the sequence. You should start maxSuffix at zero, not at one, because you will increment it at the end.
To avoid an overflow, check if maxSuffix is equal to Integer.MAX_VALUE before returning maxSuffix+1.

I added the rest of the logic based on dasblinkenlight's answer:
public int getNextUntitledFileSuffix(List<String> fileNames) throws NumberFormatException
{
int maxSuffix = 0;
final Pattern pattern = Pattern.compile("^untitled(\\d+)[.]java$", Pattern.CASE_INSENSITIVE);
for (String fileName : fileNames)
{
Matcher matcher = pattern.matcher(fileName);
if (matcher.find())
{
int suffix = Integer.parseInt(matcher.group(1));
if (suffix > maxSuffix)
{
maxSuffix = suffix;
}
}
}
return maxSuffix + 1;
}

Related

capturing group of consecutive digits using regex

i'm trying to capture only the two 6's adjacent to each other and get how many times did it occur using regex like if we had 794234879669786694326666976 the answer should be 2 or if its 66666 it should be zero and so on i'm using the following code and captured it by this (66)* and using matcher.groupcount() to get how many times did it occur but its not working !!!
package me;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class blah {
public static void main(String[] args) {
// Define regex to find the word 'quick' or 'lazy' or 'dog'
String regex = "(66)*";
String text = "6678793346666786784966";
// Obtain the required matcher
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
int match=0;
int groupCount = matcher.groupCount();
System.out.println("Number of group = " + groupCount);
// Find every match and print it
while (matcher.find()) {
match++;
}
System.out.println("count is "+match);
}
}
One approach here would be to use lookarounds to ensure that you match only islands of exactly two sixes:
String regex = "(?<!6)66(?!6)";
String text = "6678793346666786784966";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
This finds a count of two, for the input string you provided (the two matches being the 66 at the very start and end of the string).
The regex pattern uses two lookarounds to assert that what comes before the first 6 and after the second 6 are not other sixes:
(?<!6) assert that what precedes is NOT 6
66 match and consume two 6's
(?!6) assert that what follows is NOT 6
You need to use
String regex = "(?<!6)66(?!6)";
See the regex demo.
Details
(?<!6) - no 6 right before the current location
66 - 66 substring
(?!6) - no 6 right after the current location.
See the Java demo:
String regex = "(?<!6)66(?!6)";
String text = "6678793346666786784966";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
int match=0;
while (matcher.find()) {
match++;
}
System.out.println("count is "+match); // => count is 2
This didn't take long to come up with. I like regular expressions but I don't use them unless really necessary. Here is one loop method that appears to work.
char TARGET = '6';
int GROUPSIZE = 2;
// String with random termination character that's not a TARGET
String s = "6678793346666786784966" + "z";
int consecutiveCount = 0;
int groupCount = 0;
for (char c : s.toCharArray()) {
if (c == TARGET) {
consecutiveCount++;
}
else {
// if current character is not a TARGET, update group count if
// consecutive count equals GROUPSIZE
if (consecutiveCount == GROUPSIZE) {
groupCount++;
}
// in any event, reset consecutive count
consecutiveCount = 0;
}
}
System.out.println(groupCount);

How to return the first chunk of either numerics or letters from a string?

For example, if I had (-> means return):
aBc123afa5 -> aBc
168dgFF9g -> 168
1GGGGG -> 1
How can I do this in Java? I assume it's something regex related but I'm not great with regex and so not too sure how to implement it (I could with some thought but I have a feeling it would be 5-10 lines long, and I think this could be done in a one-liner).
Thanks
String myString = "aBc123afa5";
String extracted = myString.replaceAll("^([A-Za-z]+|\\d+).*$", "$1");
View the regex demo and the live code demonstration!
To use Matcher.group() and reuse a Pattern for efficiency:
// Class
private static final Pattern pattern = Pattern.compile("^([A-Za-z]+|\\d+).*$");
// Your method
{
String myString = "aBc123afa5";
Matcher matcher = pattern.matcher(myString);
if(matcher.matches())
System.out.println(matcher.group(1));
}
Note: /^([A-Za-z]+|\d+).*$ and /^([A-Za-z]+|\d+)/ both works in similar efficiency. On regex101 you can compare the matcher debug logs to find out this.
Without using regex, you can do this:
String string = "168dgFF9g";
String chunk = "" + string.charAt(0);
boolean searchDigit = Character.isDigit(string.charAt(0));
for (int i = 1; i < string.length(); i++) {
boolean isDigit = Character.isDigit(string.charAt(i));
if (isDigit == searchDigit) {
chunk += string.charAt(i);
} else {
break;
}
}
System.out.println(chunk);
public static String prefix(String s) {
return s.replaceFirst("^(\\d+|\\pL+|).*$", "$1");
}
where
\\d = digit
\\pL = letter
postfix + = one or more
| = or
^ = begin of string
$ = end of string
$1 = first group `( ... )`
An empty alternative (last |) ensures that (...) is always matched, and always a replace happens. Otherwise the original string would be returned.

how to use one string to match many rules?

Conditions:
there are many rules ,maybe hundreds, which are like :
{aab*, aabc*,
aabcdd*, dtctddds*,
*ddt*,
*cddt*,
*bcddt*,
*t,
*ttt,
*ccddttt}
each time I will get one string, then I should find the longest matched rule.
Examples:
example 1.string is aabcddttt the matched rule should be: aabcdd*
example 2. string is accddttt the matched rule should be *ccddttt
Question:
I don't want to use the rules in a long array to match the string one by one,that is inefficient method.maybe I should use the string as a regex to match the hundred rules.But yet I can't find a elegant way to solve this problem.
Can I use some regexes to get the result?
Which is the best/fastest way to match?
Java, plain C or shell are preferred,please don't use C++ STL
Longest common substring
Perhaps this algorithm is what you are looking for =).
Why not do it simply?
String[] rules = {"^aab", "bcd", "aabcdd$", "dtctddds$", "^ddt$", "^cddt$", "^bcddt$", "^t", "^ttt", "^ccddttt"};
String testCase = "aabcddttt";
for (int i = 0; i < rules.length; i++) {
Pattern p = Pattern.compile(rules[i]);
Matcher m = p.matcher(testCase);
if (m.find()) {
System.out.println("String: " + testCase + " has matched the pattern " + rules[i]);
}
}
So basically in this case, rules[0], which is ^aab found because carrot (^) means string must begin with ^aab. On the other hand, bba$ means string must end with bba. And rules1 is found because it means the rule can appear anywhere from the testCase (e.g. bcd).
You could try matching them all at once with a brackets around each sub-rule. You could use the group to determine which matched.
public static void main(String... ignored) {
for (String test : "aabaa,wwwaabcdddd,abcddtxyz".split(",")) {
System.out.println(test + " matches " + longestMatch(test, "aab*", "aabc*", "aabcdd*", "dtctddds*", "ddt"));
}
}
public static String longestMatch(String text, String... regex) {
String[] sortedRegex = regex.clone();
Arrays.sort(sortedRegex, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.length() - o1.length();
}
});
StringBuilder sb = new StringBuilder();
String sep = "(";
for (String s : sortedRegex) {
sb.append(sep).append('(').append(s).append(')');
sep = "|";
}
sb.append(")");
Matcher matcher = Pattern.compile(sb.toString()).matcher(text);
if (matcher.find()) {
for (int i = 2; i <= matcher.groupCount(); i++) {
String group = matcher.group(i);
if (group != null)
return sortedRegex[i - 2];
}
}
return "";
}
prints
aabaa matches aabc*
wwwaabcdddd matches aabcdd*
abcddtxyz matches ddt

Issue with finding indices of multiple matches in String with regex

I'm attempting to find the indices of multiple matches in a String using Regex (test code below), for use with external libraries.
static String content = "a {non} b {1} c {1}";
static String inline = "\\{[0-9]\\}";
public static void getMatchIndices()
{
Pattern pattern = Pattern.compile(inline);
Matcher matcher = pattern.matcher(content)
while (matcher.find())
{
System.out.println(matcher.group());
Integer i = content.indexOf(matcher.group());
System.out.println(i);
}
}
OUTPUT:
{1}
10
{1}
10
It finds both groups, but returns an index of 10 for both. Any ideas?
From http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#indexOf(java.lang.String):
Returns the index within this string of the first occurrence of the specified substring.
Since both match the same thing ('{1}') the first occurrence is returned in both cases.
You probably want to use Matcher#start() to determine the start of your match.
You can do this with regexp. The following will find the locations in the string.
static String content = "a {non} b {1} c {1}";
static String inline = "\\{[0-9]\\}";
public static void getMatchIndices()
{
Pattern pattern = Pattern.compile(inline);
Matcher matcher = pattern.matcher(content);
int pos = 0;
while (matcher.find(pos)) {
int found = matcher.start();
System.out.println(found);
pos = found +1;
}
}

How to determine where a regex failed to match using Java APIs

I have tests where I validate the output with a regex. When it fails it reports that output X did not match regex Y.
I would like to add some indication of where in the string the match failed. E.g. what is the farthest the matcher got in the string before backtracking. Matcher.hitEnd() is one case of what I'm looking for, but I want something more general.
Is this possible to do?
If a match fails, then Match.hitEnd() tells you whether a longer string could have matched. In addition, you can specify a region in the input sequence that will be searched to find a match. So if you have a string that cannot be matched, you can test its prefixes to see where the match fails:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class LastMatch {
private static int indexOfLastMatch(Pattern pattern, String input) {
Matcher matcher = pattern.matcher(input);
for (int i = input.length(); i > 0; --i) {
Matcher region = matcher.region(0, i);
if (region.matches() || region.hitEnd()) {
return i;
}
}
return 0;
}
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[A-Z]+[0-9]+[a-z]+");
String[] samples = {
"*ABC",
"A1b*",
"AB12uv",
"AB12uv*",
"ABCDabc",
"ABC123X"
};
for (String sample : samples) {
int lastMatch = indexOfLastMatch(pattern, sample);
System.out.println(sample + ": last match at " + lastMatch);
}
}
}
The output of this class is:
*ABC: last match at 0
A1b*: last match at 3
AB12uv: last match at 6
AB12uv*: last match at 6
ABCDabc: last match at 4
ABC123X: last match at 6
You can take the string, and iterate over it, removing one more char from its end at every iteration, and then check for hitEnd():
int farthestPoint(Pattern pattern, String input) {
for (int i = input.length() - 1; i > 0; i--) {
Matcher matcher = pattern.matcher(input.substring(0, i));
if (!matcher.matches() && matcher.hitEnd()) {
return i;
}
}
return 0;
}
You could use a pair of replaceAll() calls to indicate the positive and negative matches of the input string. Let's say, for example, you want to validate a hex string; the following will indicate the valid and invalid characters of the input string.
String regex = "[0-9A-F]"
String input = "J900ZZAAFZ99X"
Pattern p = Pattern.compile(regex)
Matcher m = p.matcher(input)
String mask = m.replaceAll('+').replaceAll('[^+]', '-')
System.out.println(input)
System.out.println(mask)
This would print the following, with a + under valid characters and a - under invalid characters.
J900ZZAAFZ99X
-+++--+++-++-
If you want to do it outside of the code, I use rubular to test the regex expressions before sticking them in the code.

Categories