regular expression for file name - java

I have files in the format *C:\Temp\myfile_124.txt*
I need a regular expression which will give me just the number "124" that is whatever is there after the underscore and before the extension.
I tried a number of ways, latest is
(.+[0-9]{18,})(_[0-9]+)?\\.txt$
I am not getting the desired output. Can someone tell me what is wrong?
Matcher matcher = FILE_NAME_PATTERN.matcher(filename);
if (matcher.matches() && matcher.groupCount() == 2) {
try {
String index = matcher.group(2);
if (index != null) {
return Integer.parseInt(index.substring(1));
}
}
catch (NumberFormatException e) {
}

The first part [0-9]{18,} states you have atleast 18 digits which you don't have.
Usually with regex its a good idea to make the expression as simple as possible. I suggest trying
_([0-9]+)?\\.txt$
Note: you have to call find() to make it perform the lookup, otherwise it says "No match found"
This example
String s = "C:\\Temp\\myfile_124.txt";
Pattern p = Pattern.compile("_(\\d+)\\.txt$");
Matcher matcher = p.matcher(s);
if (matcher.find())
for (int i = 0; i <= matcher.groupCount(); i++)
System.out.println(i + ": " + matcher.group(i));
prints
0: _124.txt
1: 124

This may work for you: (?:.*_)(\d+)\.txt
The result is in the match group.
This one uses positive lookahead and will only match the number: \d+(?=\.txt)

.*_([1-9]+)\.[a-zA-Z0-9]+
The group 1 will contain the desired output.
Demo

You can do this
^.*_\([^\.]*\)\..*$

.*_([0-9]+)\.txt
This should work too. Of course you should double escape for Java.

Related

How to know which part of regex matched?

regex= (i.*d.*n.*t.*)|(p.*r.*o.*f.*)|(u.*s.*r.*)
string to be matched= profile
Now the regex will match with the string. But I want to know which part matched.
Meaning, I want (p.*r.*o.f.) as the output
How can I get do this in Java?
You can check if which group matched:
Pattern p = Pattern.compile("(i.*d.*n.*t.*)|(p.*r.*o.*f.*)|(u.*s.*r.*)");
Matcher m = p.matcher("profile");
m.find();
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(i + ": " + m.group(i));
}
Will output:
1: null
2: profile
3: null
Because the second line is not null, it's (p.*r.*o.*f.*) that matched the string.
In your case, It seems like you can distinguish those subpatterns with the first letter. If the first letter of the match is 'p', then it will be your desired pattern. Maybe you can construct simple function to distinguish these.

Regular expressions (denominator of a fraction)

Soo i think i already solved it, what i did:
String pattern = "(<=|>=)\\s{0,2}((+]\\s{0,2})?(\\d+\\s{0,2}[/]\\s{0,2}(\\d{2,}|[1-9])\\s{0,2}|\\d+[.]\\d{1,2}|\\d+))\\s{0,2}";
The pattern had something wrong, i have corrected it above and now it works :)
I have an inequation that may containing >= or <=, some white spaces and a number. That number might be an integer, a decimal number with 2 decimal places or a fraction and I want to retrieve the number on the 2nd member of the inequation with the "Matcher". Example:
4x1 + 6x2 <= 40/3
I've tried to construct such a pattern and I was able to find it. But then I've remembered that a fraction cannot be divided by zero so I want to check that aswell. For that I have used the following code:
String inequation = "4x1 + 6x2 <= 40/3";
String pattern = "(<=|>=)\\s{0,2}((+]\\s{0,2})?(\\d+\\s{0,2}[/]\\s{0,2}(\\d{2,}|[1-9])\\s{0,2}\\d+|\\d+[.]\\d{1,2}|\\d+))\\s{0,2}";
Pattern ptrn = Pattern.compile(pattern);
Matcher match = ptrn.matcher(inequation);
if(match.find()){
String fraction = match.group(2);
System.out.println(fraction);
} else {
System.out.println("NO MATCH");
}
But it's not working as expected. If it has at least 2 digits on the denominator it returns correctly (e.g. 40/32). But if it only has 1 digit it only returns the integer part (e.g. 40).
Anyway to solve this?
Which expression should I use?
Do you just want the number after the inequality sign? Then do:
Matcher m = Pattern.compile("[<>]=?\\s*(.+?)\\s*$").matcher(string);
String number = m.find() ? m.group(1) : null;
You could try using debuggex to build regular expressions. It shows you a nice diagram of your expression and you can test your inputs as well.
Java implementation (validates that the numerator is non-zero.):
Matcher m = Pattern.compile("[<>]=?\\s{0,2}([0-9]*(/[1-9][0-9]*)?)$").matcher("4x1 + 6x2 <= 40/3");
if (m.find()) {
System.out.println(m.group(1));
}
You need an '$' at the end of your expression, so that it tries to match the entire inequality.

How to Determine if a String starts with exact number of zeros?

How can I know if my string exactly starts with {n} number of leading zeros?
For example below, the conditions would return true but my real intention is to check if the string actually starts with only 2 zeros.
String str = "00063350449370"
if (str.startsWith("00")) { // true
...
}
You can do something like:
if ( str.startsWith("00") && ! str.startsWith("000") ) {
// ..
}
This will make sure that the string starts with "00", but not a longer string of zeros.
You can try this regex
boolean res = s.matches("00[^0]*");
How about?
final String zeroes = "00";
final String zeroesLength = zeroes.length();
str.startsWith(zeroes) && (str.length() == zeroes.length() || str.charAt(zeroes.length()) != '0')
Slow but:
if (str.matches("(?s)0{3}([^0].*)?") {
This uses (?s) DOTALL option to let . also match line-breaks.
0{3} is for 3 matches.
How about using a regular expression?
0{n}[^0]*
where n is the number of leading '0's you want. You can utilise the Java regex API to check if the input matches the expression:
Pattern pattern = Pattern.compile("0{2}[^0]*"); // n = 2 here
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
// code
}
You can use a regular expression to evaluate the String value:
String str = "00063350449370";
String pattern = "[0]{2}[1-9]{1}[0-9]*"; // [0]{2}[1-9]{1} starts with 2 zeros, followed by a non-zero value, and maybe some other numbers: [0-9]*
if (Pattern.matches(pattern, str))
{
// DO SOMETHING
}
There might be a better regular expression to resolve this, but this should give you a general idea how to proceed if you choose the regular expression path.
The long way
String TestString = "0000123";
Pattern p = Pattern.compile("\\A0+(?=\\d)");
Matcher matcher = p.matcher(TestString);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(" Group: " + matcher.group());
}
Your probably better off with a small for loop though
int leadZeroes;
for (leadZeroes=0; leadZeroes<TestString.length(); leadZeroes++)
if (TestString.charAt(leadZeroes) != '0')
break;
System.out.println("Count of Leading Zeroes: " + leadZeroes);

java Matcher need to know if subsequence equals whole sequence

I need to figure out a way to determine when the thing being match isn't a subseq but is the whole sequence. ex. "this" not "is".
while (in.hasNextLine()) {
count++;
String patternInLine = in.nextLine().toString();
m = p.matcher(patternInLine);
if (m.find() && searchPattern.equals(m.group())) {
System.out.println("matches group: " + m.group());
m.toString();
System.out.println(patternInLine);
foundLinePattern.get(file).add(patternInLine);
}
}
in.close();
}
Use m.matches() instead of m.find().
find looks for any substring that matches your regex.
matches tries to match only the whole string against the regex. It will not look for substrings.

Discard the leading and trailing series of a character, but retain the same character otherwise

I have to process a string with the following rules:
It may or may not start with a series of '.
It may or may not end with a series of '.
Whatever is enclosed between the above should be extracted. However, the enclosed string also may or may not contain a series of '.
For example, I can get following strings as input:
''''aa''''
''''aa
aa''''
''''aa''bb''cc''''
For the above examples, I would like to extract the following from them (respectively):
aa
aa
aa
aa''bb''cc
I tried the following code in Java:
Pattern p = Pattern.compile("[^']+(.+'*.+)[^']*");
Matcher m = p.matcher("''''aa''bb''cc''''");
while (m.find()) {
int count = m.groupCount();
System.out.println("count = " + count);
for (int i = 0; i <= count; i++) {
System.out.println("-> " + m.group(i));
}
But I get the following output:
count = 1
-> aa''bb''cc''''
-> ''bb''cc''''
Any pointers?
EDIT: Never mind, I was using a * at the end of my regex, instead of +. Doing this change gives me the desired output. But I would still welcome any improvements for the regex.
This one works for me.
String str = "''''aa''bb''cc''''";
Pattern p = Pattern.compile("^'*(.*?)'*$");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
have a look at the boundary matcher of Java's Pattern class (http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html). Especially $ (=end of a line) might be interesting. I also recommend the following eclipse plugin for regex testing: http://sourceforge.net/projects/quickrex/ it gives you the possibilty to exactly see what will be the match and the group of your regex for a given test string.
E.g. try the following pattern: [^']+(.+'*.+)+[^'$]
I'm not that good in Java, so I hope the regex is sufficient. For your examples, it works well.
s/^'*(.+?)'*$/$1/gm

Categories