Extract substring from end till first alphabet in java - java

I have a string of format: A-2-Q4567
More examples: AB-456-T12, A24-5-M12345, etc.
I want to extract the last numerical values out of these strings, which are: 4567, 12, 12345 respectively (which is the numerical value of the substring from the end till first non-numeric character is encountered)
I can split the string, get the last string from the splitted string array, and then do a parseInt after removing the non-numerical characters from it.
But is there a more elegant way of doing this?

You can use this regex: (\d+$). It returns the last sequence of digits in the string.
EDIT - some explanation:
The \d means any digit.
The + means one or more of the previous symbols. Since the previous symbol is a digit, then \d+ means "one or more digits".
The $ means the end of the string, so \d+$ is the last sequence of digits in the string.

you can do this :
String getLastNumeric(String input)
{
String str="";
char c;
for(int i=input.length()-1;i>=0 && Character.isDigit(c=input.charAt(i));i--)
str=c+str;
return str;
}
The regex solutions might be more elegant but performance-wise I think the above is the best because Regex match can be more expensive than a simple for loop with a simple condition to evaluate.
Ofcourse The Regex is more flexible, what if your requirements change and now a dash "-" must precede the numbers ? with Regex it should be just a matter of changing one regex expression.
I put the Regex version here but remember if you're sure your requirements won't change I think the above solution is better on the CPU :
Matcher matcher= Pattern.compile("(\\d+$)").matcher(input);
if(matcher.find())
return matcher.group();
return "";

Related

String split method returning first element as empty using regex

I'm trying to get the digits from the expression [1..1], using Java's split method. I'm using the regex expression ^\\[|\\.{2}|\\]$ inside split. But the split method returning me String array with first value as empty, and then "1" inside index 1 and 2 respectively. Could anyone please tell me what's wrong I'm doing in this regex expression, so that I only get the digits in the returned String array from split method?
You should use matching. Change your expression to:
`^\[(.*?)\.\.(.*)\]$`
And get your results from the two captured groups.
As for why split acts this way, it's simple: you asked it to split on the [ character, but there's still an "empty string" between the start of the string and the first [ character.
Your regex is matching [ and .. and ]. Thus it will split at this occurrences.
You should not use a split but match each number in your string using regex.
You've set it up such that [, ] and .. are delimiters. Split will return an empty first index because the first character in your string [1..1] is a delimiter. I would strip delimiters from the front and end of your string, as suggested here.
So, something like
input.replaceFirst("^[", "").split("^\\[|\\.{2}|\\]$");
Or, use regex and regex groups (such as the other answers in this question) more directly rather than through split.
Why not use a regex to capture the numbers? This will be more effective less error prone. In that case the regex looks like:
^\[(\d+)\.{2}(\d+)\]$
And you can capture them with:
Pattern pat = Pattern.compile("^\\[(\\d+)\\.{2}(\\d+)\\]$");
Matcher matcher = pattern.matcher(text);
if(matcher.find()) { //we've found a match
int range_from = Integer.parseInt(matcher.group(1));
int range_to = Integer.parseInt(matcher.group(2));
}
with range_from and range_to the integers you can no work with.
The advantage is that the pattern will fail on strings that make not much sense like ..3[4, etc.

Java-flavoured Regex : Match whole string if group is n chars

I'm trying to create a Regex for a String validator. My String must be exactly 8 characters long, and begin with a letter (lowercase or uppercase) or a number. It can only contain letters (lowercase and uppercase), numbers or whitespaces right after that first character. If a whitespace is found, there can only be whitespaces after it.
For now, I have the match group for the second part : [a-zA-Z0-9]{1,}\s*
I can't find a way to specify that this group is matched only if it has exactly 8 characters. I tried ^([a-zA-Z0-9]{1,}\s*){8}$ but this is not the expected result.
Here are some test cases (with trailing whitespaces).
Valid :
9013
20130
89B
A5000000
Invalid :
9013
20130
90 90
123456789
There probably is a smart regex way to do it but you could also first check the length of the string:
input.length() == 8 && input.matches("[a-zA-Z0-9]+\\s*")
This is also probably more efficient than a complex regex.
You can use this lookahead based regex:
^[a-zA-Z0-9](?!.* [a-zA-Z0-9])[a-zA-Z0-9 ]{7}$
RegEx Demo
^[a-zA-Z0-9] matches an alpha-num char at start
(?!.* [a-zA-Z0-9]) is negative lookahead to make sure that there is no instance of an alpha-num char followed by a space.
[a-zA-Z0-9 ]{7}$ matches 7 chars containing alpha-num char or space.

Java split by alphabeta char creates an empty value in array

I want to split my string on every occurrence of an alpha-beta character.
for example:
"s1l1e13" to an array of: ["s1","l1","e13"]
when trying to use this simple split by regex i get some weird results:
testStr = "s1l1e13"
Arrays.toString(testStr.split("(?=[a-z])"))
gives me the array of:
["","s1","l1","e13"]
how can i create the split without the empty array element?
I tried a couple more things:
testStr = "s1"
Arrays.toString(testStr.split("(?=[a-z])"))
does return the currect array: ["s1"]
but when trying to use substring
testStr = "s1l1e13"
Arrays.toString(testStr.substring(1).split("(?=[a-z])")
i get in return ["1","l1","e13"]
what am i missing?
Your Lookahead marks each position before any character of a to z; marking the following positions:
s1 l1 e13
^ ^ ^
So by spliting using just the Lookahead, it returns ["", "s1", "l1", "e13"]
You can use a Negative Lookbehind here. This looks behind to see if there is not the beginning of the string.
String s = "s1l1e13";
String[] parts = s.split("(?<!\\A)(?=[a-z])");
System.out.println(Arrays.toString(parts)); //=> [s1, l1, e13]
Your problem is that (?=[a-z]) means "place before [a-z]" and in your text
s1l1e13
you have 3 such places. I will mark them with |
|s1|l1|e13
so split (unfortunately correctly) produces "" "s1" "l1" "e13" and doesn't automatically remove for you first empty elements.
To solve this problem you have at least two options:
make sure that there is something before your place you need to split on (it is not at start of your string). You can use for instance (?<=\\d)(?=[a-z]) if you want to split after digit but before character
(PREFFERED SOLUTION) start using Java 8 which automatically removes empty strings at start of result array if regex used on split is zero-length (look-arounds are zero length).
The first match finds "" to be okay because its looking ahead for any alpha character, which is called zero-width lookahead, so it doesn't need to actually match anything. So "s" at the beginning is alphanumeric, and it matches that at a probable spot.
If you want the regex to match something always, use ".+(?=[a-z])"
The problem is that the initial "s" counts as an alphabetic character. So, the regex is trying to split at s.
The issue is that there is nothing before the s, so the regex machine instead decides to show that there is nothing by adding the null element. It'll do the same thing at the end if you ended with "s" (or any other letter).
If this is the only string you're splitting, or if every array you had starts with a letter but does not end with one, just truncate the array to omit the first element. Otherwise, you'll probably need to loop through each array as you make it so that you can drop empty elements.
So it seems your matches has the pattern x###, where x is a letter, and # is a number.
I'd make the following Regex:
([a-z][0-9]+)

Java regex split double from string

I am having a problem splitting something like the following string:
43.80USD
What I want is to be able to split the expression into an array that has "43.80" as the first element and "USD" as the second. So the result would be something like:
["43.80", "USD"]
I am sure there is some way to do this with regex, but I am not proficient enough with it to figure it out on my own. Any help would be much appreciated.
If the format of your string is fixed you can split it as follows
String[] currency = "48.50USD".split("(?<=\\d)(?=[a-zA-Z])");
System.out.println("Amount='"+currency[0]+"'; Denomination='"+currency[1]+"'");
// prints: Amount='48.50'; Denomination='USD'
The regex above uses a positive look-behind (?<=) and a positive lookahead (?=) to find a separator (which is of zero-length here) that's preceded with a number and followed by a letter.
If your data really looks like "43.80USD" then you can use
"43.80USD".split("(?i)(?=[a-z])",2)
(?=[a-z]) will split before any of a-z characters
(?i) will make used regex case-insensitive so it will also work for uSd
second argument is max size of result array, since you don't want ["43.80", "U", "S, "D"] but ["43.80", "USD"] we need to use 2.
This regex works(\d*\.\d*)([a-zA-Z]*). Group 1 will be the amount, including the decimal. Group 2 will be the USD or other monetary name. Note that this regex only requires a decimal point, everything else is optional. So this also matches: "45123.15542ABCDEFG". Group 1 will be 45123.15542 and group 2 will be ABCDEFG. If you want more strict requirements, tell me what they are and Ill put it in. Otherwise your code will look something like:
Pattern p = Pattern.compile("(\\d*\\.\\d*)([a-zA-Z]*)");//Note the double \\ to escape twice.
Matcher m = p.matcher("43.80USD");
String amount, type;
if(m.matches){
amount = m.group(1);
type = m.group(2);
}

Removing every other character in a string using Java regex

I have this homework problem where I need to use regex to remove every other character in a string.
In one part, I have to delete characters at index 1,3,5,... I have done this as follows:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "$1"));
This prints 12345 which is what I want. Essentially I match two characters at a time, and replacing with the first character. I used group capturing to do this.
The problem is, I'm having trouble with the second part of the homework, where I need to delete characters at index 0,2,4,...
I have done the following:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll(".(.)", "$1"));
This prints abcd5, but the correct answer must be abcd. My regex is only incorrect if the input string length is odd. If it's even, then my regex works fine.
I think I'm really close to the answer, but I'm not sure how to fix it.
You are indeed very close to the answer: just make matching the second char optional.
String s = "1a2b3c4d5";
System.out.println(s.replaceAll(".(.)?", "$1"));
// prints "abcd"
This works because:
Regex is greedy by default, it will take the second character if it's there
When the input is of odd length, the second char won't be there at the last replacement, but you'd still match one char (i.e. last char in input)
You can still use backreferences in substitution even if the group fails to match
It will substitute in the empty string, not "null"
This is different from Matcher.group(int), which returns null for failed groups
References
regular-expressions.info/Optional
A closer look at the first part
Let's take a closer look at the first part of the homework:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "$1"));
// prints "12345"
Here you didn't have to use ? for the second char, but it "works" because even though you didn't match the last char, you didn't have to! The last char can remain unmatched, unreplaced, due to the problem specification.
Now suppose that we want to delete chars at index 1,3,5..., and put the chars at index 0,2,4... in brackets.
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "($1)"));
// prints "(1)(2)(3)(4)5"
A-ha!! Now you're experiencing the exact same problem with odd-length input! You couldn't match the last char with your regex, because your regex needs two chars, but there's only one char at the end for odd-length input!
The solution, again, is to make matching the second char optional:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).?", "($1)"));
// prints "(1)(2)(3)(4)(5)"
my regex is only incorrect if the input string length is odd. if it's even, then my regex works fine.
Change your expresion to .(.)? - the question mark makes the second character optional, which means it doesn't matter if input is odd or even
Your regex needs 2 chars to match, so fails on the final char.
This regex:
".(.{0,1})"
Will make the second char optional, so it will match with your final '5' as well

Categories