Java check for int range in String - java

I am trying to check if a range of numbers exist in a string
Any more elegant way than this?
if (ccnumeric.contains("51")
|| ccnumeric.contains("52")
|| ccnumeric.contains("53")
|| ccnumeric.contains("54")
|| ccnumeric.contains("55"))
I can't think of any method that satisfies this as I am checking for an int range in a string.

You could use a simple regex: 5[12345]{1} to look for a 5 followed by exactly one 1,2,3,4 or 5:
String s = "55";
Pattern p = Pattern.compile("5[12345]{1}");
if (p.matcher(s).find()) {
System.out.println("Found");
} else {
System.out.println("Not Found");
}

Maybe you can try regular expressions.
For instance
Pattern.matches( ".*5[1-5].*", ccnumeric );
Edit:
To find all numbers in a string:
List<Integer> allMatches = new ArrayList<Integer>();
Matcher m = Pattern.compile("(\\d+)").matcher(ccnumeric);
while (m.find()) {
allMatches.add(Integer.parse(m.group()));
}

Your question is a bit vague, for instance how many numbers are in the string "541", 1(541), 3(5,4,1), 6(5,4,1,54,41,541)? Depending on how you define that your regexes will change considerably. If your answer is 1, the solution is pretty straightforward, a greedy search for all numbers whose length is the same as your min and less than or equal to your max(you can then parse the matches and filter based on the min and max)...If its one of the others then your solution is going to be a bit more complicated, you are basically going to have to create a sliding window to evaluate all the possible combinations in the string.

I think your question is phrased ambigously... Are you looking to check if the input strings contains a number within the range 51-55? (Your written text seems to imply this) In this case, the string "512 is a power of two" would not be a match, since it does not contain the number 51.
Or are you looking to check that the string contains the character sequence 51, 52, 53, 54 or 55? (Your code seems to imply this) In this case, the string "512 is a power of two" would be a match, since it contains the character sequence 51.
If the first option is the case, then you'd need something like:
Pattern p = Pattern.compile("(^|[^\\d])5[1-5]($|[^\\d])");
if (p.matcher(inputString).find()) {
//The inputString contains the number you seek
}
Essentially, what this is doing is looking for
a character sequence 51-55
preceeded by either the beginning of the string, or by a non-digit character
followed by either the end of the string, or by a non digit character

Related

How to add a space after certain characters using regex Java

I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance
Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.
Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();

String MUST contain a hexadecimal value - Regex for this? [duplicate]

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?
Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$
May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}
Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

Java split a CSV ignoring HTML characteres

I need to split a string by semicolon ignoring the semicolons that may come as HTML characters.
For instance, given the string:
id=com.google.android;keywords=Android;Operating System;Phone;versions=Gingerbread;ICS;JB
I need to split it into:
id = com.google.android
keywords=Android;Operating System;Phone
versions=Gingerbread;ICS;JB
any ideia how to do this?
A regex like (?<!&#?[0-9a-zA-Z]+); would probably do it. This would prevent matching a semicolon that terminates an entity reference or character reference, though it also catches a few cases that are not technically either by the specs (e.g. it wouldn't match the semicolon at the end of &#foo; or &123;).
(?<!...) is a "negative lookbehind", so you can read this regex as matching a semicolon that is not preceded by a substring that matches &#?[0-9a-zA-Z]+ (i.e. ampersand, optional hash, and one or more alphanumerics). However lookbehinds must have an upper bound on the number of characters they can match, which + doesn't, so you'll have to use a bounded repetition count, like {1,5} rather than the unbounded +. The upper bound needs to be at least as long as the longest entity reference you might see, and if your data might contain arbitrary entity references then you'll have to use something like the length of the string as the upper bound.
String[] keyValuePairs = theString.split(
"(?<!&#?[0-9a-zA-Z]{1," + theString.length() + "});");
If you can specify a smaller bound then that would probably be more efficient.
Edit: Android apparently doesn't like this lookbehind, even with bounded repetition, so you probably won't be able to use a single regex with String.split to do what you're after, you'll have to do the looping yourself, e.g.
Pattern p = Pattern.compile("(?:&#?[0-9a-zA-Z]+)?;");
Matcher m = p.matcher(theString);
List<String> splits = new ArrayList<String>();
int lastEltStart = 0;
while(m.find()) {
if(m.end() - m.start() > 1) {
// this match was an entity/character reference so don't split here
continue;
}
if(m.start() > lastEltStart) {
// non-empty part
splits.add(theString.substring(lastEltStart, m.start()));
}
lastEltStart = m.end();
}
if(lastEltStart < theString.length()) {
// non-empty final part
splits.add(theString.substring(lastEltStart));
}
Since the HTML entites have only two or three numbers between the '&#' and ';' I used the following regex:
(?<!&#\d{2,3});

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

Regex to check string contains only Hex characters

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?
Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$
May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}
Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

Categories