Java Regex to Match a Number Within a Range - java

I'm trying to craft a Java regular expression to identify if a number (which I don't know until execution time) is within a range (and that range I also don't know until execution time).
Specifically, I'm trying to use Cisco's PRIME compliance module to validate my switch has no inactive VLANs (for this question, a VLAN is just a number), but PRIME uses Java regular expression syntax.
I know that the specific switch command I'm evaluating uses a syntax like:
switchport trunk allowed vlan 1,20,37,45,90-101,300-502,904-2044
How, then, can I tell if VLAN "x" is in any of those ranges?
If x = 20, it should match.
If x = 90, it should match.
If x = 900, it should fail.
If x = 1043, it should match.
Any ideas?
Edit: Unfortunately, the RegEx listed here is for ranges that are known; the examples are all hard-coded ranges. I need something that takes an unknown x, y, and z, where all x, y, and z might possibly be 1, 2, 3, or 4 digits, and matches if z is between x and y when written as "x-y".
Is there a way to take the string "x-y", parse it into \1 and \2 that are understood to be numbers, and match if (z >= \1 && z <= \2)?
I've tried looking at things like lookahead and lookbehind and crazy/obscure Java-compatible regex structures, but my head quickly got spun into the 4th dimension.

I don't think this should be done with a regular expression. Personally I'd use a regex to check if it's the right format, i.e. check if the string matches "VLAN ([0-9]+(-[0-9]+)?)(,([0-9]+(-[0-9]+)?))*", then split the latter part on the commas and use integer parsing from there, depending on if there is a '-' in there or not you can check the ranges.
For instance like this: https://jsfiddle.net/gcb9pm7f/15/
function testRanges()
{
var str = document.getElementById("textField").value;
var test = parseInt(document.getElementById("numberField").value);
str = str.toUpperCase(); // VLAN big
var regex = /^VLAN ([0-9]+(-[0-9]+)?)(,([0-9]+(-[0-9]+)?))*$/g;
if (regex.test(str))
{
str = str.substring(5, str.length); // remove 'VLAN'
var splitArray = str.split(',');
for (var idx = 0; idx < splitArray.length; idx++)
{
var rangeStr = splitArray[idx];
if (rangeStr.includes('-'))
{
// range, check both values.
var a = parseInt(rangeStr.split('-')[0]);
var b = parseInt(rangeStr.split('-')[1]);
if (a > b)
{
if (test >= b && test <= a) return true; // range is inclusive
}
else // a <= b
{
if (test <= b && test >= a) return true; // range is inclusive
}
}
else // not a range, single value
{
if (parseInt(rangeStr) === test) return true;
}
}
}
return false; // no match or regex not matching.
}
Adjust to your programming language as needed. Should be fairly straight forward.

Related

How to check if an integer matches given expression of "5+4n"?

What can be an efficient way to determine if a given integer matches the result of an expression 5+4n. Say, I have a number 54 and I want to check if expression given above can result in this number. One way I can think of is to evaluate 5+4n up to 54 and check if the result matches the number of interest. But it will be an inefficient way to go when I'll have to check big numbers.
If we are assuming that x will always be greater than or equal to 0, x=5+4n can be rewritten as x=1+4(n+1).
This means that when x is divided by 4, there will be a remainder of 1. So you can simply do the below:
private boolean doesMatch(x number) {
return x%4==1;
}
if you want to evaluate the expression A+Bn, with A and B configurable, you should extract A and B first:
Pattern p = Pattern.compile("(\\-?\\d+)\\s*\\+([\\-\\+]?\\d+)\\s*n");
Matcher m = p.matcher(expression);
if (m.matches()) {
int A = Integer.valueOf(m.group(1));
int B = Integer.valueOf(m.group(2));
// Evaluate expression
...
}
and then simply check if x is a solution:
// Evaluate expression
if ((x-A) % B == 0) {
// Number is a solution
} else {
// Number is not a solution for the expression
}

How do I check if a char is a vowel?

This Java code is giving me trouble:
String word = <Uses an input>
int y = 3;
char z;
do {
z = word.charAt(y);
if (z!='a' || z!='e' || z!='i' || z!='o' || z!='u')) {
for (int i = 0; i==y; i++) {
wordT = wordT + word.charAt(i);
} break;
}
} while(true);
I want to check if the third letter of word is a non-vowel, and if it is I want it to return the non-vowel and any characters preceding it. If it is a vowel, it checks the next letter in the string, if it's also a vowel then it checks the next one until it finds a non-vowel.
Example:
word = Jaemeas then wordT must = Jaem
Example 2:
word=Jaeoimus then wordT must =Jaeoim
The problem is with my if statement, I can't figure out how to make it check all the vowels in that one line.
Clean method to check for vowels:
public static boolean isVowel(char c) {
return "AEIOUaeiou".indexOf(c) != -1;
}
Your condition is flawed. Think about the simpler version
z != 'a' || z != 'e'
If z is 'a' then the second half will be true since z is not 'e' (i.e. the whole condition is true), and if z is 'e' then the first half will be true since z is not 'a' (again, whole condition true). Of course, if z is neither 'a' nor 'e' then both parts will be true. In other words, your condition will never be false!
You likely want &&s there instead:
z != 'a' && z != 'e' && ...
Or perhaps:
"aeiou".indexOf(z) < 0
How about an approach using regular expressions? If you use the proper pattern you can get the results from the Matcher object using groups. In the code sample below the call to m.group(1) should return you the string you're looking for as long as there's a pattern match.
String wordT = null;
Pattern patternOne = Pattern.compile("^([\\w]{2}[AEIOUaeiou]*[^AEIOUaeiou]{1}).*");
Matcher m = patternOne.matcher("Jaemeas");
if (m.matches()) {
wordT = m.group(1);
}
Just a little different approach that accomplishes the same goal.
Actually there are much more efficient ways to check it but since you've asked what is the problem with yours, I can tell that the problem is you have to change those OR operators with AND operators. With your if statement, it will always be true.
So in event anyone ever comes across this and wants a easy compare method that can be used in many scenarios.
Doesn't matter if it is UPPERCASE or lowercase. A-Z and a-z.
bool vowel = ((1 << letter) & 2130466) != 0;
This is the easiest way I could think of. I tested this in C++ and on a 64bit PC so results may differ but basically there's only 32 bits available in a "32 bit integer" as such bit 64 and bit 32 get removed and you are left with a value from 1 - 26 when performing the "<< letter".
If you don't understand how bits work sorry i'm not going go super in depth but the technique of
1 << N is the same thing as 2^N power or creating a power of two.
So when we do 1 << N & X we checking if X contains the power of two that creates our vowel is located in this value 2130466. If the result doesn't equal 0 then it was successfully a vowel.
This situation can apply to anything you use bits for and even values larger then 32 for an index will work in this case so long as the range of values is 0 to 31. So like the letters as mentioned before might be 65-90 or 97-122 but since but we keep remove 32 until we are left with a remainder ranging from 1-26. The remainder isn't how it actually works, but it gives you an idea of the process.
Something to keep in mind if you have no guarantee on the incoming letters it to check if the letter is below 'A' or above 'u'. As the results will always be false anyways.
For example teh following will return a false vowel positive. "!" exclamation point is value 33 and it will provide the same bit value as 'A' or 'a' would.
For starters, you are checking if the letter is "not a" OR "not e" OR "not i" etc.
Lets say that the letter is i. Then the letter is not a, so that returns "True". Then the entire statement is True because i != a. I think what you are looking for is to AND the statements together, not OR them.
Once you do this, you need to look at how to increment y and check this again. If the first time you get a vowel, you want to see if the next character is a vowel too, or not. This only checks the character at location y=3.
String word="Jaemeas";
String wordT="";
int y=3;
char z;
do{
z=word.charAt(y);
if(z!='a'&&z!='e'&&z!='i'&&z!='o'&&z!='u'&&y<word.length()){
for(int i = 0; i<=y;i++){
wordT=wordT+word.charAt(i);
}
break;
}
else{
y++;
}
}while(true);
here is my answer.
I have declared a char[] constant for the VOWELS, then implemented a method that checks whether a char is a vowel or not (returning a boolean value). In my main method, I am declaring a string and converting it to an array of chars, so that I can pass the index of the char array as the parameter of my isVowel method:
public class FindVowelsInString {
static final char[] VOWELS = {'a', 'e', 'i', 'o', 'u'};
public static void main(String[] args) {
String str = "hello";
char[] array = str.toCharArray();
//Check with a consonant
boolean vowelChecker = FindVowelsInString.isVowel(array[0]);
System.out.println("Is this a character a vowel?" + vowelChecker);
//Check with a vowel
boolean vowelChecker2 = FindVowelsInString.isVowel(array[1]);
System.out.println("Is this a character a vowel?" + vowelChecker2);
}
private static boolean isVowel(char vowel) {
boolean isVowel = false;
for (int i = 0; i < FindVowelsInString.getVowel().length; i++) {
if (FindVowelsInString.getVowel()[i] == vowel) {
isVowel = true;
}
}
return isVowel;
}
public static char[] getVowel() {
return FindVowelsInString.VOWELS;
}
}

Tokenizing an algebraic expression in string format

I"m trying to take a string that represents a full algebraic excpression, such as x = 15 * 6 / 3 which is a string, and tokenize it into its individual components. So the first would be x, then =, then 15, then *, 6, / and finally 3.
The problem I am having is actually parsing through the string and looking at the individual characters. I can't think of a way to do this without a massive amount of if statements. Surely there has to be a better way tan specifically defining each individual case and testing for it.
For each type of token, you'll want to figure out how to identify:
when you're starting to read a particular token
if you're continuing to read the same token, or if you've started a different one
Let's take your example: x=15*6/3. Let's assume that you cannot rely on the fact that there are spaces in between each token. In that case, it's trivial: your new token starts when you reach a space.
You can break down the character types into letters, digits, and symbols. Let's call the token types Variable, Operator, and Number.
A letter indicates a Variable token has started. It continues until you read a non-letter.
A symbol indicates the start of an Operator token. I only see single symbols, but you can have groups of symbols correspond to different Operator tokens.
A digit indicates the start of a Number token. (Let's assume integers for now.) The Number token continues until you read a non-digit.
Basically, that's how a simple symbolic parser works. Now, if you add in negative numbers (where the '-' symbol can have multiple meanings), or parentheses, or function names (like sin(x)) then things get more complicated, but it amounts to the same set of rules, now just with more choices.
create regular expression for each possible element: integer, variable, operator, parentheses.
combine them using the | regular expression operator into one big regular expression with capture groups to identify which one matched.
in a loop match the head of the remaining string and break off the matched part as a token. the type of the token depends on which sub-expression matched as described in 2.
or
use a lexer library, such as the one in antlr or javacc
This is from my early expression evaluator that takes an infix expression like yours and turns it into postfix to evaluate. There are methods that help the parser but I think they're pretty self documenting. Mine uses symbol tables to check tokens against. It also allows for user defined symbols and nested assignments and other things you may not need/want. But it shows how I handled your issue without using niceties like regex which would simplify this task tremendously. In addition everything shown is of my own implementation - stack and queue as well - everything. So if anything looks abnormal (unlike Java imps) that's because it is.
This section of code is important not to answer your immediate question but to show the necessary work to determine the type of token you're dealing with. In my case I had three different types of operators and two different types of operands. Based on either the known rules or rules I chose to enforce (when appropriate) it was easy to know when something was a number (starts with a number), variable/user symbol/math function (starts with a letter), or math operator (is: /,*,-,+) . Note that it only takes seeing the first char to know the correct extraction rules. From your example, if all your cases are as simple, you'd only have to handle two types, operator or operand. Nonetheless the same logic will apply.
protected Queue<Token> inToPostParse(String exp) {
// local vars
inputExp = exp;
offset = 0;
strLength = exp.length();
String tempHolder = "";
char c;
// the program runs in a loop so make sure you're dealing
// with an empty queue
q1.reset();
for (int i = offset; tempHolder != null && i < strLength; ++i) {
c = exp.charAt(i);
// Spaces are useless so skip them
if (c == ' ') { continue; }
// If c is a letter
if ((c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')) {
// Here we know it must be a user symbol possibly undefined
// at this point or an function like SIN, ABS, etc
// We extract, based on obvious rules, the op
tempHolder = extractPhrase(i); // Used to be append sequence
if (ut.isTrigOp(tempHolder) || ut.isAdditionalOp(tempHolder)) {
s1.push(new Operator(tempHolder, "Function"));
} else {
// If not some math function it is a user defined symbol
q1.insert(new Token(tempHolder, "User"));
}
i += tempHolder.length() - 1;
tempHolder = "";
// if c begins with a number
} else if (c >= '0' && c <= '9') {
try {
// Here we know that it must be a number
// so we extract until we reach a non number
tempHolder = extractNumber(i);
q1.insert(new Token(tempHolder, "Number"));
i += tempHolder.length() - 1;
tempHolder = "";
}
catch (NumberFormatException nfe) {
return null;
}
// if c is in the math symbol table
} else if (ut.isMathOp(String.valueOf(c))) {
String C = String.valueOf(c);
try {
// This is where the magic happens
// Here we determine the "intersection" of the
// current C and the top of the stack
// Based on the intersection we take action
// i.e., in math do you want to * or + first?
// Depending on the state you may have to move
// some tokens to the queue before pushing onto the stack
takeParseAction(C, ut.findIntersection
(C, s1.showTop().getSymbol()));
}
catch (NullPointerException npe) {
s1(C);
}
// it must be an invalid expression
} else {
return null;
}
}
u2();
s1.reset();
return q1;
}
Basically I have a stack (s1) and a queue (q1). All variables or numbers go into the queue. Any operators trig, math, parens, etc.. go on the stack. If the current token is to be put on the stack you have to check the state (top) to determine what parsing action to take (i.e., what to do based on math precedence). Sorry if this seems like useless information. I imagine if you're parsing a math expression it's because at some point you plan to evaluate it. IMHO, postfix is the easiest so I, regardless of input format, change it to post and evaluate with one method. If your O is different - do what you like.
Edit: Implementations
The extract phrase and number methods, which you may be most interested in, are as follows:
protected String extractPhrase(int it) {
String phrase = new String();
char c;
for ( ; it < inputExp.length(); ++it) {
c = inputExp.charAt(it);
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9')) {
phrase += String.valueOf(c);
} else {
break;
}
}
return phrase;
}
protected String extractNumber(int it) throws NumberFormatException {
String number = new String();
int decimals = 0;
char c;
for ( ; it < strLength; ++it) {
c = inputExp.charAt(it);
if (c >= '0' && c <= '9') {
number += String.valueOf(c);
} else if (c == '.') {
++decimals;
if (decimals < 2) {
number += ".";
} else {
throw new NumberFormatException();
}
} else {
break;
}
}
return number;
}
Remember - By the time they enter these methods I've already been able to deduce what type it is. This allows you to avoid the seemingly endless while-if-else chain.
Are components always separated by space character like in your question? if so, use algebricExpression.split(" ") to get a String[] of components.
If no such restrictions can be assumed, a possible solution can be to iterate over the input, and switch the Character.getType() of the current index, somthing like that:
ArrayList<String> getExpressionComponents(String exp) {
ArrayList<String> components = new ArrayList<String>();
String current = "";
int currentSequenceType = Character.UNASSIGNED;
for (int i = 0 ; i < exp.length() ; i++) {
if (currentSequenceType != Character.getType(exp.charAt(i))) {
if (current.length() > 0) components.add(current);
current = "";
currentSequenceType = Character.getType(exp.charAt(i));
}
switch (Character.getType(exp.charAt(i))) {
case Character.DECIMAL_DIGIT_NUMBER:
case Character.MATH_SYMBOL:
case Character.START_PUNCTUATION:
case Character.END_PUNCTUATION:
case Character.LOWERCASE_LETTER:
case Character.UPPERCASE_LETTER:
// add other required types
current = current.concat(new String(new char[] {exp.charAt(i)}));
currentSequenceType = Character.getType(exp.charAt(i));
break;
default:
current = "";
currentSequenceType = Character.UNASSIGNED;
break;
}
}
return components;
}
You can easily change the cases to meet with other requirements, such as split non-digit chars to separate components etc.

How can I change my regex to correctly match float literals?

I am attempting to create a parser for java expressions, but for some reason I am unable to match floating point values. I am using a java.util.Matcher obtained from
Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //identifiers as group 1
"((?:(?>[1-9][0-9]*+\\.?[0-9]*+)|(?>\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" + //literal numbers
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher();
This is intended to match an identifier, a floating point value, or an operator (I still need to refine that part of the match though will refine that part of the match later). However, I am having an issue with it in that
Below is the code that is using that expression, which is intended to take all the identifiers, numbers, and operators, register all the numbers in vars, and put all the identifiers, each number's corresponding value, and all the operators in tokens in same order as in the original string.
It does not succeed in doing so, however, because for an input string like foo 34.78e5 bar -2.7 the resulting list is '[34, A, , bar, , -, 2, B, ]' with A=-78000.0 and B=-0.7. It is supposed to return '[foo, A, bar, B]` with A=3478000 and B=-2.7. I beleive it may be just that it is failing to include both parts of the number as the match of the regex, however that may not be the case.
I have tried removing the atomic grouping and possesives from the regex, however that did not change anything.
LinkedList<String> tokens = new LinkedList<String>();
HashMap<String, Double> vars = new HashMap<String, Double>();
VariableNamer varNamer = new VariableNamer();
for(Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //variable names as group 1
"((?:(?:[1-9][0-9]*+\\.?[0-9]*+)|(?:\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" +
//literal numbers as group 2
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher(expression); token.find();){
if(token.group(2) != null) { //if its a literal number, register it in vars and substitute a string for it
String name = varNamer.next();
if (
tokens.size()>0 &&
tokens.get(tokens.size()-1).matches("[+-]") &&
tokens.size()>1?tokens.get(tokens.size()-2).matches("[^\\w\\d\\s]"):true
)
vars.put(name, tokens.pop().equals("+")?Double.parseDouble(token.group()):-Double.parseDouble(token.group()));
else
vars.put(name, Double.parseDouble((token.group())));
tokens.addLast(name);
} else {
tokens.addLast(token.group());
}
}
and here is VariableNamer:
import java.util.Iterator;
public class VariableNamer implements Iterator<String>{
StringBuffer next = new StringBuffer("A");
#Override
public boolean hasNext() {
return true;
}
#Override
public String next() {
try{
return next.toString();
}finally{
next.setCharAt(next.length()-1, (char) (next.charAt(next.length()-1) + 1));
for(int idx = next.length()-1; next.charAt(idx) + 1 > 'Z' && idx > 0; idx--){
next.setCharAt(idx, 'A');
next.setCharAt(idx - 1, (char) (next.charAt(idx - 1) + 1));
}
if (next.charAt(0) > 'Z'){
next.setCharAt(0, 'A');
next.insert(0, 'A');
}
}
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Depending on details of your expression mini-language, it is either close to the limit on what is possible using regexes ... or beyond it. And even if you do succeed in "parsing", you will be left with the problem of mapping the "group" substrings into a meaningful expression.
My advice would be to take an entirely different approach. Either find / use an existing expression library, or implement expression parsing using a parser generator like ANTLR or Javacc.

Java: looking for the fastest way to check String for presence of Unicode chars in certain range

I need to implement a very crude language identification algorithm. In my world, there are only two languages: English and not-English. I have ArrayList and I need to determine if each String is likely in English or the other language which has its Unicode chars in a certain range. So what I want to do is to check each String against this range using some type of "presence" test. If it passes the test, I say the String is not English, otherwise it's English. I want to try two type of tests:
TEST-ANY: If any char in the string falls within the range, the string passes the test
TEST-ALL: If all chars in the string fall within the range, the string passes the test
Since the array might be very long, I need to implement this very efficiently. What would be the fastest way of doing this in Java?
Thx
UPDATE: I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII, in part to take care of the "resume" problem mentioned below. What I am trying to figure out is whether Java provides any classes/methods that essentially implement TEST-ANY or TEST-ALL (or another similar test) as efficiently as possible. In other words, I am trying to avoid reinventing the wheel especially if the wheel invented before me is better anyway.
Here's how I ended up implementing TEST-ANY:
// TEST-ANY
String str = "wordToTest";
int UrangeLow = 1234; // can get range from e.g. http://www.utf8-chartable.de/unicode-utf8-table.pl
int UrangeHigh = 2345;
for(int iLetter = 0; iLetter < str.length() ; iLetter++) {
int cp = str.codePointAt(iLetter);
if (cp >= UrangeLow && cp <= UrangeHigh) {
// word is NOT English
return;
}
}
// word is English
return;
I really don't think that this solution is ideal for determining language, but if you want to check to see if a string is all ascii, you could do something like this:
public static boolean isASCII(String s){
boolean ret = true;
for(int i = 0; i < s.length() ; i++) {
if(s.charAt(i)>=128){
ret = false;
break;
}
}
return ret;
}
So then if you try this:
boolean r = isASCII("Hello");
r would equal true. But if you try:
boolean r = isASCII("Grüß dich");
then r would equal false. I haven't tested performance, but this would work reasonably fast, because all it does is compare a character to the number 128.
But as #AlexanderPogrebnyak mentioned in the comments above, this will return false if you give it "résumé". Be aware of that.
Update:
I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII
But ASCII is a range in Unicode (well at least in UTF-8). Unicode is just an extension of ASCII. What the code #mP. and I provided does is it checks to see whether each character is in a certain range. I chose that range to be ASCII, which is any Unicode character that has a decimal value of less than 128. You can just as well choose any other range. But the reason I chose ASCII is because it's the one with the Latin alphabet, the Arabic numbers, and some other common characters that would normally be in an 'English' string.
public static boolean isAscii( String s ){
int length = s.length;
for( int i = 0; i < length; i++){
final char c = s.charAt( i );
if( c > 'z' ){
return false;
}
}
return true;
}
#Hassan thanks for picking the typo replaced test against big Z with little z.

Categories