Most elegant isNumeric() solution for java - java

I'm porting a small snippet of PHP code to java right now, and I was relying on the function is_numeric($x) to determine if $x is a number or not. There doesn't seem to be an equivalent function in java, and I'm not satisfied with the current solutions I've found so far.
I'm leaning toward the regular expression solution found here: http://rosettacode.org/wiki/Determine_if_a_string_is_numeric
Which method should I use and why?

Note that the PHP isNumeric() function will correctly determine that hex and scientific notation are numbers, which the regex approach you link to will not.
One option, especially if you are already using Apache Commons libraries, is to use NumberUtils.isNumber(), from Commons-Lang. It will handle the same cases that the PHP function will handle.

Have you looked into using StringUtils library?
There's a isNumeric() function which might be what you're looking for.
(Note that "" would be evaluated to true)

It's usually a bad idea to have a number in a String. If you want to use this number then parse it and use it as a numeric. You shouldn't need to "check" if it's a numeric, either you want to use it as a numeric or not.
If you need to convert it, then you can use every parser from Integer.parseInt(String) to BigDecimal(String)
If you just need to check that the content can be seen as a numeric then you can get away with regular expressions.
And don't use the parseInt if your string can contain a float.

Optionally you can use a regular expression as well.
if (theString.matches("((-|\\+)?[0-9]+(\\.[0-9]+)?)+")))
return true;
return false;

Did you try Integer.parseInt()? (I'm not sure of the method name, but the Integer class has a method that creates an Integer object from strings). Or if you need to handle non-integer numbers, similar methods are available for Double objects as well. If these fail, an exception is thrown.
If you need to parse very large numbers (larger than int/double), and don't need the exact value, then a simple regex based method might be sufficient.

In a strongly typed language, a generic isNumeric(String num) method is not very useful. 13214384348934918434441 is numeric, but won't fit in most types. Many of those where is does fit won't return the same value.
As Colin has noted, carrying numbers in Strings withing the application is not recommended. The isNumberic function should only be applicable for input data on interface methods. These should have a more precise definition than isNumeric. Others have provided various solutions. Regular expressions can be used to test a number of conditions at once, including String length.

Just use
if((x instanceof Number)
//if checking for parsable number also
|| (x instanceof String && x.matches("((-|\+)?[0-9]+(\.[0-9]+)?)+"))
){
...
}
//---All numeric types including BigDecimal extend Number

Related

Is there a way to set the value of an Aspose Cell to a string representation while keeping its type/style?

We have a program that does pattern replacement in a variety of files (including Excel) and are using Aspose. So for example we might have a cell that is PATTERN_TO_REPLACE and then we replace that with "6" or "6.0" or "1,000.00" (in the US) or "1.000,00" (in the UK).
(PATTERN_TO_REPLACE could also be in a formula).
The problem is that we have to use a Java matcher to find (and replace) the string, which means we are calling cell.setValue(String). This led Aspose to change the CellValueType to string, even when it was previously numeric.
Our initial solution was to look and see if we thought the string was a number, then cast it to a BigDecimal and pass it in as such, but that leads to a lot of work (particularly for localization of types of currencies...)
What I'd like to do is call cell.getStyle(), cell.getType(), save them, set the value, then return them, but there is no cell.setType() method. Is there an easy way to do what I want?
When you use cell.setValue(String), surely, it will set string type. I think you may try to use the overloads like, cell.putValue(String, true) instead. This way, if you are inserting numbers (as string), it will be automatically converted to numbers and set it to numeric type.
PS. I am working as Support developer/ Evangelist at Aspose.

Why using default trash value for string is wrong?

tl;dr;
Why using
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
as default value is wrong?
explanation of situation:
I had a discussion with a colleague at my workplace.
He proposed to use some trash value as default in order to differentiate it from user value.
An easy example it would be like this:
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
...
if(myVariable == "someInitialValueDifferentThanUserValue99999999999")
{
...
}
This is quite obvious and intuitive for me that this is wrong.
But I could not give a nice argument for this, beyond that:
this is not professional.
there is a slight chance that someone would input the same value.
Once I read that if you have such a situation your architecture or programming habits are wrong.
edit:
Thank you for the answers. I found a solution that satisfied me, so I share with the others:
It is good to make a bool guard value that indicates if the initialization of a specific object has been accomplished.
And based on this private bool variable I can deduce if I play with a string that is default empty value "" from my mechanism (that is during initialization) or empty value from the user.
For me, this is a more elegant way.
Optional
Optional can be used.
Returns an empty Optional instance. No value is present for this Optional.
API Note:
Though it may be tempting to do so, avoid testing if an object is empty by comparing with == against instances returned by Option.empty(). There is no guarantee that it is a singleton. Instead, use isPresent().
Ref: Optional
Custom escape sequence shared by server and client
Define default value
When the user enter's the default value, escape the user value
Use a marker character
Always define the first character as the marker character
Take decision based on this character and strip this character for any actual comparison
Define clear boundaries for the check as propagating this character across multiple abstractions can lead to code maintenance issues.
Small elaboration on "It's not professional":
It's often a bad idea, because
it wastes memory when not a constant (at least in Java - of course, unless you're working with very limited space that's negligible).
Even as constant it may introduce ambiguity once you have more classes, packages or projects ("Was it NO_INPUT, INPUT_NOT_PROVIDED, INPUT_NONE?")
usually it's a sign that there will be no standardized scope-bound Defined_Marker_Character in the Project Documentation like suggested in the other answers
it introduces ambiguity for how to deal with deciding if an input has been provided or not
In the end you will either have a lot of varying NO_INPUT constants in different classes or end up with a self-made SthUtility class that defines one constant SthUtility.NO_INPUT and a static method boolean SthUtility.isInputEmpty(...) that compares a given input against that constant, which basically is reinventing Optional. And you will be copy-pasting that one class into every of your projects.
There is really no need as you can do the following as of Java 11 which was four releases ago.
String value = "";
// true only if length == 0
if (value.isEmpty()) {
System.out.println("Value is empty");
}
String value = " ";
// true if empty or contains only white space
if (value.isBlank()) {
System.out.println("Value is blank");
}
And I prefer to limit uses of such strings that can be searched in the class file that might possibly lead to exploitation of the code.

How to store mathematical formula in MS SQL Server DB and interpret it using JAVA?

I have to give the user the option to enter in a text field a mathematical formula and then save it in the DB as a String. That is easy enough, but I also need to retrieve it and use it to do calculations.
For example, assume I allow someone to specify the formula of employee salary calculation which I must save in String format in the DB.
GROSS_PAY = BASIC_SALARY - NO_PAY + TOTAL_OT + ALLOWANCE_TOTAL
Assume that terms such as GROSS_PAY, BASIC_SALARY are known to us and we can make out what they evaluate to. The real issue is we can't predict which combinations of such terms (e.g. GROSS_PAY etc.) and other mathematical operators the user may choose to enter (not just the +, -, ×, / but also the radical sigh - indicating roots - and powers etc. etc.). So how do we interpret this formula in string format once where have retrieved it from DB, so we can do calculations based on the composition of the formula.
Building an expression evaluator is actually fairly easy.
See my SO answer on how to write a parser. With a BNF for the range of expression operators and operands you exactly want, you can follow this process to build a parser for exactly those expressions, directly in Java.
The answer links to a second answer that discusses how to evaluate the expression as you parse it.
So, you read the string from the database, collect the set of possible variables that can occur in the expression, and then parse/evaluate the string. If you don't know the variables in advance (seems like you must), you can parse the expression twice, the first time just to get the variable names.
as of Evaluating a math expression given in string form there is a JavaScript Engine in Java which can execute a String functionality with operators.
Hope this helps.
You could build a string representation of a class that effectively wraps your expression and compile it using the system JavaCompiler — it requires a file system. You can evaluate strings directly using javaScript or groovy. In each case, you need to figure out a way to bind variables. One approach would be to use regex to find and replace known variable names with a call to a binding function:
getValue("BASIC_SALARY") - getValue("NO_PAY") + getValue("TOTAL_OT") + getValue("ALLOWANCE_TOTAL")
or
getBASIC_SALARY() - getNO_PAY() + getTOTAL_OT() + getALLOWANCE_TOTAL()
This approach, however, exposes you to all kinds of injection type security bugs; so, it would not be appropriate if security was required. The approach is also weak when it comes to error diagnostics. How will you tell the user why their expression is broken?
An alternative is to use something like ANTLR to generate a parser in java. It's not too hard and there are a lot of examples. This approach will provide both security (users can't inject malicious code because it won't parse) and diagnostics.

How to use or implement arrays in XQuery?

Is there any built in support for array in XQuery? For example, if we want to implement
the simple java program in xquery how we would do it:
(I am not asking to translate the entire program into xquery, but just asking
how to implement the array in line number 2 of the below code to xquery? I am
using marklogic / xdmp functions also).
java.lang.String test = new String("Hello XQuery");
char[] characters = test.toCharArray();
for(int i = 0; i<characters.length; i++) {
if(character[i] == (char)13) {
character[i] = (char) 0x00;
}
}
Legend:
hex 0x00 dec 0 : null
hex 0x0d dec 13: carriage return
hex 0x0a dec 10: line feed
hex 0x20 dec 22: dquote
The problem with converting your sample code to XQuery is not the absence of support for arrays, but the fact that x00 is not a valid character in XML. If it weren't for this problem, you could express your query with the simple function call:
translate($input, '', '')
Now, you could argue that's cheating, it just happens so that there's a function that does exactly what you are trying to do by hand. But if this function didn't exist, you could program it in XQuery: there are sufficient primitives available for strings to allow you to manipulate them any way you want. If you need to (and it's rarely necessary) you can convert a string to a sequence of integers using the function string-to-codepoints(), and then take advantage of all the XQuery facilities for manipulating sequences.
The lesson is, when you use a declarative language like XQuery or XSLT, don't try to use the same low-level programming techniques you were forced to use in more primitive languages. There's usually a much more direct way of expressing the problem.
XQuery has built-in support for sequences. The function tokenize() (as suggested by #harish.ray) returns a sequence. You can also construct one yourself using braces and commas:
let $mysequence = (1, 2, 3, 4)
Sequences are ordered lists, so you can rely on that. That is slightly different from a node-set returned from an XPath, those usually are document-ordered.
On a side mark: actually, everything in XQuery is either a node-set or a sequence. Even if a function is declared to return one string or int, you can treat that returned value as if it is a sequence of one item. No explicit casting is necessary, for which there are no constructs in XQuery anyhow. Functions like fn:exists() and fn:empty() always work.
HTH!
Just for fun, here's how I would do this in XQuery if fn:translate did not exist. I think Michael Kay's suggestion would end up looking similar.
let $test := "Hello XQuery"
return codepoints-to-string(
for $c in string-to-codepoints($test)
return if ($c eq 32) then 44 else $c)
Note that I changed the transformation because of the problem he pointed: 0 is not a legal codepoint. So instead I translated spaces to commas.
With MarkLogic, another option is to use http://docs.marklogic.com/json:array and its associated functions. The json:set-item-at function would allow coding in a vaguely imperative style. Coding both variations might be a good learning exercise.
There are two ways to do this.
Firstly you can create an XmlResults object using
XmlManager.createResults(), and use XmlResults.add() to add your
strings to this. You can then use the XmlResults object to set the
value of a variable in XmlQueryContext, which can be used in your
query.
Example:
XmlResults values = XMLManager.createResults();
values.add(new XmlValue("value1"));
values.add(new XmlValue("value2"));
XmlQueryContext.setVariableValue("files", values);
The alternative is to split the string in XQuery. You
can do this using the tokenize() function, which works using a
regular expression to match the string separator.
http://www.w3.org/TR/xpath-functions/#func-tokenize
Thanks.
A little outlook: XQuery 3.1 will provide native support for arrays. See http://www.w3.org/TR/xquery-31/ for more details.
You can construct an array like this:
$myArray = tokenize('a b c d e f g', '\s')
// $myArray[3] -> c
Please note that the first index of this pseudo-array is 1 not 0!
Since the question "How to use or implement arrays in XQuery?" is being held generic (and thus shows up in search results on this topic), I would like to add a generic answer for future reference (making it a Community Wiki, so others may expand):
As Christian Grün has already hinted at, with XQuery 3.1 XQuery got a native array datatype, which is a subtype of the function datatype.
Since an array is a 'ordered list of values' and an XPath/XQuery sequence is as well, the first question, which may arise, is: "What's the difference?" The answer is simple: a sequence can not contain another sequence. All sequences are automatically flattened. Not so an array, which can be an array of arrays. Just like sequences, arrays in XQuery can also have any mix of any other datatype.
The native XQuery array datatype can be expressed in either of two ways: As [] or via array {}. The difference being, that, when using the former constructor, a comma is being considered a 'hard' comma, meaning that the following array consists of two members:
[ ("apples", "oranges"), "plums" ]
while the following will consist of three members:
array { ("apples", "oranges"), "plums" }
which means, that the array expression within curly braces is resolved to a flat sequence first, and then memberized into an array.
Since Array is a subtype of function, an array can be thought of as an anonymous function, that takes a single parameter, the numeric index. To get the third member of an array, named $foo, we thus can write:
$foo(3)
If an array contains another array as a member you can chain the function calls together, as in:
$foo(3)(5)
Along with the array datatype, special operators have been added, which make it easy to look up the values of an array. One such operator (also used by the new Map datatype) is the question mark followed by an integer (or an expression that evaluates to zero or more integers).
$foo?(3)
would, again, return the third member within the array, while
$foo?(3, 6)
would return the members 3 and 6.
The parenthesis can be left out, when working with literal integers. However, the parens are needed, to form the lookup index from a dynamic expression, like in:
$foo?(3 to 6)
here, the expression in the parens gets evaluated to a sequence of integers and thus the expression would return a sequence of all members from index position 3 to index position 6.
The asterisk * is used as wildcard operator. The expression
$foo?*
will return a sequence of all items in the array. Again, chaining is possible:
$foo?3?5
matches the previos example of $foo(3)(5).
More in-depth information can be found in the official spec: XML Path Language (XPath) 3.1 / 3.11.2 Arrays
Also, a new set of functions, specific to arrays, has been implemented. These functions resinde in the namespace http://www.w3.org/2005/xpath-functions/array, which, conventionally, is being prefixed with array and can be found referenced in here: XPath and XQuery Functions and Operators 3.1 / 17.3 Functions that Operate on Arrays

How to match a long with Java regex?

I know i can match numbers with Pattern.compile("\\d*");
But it doesn't handle the long min/max values.
For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.
if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
long timeStamp = Long.parseLong(timestampStr);
return new Date(timeStamp);
} else {
LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
return null;
}
I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.
Does someone has a pattern to handle this kind of need for the primitive java types?
Does the JDK provide something to handle it automatically?
Is there a fail-safe parsing mecanism in Java?
Thanks
Edit: Please assume that the "bad long string" is not an exceptionnal case.
I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more.
I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"
I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.
The minimum avlue of a long is -9,223,372,036,854,775,808, and the maximum value is 9,223,372,036,854,775,807. So, a maximum of 19 digits. So, \d{1,19} should get you there, perhaps with an optional -, and with ^ and $ to match the ends of the string.
So roughly:
Pattern LONG_PATTERN = Pattern.compile("^-?\\d{1,19}$");
...or something along those lines, and assuming you don't allow commas (or have already removed them).
As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.
This regular expression should do what you need:
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
But this regexp doesn't validate additional symbols like +, L, _ and etc. And if you need to validate all possible Long values you need to upgrade this regexp.
Simply catch the NumberFormatException, unless this case happens very often.
Another way would be to use a pattern which only allows long literals. Such pattern might be quite complex.
A third way would be to parse the number as BigInt first. Then you can compare it to Long.MAX_VALUE and Long.MIN_VALUE to check whether it is in the bounds of long. However, this might be costly as well.
Also note:
Parsing the long is quite fast, it is a very optimized method (that, for example, tries to parse two digits in one step). Applying pattern matching might be even more costly than performing the parsing. The only thing which is slow about the parsing is throwing the NumberFormatException. Thus, simply catching the exception is the best way to go if the exceptional case does not happen too often

Categories