How to match a long with Java regex?

How to match a long with Java regex? - java

I know i can match numbers with Pattern.compile("\\d*");
But it doesn't handle the long min/max values.
For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.
if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
long timeStamp = Long.parseLong(timestampStr);
return new Date(timeStamp);
} else {
LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
return null;
}
I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.
Does someone has a pattern to handle this kind of need for the primitive java types?
Does the JDK provide something to handle it automatically?
Is there a fail-safe parsing mecanism in Java?
Thanks
Edit: Please assume that the "bad long string" is not an exceptionnal case.
I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more.
I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"
I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.

The minimum avlue of a long is -9,223,372,036,854,775,808, and the maximum value is 9,223,372,036,854,775,807. So, a maximum of 19 digits. So, \d{1,19} should get you there, perhaps with an optional -, and with ^ and $ to match the ends of the string.
So roughly:
Pattern LONG_PATTERN = Pattern.compile("^-?\\d{1,19}$");
...or something along those lines, and assuming you don't allow commas (or have already removed them).
As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.

This regular expression should do what you need:
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
But this regexp doesn't validate additional symbols like +, L, _ and etc. And if you need to validate all possible Long values you need to upgrade this regexp.

Simply catch the NumberFormatException, unless this case happens very often.
Another way would be to use a pattern which only allows long literals. Such pattern might be quite complex.
A third way would be to parse the number as BigInt first. Then you can compare it to Long.MAX_VALUE and Long.MIN_VALUE to check whether it is in the bounds of long. However, this might be costly as well.
Also note:
Parsing the long is quite fast, it is a very optimized method (that, for example, tries to parse two digits in one step). Applying pattern matching might be even more costly than performing the parsing. The only thing which is slow about the parsing is throwing the NumberFormatException. Thus, simply catching the exception is the best way to go if the exceptional case does not happen too often

Related

How to cast from String to BigDecimal with the separator as a comma? [duplicate]

This question already has answers here:
Java BigDecimal can have comma instead dot?
(3 answers)
Closed 3 months ago.
I have a method to set a BigDecimal number that is given as String:
private Client mapClient(Client client){
ClientRequest clientRequest = new ClientRequest();
// Code
clientRequest.setCashAmount(castStringToBigDecimal(client.getCashAmount()));
// More Code
}
My castStringToBigDecimal method is the follosing:
public BigDecimal castStringToBigDecimal(String value){
BigDecimal response = null;
if(value != null && !value.equals("")){
value = value.replaceAll("[.]", ",");
response = new BigDecimal(value);
}
return response;
}
An example of the input value is "1554.21"
I need that the bigDecimal separator to be a comma, not a dot. But this is giving me an exception.
EDIT
The value is the following:
And the exception is:
java.lang.NumberFormatException: Character , is neither a decimal digit number, decimal point, nor "e" notation exponential mark.

BigDecimal doesn't represent a rendering. In other words, whether to use a comma or a dot as separator is not part of the properties a BigDecimal object has.
Hence, you do not want to call .replaceAll. (And separately, you'd want .replace(".", ",") - replace replaces all, and replaceAll also replaces all and interprets the first arg as a regex, and is therefore needlessly confusing here). Just pass it with the dot.
To render a BigDecimal, don't just sysout it, that will always show a dot and there is nothing you can do about that. toString() is almost never the appropriate tool for the job of rendering data to a user - it's a debugging aid, nothing more. Use e.g. String.format("%f"), specifying the appropriate locale. Or use NumberFormat. The javadoc of BigDecimal explicitly spells this out.
There are various other issues with your code:
"cast" is the technical name for the syntactic construct: (Type) expr; - and this construct does 3 utterly different things, hence using it to describe a task, i.e. use it in a method name, is a very bad idea. In particular, only one of the 3 things it does converts anything, and you clearly use it here in the 'convert something' meaning. This is misleading; only if it's all primitives does the cast operator convert, and BigDecimal isn't primitive. Call it convertTo or whatever you please, not "cast".
BigDecimal is an extremely complicated tool for the job and usually not the right tool if you want to represent financial data. Instead, represent the atomary unit in a long and call the appropriate rendering method whenever you need to show it to a user. For example, for euros, the atomary unit is the eurocent. If something costs €1,50, you'd store "150", in a long. Before you think: But, wait, I want to divide, and then I'd lose half a cent! - yes, well, you can't exactly send your bank a request to transfer half a cent, either. Also, try to divide 4 cents by 3 with a BigDecimal and see what happens. Dividing financial amounts is tricky no matter what you use, BD isn't a catch-all solution to this problem.

I looked up the source code for Java 8's implementation of BigDecimal (https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/math/BigDecimal.java), and the period character is hard-coded in that source as the decimal point. I would not have thought this of a language for which internationalization has been so thoroughly designed in, but there it is, line 466.
Given that the author(s) of BigDecimal failed to take locale into account in such a basic way -- the use of comma instead of period as the decimal separator in Europe is well-known -- I'd have to say you cannot use that BigDecimal constructor on unaltered Strings that are otherwise formatted correctly but which (might) have a comma separator. There are other options -- the previous SO post referred to in one of the comments has one -- but it appears you cannot convert your String this way.
(One minor point -- you are not "casting" anything. That word has a specific meaning in OO programming, and a more specific one in Java, and has very little to do with your question. It is incorrect to refer to conversion as casting.)

Why using default trash value for string is wrong?

tl;dr;
Why using
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
as default value is wrong?
explanation of situation:
I had a discussion with a colleague at my workplace.
He proposed to use some trash value as default in order to differentiate it from user value.
An easy example it would be like this:
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
...
if(myVariable == "someInitialValueDifferentThanUserValue99999999999")
{
...
}
This is quite obvious and intuitive for me that this is wrong.
But I could not give a nice argument for this, beyond that:
this is not professional.
there is a slight chance that someone would input the same value.
Once I read that if you have such a situation your architecture or programming habits are wrong.
edit:
Thank you for the answers. I found a solution that satisfied me, so I share with the others:
It is good to make a bool guard value that indicates if the initialization of a specific object has been accomplished.
And based on this private bool variable I can deduce if I play with a string that is default empty value "" from my mechanism (that is during initialization) or empty value from the user.
For me, this is a more elegant way.

Optional
Optional can be used.
Returns an empty Optional instance. No value is present for this Optional.
API Note:
Though it may be tempting to do so, avoid testing if an object is empty by comparing with == against instances returned by Option.empty(). There is no guarantee that it is a singleton. Instead, use isPresent().
Ref: Optional
Custom escape sequence shared by server and client
Define default value
When the user enter's the default value, escape the user value
Use a marker character
Always define the first character as the marker character
Take decision based on this character and strip this character for any actual comparison
Define clear boundaries for the check as propagating this character across multiple abstractions can lead to code maintenance issues.

Small elaboration on "It's not professional":
It's often a bad idea, because
it wastes memory when not a constant (at least in Java - of course, unless you're working with very limited space that's negligible).
Even as constant it may introduce ambiguity once you have more classes, packages or projects ("Was it NO_INPUT, INPUT_NOT_PROVIDED, INPUT_NONE?")
usually it's a sign that there will be no standardized scope-bound Defined_Marker_Character in the Project Documentation like suggested in the other answers
it introduces ambiguity for how to deal with deciding if an input has been provided or not
In the end you will either have a lot of varying NO_INPUT constants in different classes or end up with a self-made SthUtility class that defines one constant SthUtility.NO_INPUT and a static method boolean SthUtility.isInputEmpty(...) that compares a given input against that constant, which basically is reinventing Optional. And you will be copy-pasting that one class into every of your projects.

There is really no need as you can do the following as of Java 11 which was four releases ago.
String value = "";
// true only if length == 0
if (value.isEmpty()) {
System.out.println("Value is empty");
}
String value = " ";
// true if empty or contains only white space
if (value.isBlank()) {
System.out.println("Value is blank");
}
And I prefer to limit uses of such strings that can be searched in the class file that might possibly lead to exploitation of the code.

How to store mathematical formula in MS SQL Server DB and interpret it using JAVA?

I have to give the user the option to enter in a text field a mathematical formula and then save it in the DB as a String. That is easy enough, but I also need to retrieve it and use it to do calculations.
For example, assume I allow someone to specify the formula of employee salary calculation which I must save in String format in the DB.
GROSS_PAY = BASIC_SALARY - NO_PAY + TOTAL_OT + ALLOWANCE_TOTAL
Assume that terms such as GROSS_PAY, BASIC_SALARY are known to us and we can make out what they evaluate to. The real issue is we can't predict which combinations of such terms (e.g. GROSS_PAY etc.) and other mathematical operators the user may choose to enter (not just the +, -, ×, / but also the radical sigh - indicating roots - and powers etc. etc.). So how do we interpret this formula in string format once where have retrieved it from DB, so we can do calculations based on the composition of the formula.

Building an expression evaluator is actually fairly easy.
See my SO answer on how to write a parser. With a BNF for the range of expression operators and operands you exactly want, you can follow this process to build a parser for exactly those expressions, directly in Java.
The answer links to a second answer that discusses how to evaluate the expression as you parse it.
So, you read the string from the database, collect the set of possible variables that can occur in the expression, and then parse/evaluate the string. If you don't know the variables in advance (seems like you must), you can parse the expression twice, the first time just to get the variable names.

as of Evaluating a math expression given in string form there is a JavaScript Engine in Java which can execute a String functionality with operators.
Hope this helps.

You could build a string representation of a class that effectively wraps your expression and compile it using the system JavaCompiler — it requires a file system. You can evaluate strings directly using javaScript or groovy. In each case, you need to figure out a way to bind variables. One approach would be to use regex to find and replace known variable names with a call to a binding function:
getValue("BASIC_SALARY") - getValue("NO_PAY") + getValue("TOTAL_OT") + getValue("ALLOWANCE_TOTAL")
or
getBASIC_SALARY() - getNO_PAY() + getTOTAL_OT() + getALLOWANCE_TOTAL()
This approach, however, exposes you to all kinds of injection type security bugs; so, it would not be appropriate if security was required. The approach is also weak when it comes to error diagnostics. How will you tell the user why their expression is broken?
An alternative is to use something like ANTLR to generate a parser in java. It's not too hard and there are a lot of examples. This approach will provide both security (users can't inject malicious code because it won't parse) and diagnostics.

Performance of HashMap

I have to process 450 unique strings about 500 million times. Each string has unique integer identifier. There are two options for me to use.
I can append the identifier with the string and on arrival of the
string I can split the string to get the identifier and use it.
I can store the 450 strings in HashMap<String, Integer> and on
arrival of the string, I can query HashMap to get the identifier.
Can someone suggest which option will be more efficient in terms of processing?

It all depends on the sizes of the strings, etc.
You can do all sorts of things.
You can use a binary search to get the index in a list, and at that index is the identifier.
You can hash just the first 2 characters, rather than the entire string, that would likely be faster than the binary search, assuming the strings have an OK distribution.
You can use the first character, or first two characters, if they're unique as a "perfect index" in to 255 or 65K large array that points to the identifier.
Also, if your identifier is numeric, it's better to pre-calculate that, rather than convert it on the fly all the time. Text -> Binary is actually rather expensive (Binary -> Text is worse). So it's probably nice to avoid that if possible.
But it behooves you work the problem. 1 million anything at 1ms each, is 20 minutes of processing. At 500m, every nano-second wasted adds up to 8+ minutes extra of processing. You may well not care, but just demonstrating that at these scales "every little bit helps".
So, don't take our words for it, test different things to find what gives you the best result for your work set, and then go with that. Also consider excessive object creation, and avoiding that. Normally, I don't give it a second thought. Object creation is fast, but a nano-second is a nano-second.
If you're working in Java, and you don't REALLY need Unicode (i.e. you're working with single characters of the 0-255 range), I wouldn't use strings at all. I'd work with raw bytes. String are based on Java characters, which are UTF-16. Java Readers convert UTF-8 in to UTF-16 every. single. time. 500 million times. Yup! Another few nano-seconds. 8 nano-seconds adds an hour to your processing.
So, again, look in all the corners.
Or, don't, write it easy, fire it up, run it over the weekend and be done with it.

If each String has a unique identifier then retrieval is O(1) only in case of hashmaps.
I wouldn't suggest the first method because you are splitting every string for 450*500m, unless your order is one string for 500m times then on to the next. As Will said, appending numeric to strings then retrieving might seem straight forward but is not recommended.
So if your data is static (just the 450 strings) put them in a Hashmap and experiment it. Good luck.

Use HashMap<Integer, String>. Splitting a string to get the identifier is an expensive operation because it involves creating new Strings.

I don't think anyone is going to be able to give you a convincing "right" answer, especially since you haven't provided all of the background / properties of the computation. (For example, the average length of the strings could make a lot of difference.)
So I think your best bet would be to write a benchmark ... using the actual strings that you are going to be processing.
I'd also look for a way to extract and test the "unique integer identifier" that doesn't entail splitting the string.

Splitting the string should work faster if you write your code well enough. In fact if you already have the int-id, I see no reason to send only the string and maintain a mapping.
Putting into HashMap would need hashing the incoming string every time. So you are basically comparing the performance of the hashing function vs the code you write to append (prepending might be a bit more tricky) on sending end and to parse on receiving end.
OTOH, only 450 strings aren't a big deal, and if you're into it, writing your own hashing algo/function would actually be the most elegant and performant.

Most elegant isNumeric() solution for java

I'm porting a small snippet of PHP code to java right now, and I was relying on the function is_numeric($x) to determine if $x is a number or not. There doesn't seem to be an equivalent function in java, and I'm not satisfied with the current solutions I've found so far.
I'm leaning toward the regular expression solution found here: http://rosettacode.org/wiki/Determine_if_a_string_is_numeric
Which method should I use and why?

Note that the PHP isNumeric() function will correctly determine that hex and scientific notation are numbers, which the regex approach you link to will not.
One option, especially if you are already using Apache Commons libraries, is to use NumberUtils.isNumber(), from Commons-Lang. It will handle the same cases that the PHP function will handle.

Have you looked into using StringUtils library?
There's a isNumeric() function which might be what you're looking for.
(Note that "" would be evaluated to true)

It's usually a bad idea to have a number in a String. If you want to use this number then parse it and use it as a numeric. You shouldn't need to "check" if it's a numeric, either you want to use it as a numeric or not.
If you need to convert it, then you can use every parser from Integer.parseInt(String) to BigDecimal(String)
If you just need to check that the content can be seen as a numeric then you can get away with regular expressions.
And don't use the parseInt if your string can contain a float.

Optionally you can use a regular expression as well.
if (theString.matches("((-|\\+)?[0-9]+(\\.[0-9]+)?)+")))
return true;
return false;

Did you try Integer.parseInt()? (I'm not sure of the method name, but the Integer class has a method that creates an Integer object from strings). Or if you need to handle non-integer numbers, similar methods are available for Double objects as well. If these fail, an exception is thrown.
If you need to parse very large numbers (larger than int/double), and don't need the exact value, then a simple regex based method might be sufficient.

In a strongly typed language, a generic isNumeric(String num) method is not very useful. 13214384348934918434441 is numeric, but won't fit in most types. Many of those where is does fit won't return the same value.
As Colin has noted, carrying numbers in Strings withing the application is not recommended. The isNumberic function should only be applicable for input data on interface methods. These should have a more precise definition than isNumeric. Others have provided various solutions. Regular expressions can be used to test a number of conditions at once, including String length.

Just use
if((x instanceof Number)
//if checking for parsable number also
|| (x instanceof String && x.matches("((-|\+)?[0-9]+(\.[0-9]+)?)+"))
){
...
}
//---All numeric types including BigDecimal extend Number

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.