BigDecimal.floatValue versus Float.valueOf - java

Has anybody ever come across this:
System.out.println("value of: " + Float.valueOf("3.0f")); // ok, prints 3.0
System.out.println(new BigDecimal("3.0f").floatValue()); // NumberFormatException
I would argue that the lack of consistency here is a bug, where BigDecimal(String) doesn't follow the same spec as Float.valueOf() (I checked the JDK doc).
I'm using a library that forces me to go through BigDecimal, but it can happen that I have to send "3.0f" there. Is there a known workaround (BigDecimal is inaccessible in a library).

The second example would never work since the documentation doesn't mention anything concerning an f in the String:
The String representation consists of an optional sign, '+' ('\u002B') or '-' ('\u002D'), followed by a sequence of zero or more decimal digits ("the integer"), optionally followed by a fraction, optionally followed by an exponent...
A workaround could be simply stripping the f off of the String. It should be valid then.

BigDecimal has its own documentation. As per BigDecimal javadoc
this constructor is compatible with the values returned by
Float.toString(float) and Double.toString(double).
It doesn't mention anything about Float.valueOf().

Related

Stucked with Reverse Polish Notation

I'm having a homework: When we input the math expression like -(2 + 3) * 1/5 , the output should be -1. After doing some research, I found that RPN algorithm is the way to solve this problem. So what I did is convert the expression from infix to postfix. But the problem is I can't determine the operands in some situations, like this:
Input: 11+((10-2)*6)+7
Infix-to-Postfix-----------
Output: 11102-6*+7+
There is no white space between "11" and "10", and between "10" and "2" so I can't determine every single operand correctly .
Because my output (postfix) is a string, I'm totally don't know how to solve this problem. Is there any idea for this?
You described the problem -- and obvious solution -- in your posting: the postfix output you chose destroys critical information from the original expression. The obvious solution is that you have to change your postfix routine to preserve that information.
The particular problem is that you can no longer parse a string of digits into the original integers. The obvious solution is to retain or insert a unique separator. When you emit (output) an integer, add some sort of punctuation. Since RPN uses only numbers and a handful of operators, choose something easy for you to detect and to read yourself: a space, comma, or anything else that would work for you.
For instance, if you use a simple space, then you'd have RPN format as
11 10 2 -6 *+7 +
When you read this in your RPN evaluator, use the separator as a "push integer" signal (or operator).
Note that I have used this separator as a terminal character on every integer, not merely a separator between consecutive integers. Making it a terminal simplifies both output processing and input parsing. Deciding whether or not to add that symbol depends on only one token (the integer), rather than making it conditional upon two adjacent tokens (requiring a small amount of context status).

Java Character: Is there a value for "Not A Character"?

In Java, for Double, we have a value for NaN (Not A Number).
Now, for Character, do we have a similar equivalent for "Not A Character"?
If the answer is no, then I think a safe substitute may be Character.MIN_VALUE (which is of type char and has value \u0000). Do you think this substitute is safe enough? Or do you have another suggestion?
In mathematics, there is a concept of "not a number" - 5 divided by 0 is not a number. Since this concept exists, there is NaN for the double type.
Characters are an abstract concept of mapping numbers to characters. The idea of "not a character" doesn't really exist, since the charset in use can vary (UTF-8, UTF-16, etc.).
Think of it this way. If I ask you, "what is 5 divided by 0?", you would say it's "not a number". But, we do have a defined way to represent the value, even though it's not a number. If I draw a random squiggle and ask you, "what letter is this?", you would say "it's not a letter". But, we don't have a way to actually represent that squiggle outside of what I just drew. There's no real way to communicate the "non-character" I've just drawn, but there is a way to communicate the "non-number" of 5 divided by 0.
\u0000 is the null character, which is still a character. What exactly are you trying to achieve? Depending on your goal \u0000 may suffice.
The "not-a-number" concept does not really belong to Java; rather, Java defines double as being IEEE 754 double precision floating-point numbers, which have that concept. (That said, if I recall correctly, Java does specify some details about NaN that IEEE 754 leaves open to implementations.)
The analogous standard for Java char is Unicode: Java defines char as being UTF-16 code units.
Unicode does have various reserved-undefined characters that you could use; for example, U+FFFF ('\uFFFF') will never be a character. Alternatively, you could use U+FFFD ('\uFFFD'), which is a character, but is specifically the "replacement character" suitable for replacing garbage or invalid characters.
Depends what you're trying to do. If you're trying to represent the lack of a character you could do
Optional<Character> noCharacter = Optional.empty();
You could check if the character's code is greater than or equal to the value of 'a' and less than or equal to the value of 'Z'. That would qualify as not a character if by not a character, you mean an alphabet letter. You could extend it to symbols like question mark, full stop, comma etc, but if you want to go further than ASCII territory, I think it gets out of hand.
One other approach would be to check if something is a number. If it's not, you could check if it's a white character, then if it's not, everything else qualifies as a character, therefore you get your answer.
It's a long discussion IMO, because answers vary, depending on your view on what's a character.

Why does DecimalFormat ".#" and "0.#" have different results on 23.0?

Why does java.text.DecimalFormat evaluate the following results:
new DecimalFormat("0.#").format(23.0) // result: "23"
new DecimalFormat(".#").format(23.0) // result: "23.0"
I would have expected the result to be 23 in both cases, because special character # omits zeros. How does the leading special character 0 affect the fraction part? (Tried to match/understand it with the BNF given in javadoc, but failed to do so.)
The second format seems to be invalid according to the JavaDoc, but somehow it parses without error anyway.
Pattern:
PositivePattern
PositivePattern ; NegativePattern
PositivePattern:
Prefixopt Number Suffixopt
NegativePattern:
Prefixopt Number Suffixopt
Prefix:
any Unicode characters except \uFFFE, \uFFFF, and special characters
Suffix:
any Unicode characters except \uFFFE, \uFFFF, and special characters
Number:
Integer Exponentopt
Integer . Fraction Exponentopt
Integer:
MinimumInteger
#
# Integer
# , Integer
MinimumInteger:
0
0 MinimumInteger
0 , MinimumInteger
Fraction:
MinimumFractionopt OptionalFractionopt
MinimumFraction:
0 MinimumFractionopt
OptionalFraction:
# OptionalFractionopt
Exponent:
E MinimumExponent
MinimumExponent:
0 MinimumExponentopt
In this case I'd expect the behaviour of the formatter to be undefined. That is, it may produce any old thing and we can't rely on that being consistent or meaningful in any way. So, I don't know why you're getting the 23.0, but you can assume that it's nonsense that you should avoid in your code.
Update:
I've just run a debugger through Java 7's DecimalFormat library. The code not only explicitly says that '.#' is allowed, there is a comment in there (java.text.DecimalFormat:2582-2593) that says it's allowed, and an implementation that allows it (line 2597). This seems to be in violation of the documented BNF for the pattern.
Given that this is not documented behaviour, you really shouldn't rely on it as it's liable to change between versions of Java or even library implementations.
The following source comment explains the rather unintuitive handling of ".#". Lines 3383-3385 in my DecimalFormat.java file (JDK 8) have the following comment:
// Handle patterns with no '0' pattern character. These patterns
// are legal, but must be interpreted. "##.###" -> "#0.###".
// ".###" -> ".0##".
Seems like the developers have chosen to interpret ".#" as ".0##", instead of what you expected ("0.#").

Given a string in Java, just take the first X letters

Is there something like a C# Substring for Java? I'm creating a mobile application for Blackberry device and due to screen constraints I can only afford to show 13 letters plus three dots for an ellipsis.
Any suggestion on how to accomplish this?
I need bare bones Java and not some fancy trick because I doubt a mobile device has access to a complete framework. At least in my experience working with Java ME a year ago.
You can do exactly what you want with String.substring().
String str = "please truncate me after 13 characters!";
if (str.length() > 16)
str = str.substring(0, 13) + "..."
String foo = someString.substring(0, Math.min(13, someString.length()));
Edit: Just for general reference, as of Guava 16.0 you can do:
String truncated = Ascii.truncate(string, 16, "...");
to truncate at a max length of 16 characters with an ellipsis.
Aside
Note, though, that truncating a string for display by character isn't a good system for anything where i18n might need to be considered. There are (at least) a few different issues with it:
You may want to take word boundaries and/or whitespace into account to avoid truncating at an awkward place.
Splitting surrogate pairs (though this can be avoided just by checking if the character you want to truncate at is the first of a surrogate pair).
Splitting a character and a combining character that follows it (e.g. an e followed by a combining character that puts an accent on that e.)
The appearance of a character may change depending on the character that follows it in certain languages, so just truncating at that character will produce something that doesn't even look like the original.
For these reasons (and others), my understanding is that best practice for truncation for display in a UI is to actually fade out the rendering of the text at the correct point on the screen rather than truncating the underlying string.
Whenever there is some operation that you would think is a very common thing to do, yet the Java API requires you to check bounds, catch exceptions, use Math.min(), etc. (i.e. requires more work than you would expect), check Apache's commons-lang. It's almost always there in a more concise format. In this case, you would use StringUtils#substring which does the error case handling for you. Here's what it's javadoc says:
Gets a substring from the specified String avoiding exceptions.
A negative start position can be used to start n characters from the end of the String.
A null String will return null. An empty ("") String will return "".
StringUtils.substring(null, *) = null
StringUtils.substring("", *) = ""
StringUtils.substring("abc", 0) = "abc"
StringUtils.substring("abc", 2) = "c"
StringUtils.substring("abc", 4) = ""
StringUtils.substring("abc", -2) = "bc"
StringUtils.substring("abc", -4) = "abc"
String str = "This is Mobile application."
System.out.println(str.subSequence(0, 13)+"...");

What is the effect of "*" in regular expressions?

My Java source code:
String result = "B123".replaceAll("B*","e");
System.out.println(result);
The output is:ee1e2e3e.
Why?
'*' means zero or more matches of the previous character. So each empty string will be replaced with an "e".
You probably want to use '+' instead:
replaceAll("B+", "e")
You want this for your pattern:
B+
And your code would be:
String result = "B123".replaceAll("B+","e");
System.out.println(result);
The "*" matches "zero or more" - and "zero" includes the nothing that's before the B, as well as between all the other characters.
I spent over a month working at a big tech company fixing a bug with * (splat!) in regular expressions. We maintained a little-known UNIX OS. My head nearly exploded because it matches ZERO occurrences of an encounter with a character. Talk about a hard bug to understand through your own recreates. We were double substituting in some cases. I couldn't figure out why the code was wrong, but was able to add code that caught the special (wrong) case and prevented double subbing and didn't break any of the utilities that included it (including sed and awk). I was proud to have fixed this bug, but as already mentioned.
For god's sake, just use + !!!!

Categories