Efficient Parsing of Byte Array to Number

Efficient Parsing of Byte Array to Number - java

Rather than parsing a byte array to an ASCII string then converting the string to say an integer, it should be more efficient to parse the byte array directly to an integer.
byte[] token = "24000".getBytes(Charset.forName("US-ASCII"));
The following code can do this:
int n = 0;
for (byte b : token)
n = 10*n + (b-'0');
Versus the common approach:
int n = Integer.parseInt(new String(token));
Ref: Dave's answer here >> Converting US-ASCII encoded byte to integer and back
Is there a comprehensive solution that skips String creation and goes straight to result?
Please stop marking the question for closing due to this question:
Convert a byte array to integer in java and vice versa
It deals with non-encoded bytes.
It does not answer my question.

The Java library doesn't seem to have a dedicated tool for the job, but it sure has enough tools to write one yourself.
In my opinion, if you're worried about performance since turning byte arrays to ints is a bottleneck in your code, then I suggest writing your own solution based on the piece of code you provided. If it isn't, then just use parseInt for easier readability.
In any case, if Java had a tool for doing this, it would use pretty much the same code under the hood. That's pretty much what Integer.parseInt() does (except it covers other bases, negative numbers, and is safer):
public static int parseInt(String s, int radix)
throws NumberFormatException
{
/*
* WARNING: This method may be invoked early during VM initialization
* before IntegerCache is initialized. Care must be taken to not use
* the valueOf method.
*/
if (s == null) {
throw new NumberFormatException("null");
}
if (radix < Character.MIN_RADIX) {
throw new NumberFormatException("radix " + radix +
" less than Character.MIN_RADIX");
}
if (radix > Character.MAX_RADIX) {
throw new NumberFormatException("radix " + radix +
" greater than Character.MAX_RADIX");
}
int result = 0;
boolean negative = false;
int i = 0, len = s.length();
int limit = -Integer.MAX_VALUE;
int multmin;
int digit;
if (len > 0) {
char firstChar = s.charAt(0);
if (firstChar < '0') { // Possible leading "+" or "-"
if (firstChar == '-') {
negative = true;
limit = Integer.MIN_VALUE;
} else if (firstChar != '+')
throw NumberFormatException.forInputString(s);
if (len == 1) // Cannot have lone "+" or "-"
throw NumberFormatException.forInputString(s);
i++;
}
multmin = limit / radix;
while (i < len) {
// Accumulating negatively avoids surprises near MAX_VALUE
digit = Character.digit(s.charAt(i++),radix);
if (digit < 0) {
throw NumberFormatException.forInputString(s);
}
if (result < multmin) {
throw NumberFormatException.forInputString(s);
}
result *= radix;
if (result < limit + digit) {
throw NumberFormatException.forInputString(s);
}
result -= digit;
}
} else {
throw NumberFormatException.forInputString(s);
}
return negative ? result : -result;
}

Related

How can I loop through all the characters that are 0 in a given string?

I'm trying to remove trailing zeroes from an integer and here is my code so far.
import java.math.BigInteger;
public class newuhu {
public static int numTrailingZeros(int s) {
BigInteger J = BigInteger.valueOf(s);
String sb = J.toString();
String Y = "";
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == '0') {
sb.replaceAll("0"," ");
}
return Integer.parseInt(Y);
}
Note: I turned my int into a Biginteger because I've been warned that some inputs may look like 20!, which is 2.432902e+18
However, my IntelliJ debugging tool tells me that variable sb isn't in the loop. So, I'm trying to understand what must be done to make sure sb is in the loop.
Please understand that I'm a beginner in Java so, I'm trying to learn something new.

replaceAll replaces all occurrences of string with character that you want (ie space) so you don't need loop at all, also you're concerned about overflow so you should actually use BigInteger as a parameter, not int (int wont fit anything close to 20!) but there's another issue with your code, you said you want to replace trailing zeros but right now you will replace every 0 with blank character, you should try to use something like https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html
public class newuhu {
public static int numTrailingZeros(BigInteger s) {
String sb = s.toString();
return Integer.parseInt(sb.replaceAll("0", "")); // consider returning something else if you're working with BigInteger
}

Keep in mind that when doing BigInteger.valueOf(int) does not have an effect as a number to big for int will never be stored in an int. Also 20! is fine for int.
public static String trimTrailingZeros(String source) {
for (int i = source.length() - 1; i > 0; ++i) {
char c = source.charAt(i);
if (c != '0') {
return source.substring(0, i + 1);
}
}
return ""; // or return "0";
}
Or if you prever BigInteger.
public static BigInteger trimTrailingZeros(BigInteger num) {
while (num.remainder(BigInteger.TEN).signum() == 0) {
num = num.divide(BigInteger.TEN);
}
return num;
}
This should be fast as you only create one string (via substring).

(First: variables and fields should start with a small letter - when no constants.)
It should be
sb = sb.replaceAll(...)
as sb is not changed by replaceAll, only the replaced value is returned. The class String gives immutable values, that always remain the same, so you can assign a variable to a variable and changing values of either variable will never influence the other - no further sharing.
Then it should be:
sb = sb.replaceFirst("0$", "");
replaceAll would replace every 0, like "23043500" to "23435".
replaceFirst replaces the _regular expression: char '0' at the end $.
Overflow on the input number is not possible, as you are already passing an int.
public static int numTrailingZeros(int n) {
while (n != 0 && n % 10 == 0) {
n /= 10;
}
return n;
}
public static int numTrailingZeros(String n) {
n = n.replaceAll("0+", "");
if (n.isEmpty() || n.equals("-")) { // "-0" for pessimists?
n = "0";
}
return Integer.parseInt(n);
}
% is the modulo operator, the remainder of an integer division 147 % 10 == 7.
The name is misleading, or you are calculating something different.
public static int numTrailingZeros(int n) {
int trailingZeros = 0;
while (n != 0 && n % 10 == 0) {
n /= 10;
++trailingZeros ;
}
return trailingZeros ;
}

The problem here is that sb.replaceAll("0","") won't do anything. You're throwing away the return value that contains your replaced string. See here.
What you probably want is something like this:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == '0') {
sb = sb.replaceAll("0"," ");
I'm not sure you need a while loop, though. ReplaceAll will... replace all of the zeros with spaces.

Understanding Java Integer.parseInt(String s, int radix) source code

I was looking at the source code for java.lang.Integer's parseInt method.
public static int parseInt(String s, int radix)
throws NumberFormatException
{
/*
* WARNING: This method may be invoked early during VM initialization
* before IntegerCache is initialized. Care must be taken to not use
* the valueOf method.
*/
if (s == null) {
throw new NumberFormatException("s == null");
}
if (radix < Character.MIN_RADIX) {
throw new NumberFormatException("radix " + radix +
" less than Character.MIN_RADIX");
}
if (radix > Character.MAX_RADIX) {
throw new NumberFormatException("radix " + radix +
" greater than Character.MAX_RADIX");
}
int result = 0;
boolean negative = false;
int i = 0, len = s.length();
int limit = -Integer.MAX_VALUE;
int multmin;
int digit;
if (len > 0) {
char firstChar = s.charAt(0);
if (firstChar < '0') { // Possible leading "+" or "-"
if (firstChar == '-') {
negative = true;
limit = Integer.MIN_VALUE;
} else if (firstChar != '+')
throw NumberFormatException.forInputString(s);
if (len == 1) // Cannot have lone "+" or "-"
throw NumberFormatException.forInputString(s);
i++;
}
multmin = limit / radix;
while (i < len) {
// Accumulating negatively avoids surprises near MAX_VALUE
digit = Character.digit(s.charAt(i++),radix);
if (digit < 0) {
throw NumberFormatException.forInputString(s);
}
if (result < multmin) {
throw NumberFormatException.forInputString(s);
}
result *= radix;
if (result < limit + digit) {
throw NumberFormatException.forInputString(s);
}
result -= digit;
}
} else {
throw NumberFormatException.forInputString(s);
}
return negative ? result : -result;
}
I can see that the multmin is somehow being used to detect Integer overflow on both the negative and positive sides. But I am having a hard time understanding how.
I also do not understand why we are keeping the result negative while calculating and then making it positive at the end if it was not detected as a negative number.

This method is designed to throw an exception if s represents an integer that is outside the [Integer.MIN_VALUE, Integer.MAX_VALUE] i.e. [-2147483648, 2147483647] range.
The algorithm performs repeated multiplication and addition which could eventually lead to overflow. The algorithm avoids overflow by checking the operands in advance.
Checking for overflow
The simplest way of checking if result + digit will cause an overflow without actually adding them is to check:
if (result > limit - digit) // result, limit and digit are positive
The simplest way of checking if result * radix will cause an overflow without actually multiplying them is to check:
if (result > limit / radix) // result, limit and radix are positive
So this explains what limit = Integer.MAX... and multmin = limit / radix do.
Why "accumulating negatively"?
The algorithm separates out the sign and operates on remaining digits (it is easier to deal with one case). One special case it must handle is that of -2147483648; in which case the limit must be set to 2147483648 which is outside the range of Integer.
With negative accumulation, the limit could be set to -2147483648. Note that "if" conditions described above must be adjusted for negative numbers as follows:
if (result < limit + digit) // result and limit are negative
if (result < limit / radix) // result and limit are negative
Here is a rough outline of that happens inside the algorithm at each step:
// parseInt("123", 10)
limit: -2147483647 (-Integer.MAX_VALUE)
multmin: -214748364
result: -1
result: -12
result: -123
// parseInt("2147483648", 10)
limit: -2147483647 (-Integer.MAX_VALUE)
multmin: -214748364
result: -2
result: -21
result: -214
result: -2147
result: -21474
result: -214748
result: -2147483
result: -21474836
result: -214748364
result: Overflow (after multiplication, before subtraction)

How does multmin work?
multmin is used in below code:
if (result < multmin) {
throw NumberFormatException.forInputString(s);
}
If current result is less than multmin, next generation result
must overflow, so an exception is thrown:
if result < multmin,
------> result < limit / radix (beacause multmin = limit / radix)
------> result * radix < limit
------> result * radix - digit < limit (overflow).
If current result is greater than or equals multmin, we can
assert result * radix >= limit not overflow, so continue check if result * radix - digit overflow with:
if (result < limit + digit) {
throw NumberFormatException.forInputString(s);
}
Why use negative?
Because the absolute value of Integer.MIN_VALUE(-2147483648) is greater than Integer.MAX_VALUE (2147483647).
Suppose we have a POSITIVE version, when input number start with '+', we can set limit as Integer.MAX_VALUE.
But, when input number start with '-', we can not set limit as 2147483648, it's an overflow value.

Java integer overflow

I'm totally new to Java and I'm implementing a simple function to convert string to integer on Leetcode.
public int myAtoi(String str) {
if(str.length() == 0){
return 0;
}
str = str.trim();
int n = str.length();
int signal = 0;
if(n == 1 && str.equals("+") || str.equals("-")){
return 0;
}
if(str.charAt(0) == '+'){
signal = 1;
}else if(str.charAt(0) == '-'){
signal = -1;
}
int i = (signal != 0)? 1 : 0;
if(signal == 0){
signal = 1;//default
}
int res = 0;
while(i < n){
char c = str.charAt(i);
if(!Character.isDigit(c)){
return res * signal;
}
//res = res * 10 + c - '0';
if(signal * res > Integer.MAX_VALUE){
return Integer.MAX_VALUE;
}
if(signal * res < Integer.MIN_VALUE){
return Integer.MIN_VALUE;
}
res = res * 10 + c - '0';
++i;
}
return res * signal;
}
I know java integer has the MAX_VALUE of 2147483647. When my input is 2147483648 the output should be 2147483647 but indeed it's -214748648. I really have no idea what's wrong in here. Can anybody help me to understand this?

Consider this example
public static void main(String args[]) {
int i=2147483647;
System.out.println("i="+i);
int j=++i;
System.out.println("Now i is="+i);
System.out.println("j="+j);
}
What happens?
output will be :
i = 2147483647
Now i is=-2147483648
j=-2147483648
The maximum value of integer is 2,147,483,647 and the minimum value is -2,147,483,648. Here in j (with post increment of i), we have crossed the maximum limit of an integer
This is exactly what is happening in your case too .
Because the integer overflows. When it overflows, the next value is Integer.MIN_VALUE
Why?
Integer values are represented in binary form, and there is binary addition in java. It uses a representation called two's complement, in which the first bit of the number represents its sign. Whenever you add 1 to the largest Integer(MAX INT), which has a bit sign of 0, then its bit sign becomes 1 and the number becomes negative.
So, don't put > MAX INT as input, else put a condition in your code to check it on input itself.

The input is never +2147483648 since that value can't be represented as a Java int.
It will wrap around to the negative number you observe, so accounting for that result.

How does Integer.toString() works internally?

I found that a similar question has been asked before here : how does Float.toString() and Integer.toString() works?
But this doesn't speak about how that function internally works. When I opened the internally source code of Integer.toString(), it is not understandable for normal junior java programmer.
Can somebody please explain what happens internally in short description ?
NOTE : This was one of the interview questions that I was asked recently. I had no idea about how to answer such question !

The no arg call of integer.toString() simply calls the static method Integer.toString(int i) (using the integer variables own primitive value), which is implemented as below;
public static String toString(int i) {
if (i == Integer.MIN_VALUE)
return "-2147483648";
int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
char[] buf = new char[size];
getChars(i, size, buf);
return new String(0, size, buf);
}
First it checks whether it's value is == the lowest possible integer, and returns that if it is equal. If not, then it checks what size the String needs to be using the stringSize() method of Integer to use as the size of an array of characters.
stringSize() implementation below;
static int stringSize(int x) {
for (int i=0; ; i++)
if (x <= sizeTable[i])
return i+1;
}
Once it has a char[] of the correct size, it then populates that array using the getChars() method, implemented below;
static void getChars(int i, int index, char[] buf) {
int q, r;
int charPos = index;
char sign = 0;
if (i < 0) {
sign = '-';
i = -i;
}
// Generate two digits per iteration
while (i >= 65536) {
q = i / 100;
// really: r = i - (q * 100);
r = i - ((q << 6) + (q << 5) + (q << 2));
i = q;
buf [--charPos] = DigitOnes[r];
buf [--charPos] = DigitTens[r];
}
// Fall thru to fast mode for smaller numbers
// assert(i <= 65536, i);
for (;;) {
q = (i * 52429) >>> (16+3);
r = i - ((q << 3) + (q << 1)); // r = i-(q*10) ...
buf [--charPos] = digits [r];
i = q;
if (i == 0) break;
}
if (sign != 0) {
buf [--charPos] = sign;
}
}
Explaining each individual step would take far too long for for a stackoverflow answer. The most pertinent section however (as pointed out in the comments) is the getChars() method which, complicated bit shifting aside, is essentially process of elimination for finding each character. I am afraid I can't go into any greater detail than that without going beyond my own understanding.

Atoi in Java for negative values

I am writing an Atoi function in Java. It runs fine for +ve integers. But what I want is when I enter a negative integer it should give me an error. So I tried including continue statement in my class Atoi. The class implemented is:
class Atoi {
int atoi(String tmp) {
int result = 0;
for (int i = 0; i < tmp.length(); i++) {
char digit = (char)(tmp.charAt(i) - '0');
if(digit == '-')
continue;
}
else {
result += (digit * Math.pow(10, (tmp.length() - i - 1)));
}
return result;
}
}
But unfortunately it gives me the negative equivalent of the character i.e for -12 it gives me 655312! Help.
EDIT: Suppose I need to check for floating point numbers what should I do? If I enter 12.1 or 123.2 it should return 12.1 and 123.2 repectively!!

Instead of continue you should give an error (throw an exception, return -1 or whatever you mean with "give an eror").
If you want to ignore the - you can change the else clause to:
result = digit + result * 10;

Quick fix for the obvious problem: the order of the logic was wrong...
Instead of
char digit = (char)(tmp.charAt(i) - '0');
if(digit=='-')
continue;
try
char origChar=tmp.charAt(i);
if(origChar=='-')
continue;
char digit = (char)(origChar - '0');
But there are two more problems:
it does not negate the value, in case of a '-' character is present!
what if this is the input string: -1-2-3-4-5? The result will be interesting! EDIT: try this input also: 'répa'... Even more interesting result!
Don't forget to test with incorrect inputs too, and as #Klaus suggested, don't hesitate to throw an exception, (preferably IllegalArgumentException) with a correct error message, if an incorrect input is given to the function...

If this is not being done as a programming exercise, there is a simpler solution:
static int atoi(String tmp)
{
int result = Integer.parseInt(tmp);
if(result >= 0) {
return result;
} else {
throw new IllegalArgumentException("Negative string "+"\"" + tmp + "\"");
}
}
Substitute the appropriate exception or other action in the negative result case. If you want to just ignore '-', as in the posted code, replace the if-then-else with:
return Math.abs(result);
This code also throws an exception for strings like "abc".
More generally, if a library method does not do exactly what you want, it is often easy to use it in a method that modifies its behavior, rather than re-writing it.

You can write code like this, of course, but you need to check that tmp is a valid number.
int atoi(String tmp) {
int result = 0;
int factor = tmp.charAt(0) == "-" ? -1 : 1;
for (int i = 0; i < tmp.length(); i++) {
if (tmp.chatAt(i) < '0' || tmp.chatAt(i) > '9')
continue;
char digit = (char)(tmp.charAt(i) - '0');
result += (digit * Math.pow(10, (tmp.length() - i - 1)));
}
return result * factor;
}

if(digit=='-')
With
(char)(tmp.charAt(i)
You're code is assuming there are no -'s
(char)(tmp.charAt(i) - '0');
Is an optimization that's blindly clamping the 'digit' variable to a number.
You need to step through what your code is actually doing, search for an ASCII chart and work through what the subtractions of '0' does ('0' == 48), so '1' (49) - '0' (48) = 1 etc...

If you don't want to convert negative numbers then simply return 0 whenever you encounter - sign instead of looping further. Put this code before the if-else block.
if(tmp.charAt(i)=='-')
return 0;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficient Parsing of Byte Array to Number - java

Related

How can I loop through all the characters that are 0 in a given string?

Understanding Java Integer.parseInt(String s, int radix) source code

Java integer overflow

How does Integer.toString() works internally?

Atoi in Java for negative values

Categories

Resources