After reading the base64 wiki ...
I'm trying to figure out how's the formula working :
Given a string with length of n , the base64 length will be
Which is : 4*Math.Ceiling(((double)s.Length/3)))
I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.
The max number of padding for a sequence can be = or ==.
wiki :The number of output bytes per input byte is approximately 4 / 3 (33%
overhead)
Question:
How does the information above settle with the output length ?
Each character is used to represent 6 bits (log2(64) = 6).
Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.
So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.
The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.
4 * n / 3 gives unpadded length.
And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.
((4 * n / 3) + 3) & ~3
For reference, the Base64 encoder's length formula is as follows:
As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula for padding is 4(Ceiling(n/3)).
The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.
You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.
Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.
Integers
Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.
For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).
Readable
If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):
public static int ceilDiv(int x, int y) {
return (x + y - 1) / y;
}
public static int paddedBase64(int n) {
int blocks = ceilDiv(n, 3);
return blocks * 4;
}
public static int unpaddedBase64(int n) {
int bits = 8 * n;
return ceilDiv(bits, 6);
}
// test only
public static void main(String[] args) {
for (int n = 0; n < 21; n++) {
System.out.println("Base 64 padded: " + paddedBase64(n));
System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
}
}
Inlined
Padded
We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):
blocks = (bytes + 3 - 1) / 3
chars = blocks * 4
or combined:
chars = ((bytes + 3 - 1) / 3) * 4
your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.
Unpadded
Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:
bits = bytes * 8
chars = (bits + 6 - 1) / 6
or combined:
chars = (bytes * 8 + 6 - 1) / 6
we can however still divide by two (if we want to):
chars = (bytes * 4 + 3 - 1) / 3
Unreadable
In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):
Padded
((n + 2) / 3) << 2
Unpadded
((n << 2) | 2) / 3
So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.
Notes:
Obviously you may need to add 1 to the calculations to include a null termination byte.
For Mime you may need to take care of possible line termination characters and such (look for other answers for that).
(In an attempt to give a succinct yet complete derivation.)
Every input byte has 8 bits, so for n input bytes we get:
n × 8 input bits
Every 6 bits is an output byte, so:
ceil(n × 8 / 6) = ceil(n × 4 / 3) output bytes
This is without padding.
With padding, we round that up to multiple-of-four output bytes:
ceil(ceil(n × 4 / 3) / 4) × 4 = ceil(n × 4 / 3 / 4) × 4 = ceil(n / 3) × 4 output bytes
See Nested Divisions (Wikipedia) for the first equivalence.
Using integer arithmetics, ceil(n / m) can be calculated as (n + m – 1) div m,
hence we get:
(n * 4 + 2) div 3 without padding
(n + 2) div 3 * 4 with padding
For illustration:
n with padding (n + 2) div 3 * 4 without padding (n * 4 + 2) div 3
------------------------------------------------------------------------------
0 0 0
1 AA== 4 AA 2
2 AAA= 4 AAA 3
3 AAAA 4 AAAA 4
4 AAAAAA== 8 AAAAAA 6
5 AAAAAAA= 8 AAAAAAA 7
6 AAAAAAAA 8 AAAAAAAA 8
7 AAAAAAAAAA== 12 AAAAAAAAAA 10
8 AAAAAAAAAAA= 12 AAAAAAAAAAA 11
9 AAAAAAAAAAAA 12 AAAAAAAAAAAA 12
10 AAAAAAAAAAAAAA== 16 AAAAAAAAAAAAAA 14
11 AAAAAAAAAAAAAAA= 16 AAAAAAAAAAAAAAA 15
12 AAAAAAAAAAAAAAAA 16 AAAAAAAAAAAAAAAA 16
Finally, in the case of MIME Base64 encoding, two additional bytes (CR LF) are needed per every 76 output bytes, rounded up or down depending on whether a terminating newline is required.
Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:
private Double calcBase64SizeInKBytes(String base64String) {
Double result = -1.0;
if(StringUtils.isNotEmpty(base64String)) {
Integer padding = 0;
if(base64String.endsWith("==")) {
padding = 2;
}
else {
if (base64String.endsWith("=")) padding = 1;
}
result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
}
return result / 1000;
}
I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.
The answer is (floor(n / 3) + 1) * 4 + 1
This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.
Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.
For all people who speak C, take a look at these two macros:
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1)
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4)
Taken from here.
While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c
525
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c
710
So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.
I don't see the simplified formula in other responses. The logic is covered but I wanted a most basic form for my embedded use:
Unpadded = ((4 * n) + 2) / 3
Padded = 4 * ((n + 2) / 3)
NOTE: When calculating the unpadded count we round up the integer division i.e. add Divisor-1 which is +2 in this case
Seems to me that the right formula should be:
n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)
I believe that this one is an exact answer if n%3 not zero, no ?
(n + 3-n%3)
4 * ---------
3
Mathematica version :
SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]
Have fun
GI
Simple implementantion in javascript
function sizeOfBase64String(base64String) {
if (!base64String) return 0;
const padding = (base64String.match(/(=*)$/) || [])[1].length;
return 4 * Math.ceil((base64String.length / 3)) - padding;
}
If there is someone interested in achieve the #Pedro Silva solution in JS, I just ported this same solution for it:
const getBase64Size = (base64) => {
let padding = base64.length
? getBase64Padding(base64)
: 0
return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}
const getBase64Padding = (base64) => {
return endsWith(base64, '==')
? 2
: 1
}
const endsWith = (str, end) => {
let charsFromEnd = end.length
let extractedEnd = str.slice(-charsFromEnd)
return extractedEnd === end
}
In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:
Mine64 string allocation size (approximate)
= (((4 * ((binary buffer size) + 1)) / 3) + 1)
So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.
In my Java application I need to interpret a 32 Bit Fixed Point value. The number format is as follows: The first 15 bits describe the places before the comma/point, the 16th bit represents the sign of the value and the following 16 bits describe the decimal places (1/2,1/4,1/8,1/16,...).
The input is a byte array with four values. The order of the bits in the byte array is little endian.
How can I convert such a number into Java float ?
Just do exactly what it says. Assume x is the 32bit fixed point number as int.
So, put the bits after the point, after the point, and don't use the sign here:
float f = (float)(x & 0x7fff_ffff) / (float)(1 << 16);
Put back the sign:
return x < 0 ? -f : f;
You will lose some precision. A float does not have 31 bits of precision, but your input does. It's easily adapted to doubles though.
Since the sign bit is apparently really in the middle, first get it out:
int sign = x & (1 << 16);
Join the two runs of non-sign bits:
x = (x & 0xFFFF) | ((x >> 1) & 0x7fff0000);
Then do more or less the old thing:
float f = (float)x / (float)(1 << 16);
return sign == 0 ? f : -f;
In case your input is little endian format, use the following approach to generate x:
int x = ByteBuffer.wrap(weirdFixedPoint).order(ByteOrder.LITTLE_ENDIAN).getInt();
where weirdFixedPoint is the byte array containing the 32 bit binary representation.
I'm trying to make a new character/integer. All I know about that char/int is only the first 6 bits. I have a variable called
number
It's quite a large number and it is formed from 24 bits. With this number I want to make use of the toBinaryString method
bits = Integer.toBinaryString(number);
So now I have a variable bits containing the bits from my variable number. At this moment I want to divide this string into 4 so I am left with 4, 6 characters strings which will represent my bits for my integer/character. How would I go about creating a number or a character knowing these bits?
Just to make sure I explain it in every detail let me give you an example:
I have
number = "abc" // 011000010110001001100011 as binary representation
Now I want to create a new integer with the first 6 bits (011000). Another integer with the following 6 bits (010110) and so on...
Why do you want it as a string? It sounds like you really just want bit-shifting operations:
number = ...;
int bottomBits = number & 0x3f;
int middleBits = (number >>> 6) & 0x3f;
int upperBits = (number >>> 12) & 0x3f;
So bottomBits is the least-significant 6 bits, then middleBits, then upperBits (the most-significant bits, so the first 6 bits in your binary string).
I know there are N threads for this question but some people are using different and different methods to convert a byte to int. Is this correct what am I writing? Hex to int or hex to decimal? Which one is the correct?
Anyway, why I'm getting 4864 instead of 19 ?
byte[] buffer = ....
buffer[51] = 0x13;
System.out.println( buffer[51] << 8 );
Is this correct what am I writing?
The code you've posted does implicit conversion of int to String, but that will display it in decimal. It's important to understand that a number isn't in either hex or decimal - it's just a number. The same number can be converted to different textual representations, and that's when the base matters. Likewise you can express the same number with different literals, so these two statements are exactly equivalent:
int x = 16;
int x = 0x10;
Anyway, why I'm getting 4864 instead of 19
Because you're explicitly shifting the value left 8 bits:
buffer[51] << 8
That's basically multiplying by 256, and 19 * 256 is 4864.
you are getting 4864 as a result because 4864 is 0x1300 in hex.
if you are expecting 19(0x13) as result then I guess you are trying to do circular shifting.
you can do that using writing like that,
/*hex 0x13 (19 in decimal) is assigned to buffer[51] as int*/
buffer[51] = 0x13;
System.out.println( Integer.rotateRight(buffer[51], 8));
I am unable to understand the following:
In java,
long l = 130L;
byte b = (byte)l;
If I print the value of b, why do I get -126? What is the bit representation of long l?
A byte is a sequence of 8 bits, which makes 2^8 cases = 256. Half of them represent negative numbers, which is -128 to -1. Then there is the 0, and about the half, 1 to 127 represent the positive numbers.
130 as Int looks like 128 + 2 which is:
0000:0000 1000:0000 (128)
0000:0000 0000:0010 (2)
0000:0000 1000:0010 (130)
However, the Byte has just 8 digits, and the assignment takes just the bits as they are, but just the last ones:
1000:0010
The first bit indicates, it is a negative number. Now how much do you need to add to get to zero? Let's do it stepwise:
1000:0010 x +
0000:0001 1 =
----------------
1000:0011 (x+1)
1000:0011 (x+1) +
0000:0001 1 =
----------------
1000:0100 (x+2)
Lets do bigger steps. Just add 1s where we have zeros, but first we go back to x:
1000:0010 x +
0111:1101 y =
--------------
1111:1111
Now there is the turning point: we add another 1, and get zero (plus overflow)
1111:1111 (x + y) +
0000:0001 1
---------
0000:0000 0
If (x+y) + 1 = 0, x+y = -1. A minus 1 is, interestingly, not just the same as 1 (0000:0001) with a 'negative-flag' set ('1000:0001'), but looks completely different. However, the first position always tells you the sign: 1 always indicates negative.
But what did we add before?
0111:1101 y = ?
It doesn't have a 1 at the first position, so it is a positive value. We know how to deconstruct that?
..f:8421 Position of value (1, 2, 4, 8, 16=f, 32, 64 in opposite direction)
0111:1101 y = ?
..f 84 1 = 1+4+8+16+32+64= 125
And now it's clear: x+125 = -1 => x = -126
You may imagine the values, organized in a circle, with the 0 at the top (high noon) and positive values arranged like on a clock from 0 to 5 (but to 127), and the turning point at the bottom (127 + 1 => -128 [sic!].) Now you can go on clockwise, adding 1 leads to -127, -126, -125, ... -3, -2, -1 (at 11 o'clock) and finally 0 at the top again.
For bigger numbers (small, int, long) take bigger clocks, with the zero always on top, the maximum and minimum always on bottom. But even a byte is much too big, to make a picture, so I made one of a nibble, a half-byte:
You can easily fill the holes in the picture, it's trivial!
Btw.: the whole thing isn't called casting. Casting is only used between Objects. If you have something, which is in real a subtype:
Object o = new String ("casting or not?");
this is just an assignment, since a String is (always) an Object. No casting involved.
String s = (String) o;
This is a casting. To the more specific type. Not every object is a String. There is a small relationship to integer promotion, since every byte can be lossless transformed to long, but not every long to byte. However, even Byte and Long, the Object-types, aren't inherited from each other.
You just don't get a warning, for
byte s = (byte) 42;
long o = s; // no problem, no warning
byte b = (byte) o; // written like casting
Bytes are signed in Java - so the range of values is -128 to 127 inclusive.
The bit pattern for 130 as a long, when simply truncated to 8 bits, is the bit pattern for -126 as a byte.
As another example:
int x = 255;
byte b = (byte) x; // b is now -1
You mean byte b = (byte)l?
Java's types are signed, so bytes allow numbers between -128 and +127.
For beginners to understand:
1 byte = 8 bits
Range (derived from 2's complement no. system) = [-2^(n-1) to 2^(n-1)-1], where n is no. of bits
So range is -128 to 127
Whenever value is incremented more than highest possible +ve value, the flow goes to the lowest possible -ve value.
So after value reaches 127 , flow continues from -128 to -127 to -126
to cover a total space of 130 and the thus o/p is -126