Big Endian Hex Number conversation - java

I have the following 4 bytes of Hex String (38 01 02 00) and the expected output is (201.38) Decimal, Like the first input is reversed.
If (89 00 00 00) is given then the expected outcome would be (0.89)
I don't know the mathematical name of this conversation.
I have tried converting Big-endian to little-endian but the outcome has failed with (00020138).
I have tried writing a simple method but the outcome is still wrong (1492992770)
int htonl(final int value) {
return ByteBuffer
.allocate(4)
.putInt(value).order(ByteOrder.nativeOrder())
.getInt(0);
}

It seems to be BCD encoding (Binary Coded Decimal) where each 4-byte group represents one decimal place.
And the byte-order is little endian - it's beginning with the least significant byte.
All I see suggests that the decimal point is in a fixed position, but that's just a guess.
So, the result of your first try comes close. Just divide it by 100, and you're done.

Solved it with the help of #Ralf Kleberhoff
I reversed the hex input first
Divided the output with 100
public static String reverse(final String originalHex) {
final int lengthInBytes = originalHex.length() / 2;
final char[] chars = new char[lengthInBytes * 2];
for (int index = 0; index < lengthInBytes; index++) {
final int reversedIndex = lengthInBytes - 1 - index;
chars[reversedIndex * 2] = originalHex.charAt(index * 2);
chars[reversedIndex * 2 + 1] = originalHex.charAt(index * 2 + 1);
}
return new String(chars);
}
final float f = Float.parseFloat(reverse("89000000"));
System.out.println(f / 100);
Given input is 89000000 and result was 0.89 which was expected.

Related

What is the Base64 size in bytes of a byte array in Java? [duplicate]

After reading the base64 wiki ...
I'm trying to figure out how's the formula working :
Given a string with length of n , the base64 length will be
Which is : 4*Math.Ceiling(((double)s.Length/3)))
I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.
The max number of padding for a sequence can be = or ==.
wiki :The number of output bytes per input byte is approximately 4 / 3 (33%
overhead)
Question:
How does the information above settle with the output length ?
Each character is used to represent 6 bits (log2(64) = 6).
Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.
So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.
The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.
4 * n / 3 gives unpadded length.
And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.
((4 * n / 3) + 3) & ~3
For reference, the Base64 encoder's length formula is as follows:
As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula for padding is 4(Ceiling(n/3)).
The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.
You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.
Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.
Integers
Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.
For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).
Readable
If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):
public static int ceilDiv(int x, int y) {
return (x + y - 1) / y;
}
public static int paddedBase64(int n) {
int blocks = ceilDiv(n, 3);
return blocks * 4;
}
public static int unpaddedBase64(int n) {
int bits = 8 * n;
return ceilDiv(bits, 6);
}
// test only
public static void main(String[] args) {
for (int n = 0; n < 21; n++) {
System.out.println("Base 64 padded: " + paddedBase64(n));
System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
}
}
Inlined
Padded
We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):
blocks = (bytes + 3 - 1) / 3
chars = blocks * 4
or combined:
chars = ((bytes + 3 - 1) / 3) * 4
your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.
Unpadded
Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:
bits = bytes * 8
chars = (bits + 6 - 1) / 6
or combined:
chars = (bytes * 8 + 6 - 1) / 6
we can however still divide by two (if we want to):
chars = (bytes * 4 + 3 - 1) / 3
Unreadable
In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):
Padded
((n + 2) / 3) << 2
Unpadded
((n << 2) | 2) / 3
So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.
Notes:
Obviously you may need to add 1 to the calculations to include a null termination byte.
For Mime you may need to take care of possible line termination characters and such (look for other answers for that).
(In an attempt to give a succinct yet complete derivation.)
Every input byte has 8 bits, so for n input bytes we get:
n × 8      input bits
Every 6 bits is an output byte, so:
ceil(n × 8 / 6)  =  ceil(n × 4 / 3)      output bytes
This is without padding.
With padding, we round that up to multiple-of-four output bytes:
ceil(ceil(n × 4 / 3) / 4) × 4  =  ceil(n × 4 / 3 / 4) × 4  =  ceil(n / 3) × 4      output bytes
See Nested Divisions (Wikipedia) for the first equivalence.
Using integer arithmetics, ceil(n / m) can be calculated as (n + m – 1) div m,
hence we get:
(n * 4 + 2) div 3      without padding
(n + 2) div 3 * 4      with padding
For illustration:
n with padding (n + 2) div 3 * 4 without padding (n * 4 + 2) div 3
------------------------------------------------------------------------------
0 0 0
1 AA== 4 AA 2
2 AAA= 4 AAA 3
3 AAAA 4 AAAA 4
4 AAAAAA== 8 AAAAAA 6
5 AAAAAAA= 8 AAAAAAA 7
6 AAAAAAAA 8 AAAAAAAA 8
7 AAAAAAAAAA== 12 AAAAAAAAAA 10
8 AAAAAAAAAAA= 12 AAAAAAAAAAA 11
9 AAAAAAAAAAAA 12 AAAAAAAAAAAA 12
10 AAAAAAAAAAAAAA== 16 AAAAAAAAAAAAAA 14
11 AAAAAAAAAAAAAAA= 16 AAAAAAAAAAAAAAA 15
12 AAAAAAAAAAAAAAAA 16 AAAAAAAAAAAAAAAA 16
Finally, in the case of MIME Base64 encoding, two additional bytes (CR LF) are needed per every 76 output bytes, rounded up or down depending on whether a terminating newline is required.
Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:
private Double calcBase64SizeInKBytes(String base64String) {
Double result = -1.0;
if(StringUtils.isNotEmpty(base64String)) {
Integer padding = 0;
if(base64String.endsWith("==")) {
padding = 2;
}
else {
if (base64String.endsWith("=")) padding = 1;
}
result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
}
return result / 1000;
}
I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.
The answer is (floor(n / 3) + 1) * 4 + 1
This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.
Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.
For all people who speak C, take a look at these two macros:
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1)
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4)
Taken from here.
While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c
525
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c
710
So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.
I don't see the simplified formula in other responses. The logic is covered but I wanted a most basic form for my embedded use:
Unpadded = ((4 * n) + 2) / 3
Padded = 4 * ((n + 2) / 3)
NOTE: When calculating the unpadded count we round up the integer division i.e. add Divisor-1 which is +2 in this case
Seems to me that the right formula should be:
n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)
I believe that this one is an exact answer if n%3 not zero, no ?
(n + 3-n%3)
4 * ---------
3
Mathematica version :
SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]
Have fun
GI
Simple implementantion in javascript
function sizeOfBase64String(base64String) {
if (!base64String) return 0;
const padding = (base64String.match(/(=*)$/) || [])[1].length;
return 4 * Math.ceil((base64String.length / 3)) - padding;
}
If there is someone interested in achieve the #Pedro Silva solution in JS, I just ported this same solution for it:
const getBase64Size = (base64) => {
let padding = base64.length
? getBase64Padding(base64)
: 0
return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}
const getBase64Padding = (base64) => {
return endsWith(base64, '==')
? 2
: 1
}
const endsWith = (str, end) => {
let charsFromEnd = end.length
let extractedEnd = str.slice(-charsFromEnd)
return extractedEnd === end
}
In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:
Mine64 string allocation size (approximate)
= (((4 * ((binary buffer size) + 1)) / 3) + 1)
So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.

java.nio.BufferOverflowException constructing byte[]

I'm building a byte array to identify an M-Bus Master and i need to do it using the secondary address.
To do it i need to build a byte[] with the identification of the address:
Identification Number (4 bytes) – A number ranging from 00000000 to
99999999 to identify the meter.
Manufacturer ID (2 bytes) – Three letters that identify the
manufacturer.
Version (1 byte) – Specifies the version of the device. The version
is manufacturer specific.
Device type (1 byte) – This field codes the device type (e.g.
electricity meter, cold water meter)
If my math is not failing me this has a total of 8 bytes.
So here is my code to do it:
public static void main(String[] args) {
// TODO code application logic here
MBusSerialBuilder builder = MBusConnection.newSerialBuilder("COM4").setBaudrate(2400);
try (MBusConnection mBusConnection = builder.build()) {
// read/write
int primaryAddress = 253;
byte[] idNum = ByteBuffer.allocate(4).putInt(46152604).array();
byte version = 0xFF & 88; //only need a byte not a 4 byte int
byte deviceType = 0xFF & 13; //only need a byte not a 4 byte int
short manuID = createManuID("ZRI");
//builds the message without overflow now
byte[] data = ByteBuffer.allocate(8).put(idNum).putShort(manuID).put(version).put(deviceType).array();
mBusConnection.write(primaryAddress, data);
VariableDataStructure vds = mBusConnection.read(primaryAddress);
System.out.println(vds);
} catch (IOException ex) {
System.out.println(ex.getLocalizedMessage());
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
Note previouslly i had
byte[] manId = ByteBuffer.allocate(2).putChar('Z').putChar('R').putChar('I').array();
And it was returning me java.nio.BufferOverflowException.
With the recent changes the error is now on data declaration.
Even if i alloc 50 bytes
byte[] data = ByteBuffer.allocate(50).put(idNum).put(manId).putInt(88).putInt(13).array();
Diferent error
java.lang.IndexOutOfBoundsException
Here is some info i extracted from a log file of the seller's program.
MBus Tx_raw-><11><68><b><b><68><53><fd><52><4><26><15><46><ff><ff><ff><ff><23><16>
MBus Rx_raw-><0><aa><1><e5><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0>
MBus Tx_raw-><5><10><7b><fd><78><16>
MBus Rx_raw-><0><aa><b7><68><b1><b1><68><8><0><72><4><26><15><46><49><6a><88><d><29><0><0><0><4><6d><34><a><27><2c><82><4><6c><21><21><82><a><6c><21><2c><4><6><0><0><0><0><84><4><6><0><0><0><0><84><a><6><0><0><0><0><4><13><4a><30><0><0>
MBus consecutive Frame [183]-><68><b1><b1><68><8><0><72><4><26><15><46><49><6a><88><d><29><0><0><0><4><6d><34><a><27><2c><82><4><6c><21><21><82><a><6c><21><2c><4><6><0><0><0><0><84><4><6><0><0><0><0><84><a><6><0><0><0><0><4><13><4a><30><0><0><2><59><8a><7><2><5d><bc><7><2><61><ce><ff><4><3b><bf><2><0><0><4><2d><4><0><0><0><4><26><b><8><0><0><84><10><6><2><0><0><0><84><14><6><0><0><0><0><84><1a><6><0><0><0><0><84><40><14><c1><6><0><0><84><44><14><0><0><0><0><84><4a><14><a9><0><0><0><84><80><40><14><10><0><0><0><84><84><40><14><0><0><0><0><84><8a><40><14><0><0><0><0><84><c0><40><14><e3><0><0><0><84><c4><40><14><0><0><0><0><84><ca><40><14><0><0><0><0><1b><16>
Readout insert->INSERT INTO LETTURE_CONTATORI_TEMP VALUES(NULL,'OK','1','1','510','07/12/2017 10:16:23','','','1512641783','0','07/12/2017 10:52','01/01/2017','01/12/2017','0','0','0','12362','1930','1980','-50','703','4','2059','2','0','0','1729','0','169','16','0','0','227','0','0','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','');
Ok, so with an offset of 65 (A) to every character you can create a mapping so-to-speak of the letters to convert them to smaller values which will fit in the 2 bytes (16 bits) whereby the values range from 0 to 25 (0=A, 1=B..., 25=Z). Since this range requires at most 5 bits, and you have a maximum of 3 characters to convert, you only need 15 bits and can squeeze these into the 2 bytes (16 bits) required for the manufacturer id. All you have to do is apply a bit shift of 5 (size of the values) * the index of the character in your manufacturer id string.
Here is the method
public static short createManuID(String id)
{
int bitMashedManuID = 0;
id = id.toUpperCase(); //force the chars to be within 65-90
if(id.length() == 3)
{
short offset = 65; //A = 0, B = 1 ... Z = 25
//number bits needed to fit 0-25 and so values won't overlap during the bit mashing
short bitShift = 5;
for(int i = 0; i < id.length(); i++)
{
short valueOfChar = (short)id.charAt(i);
valueOfChar -= offset; //apply the offset
bitMashedManuID += valueOfChar << bitShift * i; //pack the bits
}
}
return (short)bitMashedManuID;
}
Example
Z = 90, apply the offset of 65 and we get 25 (11001)
So a manufacturer id of ZZZ should look like (11001|11001|11001) which equals 26425.
System.out.println(createManuID("ZZZ")); //outputs 26425
Your manufacturer id
Z = 90 - 65 = 25 = 11001
R = 82 - 65 = 17 = 10001
I = 73 - 65 = 8 = 01000
ZRI = |01000|10001|11001| = 8761
System.out.println(createManuID("ZRI")); //8761
Therefore when all is said and done you can create your byte array like this without overflow and satisfying the 8 byte array length requirement.
public static void main(String[] args)
{
byte[] idNum = ByteBuffer.allocate(4).putInt(46152604).array();
byte version = 0xFF & 88; //only need a byte not a 4 byte int
byte deviceType = 0xFF & 13; //only need a byte not a 4 byte int
short manuID = createManuID("ZRI");
//builds the message without overflow now
byte[] data = ByteBuffer.allocate(8).put(idNum).putShort(manuID).put(version).put(deviceType).array();
}
All that's left is determine the order of the letters are going to be packed in. Currently I pack them from right to left but depending on the device you are talking to it may require left to right which means you have the loop start at for(int i = id.length() - 1; i >= 0; i--)
The BB isn't large enough.
You don't need all this. Allocate one ByteBuffer large enough for all the data, and then call all the puts you need. Or use a DataOutputStream.

How to Develop a Hash function for traffic license numbers?

Develop a hash function to generate an index value between 0-4999 inclusive for a given traffic license number. Your hash function should generate as few as possible collisions. Hash function should use the properties of license numbers. Hash method should take the license number as a single String and return an index value. We assume that the license numbers to be in the following format: City code is a number between 10 and 99 inclusive. Three letters are any letter combination from English alphabet with 26 chars. Two digits number is a number between 10 and 99 inclusive.
I wrote something about this question but, collisions are a lot (1800 for 5k)
static long printValue(String s) {
long result = 0;
for (int i = 0; i < s.length(); i++) {
result += Math.pow(27, MAX_LENGTH - i - 1) * (1 + s.charAt(i) - 'A');
}
result = result % 5009;
return (int) result;
}
public int hashF(String str) {
String a = str.substring(0, 2);
String b = str.substring(5, 7);
String middle = str.substring(2, 5);
int q = (int) printValue(middle);
String last = a + q + b;
int index = Integer.parseInt(last);
index = index % 5009;
return index;
}
Link for orjinal file of licence numbers.
These are some examples from file of traffic licence number. Collisions must be 300 (maximum).
65HNM25
93DTV23
94WPX23
31RKK46
15YXX90
31MDV74
45BOG99
65JRM50
77VXR55
39TKY41
80MJU73
63QYE57
38FCO80
45ORI16
17CHN73
70SXR63
87CVM74
27EEE85
32PFJ91
50PBA66
70TVK72
15YLS20
80MPM74
21ZRN20
36VVE84
58IDW24
77VDC89
19BVK93
28SUF63
Your problem is not your code, but mathematics. Even a (perfect for you, but not very useful) hash code that produces consecutive hashes that are then mod 5000, ie
10AAA10 -> 0
10AAA11 -> 1
... etc
99ZZZ99 -> 600 (90 * 26 * 26 * 26 * 90) % 5000
will statistically produce over 1800 collisions and is no better than the simplest implementation, which is to use String's hashCode:
int hash = Math.abs(number.hashCode() % 5000);
It's a silly exercise, as it has no real world use.
Your split of the license plate into 3 parts is fine. But converting the middle to a number, hashing it, then adding the two outside strings, converting that all to an integer, and then finally executing a modulo on that is ... awkward.
I would start off with converting the prefix (10-99) to an integer, and then subtracting 10 to get the range 0-89.
Then, for each letter, I'd multiply the result by 26, and add the index of the letter (0-25).
Third, I'd multiply the whole result by 90 (the range of the final part), convert the final 2 characters to an integer, subtract 10 to convert the 10-99 range to 0-89, and add to the result from earlier.
Finally, mod the result with 5000 to get to required 0-4999 range.
Pseudo code:
result = toInt(prefix) - 10
foreach letter in middle:
result = result * 26 + ( letter - 'A' )
result = result * 90 + ( toInt(suffix) - 10)
result = result % 5000

First Byte's Bit Off-By-One after Steganography

Currently working on a Steganography project where, given a message in bytes and the number of bits to modify per byte, hide a message in an arbitrary byte array.
In the first decoded byte of the resulting message, the value has it's first (leftmost) bit set to '1' instead of '0'. For example, when using message "Foo".getBytes() and maxBits = 1 the result is "Æoo", not "Foo" (0b01000110 gets changed to 0b11000110). With message "Æoo".getBytes() and maxBits = 1 result is "Æoo", meaning the bit is not getting flipped as far as I can tell.
Only certain values of maxBits for certain message bytes cause this error, for example "Foo" encounters this problem at maxBits equal to 1, 5, and 6, whereas "Test" encounters this problem at maxBits equal to 1, 3, and 5. Only the resulting first character ends up with its first bit set, and this problem only occurs at the specified values of this.maxBits related to the initial data.
Why, for certain values of maxBits, is the first bit of the
resulting decoded message always 1?
Why do different inputs have different values for maxBits that
work fine, and others that do not?
What is the pattern with the value of maxBits and the
resulting erroneous results in relation to the original data?
Encode and Decode Methods:
public byte[] encodeMessage(byte[] data, byte[] message) {
byte[] encoded = data;
boolean[] messageBits = byteArrToBoolArr(message);
int index = 0;
for (int x = 0; x < messageBits.length; x++) {
encoded[index] = messageBits[x] ? setBit(encoded[index], x % this.maxBits) : unsetBit(encoded[index], x % this.maxBits);
if (x % this.maxBits == 0 && x != 0)
index++;
}
return encoded;
}
public byte[] decodeMessage(byte[] data) {
boolean[] messageBits = new boolean[data.length * this.maxBits];
int index = 0;
for (int x = 0; x < messageBits.length; x++) {
messageBits[x] = getBit(data[index], x % this.maxBits);
if (x % this.maxBits == 0 && x != 0)
index++;
}
return boolArrToByteArr(messageBits);
}
Unset, Set, and Get Methods:
public byte unsetBit(byte data, int pos) {
return (byte) (data & ~((1 << pos)));
}
public byte setBit(byte data, int pos) {
return (byte) (data | ((1 << pos)));
}
public boolean getBit(byte data, int pos) {
return ((data >>> pos) & 0x01) == 1;
}
Conversion Methods:
public boolean[] byteArrToBoolArr(byte[] b) {
boolean bool[] = new boolean[b.length * 8];
for (int x = 0; x < bool.length; x++) {
bool[x] = false;
if ((b[x / 8] & (1 << (7 - (x % 8)))) > 0)
bool[x] = true;
}
return bool;
}
public byte[] boolArrToByteArr(boolean[] bool) {
byte[] b = new byte[bool.length / 8];
for (int x = 0; x < b.length; x++) {
for (int y = 0; y < 8; y++) {
if (bool[x * 8 + y]) {
b[x] |= (128 >>> y);
}
}
}
return b;
}
Sample Code and Output:
test("Foo", 1);//Æoo
test("Foo", 2);//Foo
test("Foo", 3);//Foo
test("Foo", 4);//Foo
test("Foo", 5);//Æoo
test("Foo", 6);//Æoo
test("Foo", 7);//Foo
test("Foo", 8);//Foo
test("Test", 1);//Ôest
test("Test", 2);//Test
test("Test", 3);//Ôest
test("Test", 4);//Test
test("Test", 5);//Ôest
test("Test", 6);//Test
test("Test", 7);//Test
test("Test", 8);//Test
private static void test(String s, int x) {
BinaryModifier bm = null;
try {
bm = new BinaryModifier(x);//Takes maxBits as constructor param
} catch (BinaryException e) {
e.printStackTrace();
}
System.out.println(new String(bm.decodeMessage(bm.encodeMessage(new byte[1024], s.getBytes()))));
return;
}
Your logic of incrementing index has two flaws, which overwrite the first bit of the first letter. Obviously, the bug is expressed when the overwriting bit is different to the first bit.
if (x % this.maxBits == 0 && x != 0)
index++;
The first problem has to do with embedding only one bit per byte, i.e. maxBits = 1. After you have embedded the very first bit and reached the above conditional, x is still 0, since it will be incremented at the end of the loop. You should be incrementing index at this point, but x != 0 prevents you from doing so. Therefore, the second bit will also be embedded in the first byte, effectively overwriting the first bit. Since this logic also exists in the decode method, you read the first two bits from the first byte.
More specifically, if you embed a 00 or 11, it will be fine. But a 01 will be read as 11 and a 10 will be read as 00, i.e., whatever value is the second bit. If the first letter has an ascii code less or equal than 63 (00xxxxxx), or greater or equal than 192 (11xxxxxx), it will come out fine. For example:
# -> # : 00100011 (35) -> 00100011 (35)
F -> Æ : 01000110 (70) -> 11000110 (198)
The second problem has to do with the x % this.maxBits == 0 part. Consider the case where we embed 3 bits per byte. After the 3rd bit, when we reach the conditional we still have x = 2, so the modulo operation will return false. After we have embedded a 4th bit, we do have x = 3 and we're allowed to move on to the next byte. However, this extra 4th bit will be written at the 0th position of the first byte, since x % this.maxBits will be 3 % 3. So again, we have a bit overwriting our very first bit. However, after the first cycle the modulo operation will correctly write only 3 bits per byte, so the rest of our message will be unaffected.
Consider the binary for "F", which is 01000110. By embedding N bits per byte, we effectively embed the following groups in the first few bytes.
1 bit 01 0 0 0 1 1 0
2 bits 010 00 11 0x
3 bits 0100 011 0xx
4 bits 01000 110x
5 bits 010001 10xxxx
6 bits 0100011 0xxxxx
7 bits 01000110
8 bits 01000110x
As you can see, for groups of 5 and 6 bits, the last bit of the first group is 1, which will overwrite our initial 0 bit. For all other cases the overwrite doesn't affect anything. Note that for 8 bits, we end up using the first bit of the second letter. If that happened to have an ascii code greater or equal than 128, it would again overwrite the firstmost 0 bit.
To address all problems, use either
for (int x = 0; x < messageBits.length; x++) {
// code in the between
if ((x + 1) % this.maxBits == 0)
index++;
}
or
for (int x = 0; x < messageBits.length; ) {
// code in the between
x++;
if (x % this.maxBits == 0)
index++;
}
Your code has another potential problem which hasn't been expressed. If your data array has a size of 1024, but you only embed 3 letters, you will affect only the first few bytes, depending on the value of maxBits. However, for the extraction, you define your array to have a size of data.length * this.maxBits. So you end up reading bits from all of the bytes of the data array. This is currently no problem, because your array is populated by 0s, which are converted to empty strings. However, if your array had actual numbers, you'd end up reading a lot of garbage past the point of your embedded data.
There are two general ways of addressing this. You either
append a unique sequence of bits at the end of your message (marker), such that when you encounter that sequence you terminate the extraction, e.g. eight 0s, or
you add a few bits before embedding your actual data (header), which will tell you how to extract your data, e.g., how many bytes and how many bits per byte to read.
One thing you're probably going to run afoul of is the nature of character encoding.
When you call s.getBytes() you are turning the string to bytes using your JVM's default encoding. Then you modify the bytes and you create a new String from the modified bytes again using the default encoding.
So the question is what is that encoding and precisely how does it work. For example, the encoding may well in some cases only be looking at the lower 7 bits of a byte relating to the character, then your setting of the top bit won't have any effect on the string created from the modified bytes.
If you really want to tell if your code is working right, do your testing by directly examining the byte[] being produced by your encode and decode methods, not by turning the modified bytes into strings and looking at the strings.

String to binary?

I have a very odd situation,
I'm writing a filter engine for another program, and that program has what are called "save areas". Each of those save areas is numbered 0 through 32 (why there are 33 of them, I don't know). They are turned on or off via a binary string,
1 = save area 0 on
10 = save area 1 on, save area 0 off
100 = save area 2 on, save areas 1 and 0 off.
and so on.
I have another program passing in what save areas it needs, but it does so with decimal representations and underscores - 1_2_3 for save areas 1, 2, and 3 for instance.
I would need to convert that example to 1110.
What I came up with is that I can build a string as follows:
I break it up (using split) into savePart[i]. I then iterate through savePart[i] and build strings:
String saveString = padRight("0b1",Integer.parseInt(savePart[i]));
That'll give me a string that reads "0b1000000" in the case of save area 6, for instance.
Is there a way to read that string as if it was a binary number instead. Because if I were to say:
long saveBinary = 0b1000000L
that would totally work.
or, is there a smarter way to be doing this?
long saveBinary = Long.parseLong(saveString, 2);
Note that you'll have to leave off the 0b prefix.
This will do it:
String input = "1_2_3";
long areaBits = 0;
for (String numTxt : input.split("_")) {
areaBits |= 1L << Integer.parseInt(numTxt);
}
System.out.printf("\"%s\" -> %d (decimal) = %<x (hex) = %s (binary)%n",
input, areaBits, Long.toBinaryString(areaBits));
Output:
"1_2_3" -> 14 (decimal) = e (hex) = 1110 (binary)
Just take each number in the string and treat it as an exponent. Accumulate the total for each exponent found and you will get your answer w/o the need to remove prefixes or suffixes.
// will contain our answer
long answer = 0;
String[] buckets = givenData.split("_"); // array of each bucket wanted, exponents
for (int x = 0; x < buckets.length; x++){ // iterate through all exponents found
long tmpLong = Long.parseLong(buckets[x]); // get the exponent
answer = (10^tmpLong) + answer; // add 10^exponent to our running total
}
answer will now contain our answer in the format 1011010101 (what have you).
In your example, the given data was 1_2_3. The array will contain {"1", "2", "3"}
We iterate through that array...
10^1 + 10^2 + 10^3 = 10 + 100 + 1000 = 1110
I believe this is also why your numbers are 0 - 32. x^0 = 1, so you can dump into the 0 bucket when 0 is in the input.

Categories