writing a BitSet to a file in java - java

I have a BitSet and want to write it to a file- I came across a solution to use a ObjectOutputStream using the writeObject method.
I looked at the ObjectOutputStream in the java API and saw that you can write other things (byte, int, short etc)
I tried to check out the class so I tried to write a byte to a file using the following code but the result gives me a file with 7 bytes instead of 1 byte
my question is what are the first 6 bytes in the file? why are they there?
my question is relevant to a BitSet because i don't want to start writing lots of data to a file and realize I have random bytes inserted in the file without knowing what they are.
here is the code:
byte[] bt = new byte[]{'A'};
File outFile = new File("testOut.txt");
FileOutputStream fos = new FileOutputStream(outFile);
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.write(bt);
oos.close();
thanks for any help
Avner

The other bytes will be type information.
Basically ObjectOutputStream is a class used to write Serializable objects to some destination (usually a file). It makes more sense if you think about InputObjectStream. It has a readObject() method on it. How does Java know what Object to instantiate? Easy: there is type information in there.

You could be writing any objects out to an ObjectOutputStream, so the stream holds information about the types written as well as the data needed to reconstitute the object.
If you know that the stream will always contain a BitSet, don't use an ObjectOutputStream - and if space is a premium, then convert the BitSet to a set of bytes where each bit corresponds to a bit in the BitSet, then write that directly to the underlying stream (e.g. a FileOutputStream as in your example).

The serialisation format, like many others, includes a header with magic number and version information. When you use DataOutput/OutputStream methods on ObjectOutputStream are placed in the middle of the serialised data (with no type information). This is typically only done in writeObject implementations after a call to defaultWriteObject or use of putFields.

If you only use the saved BitSet in Java, the serialization works fine. However, it's kind of annoying if you want share the bitset across multi platforms. Besides the overhead of Java serialization, the BitSet is stored in units of 8-bytes. This can generate too much overhead if your bitset is small.
We wrote this small class so we can exract byte arrays from BitSet. Depending on your usecase, it might work better than Java serialization for you.
public class ExportableBitSet extends BitSet {
private static final long serialVersionUID = 1L;
public ExportableBitSet() {
super();
}
public ExportableBitSet(int nbits) {
super(nbits);
}
public ExportableBitSet(byte[] bytes) {
this(bytes == null? 0 : bytes.length*8);
for (int i = 0; i < size(); i++) {
if (isBitOn(i, bytes))
set(i);
}
}
public byte[] toByteArray() {
if (size() == 0)
return new byte[0];
// Find highest bit
int hiBit = -1;
for (int i = 0; i < size(); i++) {
if (get(i))
hiBit = i;
}
int n = (hiBit + 8) / 8;
byte[] bytes = new byte[n];
if (n == 0)
return bytes;
Arrays.fill(bytes, (byte)0);
for (int i=0; i<n*8; i++) {
if (get(i))
setBit(i, bytes);
}
return bytes;
}
protected static int BIT_MASK[] =
{0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01};
protected static boolean isBitOn(int bit, byte[] bytes) {
int size = bytes == null ? 0 : bytes.length*8;
if (bit >= size)
return false;
return (bytes[bit/8] & BIT_MASK[bit%8]) != 0;
}
protected static void setBit(int bit, byte[] bytes) {
int size = bytes == null ? 0 : bytes.length*8;
if (bit >= size)
throw new ArrayIndexOutOfBoundsException("Byte array too small");
bytes[bit/8] |= BIT_MASK[bit%8];
}
}

Related

Get bits from a ByteBuffer

I am working on a BitBuffer that will take x bits from a ByteBuffer as an int, long, etc, but I seem to be having a whole lot of problems.
I've tried loading a long at a time and using bit shifting, but the difficulty comes from rolling from one long into the next. I am wondering if there's just a better way. Anyone have any suggestions?
public class BitBuffer
{
final private ByteBuffer bb;
public BitBuffer(byte[] bytes)
{
this.bb = ByteBuffer.wrap(bytes);
}
public int takeInt(int bits)
{
int bytes = toBytes(bits);
if (bytes > 4) throw new RuntimeException("Too many bits requested");
int i=0;
// take bits from bb and fill it into an int
return i;
}
}
More specifically, I am trying to take x bits from the buffer and return them as an int (the minimal case). I can access bytes from the buffer, but let's say I only want to take just the first 4 bits instead.
Example:
If my buffer is filled with "101100001111", if I run these in order:
takeInt(4) // should return 11 (1011)
takeInt(2) // should return 0 (00)
takeInt(2) // should return 0 (00)
takeInt(1) // should return 1 (1)
takeInt(3) // should return 7 (111)
I would like to use something like this for bit packed encoded data where an integer can be stored in just a few bits of a byte.
The BitSet and ByteBuffer ideas were a bit too difficult to control so instead, I went with a binary string approach that basically takes a whole lot of headache out of managing an intermediate buffer of bits.
public class BitBuffer
{
final private String bin;
private int start;
public BitBuffer(byte[] bytes)
{
this.bin = toBinaryString(bytes); // TODO: create this function
this.start = 0;
}
public int takeInt(int nbits)
{
// TODO: handle edge cases
String bits = bin.substring(start, start+=nbits);
return Integer.parseInt(bits, 2);
}
}
Out of everything I've tried this was the cleanest and easiest approach, but I am open to suggestions!
You can convert the ByteBuffer into BitSet and then you'll have continuous access to the bits
public class BitBuffer
{
final private BitSet bs;
public BitBuffer(byte[] bytes)
{
this.bs = BitSet.valueOf(bytes);
}
public int takeInt(int bits)
{
int bytes = toBytes(bits);
if (bytes > 4) throw new RuntimeException("Too many bits requested");
int i=0;
// take bits from bs and fill it into an int
return i;
}
}

Java store boolean array in file and read fast

I need to store boolean array with 80,000 items in file. I don't care how much time saving takes, I'm interested only in the loading time of array.
I did't try to store it by DataOutputStream because it requires access for each value.
I tried to make this by 3 approaches, such as:
serialize boolean array
use BitSet instead of boolean array an serialize it
transfer boolean array into byte array, where 1 is true and 0 is false appropriately and write it by FileChannel using ByteBuffer
To test reading from files by these approaches, I had run each approach 1,000 times in loop. So I got results which look like this:
deserialization of boolean array takes 574 ms
deserialization of BitSet - 379 ms
getting byte array from FileChannel by MappedByteBuffer - 170 ms
The first and second approaches are too long, the third, perhaps, is not approach at all.
Perhaps there are a best way to accomplish it, so I need your advice
EDIT
Each method ran once
13.8
8.71
6.46
ms appropriatively
What about writing a byte for each boolean and develop a custom parser? This will propably one of the fastest methods.
If you want to save space you could also put 8 booleans into one byte but this would require some bit shifting operations.
Here is a short example code:
public void save() throws IOException
{
boolean[] testData = new boolean[80000];
for(int X=0;X < testData.length; X++)
{
testData[X] = Math.random() > 0.5;
}
FileOutputStream stream = new FileOutputStream(new File("test.bin"));
for (boolean item : testData)
{
stream.write(item ? 1 : 0);
}
stream.close();
}
public boolean[] load() throws IOException
{
long start = System.nanoTime();
File file = new File("test.bin");
FileInputStream inputStream = new FileInputStream(file);
int fileLength = (int) file.length();
byte[] data = new byte[fileLength];
boolean[] output = new boolean[fileLength];
inputStream.read(data);
for (int X = 0; X < data.length; X++)
{
if (data[X] != 0)
{
output[X] = true;
continue;
}
output[X] = false;
}
long end = System.nanoTime() - start;
Console.log("Time: " + end);
return output;
}
It takes about 2ms to load 80.000 booleans.
Tested with JDK 1.8.0_45
So I had a very similar use case where i wanted to serialise/deserialise a very large boolean array.
I implemented something like this,
Firstly i converted boolean array to an integer array simply to club multiple boolean values (This makes storage more efficient and there are no issues with bit padding)
This now means we have to build wrapper methods which will give true/false
private boolean get (int index) {
int holderIndex = (int) Math.floor(index/buckets);
int internalIndex = index % buckets;
return 0 != (container[holderIndex] & (1 << internalIndex));
}
and
private void set (int index) {
int holderIndex = (int) Math.floor(index/buckets);
int internalIndex = index % buckets;
int value = container[holderIndex];
int newValue = value | (1 << internalIndex);
container[holderIndex] = newValue;
}
Now to serialise and deserialise you can directly convert this to bytestream and write to file.
my source code, for reference

Converting byte to a single int Android

I am implementing a Bluetooth Android application, where Image is sent from one device to another. The bitmap's byte array is sent and successfully reconstructed at the receiver end. However, I need to send a single integer value together with the bitmap as an index(so the receiver knows what to do with the received bitmap). So basically I want to send this in a byte stream:
int|bitmap
Since I need to transfer an int up to 27, that means it fits into a single byte, right? My current code looks like this:
ba[0] = Integer.valueOf(drawableNumber).byteValue(); //drawableNumber value is between 1 and 27
ByteArrayOutputStream bs = new ByteArrayOutputStream(); //create new output stream
try {
bs.write(ba); //bytes of the integer
bs.write(bitmapdata); //bytes of the bitmap
bs.toByteArray() // put everything into byte array
}
mChatService.write(bs.toByteArray()); // that is where bytes are sent to another device
And at the receiver end:
case MESSAGE_READ:
readBuf = (byte[]) msg.obj; // readBuf contains ALL the received bytes using .read method
So my question is, how can I reconstruct the integer and the image I have sent(basically a single byte to a single integer)? I manage to reconstruct the bitmap alone, but I need this additional integer value to know what to do with the received image. The integer value will always be between 0 and 27. I have checked all other answers, but could not find a proper solution..
EDIT: Main question is how to separate the integer bytes from the bitmap bytes in the byte array. Because at the receiving end I want to reconstruct the sent integer AND the bitmap separately
Since converting a byte to an int is a downcast, you can just assign the byte to an int variable.
int myInt = ba[0];
When I try this in java, it simply tells me what my mind was thinking when I commented. An integer is represented as a byte (so in your case, it is simply ba[0]). Or it should be based on your code. Any more than that and it would be a long. That means it is also the first byte that will get read out of your buffer (or should be).
import java.io.ByteArrayOutputStream;
public class TestClass {
public static void main(String args[]){
byte[] ba = new byte[10];
int myInt = 13;
ByteArrayOutputStream bs = new ByteArrayOutputStream(); //create new output stream
try {
bs.write(myInt); //bytes of the integer
ba = bs.toByteArray(); // put everything into byte array
} finally{};
for(int i = 0; i < ba.length; i++){
System.out.println(i);
System.out.println(ba[i]);
}
}
}
Again I realize this isn't exactly an answer, just too much for a comment.

Using assertArrayEquals() with wildcards?

I want to test code that produces byte arrays used to send as UDP packets.
Although I'm not able to reproduce every byte in my test (e.g. random bytes, timestamps), I'd like to test the bytes that I can predetermine.
Is something like the following possible using JUnit 4.8 (and Mockito 1.8)?
Packet packet = new RandomPacket();
byte[] bytes = new byte[] {
0x00, 0x02, 0x05, 0x00, anyByte(), anyByte(), anyByte(), anyByte(), 0x00
};
assertArrayEquals(packet.getBytes(), bytes);
The sample above is of course not working, I'm just searching for a way to use some sort of wildcard in assertArrayEquals().
PS: My only alternative right now is to check each byte individually (and omit random ones). But this is quiet tedious and not really reusable.
Thanks to the answer from JB Nizet I have the following code in place now, working just fine:
private static int any() {
return -1;
}
private static void assertArrayEquals(int[] expected, byte[] actual) {
if(actual.length != expected.length) {
fail(String.format("Arrays differ in size: expected <%d> but was <%d>", expected.length, actual.length));
}
for(int i = 0; i < expected.length; i ++) {
if(expected[i] == -1) {
continue;
}
if((byte) expected[i] != actual[i]) {
fail(String.format("Arrays differ at element %d: expected <%d> but was <%d>", i, expected[i], actual[i]));
}
}
}
You could simply write your expected array as an array of integers, and use a special value (such as -1) to represent the wildcard. It's the same trick as the read methods of the input streams. You would just have to write your custom assertEqualsWithWildCard(int[] expected, byte[] actual).
If you are going to be writing a lot of code like this, I would write a separate class to "decode" the packet into meaningful fields. Then (of course, after testing that the class itself works) you can write sensible tests like
assertEquals(42, packet.length());
assertEquals(0xDEADBEEF, packet.checksum());
etc.
That way, you are not "omitting random bytes", and your code will be much more readable (if a tad more verbose).

Attempt at parsing packets: Is there a Java equivalent to Python's "unpack"?

Is there an equivalent function to this Python function in Java?
struct.unpack(fmt, string)
I'm trying to port a parser written in Python to Java, and I'm looking for a way to implement the following line of code:
handle, msgVer, source, startTime, dataFormat, sampleCount, sampleInterval, physDim, digMin, digMax, physMin, physMax, freq, = unpack(self.headerFormat,self.unprocessed[pos:pos+calcsize(self.headerFormat)])
I'm using this in the context of a project where I receive bytes from the network and need to extract a specific part of the bytes to display them.
[EDIT 2]
The conclusion I had posted as an update was wrong. I deleted it to avoid misleading others.
I am not aware of any real equivalent to Python's unpack in Java.
The conventional approach would be to read the data from a stream (originating either from a socket, or from a byte array read from the socket, via a ByteArrayInputStream) using a DataInputStream. That class has a set of methods for reading various primitives.
In your case, you would do something like:
DataInputStream in;
char[] handle = new char[6]; in.readFully(handle);
byte messageVersion = in.readByte();
byte source = in.readByte();
int startTime = in.readInt();
byte dataFormat = in.readByte();
byte sampleCount = in.readByte();
int sampleInterval = in.readInt();
short physDim = in.readShort();
int digMin = in.readInt();
int digMax = in.readInt();
float physMin = in.readFloat();
float physMax = in.readFloat();
int freq = in.readInt();
And then turn those variables into a suitable object.
Note that i've opted to pack each field into the smallest primitive which will hold it; that means putting unsigned values into signed types of the same size. You might prefer to put them in bigger types, so that they keep their sign (eg putting an unsigned short into an int); DataInputStream has a set of readUnsignedXXX() methods that you can use for that.

Categories