Write a bit-by-bit file in java [duplicate] - java

To read/write binary files, I am using DataInputStream/DataOutputStream, they have this method writeByte()/readByte(), but what I want to do is read/write bits? Is it possible?
I want to use it for a compression algorithm, so when I am compressing I want to write 3 bits(for one number and there are millions of such numbers in a file) and if I write a byte at everytime I need to write 3 bits, I will write loads of redundant data...

It's not possible to read/write individual bits directly, the smallest unit you can read/write is a byte.
You can use the standard bitwise operators to manipulate a byte though, so e.g. to get the lowest 2 bits of a byte, you'd do
byte b = in.readByte();
byte lowBits = b&0x3;
set the low 4 bits to 1, and write the byte:
b |= 0xf;
out.writeByte(b);
(Note, for the sake of efficiency you might want to read/write byte arrays and not single bytes)

There's no way to do it directly. The smallest unit computers can handle is a byte (even booleans take up a byte). However you can create a custom stream class that packs a byte with the bits you want then writes it. You can then make a wrapper for this class who's write function takes some integral type, checks that it's between 0 and 7 (or -4 and 3 ... or whatever), extracts the bits in the same way the BitInputStream class (below) does, and makes the corresponding calls to the BitOutputStream's write method. You might be thinking that you could just make one set of IO stream classes, but 3 doesn't go into 8 evenly. So if you want optimum storage efficiency and you don't want to work really hard you're kind of stuck with two layers of abstraction. Below is a BitOutputStream class, a corresponding BitInputStream class, and a program that makes sure they work.
import java.io.IOException;
import java.io.OutputStream;
class BitOutputStream {
private OutputStream out;
private boolean[] buffer = new boolean[8];
private int count = 0;
public BitOutputStream(OutputStream out) {
this.out = out;
}
public void write(boolean x) throws IOException {
this.count++;
this.buffer[8-this.count] = x;
if (this.count == 8){
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.count = 0;
}
}
public void close() throws IOException {
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.out.close();
}
}
I'm sure there's a way to pack the int with bit-wise operators and thus avoid having to reverse the input, but I don't what to think that hard.
Also, you probably noticed that there is no local way to detect that the last bit has been read in this implementation, but I really don't want to think that hard.
import java.io.IOException;
import java.io.InputStream;
class BitInputStream {
private InputStream in;
private int num = 0;
private int count = 8;
public BitInputStream(InputStream in) {
this.in = in;
}
public boolean read() throws IOException {
if (this.count == 8){
this.num = this.in.read() + 128;
this.count = 0;
}
boolean x = (num%2 == 1);
num /= 2;
this.count++;
return x;
}
public void close() throws IOException {
this.in.close();
}
}
You probably know this, but you should put a BufferedStream in between your BitStream and FileStream or it'll take forever.
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Random;
class Test {
private static final int n = 1000000;
public static void main(String[] args) throws IOException {
Random random = new Random();
//Generate array
long startTime = System.nanoTime();
boolean[] outputArray = new boolean[n];
for (int index = 0; index < n; index++){
outputArray[index] = random.nextBoolean();
}
System.out.println("Array generated in " + (double)(System.nanoTime() - startTime)/1000/1000/1000 + " seconds.");
//Write to file
startTime = System.nanoTime();
BitOutputStream fout = new BitOutputStream(new BufferedOutputStream(new FileOutputStream("booleans.bin")));
for (int index = 0; index < n; index++){
fout.write(outputArray[index]);
}
fout.close();
System.out.println("Array written to file in " + (double)(System.nanoTime() - startTime)/1000/1000/1000 + " seconds.");
//Read from file
startTime = System.nanoTime();
BitInputStream fin = new BitInputStream(new BufferedInputStream(new FileInputStream("booleans.bin")));
boolean[] inputArray = new boolean[n];
for (int index = 0; index < n; index++){
inputArray[index] = fin.read();
}
fin.close();
System.out.println("Array read from file in " + (double)(System.nanoTime() - startTime)/1000/1000/1000 + " seconds.");
//Delete file
new File("booleans.bin").delete();
//Check equality
boolean equal = true;
for (int index = 0; index < n; index++){
if (outputArray[index] != inputArray[index]){
equal = false;
break;
}
}
System.out.println("Input " + (equal ? "equals " : "doesn't equal ") + "output.");
}
}

Please take a look at my bit-io library https://github.com/jinahya/bit-io, which can read and write non-octet-aligned values such as a 1-bit boolean or 17-bit unsigned integer.
<dependency>
<!-- resides in central repo -->
<groupId>com.googlecode.jinahya</groupId>
<artifactId>bit-io</artifactId>
<version>1.0-alpha-13</version>
</dependency>
This library reads and writes arbitrary-length bits.
final InputStream stream;
final BitInput input = new BitInput(new BitInput.StreamInput(stream));
final int b = input.readBoolean(); // reads a 1-bit boolean value
final int i = input.readUnsignedInt(3); // reads a 3-bit unsigned int
final long l = input.readLong(47); // reads a 47-bit signed long
input.align(1); // 8-bit byte align; padding
final WritableByteChannel channel;
final BitOutput output = new BitOutput(new BitOutput.ChannelOutput(channel));
output.writeBoolean(true); // writes a 1-bit boolean value
output.writeInt(17, 0x00); // writes a 17-bit signed int
output.writeUnsignedLong(54, 0x00L); // writes a 54-bit unsigned long
output.align(4); // 32-bit byte align; discarding

InputStreams and OutputStreams are streams of bytes.
To read a bit you'll need to read a byte and then use bit manipulation to inspect the bits you care about. Likewise, to write bits you'll need to write bytes containing the bits you want.

Yes and no. On most modern computers, a byte is the smallest addressable unit of memory, so you can only read/write entire bytes at a time. However, you can always use bitwise operators to manipulate the bits within a byte.

Bits are packaged in bytes and apart from VHDL/Verilog I have seen no language that allows you to append individual bits to a stream. Cache up your bits and pack them into a byte for a write using a buffer and bitmasking. Do the reverse for read, i.e. keep a pointer in your buffer and increment it as you return individually masked bits.

Afaik there is no function for doing this in the Java API. However you can of course read a byte and then use bit manipulation functions. Same goes for writing.

If you are just writing bits to a file, Java's BitSet class might be worth a look at. From the javadoc:
This class implements a vector of bits that grows as needed. Each component of the bit set has a boolean value. The bits of a BitSet are indexed by nonnegative integers. Individual indexed bits can be examined, set, or cleared. One BitSet may be used to modify the contents of another BitSet through logical AND, logical inclusive OR, and logical exclusive OR operations.
You are able to convert BitSets to long[] and byte[] to save data to a file.

The below code should work
int[] mynumbers = {3,4};
BitSet compressedNumbers = new BitSet(mynumbers.length*3);
// let's say you encoded 3 as 101 and 4 as 010
String myNumbersAsBinaryString = "101010";
for (int i = 0; i < myNumbersAsBinaryString.length(); i++) {
if(myNumbersAsBinaryString.charAt(i) == '1')
compressedNumbers.set(i);
}
String path = Resources.getResource("myfile.out").getPath();
ObjectOutputStream outputStream = null;
try {
outputStream = new ObjectOutputStream(new FileOutputStream(path));
outputStream.writeObject(compressedNumbers);
} catch (IOException e) {
e.printStackTrace();
}

Related

How do I improve the runtime of my algorithm?

The aim is given a file, with the 1st line as the number of lines available, find how many pair of lines are permutations of each other. Example would be that AABA is a permutation of BAAA. The code is written in java. This is my current code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Arrays;
public class SpeedDemon {
public class Data{
byte[] dataValues;
byte duplicate=1;
int hashcode;
public Data(byte[] input) {
dataValues= new byte[128];
for (byte x : input) {
if (x==10){
break;
}
dataValues[x]++;
}
hashcode = Arrays.hashCode(dataValues);
}
public boolean equal(Data o){
return this.hashcode==o.hashcode&&Arrays.equals(o.dataValues, this.dataValues);
}
}
public int processData(String fileName){
try {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
int size = Integer.parseInt(reader.readLine());
int arr_size = 2;
while (arr_size < size) {
arr_size *= 2;
}
Data[] map = new Data[arr_size];
int z = 0;
Data data;
int j;
for (int i = 0; i < size; i++) {
data = new Data(reader.readLine().getBytes());
j = data.hashcode;
j ^= (j >>> 16);
j &= (arr_size - 1);
while (true) {
if (map[j] == null) {
map[j] = data;
break;
} else {
if (map[j].equal(data)) {
z += map[j].duplicate++;
break;
} else {
j = j == arr_size - 1 ? 0 : j + 1;
}
}
}
}
return z;
}catch(Exception ex){ }
return 0;
}
public static void main(String[] args) {
System.out.println(new SpeedDemon().processData(args[0]));
}
}
I would like to know if there is any way to improve the time efficiency of the program? It is part of my class contest and some people have managed runtimes of around 25% faster. I tried different array sizes and this seem to work the best.
Multiply arr_size by 4. You need a lot of free slots to make open addressing efficient, and depending on what size is you may not be getting very many right now.
Specify a larger buffer size on your buffered reader to reduce the I/O count. 32768 would be reasonable.
Then work on efficiency in Data Both the hashing and comparison operations need to iterate through all 128 possible byte values, which is unnecessary.
Are you sure your code even gets the correct answer? It doesn't seem likely.
The easiest way to determine if two strings are permutations of each other is to sort the strings and compare them. With that in mind, an easier and faster way to code this up would be to use a Map. Something like this:
Create a new Map where the key and value are both strings
for each line of the file
s = read string from file
sortedString = sort(s) // sort characters in the string
if (map.contains(sortedString))
you found a duplicate
else
map.insert(sortedString, string) // the key is the sorted string
end for
There are other ways to do this, but that's the easiest way I know of, and probably the fastest.

Is there a way to pass a Java class to new file within Eclipse?

I'm a new programmer, so please be patient with me!
I have two files created: tableBlocks.java and buildBuffer.java.
Within the file tableBlocks, I read in a file, perform some mathematical operations (not shown below), and insert the results into a byte array.
Within the file buildBuffer, I build an empty buffer (another array).
I want to be able to insert the built byte array from tableBlocks into the buffer array located in buildBuffer. However, I haven't been able to pass information between files.
I tried the extend and implement methods, neither with any luck. Inserting into an array is difficult enough in Java (apparently).
Table7Block2.java
public final class Table7Block2 {
private static final String INPUT_FILE_NAME = "C:\\Users\\CASPERLS\\BinaryData.bin";
private static final String OUTPUT_FILE_NAME = "C:\\Users\\CASPERLS\\BinaryWritten.bin";
public static void main(String... args) {
Table7Block2 test = new Table7Block2();
byte[] fileContents = test.read(INPUT_FILE_NAME); // create empty array
test.write(fileContents, OUTPUT_FILE_NAME);
}
}
buildBuffer.java
public class buildBuffer {
public static void main(String[] args) {
byte[] bytebuf = new byte[236];
int position, n, value;
n = 235;
position = 11;
value = 1;
int c;
for (c = n - 1; c >= position - 1; c--) {
bytebuf[c + 1] = bytebuf[c];
}
bytebuf[position] = (byte) value;
System.out.println("Resultant array is");
for (c = 0; c <= n; c++) {
System.out.print(bytebuf[c]);
}
}
}
I want to be able to call "bytebuf" in my TableBlock code, and insert the data I read in from the random file into my bytebuf array.

Pack header and data layout in one byte array using ByteBuffer in an efficient way?

I have a header and data which I need to represent in one Byte Array. And I have a particular format for packing the header in a Byte Array and also a different format to pack the data in a Byte Array. After I have these two, I need to make one final Byte Array out of it.
Below is the layout which is how defined in C++ and accordingly I have to do in Java.
// below is my header offsets layout
// addressedCenter must be the first byte
static constexpr uint32_t addressedCenter = 0;
static constexpr uint32_t version = addressedCenter + 1;
static constexpr uint32_t numberOfRecords = version + 1;
static constexpr uint32_t bufferUsed = numberOfRecords + sizeof(uint32_t);
static constexpr uint32_t location = bufferUsed + sizeof(uint32_t);
static constexpr uint32_t locationFrom = location + sizeof(CustomerAddress);
static constexpr uint32_t locationOrigin = locationFrom + sizeof(CustomerAddress);
static constexpr uint32_t partition = locationOrigin + sizeof(CustomerAddress);
static constexpr uint32_t copy = partition + 1;
// this is the full size of the header
static constexpr uint32_t headerOffset = copy + 1;
And CustomerAddress is a typedef for uint64_t and it is made up like this -
typedef uint64_t CustomerAddress;
void client_data(uint8_t datacenter,
uint16_t clientId,
uint8_t dataId,
uint32_t dataCounter,
CustomerAddress& customerAddress)
{
customerAddress = (uint64_t(datacenter) << 56)
+ (uint64_t(clientId) << 40)
+ (uint64_t(dataId) << 32)
+ dataCounter;
}
And below is my data layout -
// below is my data layout -
//
// key type - 1 byte
// key len - 1 byte
// key (variable size = key_len)
// timestamp (sizeof uint64_t)
// data size (sizeof uint16_t)
// data (variable size = data size)
Problem Statement:-
Now for a part of project, I am trying to represent overall stuff in one particular class in Java so that I can just pass the necessary fields and it can make me a final Byte Array out of it which will have the header first and then the data:
Below is my DataFrame class:
public final class DataFrame {
private final byte addressedCenter;
private final byte version;
private final Map<byte[], byte[]> keyDataHolder;
private final long location;
private final long locationFrom;
private final long locationOrigin;
private final byte partition;
private final byte copy;
public DataFrame(byte addressedCenter, byte version,
Map<byte[], byte[]> keyDataHolder, long location, long locationFrom,
long locationOrigin, byte partition, byte copy) {
this.addressedCenter = addressedCenter;
this.version = version;
this.keyDataHolder = keyDataHolder;
this.location = location;
this.locationFrom = locationFrom;
this.locationOrigin = locationOrigin;
this.partition = partition;
this.copy = copy;
}
public byte[] serialize() {
// All of the data is embedded in a binary array with fixed maximum size 70000
ByteBuffer byteBuffer = ByteBuffer.allocate(70000);
byteBuffer.order(ByteOrder.BIG_ENDIAN);
int numOfRecords = keyDataHolder.size();
int bufferUsed = getBufferUsed(keyDataHolder); // 36 + dataSize + 1 + 1 + keyLength + 8 + 2;
// header layout
byteBuffer.put(addressedCenter); // byte
byteBuffer.put(version); // byte
byteBuffer.putInt(numOfRecords); // int
byteBuffer.putInt(bufferUsed); // int
byteBuffer.putLong(location); // long
byteBuffer.putLong(locationFrom); // long
byteBuffer.putLong(locationOrigin); // long
byteBuffer.put(partition); // byte
byteBuffer.put(copy); // byte
// now the data layout
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
byte keyType = 0;
byte keyLength = (byte) entry.getKey().length;
byte[] key = entry.getKey();
byte[] data = entry.getValue();
short dataSize = (short) data.length;
ByteBuffer dataBuffer = ByteBuffer.wrap(data);
long timestamp = 0;
if (dataSize > 10) {
timestamp = dataBuffer.getLong(2);
}
byteBuffer.put(keyType);
byteBuffer.put(keyLength);
byteBuffer.put(key);
byteBuffer.putLong(timestamp);
byteBuffer.putShort(dataSize);
byteBuffer.put(data);
}
return byteBuffer.array();
}
private int getBufferUsed(final Map<byte[], byte[]> keyDataHolder) {
int size = 36;
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
size += 1 + 1 + 8 + 2;
size += entry.getKey().length;
size += entry.getValue().length;
}
return size;
}
}
And below is how I am using my above DataFrame class:
public static void main(String[] args) throws IOException {
// header layout
byte addressedCenter = 0;
byte version = 1;
long location = packCustomerAddress((byte) 12, (short) 13, (byte) 32, (int) 120);
long locationFrom = packCustomerAddress((byte) 21, (short) 23, (byte) 41, (int) 130);
long locationOrigin = packCustomerAddress((byte) 21, (short) 24, (byte) 41, (int) 140);
byte partition = 3;
byte copy = 0;
// this map will have key as the actual key and value as the actual data, both in byte array
// for now I am storing only two entries in this map
Map<byte[], byte[]> keyDataHolder = new HashMap<byte[], byte[]>();
for (int i = 1; i <= 2; i++) {
keyDataHolder.put(generateKey(), getMyData());
}
DataFrame records =
new DataFrame(addressedCenter, version, keyDataHolder, location, locationFrom,
locationOrigin, partition, copy);
// this will give me final packed byte array
// which will have header and data in it.
byte[] packedArray = records.serialize();
}
private static long packCustomerAddress(byte datacenter, short clientId, byte dataId,
int dataCounter) {
return ((long) (datacenter) << 56) | ((long) clientId << 40) | ((long) dataId << 32)
| ((long) dataCounter);
}
As you can see in my DataFrame class, I am allocating ByteBuffer with predefined size of 70000. Is there a better way by which I can allocate the size I am using while making ByteBuffer instead of using a hardcoded 70000?
Also is there any better way as compared to what I am doing which packs my header and data in one byte array? I also need to make sure it is thread safe since it can be called by multiple threads.
Is there a better way by which I can allocate the size I am using while making ByteBuffer instead of using a hardcoded 70000?
There are at least two, non-overlapping approaches. You may use both.
One is buffer pooling. You should find out how many buffers you need during peak periods, and use a maximum above it, e.g. max + max / 2, max + average, max + mode, 2 * max.
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.function.Consumer;
import java.util.function.Function;
public class ByteBufferPool {
private final int bufferCapacity;
private final LinkedBlockingDeque<ByteBuffer> queue;
public ByteBufferPool(int limit, int bufferCapacity) {
if (limit < 0) throw new IllegalArgumentException("limit must not be negative.");
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.bufferCapacity = bufferCapacity;
this.queue = (limit == 0) ? null : new LinkedBlockingDeque<>(limit);
}
public ByteBuffer acquire() {
ByteBuffer buffer = (queue == null) ? null : queue.pollFirst();
if (buffer == null) {
buffer = ByteBuffer.allocate(bufferCapacity);
}
else {
buffer.clear();
buffer.order(ByteOrder.BIG_ENDIAN);
}
return buffer;
}
public boolean release(ByteBuffer buffer) {
if (buffer == null) throw new IllegalArgumentException("buffer must not be null.");
if (buffer.capacity() != bufferCapacity) throw new IllegalArgumentException("buffer has unsupported capacity.");
if (buffer.isDirect()) throw new IllegalArgumentException("buffer must not be direct.");
if (buffer.isReadOnly()) throw new IllegalArgumentException("buffer must not be read-only.");
return (queue == null) ? false : queue.offerFirst(buffer);
}
public void withBuffer(Consumer<ByteBuffer> action) {
if (action == null) throw new IllegalArgumentException("action must not be null.");
ByteBuffer buffer = acquire();
try {
action.accept(buffer);
}
finally {
release(buffer);
}
}
public <T> T withBuffer(Function<ByteBuffer, T> function) {
if (function == null) throw new IllegalArgumentException("function must not be null.");
ByteBuffer buffer = acquire();
try {
return function.apply(buffer);
}
finally {
release(buffer);
}
}
public <T> CompletionStage<T> withBufferAsync(Function<ByteBuffer, CompletionStage<T>> asyncFunction) {
if (asyncFunction == null) throw new IllegalArgumentException("asyncFunction must not be null.");
ByteBuffer buffer = acquire();
CompletionStage<T> future = null;
try {
future = asyncFunction.apply(buffer);
}
finally {
if (future == null) {
release(buffer);
}
else {
future = future.whenComplete((result, throwable) -> release(buffer));
}
}
return future;
}
}
The withBuffer methods allow a straight forward usage of the pool, while the acquire and release allow separating the acquisition and releasing points.
Another one is segregating the serialization interface, e.g. the put, putInt and putLong, where you can then implement a byte counting class and an actual byte buffering class. You should add a method to such interface to know if the serializer is counting bytes or buffering, in order to avoid unnecessary byte generation, and another method to increment byte usage directly, useful when calculating the size of a string in some encoding without actually serializing.
public interface ByteSerializer {
ByteSerializer put(byte value);
ByteSerializer putInt(int value);
ByteSerializer putLong(long value);
boolean isSerializing();
ByteSerializer add(int bytes);
int position();
}
public class ByteCountSerializer implements ByteSerializer {
private int count = 0;
#Override
public ByteSerializer put(byte value) {
count += 1;
return this;
}
#Override
public ByteSerializer putInt(int value) {
count += 4;
return this;
}
#Override
public ByteSerializer putLong(long value) {
count += 8;
return this;
}
#Override
public boolean isSerializing() {
return false;
}
#Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
count += bytes;
return this;
}
#Override
public int position() {
return count;
}
}
import java.nio.ByteBuffer;
public class ByteBufferSerializer implements ByteSerializer {
private final ByteBuffer buffer;
public ByteBufferSerializer(int bufferCapacity) {
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.buffer = ByteBuffer.allocate(bufferCapacity);
}
#Override
public ByteSerializer put(byte value) {
buffer.put(value);
return this;
}
#Override
public ByteSerializer putInt(int value) {
buffer.putInt(value);
return this;
}
#Override
public ByteSerializer putLong(long value) {
buffer.putLong(value);
return this;
}
#Override
public boolean isSerializing() {
return true;
}
#Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
for (int b = 0; b < bytes; b++) {
buffer.put((byte)0);
}
return this;
// or throw new UnsupportedOperationException();
}
#Override
public int position() {
return buffer.position();
}
public ByteBuffer buffer() {
return buffer;
}
}
In your code, you'd do something along these lines (not tested):
ByteCountSerializer counter = new ByteCountSerializer();
dataFrame.serialize(counter);
ByteBufferSerializer serializer = new ByteByfferSerializer(counter.position());
dataFrame.serialize(serializer);
ByteBuffer buffer = serializer.buffer();
// ... write buffer, ?, profit ...
Your DataFrame.serialize method should be refactored to accept a ByteSerializer, and in cases where it would generate data, it should check isSerializing to know if it should only calculate the size or actually write bytes.
I leave combining both approaches as an exercise, mainly because it depends a lot on how you decide to do it.
For instance, you may make ByteBufferSerializer use the pool directly and keep an arbitrary capacity (e.g. your 70000), you may pool ByteBuffers by capacity (but instead of the needed capacity, use the least power of 2 greater than the needed capacity, and set the buffer's limit before returning from acquire), or you may pool ByteBufferSerializers directly as long as you add a reset() method.
Also is there any better way as compared to what I am doing which packs my header and data in one byte array?
Yes. Pass around the byte buffering instance instead of having certain methods return byte arrays which are discarded the moment after their length is checked or their contents are copied.
I also need to make sure it is thread safe since it can be called by multiple threads.
As long as each buffer is being used by only one thread, with proper synchronization, you don't have to worry.
Proper synchronization means your pool manager has acquire and release semantics in its methods, and that if a buffer is used by multiple threads between fetching it from and returning it to the pool, you are adding release semantics in the thread that stops using the buffer and adding acquire semantics in the thread that starts using the buffer. For instance, if you're passing the buffer through CompletableFutures, you shouldn't have to worry about this, or if you're communicating explicitly between threads with an Exchanger or a proper implementation of BlockingQueue.
From java.util.concurrent's package description:
The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization. In particular:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
Actions in a thread prior to the submission of a Runnable to an Executor happen-before its execution begins. Similarly for Callables submitted to an ExecutorService.
Actions taken by the asynchronous computation represented by a Future happen-before actions subsequent to the retrieval of the result via Future.get() in another thread.
Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread.
For each pair of threads that successfully exchange objects via an Exchanger, actions prior to the exchange() in each thread happen-before those subsequent to the corresponding exchange() in another thread.
Actions prior to calling CyclicBarrier.await and Phaser.awaitAdvance (as well as its variants) happen-before actions performed by the barrier action, and actions performed by the barrier action happen-before actions subsequent to a successful return from the corresponding await in other threads.
Another way of doing it would be via a DataOutputStream around a ByteArrayOutputStream, but you should concentrate your performance tuning around the places it's needed, and this isn't one of them. Efficiency isn't any kind of an issue here. The network I/O will dominate by orders of magnitude.
Another reason to use a ByteArrayOutputStream is that you don't have to guess the buffer size in advance: it will grow as necessary.
To keep it thread-safe, use only local variables.

Conversion from parsing a String to double is returning 0.0:

This is a follow up on the last question I made regarding this topic. It's a different issue though.
My code is working, except it's copying some sort of address using the copyOfRange. It always returns 0.0 due to an address of some sort, instead of the section of the array getBits.
Can someone please scan this is and make a suggestion? I am going crazy over this (it's not an assignment).
package runTests;
import java.util.Arrays;
public class runTestGetBinaryStrands {
protected static int getBits[] = {1,0,1,1,0,1,0,0,0,1,1,0,1,0,1,0};
double numerator, denominator, x, y;
public static void main (String[] args){
runTestGetBinaryStrands test = new runTestGetBinaryStrands();
test.getNumber(null, getBits);
}
/*NOTE OF THIS FORLOOP: * Divided the bits array in half & convert two different binary values to a string * I parsed the string to an int value, which can be put saved to a double and be treated like a decimal value. * I got the first 8 elements and stashed them into numerator, and did the same for denominator for the remaining array bits. * * The chromosome has one binary string, made up of a bunch of smaller parts.* You use getNumber in the chromosome to get out the values of the parts. **/
public void getNumber(String convert, int[] tempBinary){
for (int i = 0; i < getBits.length; i++){
for(int j = 0; j < getBits.length; j++){ //start at index 0 to 7 = 8.
tempBinary = Arrays.copyOfRange(getBits, 0, 7); //Get first set of 8 elements.
convert = tempBinary.toString();
System.out.println(convert);
try{
numerator = Integer.parseInt(convert); //converts string to one whole section in
}catch (NumberFormatException ex){
}
System.out.println("See Numerator's value: " + numerator);
tempBinary= Arrays.copyOfRange(getBits, 8, 15); //Get Second set of 8 elements.
convert = tempBinary.toString();
try{
denominator = Integer.parseInt(convert); //converts string to one whole section in
}
catch (NumberFormatException ex){
}
System.out.println("See Denominator's value: " + denominator);
}
}
}
}
Replace the lines convert = tempBinary.toString(); with:
convert = "";
for(int bin : tempBinary){
convert += bin;
}
That should get your conversion working.

Get size of String w/ encoding in bytes without converting to byte[]

I have a situation where I need to know the size of a String/encoding pair, in bytes, but cannot use the getBytes() method because 1) the String is very large and duplicating the String in a byte[] array would use a large amount of memory, but more to the point 2) getBytes() allocates a byte[] array based on the length of the String * the maximum possible bytes per character. So if I have a String with 1.5B characters and UTF-16 encoding, getBytes() will try to allocate a 3GB array and fail, since arrays are limited to 2^32 - X bytes (X is Java version specific).
So - is there some way to calculate the byte size of a String/encoding pair directly from the String object?
UPDATE:
Here's a working implementation of jtahlborn's answer:
private class CountingOutputStream extends OutputStream {
int total;
#Override
public void write(int i) {
throw new RuntimeException("don't use");
}
#Override
public void write(byte[] b) {
total += b.length;
}
#Override public void write(byte[] b, int offset, int len) {
total += len;
}
}
Simple, just write it to a dummy output stream:
class CountingOutputStream extends OutputStream {
private int _total;
#Override public void write(int b) {
++_total;
}
#Override public void write(byte[] b) {
_total += b.length;
}
#Override public void write(byte[] b, int offset, int len) {
_total += len;
}
public int getTotalSize(){
_total;
}
}
CountingOutputStream cos = new CountingOutputStream();
Writer writer = new OutputStreamWriter(cos, "my_encoding");
//writer.write(myString);
// UPDATE: OutputStreamWriter does a simple copy of the _entire_ input string, to avoid that use:
for(int i = 0; i < myString.length(); i+=8096) {
int end = Math.min(myString.length(), i+8096);
writer.write(myString, i, end - i);
}
writer.flush();
System.out.println("Total bytes: " + cos.getTotalSize());
it's not only simple, but probably just as fast as the other "complex" answers.
The same using apache-commons libraries:
public static long stringLength(String string, Charset charset) {
try (NullOutputStream nul = new NullOutputStream();
CountingOutputStream count = new CountingOutputStream(nul)) {
IOUtils.write(string, count, charset.name());
count.flush();
return count.getCount();
} catch (IOException e) {
throw new IllegalStateException("Unexpected I/O.", e);
}
}
Guava has an implementation according to this post:
Utf8.encodedLength()
Here's an apparently working implementation:
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
public class TestUnicode {
private final static int ENCODE_CHUNK = 100;
public static long bytesRequiredToEncode(final String s,
final Charset encoding) {
long count = 0;
for (int i = 0; i < s.length(); ) {
int end = i + ENCODE_CHUNK;
if (end >= s.length()) {
end = s.length();
} else if (Character.isHighSurrogate(s.charAt(end))) {
end++;
}
count += encoding.encode(s.substring(i, end)).remaining() + 1;
i = end;
}
return count;
}
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++) {
sb.appendCodePoint(11614);
sb.appendCodePoint(1061122);
sb.appendCodePoint(2065);
sb.appendCodePoint(1064124);
}
Charset cs = StandardCharsets.UTF_8;
System.out.println(bytesRequiredToEncode(new String(sb), cs));
System.out.println(new String(sb).getBytes(cs).length);
}
}
The output is:
1400
1400
In practice I'd increase ENCODE_CHUNK to 10MChars or so.
Probably slightly less efficient than brettw's answer, but simpler to implement.
Ok, this is extremely gross. I admit that, but this stuff is hidden by the JVM, so we have to dig a little. And sweat a little.
First, we want the actual char[] that backs a String without making a copy. To do this we have to use reflection to get at the 'value' field:
char[] chars = null;
for (Field field : String.class.getDeclaredFields()) {
if ("value".equals(field.getName())) {
field.setAccessible(true);
chars = (char[]) field.get(string); // <--- got it!
break;
}
}
Next you need to implement a subclass of java.nio.ByteBuffer. Something like:
class MyByteBuffer extends ByteBuffer {
int length;
// Your implementation here
};
Ignore all of the getters, implement all of the put methods like put(byte) and putChar(char) etc. Inside something like put(byte), increment length by 1, inside of put(byte[]) increment length by the array length. Get it? Everything that is put, you add the size of whatever it is to length. But you're not storing anything in your ByteBuffer, you're just counting and throwing away, so no space is taken. If you breakpoint the put methods, you can probably figure out which ones you actually need to implement. putFloat(float) is probably not used, for example.
Now for the grand finale, putting it all together:
MyByteBuffer bbuf = new MyByteBuffer(); // your "counting" buffer
CharBuffer cbuf = CharBuffer.wrap(chars); // wrap your char array
Charset charset = Charset.forName("UTF-8"); // your charset goes here
CharsetEncoder encoder = charset.newEncoder(); // make a new encoder
encoder.encode(cbuf, bbuf, true); // do it!
System.out.printf("Length: %d\n", bbuf.length); // pay me US$1,000,000

Categories