I have a Socket that I am both reading and writing to, via BufferedReaders and BufferedWriters. I'm not sure which operations are okay to do from separate threads. I would guess that writing to the socket from two different threads at the same time is a bad idea. Same with reading off the socket from two different threads at the same time. What about reading on one thread while writing on another?
I ask because I want to have one thread blocked for a long time on a read as it waits for more data, but during this wait I also have occasional data to send on the socket. I'm not clear if this is threadsafe, or if I should cancel the read before I write (which would be annoying).
Sockets are thread unsafe at the stream level. You have to provide synchronization. The only warranty is that you won't get copies of the exact same bytes in different read invocations no matter concurrency.
But at a Reader and, specially, Writer level, you might have some locking problems.
Anyway, you can handle read and write operations with the Socket's streams as if they were completely independent objects (they are, the only thing they share is their lifecyle).
Once you have provided correct synchronization among reader threads on one hand, and writer threads on the other hand, any number of readers and writers will be okay. This means that, yes, you can read on one thread and write on another (in fact that's very frequent), and you don't have to stop reading while writing.
One last advice: all of the operations involving threads have associated timeout, make sure that you handle the timeouts correctly.
You actually read from InputStream and write to OutputStream. They are fairly independent and for as long as you serialize access to each of them you are ok.
You have to correlate, however, the data that you send with data that you receive. That's different from thread safety.
Java java.net.Socket is not actually thread safe: Open the Socket source, and look at the (let say) connected member field and how it is used. You will see that is not volatile, read and updated without synchrinization. This indicates that Socket class is not designed to be used by multiple threads. Though, there is some locks and synchronization there, it is not consistent.`
I recommend not to do it. Eventually, use buffers(nio), and do socket reads/writes in one thread
For details go the the discussionv
You can have one thread reading the socket and another thread writing to it. You may want to have a number of threads write to the socket, in which case you have to serialize your access with synchronization or you could have a single writing thread which gets the data to write from a queue. (I prefer the former)
You can use non-blocking IO and share the reading and writing work in a single thread. However this is actually more complex and tricky to get right. If you want to do this I suggest you use a library to help you such as Netty or Mina.
Very interesting, the nio SocketChannel writes are synchronized
http://www.docjar.com/html/api/sun/nio/ch/SocketChannelImpl.java.html
The old io Socket stuff depends on the OS so you would have to look at the OS native code to know for sure(and that may vary from OS to OS)...
Just look at java.net.SocketOutputStream.java which is what Socket.getOutputStream returns.
(unless of course I missed something).
oh, one more thing, they could have put synchronization in the native code in every JVM on each OS but who knows for sure. Only the nio is obvious that synchronization exists.
This is how socketWrite in native code, so it's not thread safe from the code
JNIEXPORT void JNICALL
Java_java_net_SocketOutputStream_socketWrite0(JNIEnv *env, jobject this,
jobject fdObj,
jbyteArray data,
jint off, jint len) {
char *bufP;
char BUF[MAX_BUFFER_LEN];
int buflen;
int fd;
if (IS_NULL(fdObj)) {
JNU_ThrowByName(env, "java/net/SocketException", "Socket closed");
return;
} else {
fd = (*env)->GetIntField(env, fdObj, IO_fd_fdID);
/* Bug 4086704 - If the Socket associated with this file descriptor
* was closed (sysCloseFD), the the file descriptor is set to -1.
*/
if (fd == -1) {
JNU_ThrowByName(env, "java/net/SocketException", "Socket closed");
return;
}
}
if (len <= MAX_BUFFER_LEN) {
bufP = BUF;
buflen = MAX_BUFFER_LEN;
} else {
buflen = min(MAX_HEAP_BUFFER_LEN, len);
bufP = (char *)malloc((size_t)buflen);
/* if heap exhausted resort to stack buffer */
if (bufP == NULL) {
bufP = BUF;
buflen = MAX_BUFFER_LEN;
}
}
while(len > 0) {
int loff = 0;
int chunkLen = min(buflen, len);
int llen = chunkLen;
(*env)->GetByteArrayRegion(env, data, off, chunkLen, (jbyte *)bufP);
while(llen > 0) {
int n = NET_Send(fd, bufP + loff, llen, 0);
if (n > 0) {
llen -= n;
loff += n;
continue;
}
if (n == JVM_IO_INTR) {
JNU_ThrowByName(env, "java/io/InterruptedIOException", 0);
} else {
if (errno == ECONNRESET) {
JNU_ThrowByName(env, "sun/net/ConnectionResetException",
"Connection reset");
} else {
NET_ThrowByNameWithLastError(env, "java/net/SocketException",
"Write failed");
}
}
if (bufP != BUF) {
free(bufP);
}
return;
}
len -= chunkLen;
off += chunkLen;
}
if (bufP != BUF) {
free(bufP);
}
}
Related
I developed an application using Java socket. I am exchanging messages with this application with the help of byte arrays. I have a message named M1, 1979 bytes long. My socket buffer length is 512 bytes. I read this message in 4 parts, each with 512 bytes, but the last one is of course 443 bytes. I will name these parts like A, B, C, and D. So ABCD is a valid message of mine respectively.
I have a loop with a thread which is like below.
BlockingQueue<Chunk> queue = new LinkedBlockingQueue<>();
InputStream in = socket.getInputStream()
byte[] buffer = new byte[512];
while(true) {
int readResult = in.read(buffer);
if(readResult != -1) {
byte[] arr = Arrays.copyOf(buffer, readResult);
Chunk c = new Chunk(arr);
queue.put(c);
}
}
I'm filling the queue with the code above. When the message sending starts, I see the queue fill up in ABCD form but sometimes I put the data in the queue as a BACD. But I know that this is impossible because the TCP connection guarantees the order.
I looked at the dumps with Wireshark. This message comes correctly with a single tcp package. So there is no problem on the sender side. I am 100% sure that the message has arrived correctly but the read method does not seem to read in the correct order and this situation doesn't always happen. I could not find a valid reason for what caused this.
When I tried the same code on two different computers I noticed that the problem was in only one. The jdk versions on these computers are different. I looked at the version differences between the two jdk versions. When the Jdk version is "JDK 8u202", I am getting the situation where it works incorrectly. When I tried it with jdk 8u271, there was no problem. Maybe it is related to that but I wasn't sure. Because I have no valid evidence.
I am open to all kinds of ideas and suggestions. It's really on its way to being the most interesting problem I've ever encountered.
Thank you for your help.
EDIT: I found similar question.
Blocking Queue Take out of Order
EDIT:
Ok, I have read all the answers given below. Thank you for providing different perspectives for me. I will try to supplement some missing information.
Actually I have 2 threads. Thread 1(SocketReader) is responsible for reading socket. It wraps the information it reads with a Chunk class and puts it on the queue in the other Thread 2. So queue is in Thread 2. Thread 2(MessageDecoder) is consuming the blocking queue. There are no threads other than these. Actually this is a simple example of a "producer consumer design patter".
And yes, other messages are sent, but other messages take up less than 512 bytes. Therefore, I can read in one go. I do not encounter any sort problem.
MessageDecoder.java
public class MessageDecoder implements Runnable{
private BlockingQueue<Chunk> queue = new LinkedBlockingQueue<>();
public MessageDecoder() {
}
public void run() {
while(true) {
Chunk c;
try {
c = queue.take();
System.out.println(c.toString());
} catch (InterruptedException e) {
e.printStackTrace();
}
decodeMessageChunk(c);
}
}
public void put(Chunk c) {
try {
queue.put(c);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
SocketReader.java
public class SocketReader implements Runnable{
private final MessageDecoder msgDec;
private final InputStream in;
byte[] buffer = new byte[512];
public SocketReader(InputStream in, MessageDecoder msgDec) {
this.in = in;
this.msgDec = msgDec;
}
public void run() {
while(true) {
int readResult = in.read(buffer);
if(readResult != -1) {
byte[] arr = Arrays.copyOf(buffer, readResult);
Chunk c = new Chunk(arr);
msgDec.put(c);
}
}
}
}
Even if it's a FIFO queue, the locking of the LinkedBloquingQueue is unfair, so you can't guarantee the ordering of elements. More info regarding this here
I'd suggest using an ArrayBlockingQueue instead. Like the LinkedBloquingQueue, the order is not guaranteed but offers a slightly different locking mechanism.
This class supports an optional fairness policy for ordering waiting
producer and consumer threads. By default, this ordering is not
guaranteed. However, a queue constructed with fairness set to true
grants threads access in FIFO order. Fairness generally decreases
throughput but reduces variability and avoids starvation.
In order to set fairness, you must initialize it using this constructor:
So, for example:
ArrayBlockingQueue<Chunk> fairQueue = new ArrayBlockingQueue<>(1000, true);
/*.....*/
Chunk c = new Chunk(arr);
fairQueue.add(c);
As the docs state, this should grant thread access in FIFO order, guaranteeing the retrievement of the elements to be consistent while avoiding possible locking robbery happening in LinkedBloquingQueue's lock mechanism.
Consider the following code snippet, which simply writes the contents of someByteBuffer to the standard output:
// returns an instance of "java.nio.channels.Channels$WritableByteChannelImpl"
WritableByteChannel w = Channels.newChannel(System.out);
w.write(someByteBuffer);
Java specifies that channels are, in general, intended to be safe for multithreaded access, while buffers are not safe for use by multiple concurrent threads.
So, I was wondering whether the above snippet requires synchronization, as it is invoking the write method of a channel (which is supposed to be thread-safe) on some buffer (which is not thread-safe).
I took a look at the implementation of the write method:
public int write(ByteBuffer src) throws IOException {
int len = src.remaining();
int totalWritten = 0;
synchronized (writeLock) {
while (totalWritten < len) {
int bytesToWrite = Math.min((len - totalWritten),
TRANSFER_SIZE);
if (buf.length < bytesToWrite)
buf = new byte[bytesToWrite];
src.get(buf, 0, bytesToWrite);
try {
begin();
out.write(buf, 0, bytesToWrite);
} finally {
end(bytesToWrite > 0);
}
totalWritten += bytesToWrite;
}
return totalWritten;
}
}
Notice that everything is synchronized by the writeLock, except the first two lines in the method. Now, as the ByteBuffer src is not thread-safe, calling src.remaining() without proper synchronization is risky, as another thread might change it.
Should I synchronize the line w.write(someByteBuffer) in the above snippet, or am I missing something and the Java implementation of the write() method has already taken care of that?
Edit: Here's a sample code which often throws a BufferUnderflowException, since I commented out the synchronized block at the very end. Removing those comments will make the code exception free.
import java.nio.*;
import java.nio.channels.*;
public class Test {
public static void main(String[] args) throws Exception {
ByteBuffer b = ByteBuffer.allocate(10);
b.put(new byte[]{'A', 'B', 'C', 'D', 'E', 'F', 'G', '\n'});
// returns an instance of "java.nio.channels.Channels$WritableByteChannelImpl"
WritableByteChannel w = Channels.newChannel(System.out);
int c = 10;
Thread[] r = new Thread[c];
for (int i = 0; i < c; i++) {
r[i] = new Thread(new MyRunnable(b, w));
r[i].start();
}
}
}
class MyRunnable implements Runnable {
private final ByteBuffer b;
private final WritableByteChannel w;
MyRunnable(ByteBuffer b, WritableByteChannel w) {
this.b = b;
this.w = w;
}
#Override
public void run() {
try {
// synchronized (b) {
b.flip();
w.write(b);
// }
} catch (Exception e) {
e.printStackTrace();
}
}
}
The point is: if your setup allows more than one thread to tamper with that buffer object, then you are subject to threading issues. It is that simple!
The question is not if channel.write() is thread-safe. That is good to know, but not at the core of the problem!
The real question is: what is your code doing with that buffer?
What does it help that this channel implementation does lock internally on something when the data it is operating on ... is coming from the outside?!
You know, all kinds of things could happen on that src object coming into this method - while that channel is busy writing the buffer!
In other words: the question whether this code is "safe" fully depends on what your code is doing with that very src buffer object in parallel.
Given the OP's comment: the core point is: you have to ensure that any activity that makes use of that byte buffer is thread-safe. In the example given, we have two operations:
b.flip();
w.write(b);
Those are the only operation each thread will be doing; thus: when making sure that only one thread can make those two calls (as shown; by looking on the shared buffer object); then you are good.
It is really simple: if you have "shared data"; then you have to ensure that the threads reading/writing that "shared data" are synchronized somehow to avoid race conditions.
You only need to lock the Channel if you are making multiple writes from multiple threads and want to ensure atomicity of those writes in which case you need to lock the System.out object.
If you shared a mutated data structure which is not thread safe across threads, you need to add locking. I would avoid using a ByteBuffer in multiple threads if you can.
I have a class that implements 'Runnable' to read data from a data stream. The data comes from a Channel which is stored as a member variable in another of my classes, and I can get an instance of this channel by simply calling the getter getInputChannel(). Now, for my Runnable to read the data from the channel, it needs to know what type of channel it is so that it can use the channel's read method. The channel type may be one of either FileChannel or SocketChannel, and is decided at run time, i.e.,
private class ReadInputStream implements Runnable {
Thread thread;
boolean running = true;
ByteBuffer buffer = ByteBuffer.allocate(1024);
FileChannel or SocketChannel channel;
public ReadInputStream() {
// Need to cast type channel at run time
Channel ch = getInputChannel();
this.channel = (FileChannel or SocketChannel) ch;
}
public void run() {
while (running) {
channel.read(buffer);
// etc.
}
}
}
What is the best way to get the right type of channel so that I can implement its read method in the runnable's run() method?
Both FileChannel and SocketChannel implement ByteChannel which is what declares their read(ByteBuffer) method, so that's the type your getInputChannel() should return.
Edit Or if you only ever read from the channel, return a ReadableByteChannel as Darkhogg says. Since this is an input channel, this is most likely the case anyway.
There is no way in Java to express union types. Your best bet is to use some common interface that applies to both.
If you're using the channel only for reading, define it to be a ReadableByteChannel.
If you're using it for writing, use a WritableByteChannel.
If you need both, use ByteChannel.
You could use a simple if/else clause and cycle through the available instance types, for instance (no pun intended..):
if (channel instanceof FileChannel)
{
((FileChannel)channel).read(buffer);
}
else if (channel instanceof SocketChannel)
{
((SocketChannel)channel).read(buffer);
}
etc.
There are multiple threads, say B, C and D, each writing small packets of data to a buffer at a high frequency. They own their buffer and nobody else ever writes to it. Writing must be as fast as possible, and I've determined that using synchronized makes it unacceptably slow.
The buffers are simply byte arrays, along with the index of the first free element:
byte[] buffer;
int index;
public void write(byte[] data) {
// some checking that the buffer won't overflow... not important now
System.arraycopy(data, 0, buffer, index, data.length);
index += data.length;
}
Every once in a while, thread A comes along to flush everybody's buffer to a file. It's okay if this part has some overhead, so using synchronized here is no problem.
Now the trouble is, that some other thread might be writing to a buffer, while thread A is flushing it. This means that two threads attempt to write to index around the same time. That would lead to data corruption, which I would like to prevent, but without using synchronized in the write() method.
I've got the feeling that, using the right order of operations and probably some volatile fields, this must be possible. Any bright ideas?
Have you tried a solution which uses synchronization, and found it doesn't perform well enough? You say you've determined that it's unacceptably slow - how slow was it, and do you already have a performance budget? Normally, obtaining an uncontested lock is extremely cheap, so I wouldn't expect it to be a problem.
There may well be some clever lock-free solution - but it's likely to be significantly more complicated than just synchronizing whenever you need to access shared data. I understand that lock-free coding is all the rage, and scales beautifully when you can do it - but if you've got one thread interfering with another's data, it's very hard to do it safely. Just to be clear, I like using lock-free code when I can use high-level abstractions created by experts - things like the Parallel Extensions in .NET 4. I just don't like working with low-level abstractions like volatile variables if I can help it.
Try locking, and benchmark it. Work out what performance is acceptable, and compare the performance of a simple solution with that goal.
Of course, one option is redesigning... does the flushing have to happen actively in a different thread? Could the individual writer threads not just hand off the buffer to the flushing thread (and start a different buffer) periodically? That would make things a lot simpler.
EDIT: Regarding your "flush signal" idea - I'd been thinking along similar lines. But you need to be careful about how you do it so that the signal can't get lost even if one thread takes a long time to process whatever it's doing. I suggest you make thread A publish a "flush counter"... and each thread keeps its own counter of when it last flushed.
EDIT: Just realized this is Java, not C# - updated :)
Use AtomicLong.incrementAndGet() to increment from thread A, and AtomicLong.get() to read from the other threads. Then in each thread, compare whether you're "up to date", and flush if necessary:
private long lastFlush; // Last counter for our flush
private Flusher flusher; // The single flusher used by all threads
public void write(...)
{
long latestFlush = flusher.getCount(); // Will use AtomicLong.get() internally
if (latestFlush > lastFlush)
{
flusher.Flush(data);
// Do whatever else you need
lastFlush = latestFlush; // Don't use flusher.getCount() here!
}
// Now do the normal write
}
Note that this assumes you only ever need to check for flushing in the Write method. Obviously that may not be the case, but hopefully you can adapt the idea.
You can use volatile alone to safely read/write to a buffer (if you have only one writer) however, only one thread can safely flush the data. To do this you can use a ring buffer.
I would add to #Jon's comment that this is significantly more complicated to test. e.g. I had one "solution" which worked for 1 billion messages consistently one day but kept breaking the next because the box was more loaded.
With synchronized your latency should be below 2 micro-seconds. With Lock, you could get this down to 1 micro-second. with busy waiting on a volatile you can get this down to 3-6 ns per byte (The time it takes to transfer data between threads becomes important)
Note: as the volume of data increases the relative cost of the lock becomes less important. e.g. if you are typically writing 200 bytes or more I wouldn't worry about the difference.
One approach I take is to use the Exchanger with two direct ByteBuffers and avoid writing any data in the critical path (i.e. only write the data after I have processed everything and it doesn't matter so much)
Invert control. Rather than having A poll the other threads, let them push.
I suppose LinkedBlockingQueue might be the most simple thing to go with.
Pseudocode:
LinkedBlockingQueue<byte[]> jobs;//here the buffers intended to be flushed are pushed into
LinkedBlockingQueue<byte[]> pool;//here the flushed buffers are pushed into for reuse
Writing thread:
while (someCondition) {
job = jobs.take();
actualOutput(job);
pool.offer(job);
}
Other threads:
void flush() {
jobs.offer(this.buffer);
this.index = 0;
this.buffer = pool.poll();
if (this.buffer == null)
this.buffer = createNewBuffer();
}
void write(byte[] data) {
// some checking that the buffer won't overflow... not important now
System.arraycopy(data, 0, buffer, index, data.length);
if ((index += data.length) > threshold)
this.flush();
}
LinkedBlockingQueue basically encapsulates the technical means to pass messages safely between threads.
Not only is it simpler this way round, but it clearly seperates concerns, because the threads that actually generate the output determine when they want to flush their buffers and they are the only ones to maintain their own state.
The buffers that are in both queues present a memory overhead, but that should be acceptable. The pool is unlikely to grow signifficantly bigger than the total number of threads and unless actual output presents a bottleneck, the jobs queue should be empty most of the time.
Volatile Variables And A Circular Buffer
Use a circular buffer, and make the flushing thread "chase" the writes around the buffer instead of resetting the index to zero after each flush. This allows writes to occur during a flush without any locking.
Use two volatile variables - writeIndex for where the writing thread is up to, and flushIndex for where the flushing thread is up to. These variables are each updated by only one thread, and can be read atomically by the other thread. Use these variables to keep the threads constrained to separate sections of the buffer. Do not allow the flushing thread to go past where the writing thread is up to (i.e. flush an unwritten part of the buffer). Do not allow the writing thread to go past where the flushing thread is up to (i.e. overwrite an unflushed part of the buffer).
Writing thread loop:
Read writeIndex (atomic)
Read flushIndex (atomic)
Check that this write will not overwrite unflushed data
Write to the buffer
Calculate the new value for writeIndex
Set writeIndex (atomic)
Flushing thread loop:
Read writeIndex (atomic)
Read flushIndex (atomic)
Flush the buffer from flushIndex to writeIndex - 1
Set flushIndex (atomic) to the value that was read for writeIndex
But, WARNING: for this to work, the buffer array elements might also need to be volatile, which you can't do in Java (yet). See http://jeremymanson.blogspot.com/2009/06/volatile-arrays-in-java.html
Nevertheless, here's my implementation (changes are welcome):
volatile int writeIndex = 0;
volatile int flushIndex = 0;
byte[] buffer = new byte[268435456];
public void write(byte[] data) throws Exception {
int localWriteIndex = writeIndex; // volatile read
int localFlushIndex = flushIndex; // volatile read
int freeBuffer = buffer.length - (localWriteIndex - localFlushIndex +
buffer.length) % buffer.length;
if (data.length > freeBuffer)
throw new Exception("Buffer overflow");
if (localWriteIndex + data.length <= buffer.length) {
System.arraycopy(data, 0, buffer, localWriteIndex, data.length);
writeIndex = localWriteIndex + data.length;
}
else
{
int firstPartLength = buffer.length - localWriteIndex;
int secondPartLength = data.length - firstPartLength;
System.arraycopy(data, 0, buffer, localWriteIndex, firstPartLength);
System.arraycopy(data, firstPartLength, buffer, 0, secondPartLength);
writeIndex = secondPartLength;
}
}
public byte[] flush() {
int localWriteIndex = writeIndex; // volatile read
int localFlushIndex = flushIndex; // volatile read
int usedBuffer = (localWriteIndex - localFlushIndex + buffer.length) %
buffer.length;
byte[] output = new byte[usedBuffer];
if (localFlushIndex + usedBuffer <= buffer.length) {
System.arraycopy(buffer, localFlushIndex, output, 0, usedBuffer);
flushIndex = localFlushIndex + usedBuffer;
}
else {
int firstPartLength = buffer.length - localFlushIndex;
int secondPartLength = usedBuffer - firstPartLength;
System.arraycopy(buffer, localFlushIndex, output, 0, firstPartLength);
System.arraycopy(buffer, 0, output, firstPartLength, secondPartLength);
flushIndex = secondPartLength;
}
return output;
}
Perhaps:
import java.util.concurrent.atomic;
byte[] buffer;
AtomicInteger index;
public void write(byte[] data) {
// some checking that the buffer won't overflow... not important now
System.arraycopy(data, 0, buffer, index, data.length);
index.addAndGet(data.length);
}
public int getIndex() {
return index.get().intValue();
}
otherwise the lock classes in the java.util.concurrent.lock package are more lightweight than the synchronized keyword...
so:
byte[] buffer;
int index;
ReentrantReadWriteLock lock;
public void write(byte[] data) {
lock.writeLock().lock();
// some checking that the buffer won't overflow... not important now
System.arraycopy(data, 0, buffer, index, data.length);
index += data.length;
lock.writeLock.unlock();
}
and in the flushing thread:
object.lock.readLock().lock();
// flush the buffer
object.index = 0;
object.lock.readLock().unlock();
UPDATE:
The pattern you describe for reading and writing to the buffer will not benefit from using a ReadWriteLock implementation, so just use a plain ReentrantLock:
final int SIZE = 99;
byte[] buffer = new byte[SIZE];
int index;
// Use default non-fair lock to maximise throughput (although some writer threads may wait longer)
ReentrantLock lock = new ReentrantLock();
// called by many threads
public void write(byte[] data) {
lock.lock();
// some checking that the buffer won't overflow... not important now
System.arraycopy(data, 0, buffer, index, data.length);
index += data.length;
lock.unlock();
}
// Only called by 1 thread - or implemented in only 1 thread:
public byte[] flush() {
byte[] rval = new byte[index];
lock.lock();
System.arraycopy(buffer, 0, rval, 0, index);
index = 0;
lock.unlock();
return rval;
}
As you describe usage as many write threads with a single reader/flusher thread, a ReadWriteLock is not neccessary, Infact I beleive it is more heavyweight than a simple ReentrantLock (?). ReadWriteLocks are useful for many reader threads, with few write threads - the opposite of the situation you describe.
You can try implementing semaphores.
I like the lock-free stuff, it's addictive :). And rest ensured: they remove a lot locking shortcomings, coming w/ some steep learning curve. Still they're and error-prone.
Read few articles, perhaps a book and try it home 1st.
How to handle your case? You can't atomically copy data (and update size), but you can atomically update a reference to that data.
simple way to do it; Note: you can ALWAYS read from the buffer w/o holding a lock which is the entire point.
final AtomicReference<byte[]> buffer=new AtomicReference<byte[]>(new byte[0]);
void write(byte[] b){
for(;;){
final byte[] cur = buffer.get();
final byte[] copy = Arrays.copyOf(cur, cur.length+b.length);
System.arraycopy(b, 0, cur, cur.length, b.length);
if (buffer.compareAndSet(cur, copy)){
break;
}
//there was a concurrent write
//need to handle it, either loop to add at the end but then you can get out of order
//just as sync
}
}
You actually you can still use a larger byte[] and append to it but I leave the exercise for yourself.
Continued
I had to write the code in a pinch. A short description follows:
The code is lock-free but not-obstruction free due to use of the CLQ. As you see the code always continues regardless of the conditions taken and practically doesn't loop (busy wait) anywhere besides the CLQ, itself.
Many lock-free algorithms rely on the help of all the threads to properly finish the task(s).
There might be some mistake but I hope the main idea is sort of clear:
The algorithm allows many writers, many readers
If the main state cannot be changed so there is a single writer only, append the byte[] into a queue.
Any writer (that succeeded on the CAS) must attempt to flush the queue prior to writing its own data.
A reader must check for pending writes and flush them before using the main buffer
If enlarging (current byte[] not enough) the buffer and the size must be thrown away and new generation of Buffer+Size is to be used. Otherwise only size is increased. The operation again requires to hold the lock (i.e. the CAS succeeded)
Please, any feedback is welcome.
Cheers and hopefully people can warm up to the lock-free structures algorithms.
package bestsss.util;
import java.util.Arrays;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicInteger;
//the code uses ConcurrentLinkedQueue to simplify the implementation
//the class is well - know and the main point is to demonstrate the lock-free stuff
public class TheBuffer{
//buffer generation, if the room is exhaused need to update w/ a new refence
private static class BufGen{
final byte[] data;
volatile int size;
BufGen(int capacity, int size, byte[] src){
this.data = Arrays.copyOf(src, capacity);
this.size = size;
}
BufGen append(byte[] b){
int s = this.size;
int newSize = b.length+s;
BufGen target;
if (newSize>data.length){
int cap = Integer.highestOneBit(newSize)<<1;
if (cap<0){
cap = Integer.MAX_VALUE;
}
target = new BufGen(cap, this.size, this.data);
}
else if(newSize<0){//overflow
throw new IllegalStateException("Buffer overflow - over int size");
} else{
target = this;//if there is enough room(-service), reuse the buffer
}
System.arraycopy(b, 0, target.data, s, b.length);
target.size = newSize;//'commit' the changes and update the size the copy part, so both are visible at the same time
//that's the volatile write I was talking about
return target;
}
}
private volatile BufGen buffer = new BufGen(16,0,new byte[0]);
//read consist of 3 volatile reads most of the time, can be 2 if BufGen is recreated each time
public byte[] read(int[] targetSize){//ala AtomicStampedReference
if (!pendingWrites.isEmpty()){//optimistic check, do not grab the look and just do a volatile-read
//that will serve 99%++ of the cases
doWrite(null, READ);//yet something in the queue, help the writers
}
BufGen buffer = this.buffer;
targetSize[0]=buffer.size;
return buffer.data;
}
public void write(byte[] b){
doWrite(b, WRITE);
}
private static final int FREE = 0;
private static final int WRITE = 1;
private static final int READ= 2;
private final AtomicInteger state = new AtomicInteger(FREE);
private final ConcurrentLinkedQueue<byte[]> pendingWrites=new ConcurrentLinkedQueue<byte[]>();
private void doWrite(byte[] b, int operation) {
if (state.compareAndSet(FREE, operation)){//won the CAS hurray!
//now the state is held "exclusive"
try{
//1st be nice and poll the queue, that gives fast track on the loser
//we too nice
BufGen buffer = this.buffer;
for(byte[] pending; null!=(pending=pendingWrites.poll());){
buffer = buffer.append(pending);//do not update the global buffer yet
}
if (b!=null){
buffer = buffer.append(b);
}
this.buffer = buffer;//volatile write and make sure any data is updated
}finally{
state.set(FREE);
}
}
else{//we lost the CAS, well someone must take care of the pending operation
if (b==null)
return;
pendingWrites.add(b);
}
}
public static void main(String[] args) {
//usage only, not a test for conucrrency correctness
TheBuffer buf = new TheBuffer();
buf.write("X0X\n".getBytes());
buf.write("XXXXXXXXXXAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXXXXXXXXXXXXXXXXXXX\n".getBytes());
buf.write("Hello world\n".getBytes());
int[] size={0};
byte[] bytes = buf.read(size);
System.out.println(new String(bytes, 0, size[0]));
}
}
Simplistic case
Another far simpler solution that allows many writers but single reader. It postpones the writes into a CLQ and the reader just reconstructs 'em. The construction code is omiitted this time.
package bestsss.util;
import java.util.ArrayList;
import java.util.concurrent.ConcurrentLinkedQueue;
public class TheSimpleBuffer {
private final ConcurrentLinkedQueue<byte[]> writes =new ConcurrentLinkedQueue<byte[]>();
public void write(byte[] b){
writes.add(b);
}
private byte[] buffer;
public byte[] read(int[] targetSize){
ArrayList<byte[]> copy = new ArrayList<byte[]>(12);
int len = 0;
for (byte[] b; null!=(b=writes.poll());){
copy.add(b);
len+=b.length;
if (len<0){//cant return this big, overflow
len-=b.length;//fix back;
break;
}
}
//copy, to the buffer, create new etc....
//...
///
targetSize[0]=len;
return buffer;
}
}
So i have this small client side code
public class Client {
private static Socket socket;
private static ObjectOutputStream out;
public static void main(String[] args) {
while (true) {
try {
if (socket != null) {
out.writeObject("Hello...");
Thread.sleep(1500);
} else {
socket = new Socket("myhost", 1234);
out = new ObjectOutputStream(socket.getOutputStream());
System.out.println("connected to server");
}
} catch (final Exception e) {
//set socket to null for reconnecting
}
}
}
}
What bugs me is that when i run the code with javaw.exe, i see that the java is eating ~10kb more memory every 2-3 sec. So memory usage keeps growing and growing...
Is java really that bad or is there something else wrong?
I ran this code in while loop for a while and memory usage increased for 1000 kb.
Doesn't java gargabe collect the 'tmp' variable after it's used?
try {
if (socket == null) {
final Socket tmp = new Socket("localhost", 1234);
if (tmp != null) {
socket = tmp;
}
Thread.sleep(100);
}
} catch (final Exception e) {
}
So, I've written a simple test server for your client and I'm now running both, and there seems to be no increase in memory usage.
import java.net.*;
import java.io.*;
/**
* example class adapted from
* http://stackoverflow.com/questions/5122569/why-is-java-constantly-eating-more-memory
*/
public class Client {
private static Socket socket;
private static ObjectOutputStream out;
private static void runClient() {
while (true) {
try {
if (socket != null) {
out.writeObject("Hello...");
Thread.sleep(100);
System.out.print(",");
} else {
socket = new Socket("localhost", 1234);
out = new ObjectOutputStream(socket.getOutputStream());
System.out.println("connected to server");
}
} catch (final Exception e) {
//set socket to null for reconnecting
e.printStackTrace();
return;
}
}
}
private static void runServer() throws IOException{
ServerSocket ss = new ServerSocket(1234);
Socket s = ss.accept();
InputStream in = s.getInputStream();
byte[] buffer = new byte[500];
while(in.read(buffer) > 0) {
System.out.print(".");
}
}
public static void main(String[] args)
throws IOException
{
if(args.length > 0) {
runServer();
}
else {
runClient();
}
}
}
What are you doing different?
So, I've looked a bit more detailed at the memory usage of this program, and found for this a useful tool, the "Java Monitoring and Management console" hidden in the development menu of my system :-)
Here is a screenshot of the memory usage while running the client program some time (each 100 ms I send an object, remember) ...
We can see that the memory usage has a saw tooth curve - it is lineary increasing, then comes a garbage collection and it is falling down to the base usage. After some initial period the VM is doing the GC more often (and thus more quickly). For now, no problem.
Here is a variant program where I did not send the same string always, but a different one each time:
private static void runClient() {
int i = 0;
while (true) {
try {
i++;
if (socket != null) {
out.writeObject("Hello " + i + " ...");
Thread.sleep(100);
System.out.print(",");
(The rest is like above). I thought this would need more memory, since the ObjectOutputStream has to remember which Objects are already sent, to be able to reuse their identifiers in case they come again.
But no, it looks quite similar:
The little irregularity between 39 and 40 is a manual full GC made by the "Perform GC" button here - it did not change much, though.
I let the last program run a bit longer, and now we see that the ObjectOutputStream still is holding references to our Strings ...
In half an hour our program ate about 2 MB of memory (on a 64-bit-VM). In this time, it sent 18000 Strings. So, each of the Strings used in average about 100 bytes of memory.
Each of those Strings was between 11 and 17 chars long. the latter ones (about the half) are using in fact 32-char-arrays, the former ones 16-char-arrays, because of the allocation strategy of the StringBuilder. These take 64 or 32 bytes + array-overhead (at least 12 more bytes, more likely more). Additionally the String objects themselves take some memory overhead (at least 8+8+4+4+4 = 28 for class and the fields I remember, more likely more), so we have in average (at least) 88 bytes per String. In addition there likely is some overhead in the ObjectOutputStream to maintain these objects in some data structure.
So, not much more lost than in fact needed.
Ah, one tip of how to avoid the ObjectOutputStream (and the corresponding ObjectInputStream, too) storing the objects, if you don't plan on sending any of them again: Invoke its reset method every some thousand strings or so.
Here is a last screenshot before I kill the program, after a bit more than an hour:
For comparison, I added the named reset and let the program run two more hours (and a bit):
It still collects memory as before, but now when I click on "Perform GC" it cleans everything and goes back at the state before (just a bit over 1 MB). (It would do the same when coming at the end of Heap, but I didn't want to wait this long.)
Well the garbage collector never runs when a variable actually goes out of scope, or you'd spend most of your time in the GC code.
What it does instead (and this is quite a simplification) is it waits until your memory used reaches a threshold, and only then does it start releasing memory.
This is what you're seeing, your memory consumption is increasing so slowly that it'll take a long time to reach the next threshold and actually free memory.
I don't think adding close is your problem because from what i think you are trying to do is keep writing to the stream. Have you tried out.flush(). This flushes the content so that its not in memory anymore.
It looks like you never close the Socket or flush the ObjectOutputStream. Also note that Java garbage collection basically happens not when you want it to but when the garbage collector sees fit.
IO is cached by the Socket implementation until it is flushed. So either you really read the input / output from the socket (or call #flush() on your streams) or you close the socket.
To me logic itself is culPrit, there is no condition to come out of the while loop.
Again no flush.
ObjectOutputStream cached every object you send, in case you send it again. To clear this you need to call the reset() method
Reset will disregard the state of any objects already written to the stream. The state is reset to be the same as a new ObjectOutputStream. The current point in the stream is marked as reset so the corresponding ObjectInputStream will be reset at the same point. Objects previously written to the stream will not be refered to as already being in the stream. They will be written to the stream again.
BTW: 10 KB is worth about 0.1 cents of memory. One minute of your time at minimum wage is worth 100x times this. I suggest you consider what is the best use of your time.