BlockingQueue Works improperly with stringbuilder - java

I was trying to enqueue a set of strings with a thread (say thread1) into a BlockingQueue and write these queued items to a file with a different thread (thread2). A simple producer - consumer problem.
Thread1 :
while(condition) { queue.add(data); }
Thread2 :
while(true) { queue.take(data); //write the took data }
This whole operation works fine with the data being String. When I try to do the same operation with a StringBuilder, the results are random.
if the enqueued data is "This is my data", the output is "y data" or "is my data" or some random subset of the entire data ( sometimes the expected entire data too )
Is it the Blocking queue's nature to behave this way to stringBuilders or am i doing it wrong??

Related

Concurrent in-order processing of work items from a Java BlockingQueue

I have part of a system that processes a BlockingQueue of input items within a worker thread, and puts the results on an BlockingQueue of output items, where the relevant code (simplified) looks something like this:
while (running()) {
InputObject a=inputQueue.take(); // Get from input BlockingQueue
OutputObject b=doProcessing(a); // Process the item
outputQueue.put(b); // Place on output BlockingQueue
}
doProcessing is the main performance bottleneck in this code, but the processing of queue items could be parallelised since the processing steps are all independent of each other.
I would therefore like to improve this so that items can be processed concurrently by multiple threads, with the constraint that this must not change the order of outputs (e.g. I can't simply have 10 threads running the loop above, because that might result in outputs being ordered differently depending on processing times).
What is the best way to achieve this in pure, idiomatic Java?
Parallel streams from List preserve ordering:
List<T> input = ...
List<T> output = input.parallelStream()
.filter(this::running)
.map(this::doProcessing)
.collect(Collectors.toList());
PriorityBlockingQueue can be used if your work items can be compared to one another, and you will wait until running() is false before reading from the output queue:
outputQueue = new PriorityBlockingQueue<>();
Or you could order them after they have all been processed (if they can be compared to one another):
outputQueue.drainTo(outputList);
outputList.sort(null);
A simple way to implement comparation would be assigning a progressive ID to each element put into the input queue.
Create X event-loop threads, where X is the amount of steps that can be processed in parallel.
They will be processed in parallel, except one after another, i.e. not on the same item. While one step will be carried on on one item, the previous step will be carried on on the previous item, etc.
To further optimize it, you can use concurrent queues provided by JCTools, which are optimized for Single-Producer Single-Consumer scenarios (JDK's BlockingQueue implementations support Multiple-Producer Multiple-Consumer).
// Thread 1
while (running()) {
InputObject a = inputQueue.take();
OutputObject b = doProcessingStep1(a);
queue1.put(b);
}
// Thread 2
while (running()) {
InputObject a = queue1.take();
OutputObject b = doProcessingStep2(a);
queue2.put(b);
}
// Thread 3
while (running()) {
InputObject a = queue2.take();
OutputObject b = doProcessingStep3(a);
outputQueue.put(b);
}

Threads and stringbuilder

I have this code which should output String length = 10000, but i keep getting different outputs, and i am confused on how is it exactly happening? is that because for example thread 1 will append like 95 times and then another thread will interrupt thread 1 eg: thread 2, and then thread 2 will append up to 98 before getting interrupted by t3 etc.. ?
StringBuilder is not thread-safe. You can't use one from concurrent threads.
Replace it with a thread-safe StringBuffer, and you'll get the result you expect.
Since it's not thread-safe, you can't expect a deterministic result when using it from different threads. For example, the code of StringBuilder might contain something like
int newIndex = size();
buffer[newIndex] = appendedCharacter;
If two threads execute these two lines concurrently, then both might eecute the first instruction and get the same value for newIndex, and then both would insert the new character at the same index. That's called a data race. And such data races are the primary reason why non-threadsafe classes shouldn't be used from multiple threads.

Strange behaviour arrayBlockingQueue with array elements

I am having some strange behavior with the use of an ArrayBlockingQueue which I use in order to communicate between certain treads in a java application.
I am using 1 static ArrayBlockingQueue as initialised like this:
protected static BlockingQueue<long[]> commandQueue;
Followed by the constructor which has this as one of its lines:
commandQueue = new ArrayBlockingQueue<long[]>(amountOfThreads*4);
Where amountOfThreads is given as a constructor argument.
I then have a producer that creates an array of long[2] gives it some values and then offers it to the queue, I then change one of the values of the array directly after it and offer it once again to the queue:
long[] temp = new long[2];
temp[0] = currentThread().getId();
temp[1] = gyrAddress;//Address of an i2c sensor
CommunicationThread.commandQueue.offer(temp);//CommunicationThread is where the commandqueue is located
temp[1] = axlAddress;//Change the address to a different sensor
CommunicationThread.commandQueue.offer(temp);
The consumer will then take this data and open up an i2c connection to a specific sensor, get some data from said sensor and communicate the data back using another queue.
For now however I have set the consumer to just consume the head and print the data.
long[] command = commandQueue.take();//This will hold the program until there is at least 1 command in the queue
if (command.length!=2){
throw new ArrayIndexOutOfBoundsException("The command given is of incorrect format");
}else{
System.out.println("The thread with thread id " + command[0] + " has given the command to get data from address " +Long.toHexString(command[1]));
}
Now for testing I have a producer thread with these addresses (byte) 0x34, (byte)0x44
If things are going correctly my output should be:
The thread with thread id 14 has given the command to get data from address 44
The thread with thread id 14 has given the command to get data from address 34
However I get:
The thread with thread id 14 has given the command to get data from address 34
The thread with thread id 14 has given the command to get data from address 34
Which would mean that it is sending the temp array after it has changed it.
Things that I did to try and fix it:
I tried a sleep, if I added a 150 ms sleep then the response is correct.
However this method will quite obviously affect performance...
Since the offer method returns a true I tried the following piece of code
boolean tempBool = false;
while(!tempBool){
tempBool = CommunicationThread.commandQueue.offer(temp);
System.out.println(tempBool);
}
Which prints out a true. This did not have an affect.
I tried printing temp[1] after this while loop and at that moment it is the correct value.(It prints out 44 however the consumer receives 34)
What most likely is the case is a syncronisation issue, however I thought that the point of a BlockingQueue based object would be to solve this.
Any help or suggestion on the workings of this BlockingQueue would be greatly appreciated. Let me end on a note that this is my first time working with queues in between threads in java and that the final program will be running on a raspberry pi using the pi4j library to communicate with the sensors
Since you asked about how BlockingQueue works exactly, let's start with that:
A blocking queue is a queue that blocks when you try to dequeue from it while the queue is empty, or when you try to enqueue items to it while the queue is already full. A thread trying to dequeue from an empty queue is blocked until some other thread inserts an item into the queue.
Soo these blocking queue's prevent different threads from reading/writing to a queue while it is not yet possible because it is either empty or full.
As Andy Turner and JB Nizet already explained, variables are statically shared in memory. This means that when your thread that reads the queue it finds a reference (A.K.A. a pointer) to this variable (in memory) and uses this pointer in it's following code. However before it manages to read this data, you already changed the variable, normally in non-threaded applications this wouldn't be an issue since only one thread will try to read from memory and it will always be executed chronologically. A way to circumvent this is to create a new variable/array (which will assign itself to new memory) with the variable data every time you add an entry to the queue, this way you make sure you do not overwrite a variable in memory before it is processed by the other thread. A simple way to do this is:
long[] tempGyr = new long[2];
tempGyr[0] = currentThread().getId();
tempGyr[1] = gyrAddress;
CommunicationThread.commandQueue.offer(tempGyr);//CommunicationThread is where the commandqueue is located
long[] tempAxl = new long[2];
tempAxl[0] = currentThread().getId();
tempAxl[1] = axlAddress;
CommunicationThread.commandQueue.offer(tempAxl);
Hope this explains the subject, if not: feel free to ask for additional questions :)

Extra bytes appearing when building file data using multiple threads

I am working on a large scale dataset and after building a model, I use multithreading (whole project in Java) as follows:
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
int i=0;
Collection<Track1Callable> callables = new ArrayList<Track1Callable>();
// For each entry in the test file, do watever needs to be done.
// Track1Callable actually processes that entry and returns a double value.
for (Pair<PreferenceArray, long[]> tests : new DataFileIterable(
KDDCupDataModel.getTestFile(dataFileDirectory))) {
PreferenceArray userTest = tests.getFirst();
callables.add(new Track1Callable(recommender, userTest));
i++;
}
ExecutorService executor = Executors.newFixedThreadPool(cores); //24 cores
List<Future<byte[]>> results = executor.invokeAll(callables);
executor.shutdown();
for (Future<byte[]> result : results) {
for (byte estimate : result.get()) {
out.write(estimate);
}
}
out.flush();
out.close();
When I receive the result from each callable, output it to a file. Does this output in the exact order as the list of initial Callables was made? In spite of some completing before others? Seems it should but not sure.
Also, I expect a total of 6.2 million bytes to be written to the outfile. But I get an additional 2000 bytes (Yeah for free). That messes up my submission and I think it is because of some concurrency issues. I tested this on small dataset and it seems to work fine there (264 bytes expected and received).
Anyhing wrong I am doing with the Executor framework or Futures?
Q: Does the order is the same as the one specified for the tasks? Yes.
From the API:
Returns: A list of Futures
representing the tasks, in the same
sequential order as produced by the
iterator for the given task list. If
the operation did not time out, each
task will have completed. If it did
time out, some of these tasks will not
have completed.
As for the "extra" bytes: have you tried doing all of this in sequential order (i.e., without using an executor) and checking if you obtain different results? It seems that your problem is outside the code provided (and probably is not due to concurrency).
The order in which the callable's are executed doesn't matter from the code you have here. You write the results in the order you store the futures in the list. Even if they were executed in reverse order, the file should appear the same as your file writing is single threaded.
I suspect your callables are interacting with each other and you get different results depending on the number of core you use. e.g. You might be using SimpleDateFormat.
I suggest you run this twice in the same program with a dataset which completes in a short time. Run it first with only one thread in the thread pool and a second time with 24 threads You should be able to compare the results from both runs with Arrays.equals(byte[], byte[]) and see that you get exactly the same results.

Returning the value from the loop

I am doing a POC with the rabbitMQ and writing a program to add two numbers and getting the response.
The code that we wrote to retrieve the value from the queue is running infinite time( in a while loop) and a line( inside the while loop) waits for some data to be retrieved from the queue; until it gets something from queue it will not go for the next round of the while loop.
Means we are getting the value inside an infinite loop.
And I want to use this value for my next processing.
while (true)
{
QueueingConsumer.Delivery delivery1;
try
{
delivery = consumer.nextDelivery();
//The above line waits until delivery get some value
String result1 = new String(delivery1.getBody());
System.out.println("Result received-"+ result1);
}
catch (InterruptedException ie)
{
continue;
}
} // end while
Problem is that I am not able to return the value from the while loop( I want to run it infinite time).
How can I do that so the loop will continue and I will get the processed data outside loop too?
If 'processing the result' is an operation that completes quickly, then just do it inline, e.g. by calling a separate function that does the actual processing:
void mainLoop()
{
while (true)
{
QueueingConsumer.Delivery delivery1;
try
{
delivery = consumer.nextDelivery();
//The above line waits until delivery get some value
String result1 = new String(delivery1.getBody());
System.out.println("Result received-"+ result1);
processResult(result1);
}
catch (InterruptedException ie)
{
continue;
}
} // end while
}
void processResult(String result)
{
// Do whatever needs to be done with 'result'
}
If you need processing to happen concurrently with the loop, then you will need to work with multiple threads and the problem gets a bit more complicated.
What do you mean by that exactly?
If you want to stay in the same thread, just call a function (work on the one message received and than read the next).
If you need concurrency (always read, regardless whether you a re processing a message or not) use a producer/ consumer pattern.
Create one thread that
reads from the mq
posts into a (thread-safe) collection
signals that
goes back to read from the mq
Create at least one mor thread that
waits for the signal
reads (and removes) message from the (thread-safe) collection
process the message
goes back to wait for the signal
hth
Mario
Make your return value having more visibility.
So, you'll gain access to it's value
It sounds like you're referring to the yield feature which allows your function to return multiple values. As far as I know this is not supported out-of-the-box in Java but there are some projects available that implement this feature.

Categories