Java - Most efficient random-access multi-threaded list

Java - Most efficient random-access multi-threaded list - java

Chosen List Structure:
Synchronised LinkedList.
Scenario:
My program requires rendering some (rather computational) generated images in a grid. These images must update whenever some data value changes (on another thread), hence, I have a rendering queue to manage this.
The rendering queue is a synchronised LinkedList, where on a low-priority thread, it is constantly being iterated over to check if some render work needs doing. Since the images are based on all kinds of data, each of which could change independently, I needed some form of queue to combine changes.
Data tends to change in chunks, and so when a large batch comes through I see an imaginary line run down the area where it's re-rendering the tiles. To pretty this up a bit, I decided rather than rendering in standard order, I'd render them in a random order (to give a 'dissolve in/out' effect).
It looks lovely, but the only problem is, there is a notable different in the amount of time it takes to complete with this effect running.
Problem:
I've theorised a couple of reasons accessing this list randomly instead of iteratively would cause such a notable delay. Firstly, the Random number generator's nextInt method might take up a significant enough amount of time. Secondly, since it's a LinkedList, getting the nth item might also be significant when the size of the list is in the 4000s range.
Is there any other reason for this delay that I might have overlooked? Rather than using a random number generator, or even a linked list, how else might I efficiently achieve a random access & remove from a list? If you've read the scenario, perhaps you can think of another way I could go about this entirely?
Requirements:
Multi-threaded addition to & modification of list.
Random access & removal of items from list.
Efficient operation, with large data sets & number of runs

You can use an ArrayList along with a couple of simple operations to implement this very efficiently.
To insert, always insert new work at the end of the list (an amortized constant time operation).
To extract a random piece of work, pick a random number i, swap the element at i with the element at the end of the list, and then extract and return that new last element.
Here's code (untested, uncompiled):
class RandomizedQueue<T> {
private final List<T> workItems = new ArrayList<>();
private final Random random;
RandomizedQueue(Random random) {
this.random = random;
}
public synchronized void insert(T item) {
workItems.add(item);
}
public synchronized T extract() {
if (workItems.isEmpty()) {
return null; // or throw an exception
}
int pos = random.nextInt(workItems.size());
int lastPos = workItems.size() - 1;
T item = workItems.get(pos);
workItems.set(pos, workItems.get(lastPos));
return workItems.remove(lastPos);
}
}

You could perhaps use a PriorityQueue, and when adding things to this queue give each item a random priority. The rendering can just always take the top element on the queue since it is randomized already. Inserting at a "random" position in a PriorityQueue (or better put, with a random priority) is really fast.

Related

Efficient way for checking if a string is present in an array of strings [duplicate]

This question already has answers here:
How do I determine whether an array contains a particular value in Java?
(30 answers)
Closed 2 years ago.
I'm working on a little project in java, and I want to make my algorithm more efficient.
What I'm trying to do is check if a given string is present in an array of strings.
The thing is, I know a few ways to check if a string is present in an array of strings, but the array I am working with is pretty big (around 90,000 strings) and I am looking for a way to make the search more efficient, and the only ways I know are linear search based, which is not good for an array of this magnitude.
Edit: So I tried implementing the advices that were given to me, but the code i wrote accordingly is not working properly, would love to hear your thoughts.`
public static int binaryStringSearch(String[] strArr, String str) {
int low = 0;
int high = strArr.length -1;
int result = -1;
while (low <= high) {
int mid = (low + high) / 2;
if (strArr[mid].equals(str)) {
result = mid;
return result;
}else if (strArr[mid].compareTo(str) < 0) {
low = mid + 1;
}else {
high = mid - 1;
}
}
return result;
}
Basically what it's supposed to do is return the index at which the string is present in the array, and if it is not in the array then return -1.

So you have a more or less fixed array of strings and then you throw a string at the code and it should tell you if the string you gave it is in the array, do I get that right?
So if your array pretty much never changes, it should be possible to just sort them by alphabet and then use binary search. Tom Scott did a good video on that (if you don't want to read a long, messy text written by someone who isn't a native english speaker, just watch this, that's all you need). You just look right in the middle and then check - is the string you have before or after the string in the middle you just read? If it is already precisely the right one, you can just stop. But in case it isn't, you can eliminate every string after that string in case it's after the string you want to find, otherwise every string that's before the just checked string. Of course, you also eliminate the string itself if it's not equal because - logic. And then you just do it all over again, check the string in the middle of the ones which are left (btw you don't have to actually delete the array items, it's enough just to set a variable for the lower and upper boundary because you don't randomly delete elements in the middle) and eliminate based on the result. And you do that until you don't have a single string in the list left. Then you can be sure that your input isn't in the array. So this basically means that by checking and comparing one string, you can't just eliminate 1 item like you could with checking one after the other, you can remove more then half of the array, so with a list of 256, it should only take 8 compares (or 9, not quite sure but I think it takes one more if you don't want to find the item but know if it exists) and for 65k (which almost matches your number) it takes 16. That's a lot more optimised.
If it's not already sorted and you can't because that would take way too long or for some reason I don't get, then I don't quite know and I think there would be no way to make it faster if it's not ordered, then you have to check them one by one.
Hope that helped!
Edit: If you don't want to really sort all the items and just want to make it a bit (26 times (if language would be random)) faster, just make 26 arrays for all letters (in case you only use normal letters, otherwise make more and the speed boost will increase too) and then loop through all strings and put them into the right array matching their first letter. That way it is much faster then sorting them normally, but it's a trade-off, since it's not so neat then binary search. You pretty much still use linear search (= looping through all of them and checking if they match) but you already kinda ordered the items. You can imagine that like two ways you can sort a buncha cards on a table if you want to find them quicker, the lazy one and the not so lazy one. One way would be to sort all the cards by number, let's just say the cards are from 1-100, but not continuously, there are missing cards. But nicely sorting them so you can find any card really quickly takes some time, so what you can do instead is making 10 rows of cards. In each one you just put your cards in some random order, so when someone wants card 38, you just go to the third row and then linearly search through all of them, that way it is much faster to find items then just having them randomly on your table because you only have to search through a tenth of the cards, but you can't take shortcuts once you're in that row of cards.

Depending on the requirements, there can be so many ways to deal with it. It's better to use a collection class for the rich API available OOTB.
Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and the insertion order does not matter: Use Set<String> set = new HashSet<>() and then you can use Set#contains to check the presence of a particular string.
Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and also the insertion order needs to be preserved: Use Set<String> set = new LinkedHashSet<>() and then you can use Set#contains to check the presence of a particular string.
Can the list contain duplicate strings. If yes, you can use a List<String> list = new ArrayList<>() to benefit from its rich API as well as get rid of the limitation of fixed size (Note: the maximum number of elements can be Integer.MAX_VALUE) beforehand. However, a List is navigated always in a sequential way. Despite this limitation (or feature), the can gain some efficiency by sorting the list (again, it's subject to your requirement). Check Why is processing a sorted array faster than processing an unsorted array? to learn more about it.

You could use a HashMap which stores all the strings if
Contains query is very frequent and lookup strings do not change frequently.
Memory is not a problem (:D) .

Impose order in Jsprit with HardActivityConstraint

In a scenario of re-solving a previously solved problem (with some new data, of course), it's typically impossible to re-assign a vehicle's very-first assignment once it was given. The driver is already on its way, and any new solution has to take into account that:
the job must remain his (can't be assigned to another vehicle)
the activity that's been assigned to him as the very-first, must remain so in future solutions
For the sake of simplicity, I'm using a single vehicle scenario, and only trying to impose the second bullet (i.e. ensure that a certain activity will be the first in the solution).
This is how I defined the constraint:
new HardActivityConstraint()
{
#Override
public ConstraintsStatus fulfilled(JobInsertionContext iFacts, TourActivity prevAct, TourActivity newAct, TourActivity nextAct,
double prevActDepTime)
{
String locationId = newAct.getLocation().getId();
// we want to make sure that any solution will have "C1" as its first activity
boolean activityShouldBeFirst = locationId.equals("C1");
boolean attemptingToInsertFirst = (prevAct instanceof Start);
if (activityShouldBeFirst && !attemptingToInsertFirst)
return ConstraintsStatus.NOT_FULFILLED_BREAK;
if (!activityShouldBeFirst && attemptingToInsertFirst)
return ConstraintsStatus.NOT_FULFILLED;
return ConstraintsStatus.FULFILLED;
}
}
This is how I build the algorithm:
VehicleRoutingAlgorithmBuilder vraBuilder;
vraBuilder = new VehicleRoutingAlgorithmBuilder(vrpProblem, "schrimpf.xml");
vraBuilder.addCoreConstraints();
vraBuilder.addDefaultCostCalculators();
StateManager stateManager = new StateManager(vrpProblem);
ConstraintManager constraintManager = new ConstraintManager(vrpProblem, stateManager);
constraintManager.addConstraint(new HardActivityConstraint() { ... }, Priority.HIGH);
vraBuilder.setStateAndConstraintManager(stateManager, constraintManager);
VehicleRoutingAlgorithm algorithm = vraBuilder.build();
The results are not good. I'm only getting solutions with a single job assigned (the one with the required activity). In debug it's clear that the job insertion iterations consider many viable options that appear to solve the problem entirely, but at the bottom line, the best solution returned by the algorithm doesn't include the other jobs.
UPDATE: even more surprising, is that when I use the constraint in scenarios with over 5 vehicles, it works fine (worst results are with 1 vehicle).
I'll gladly attach more information if needed.
Thanks
Zach

First, you can use initial routes to ensure that certain jobs need to be assigned to specific vehicles right from the beginning (see example).
Second, to ensure that no activity will be inserted between start and your initial job(location) (e.g. "C1" in your example), you need to prohibit it the way you defined your HardActConstraint, just modify it so that a newAct can never be between prevAct=Start and nextAct=act(C1).
Third, with regards to your update, just have in mind that the essence of the algorithm is to ruin part of the solution (remove a number of jobs) and recreate the solution again (insert the unassigned jobs). Currently, the schrimpf algorithm ruins a number of jobs relative to the total number of jobs, i.e. noJobs = 0.5 * totalNoJobs for the random ruin and 0.3 * totalNoJobs for the radial ruin. If your problem is very small, the share of jobs to be removed might not sufficiant. This is going to change with next release, where you can use an algorithm out of the box which defines an absolute minimum of jobs that need to be removed. For the time being, modify the shares in your algorithmConfig.xml.

is there any faster way to generate List of N integer

well I know it is very novice question, but nothing is getting into my mind. Currently I am trying this, but it is the least efficient way for such a big number. Help me anyone.
int count = 66000000;
LinkedList<Integer> list = new LinkedList<Integer>();
for (int i=1;i<=count;i++){
list.add(i);
//System.out.println(i);
}
EDIT:
Actually I have o perform operation on whole list(queue) repeatedly (say on a condition remove some elements and add again), so having to iterate whole list became so slow what with such number it took more than 10min.

the size of your output is O(n) therefore it's literally impossible to have an algorithm that populates your list any more efficient than O(n) time complexity.
You're spending a whole lot more time just printing your numbers to the screen than you actually are spending generating the list. If you really want to speed this code up, remove the
System.out.println(i);
On a separate note, I've noticed that you're using a LinkedList, If you used an array(or array-based list) it should be faster.

You could implement a List where the get(int index) method simply returns the index (or some value based on the index). The creation of the list would then be constant time (O(1)). The list would have to be immutable.

Your question isn't just about building the list, it includes deletion and re-insertion. I suspect you should be using a HashSet, maybe even a BitSet instead of a List of any kind.

I need to implement an array hashtable that works without initializing the array to null at the start. Any clue how to do that?

So, here is the actual question (it's for a homework):
A hashtable is data structure that allows access and manipulation of the date at constant time (O(1)). The hashtable array must be initialized to null during the creation of the hashtable in order to identify the empty cells. In most cases, the time penalty is enormous especially considering that most cells will never be read. We ask of you that you implement a hashtable that bypasses this problem at the price of a heavier insertion, but still at constant time. For the purpose of this homework and to simplify your work, we suppose that you can't delete elements in this hashtable.
In the archive of this homework you will find the interface of an hashtable that you need to fill. You can use the function hashcode() from java as a hash function. You will have to use the Vector data structure from Java in order to bypass the initialization and you have to find by yourself how to do so. You can only insert elements at the end of the vector so that the complexity is still O(1).
Here are some facts to consider:
In a hashtable containing integers, the table contains numeric values (but they don't make any sense).
In a stack, you cannot access elements over the highest element, but you know for sure that all the values are valid. Furthermore, you know the index of the highest element.
Use those facts to bypass the initialization of the hashtable. The table must use linear probing to resolve collisions.
Also, here is the interface that I need to implement for this homework:
public interface NoInitHashTable<E>
{
public void insert(E e);
public boolean contains(E e);
public void rehash();
public int nextPrime(int n);
public boolean isPrime(int n);
}
I have already implemented nextPrime and isPrime (I don't think they are different from a normal hashtable). The three other I need to figure out.
I thought a lot about it and discussed it with my teammate but I really can't find anything. I only need to know the basic principle of how to implement it, I can handle the coding.
tl;dr I need to implement an array hashtable that works without initializing the array to null at the start. The insertion must be done in constant time. I only need to know the basic principle of how to do that.

I think I have seen this in a book as exercise with answer at the back, but I can't remember which book or where. It is generally relevant to the question of why we usually concentrate on the time a program takes rather than the space - a program that runs efficiently in time shouldn't need huge amounts of space.
Here is some pseudo-code that checks if a cell in the hash table is valid. I will leave the job of altering the data structures it defines to make another cell in the hash table valid as a remaining exercise for the reader.
// each cell here is for a cell at the same offset in the
// hash table
int numValidWhenFirstSetValid[SIZE];
int numValidSoFar = 0; // initialise only this
// Only cells 0..numValidSoFar-1 here are valid.
int validOffsetsInOrderSeen[SIZE];
boolean isValid(int offsetInArray)
{
int supposedWhenFirstValid =
numValidWhenFirstSetValid[offsetInArray]
if supposedWhenFirstValid >= numValidSoFar)
{
return false;
}
if supposedWhenFirstValid < 0)
{
return false;
}
if (validOffsetsInOrderSeen[supposedWhenFirstValid] !=
offsetInArray)
{
return false;
}
return true;
}
Edit - this is exercise 24 in section 2.2.6 of Knuth Vol 1. The provided answer references exercise 2.12 of "The Design And Analysis of Computer Programs" by Aho, Hopcraft, and Ullman. You can avoid any accusation of plaigarism in your answer by referencing the source of the question you were asked :-)

Mark each element in hashtable with some color (1, 2, ...)
F.e.
Current color:
int curColor = 0;
When you put element to hash table, associate with it current color (curColor)
If you need to search, filter elements that haven't the same color (element.color == curColor)
If you need to clear hashTable, just increment current color (curColor++)

What is a data structure that has O(1) for append, prepend, and retrieve element at any location?

I'm looking for Java solution but any general answer is also OK.
Vector/ArrayList is O(1) for append and retrieve, but O(n) for prepend.
LinkedList (in Java implemented as doubly-linked-list) is O(1) for append and prepend, but O(n) for retrieval.
Deque (ArrayDeque) is O(1) for everything above but cannot retrieve element at arbitrary index.
In my mind a data structure that satisfy the requirement above has 2 growable list inside (one for prepend and one for append) and also stores an offset to determine where to get the element during retrieval.

You're looking for a double-ended queue. This is implemented the way you want in the C++ STL, which is you can index into it, but not in Java, as you noted. You could conceivably roll your own from standard components by using two arrays and storing where "zero" is. This could be wasteful of memory if you end up moving a long way from zero, but if you get too far you can rebase and allow the deque to crawl into a new array.
A more elegant solution that doesn't really require so much fanciness in managing two arrays is to impose a circular array onto a pre-allocated array. This would require implementing push_front, push_back, and the resizing of the array behind it, but the conditions for resizing and such would be much cleaner.

A deque (double-ended queue) may be implemented to provide all these operations in O(1) time, although not all implementations do. I've never used Java's ArrayDeque, so I thought you were joking about it not supporting random access, but you're absolutely right — as a "pure" deque, it only allows for easy access at the ends. I can see why, but that sure is annoying...
To me, the ideal way to implement an exceedingly fast deque is to use a circular buffer, especially since you are only interested in adding removing at the front and back. I'm not immediately aware of one in Java, but I've written one in Objective-C as part of an open-source framework. You're welcome to use the code, either as-is or as a pattern for implementing your own.
Here is a WebSVN portal to the code and the related documentation. The real meat is in the CHAbstractCircularBufferCollection.m file — look for the appendObject: and prependObject: methods. There is even a custom enumerator ("iterator" in Java) defined as well. The essential circular buffer logic is fairly trivial, and is captured in these 3 centralized #define macros:
#define transformIndex(index) ((headIndex + index) % arrayCapacity)
#define incrementIndex(index) (index = (index + 1) % arrayCapacity)
#define decrementIndex(index) (index = ((index) ? index : arrayCapacity) - 1)
As you can see in the objectAtIndex: method, all you do to access the Nth element in a deque is array[transformIndex(N)]. Note that I make tailIndex always point to one slot beyond the last stored element, so if headIndex == tailIndex, the array is full, or empty if the size is 0.
Hope that helps. My apologies for posting non-Java code, but the question author did say general answers were acceptable.

If you treat append to a Vector/ArrayList as O(1) - which it really isn't, but might be close enough in practice -
(EDIT - to clarify - append may be amortized constant time, that is - on average, the addition would be O(1), but might be quite a bit worse on spikes. Depending on context and the exact constants involved, this behavior can be deadly).
(This isn't Java, but some made-up language...).
One vector that will be called "Forward".
A second vector that will be called "Backwards".
When asked to append -
Forward.Append().
When asked to prepend -
Backwards.Append().
When asked to query -
if ( Index < Backwards.Size() )
{
return Backwards[ Backwards.Size() - Index - 1 ]
}
else
{
return Forward[ Index - Backwards.Size() ]
}
(and also check for the index being out of bounds).

Your idea might work. If those are the only operations you need to support, then two Vectors are all you need (call them Head and Tail). To prepend, you append to head, and to append, you append to tail. To access an element, if the index is less than head.Length, then return head[head.Length-1-index], otherwise return tail[index-head.Length]. All of these operations are clearly O(1).

Here is a data structure that supports O(1) append, prepend, first, last and size. We can easily add other methods from AbstractList<A> such as delete and update
import java.util.ArrayList;
public class FastAppendArrayList<A> {
private ArrayList<A> appends = new ArrayList<A>();
private ArrayList<A> prepends = new ArrayList<A>();
public void append(A element) {
appends.add(element);
}
public void prepend(A element) {
prepends.add(element);
}
public A get(int index) {
int i = prepends.size() - index;
return i >= 0 ? prepends.get(i) : appends.get(index + prepends.size());
}
public int size() {
return prepends.size() + appends.size();
}
public A first() {
return prepends.isEmpty() ? appends.get(0) : prepends.get(prepends.size());
}
public A last() {
return appends.isEmpty() ? prepends.get(0) : appends.get(prepends.size());
}

What you want is a double-ended queue (deque) like the STL has, since Java's ArrayDeque lacks get() for some reason. There were some good suggestions and links to implementations here:
Java equivalent of std::deque?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.