Drop part of a List<> when encountering OutOfMemoryException

Drop part of a List<> when encountering OutOfMemoryException - java

I'm writing a program that is supposed to continually push generated data into a List sensorQueue. The side effect is that I will eventually run out of memory. When that happens, I'd like drop parts of the list, in this example the first, or older, half. I imagine that if I encounter an OutOfMemeryException, I won't be able to just use sensorQueue = sensorQueue.subList((sensorQueue.size() / 2), sensorQueue.size());, so I came here looking for an answer.
My code:
public static void pushSensorData(String sensorData) {
try {
sensorQueue.add(parsePacket(sensorData));
} catch (OutOfMemoryError e) {
System.out.println("Backlog full");
//TODO: Cut the sensorQueue in half to make room
}
System.out.println(sensorQueue.size());
}

Is there an easy way to detect an impending OutOfMemoryException then?
You can have something like below to determine MAX memory and USED memory. Using that information you can define next set of actions in your programme. e.g. reduce its size or drop some elements.
final int MEGABYTE = (1024*1024);
MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
long maxMemory = heapUsage.getMax() / MEGABYTE;
long usedMemory = heapUsage.getUsed() / MEGABYTE;
Hope this would helps!

The problem with subList is that it creates sublist keeping the original one in memory. However, ArrayList or other extension of AbstractList has removeRange(int fromIndex, int toIndex) which removes elements of current list, so doesn't require additional memory.
For the other List implementations there is similar remove(int index) which you can use multiple times for the same purpose.

I think you idea is severely flawed (sorry).
There is no OutOfMemoryException, there is OutOfMemoryError only! Why that is important? Because errors leaves app in unstable state, well I'm not that sure about that claim in general, but it definitely holds for OutOfMemoryError. Because there is no guarantee, that you will be able to catch it! You can consume all of memory within you try-catch block, and OutOfMemoryError will be thrown somewhere in JDK code. So your catching is pointless.
And what is the reason for this anyways? How many messages do you want in list? Say that your message is 1MB. And your heap is 1000MB. So if we stop considering other classes, your heap size define, that your list will contain up to 1000 messages, right? Wouldn't it be easier to set heap sufficiently big for your desired number of messages, and specify message count in easier, intergral form? And if your answer is "no", then you still cannot catch OutOfMemoryError reliably, so I'd advise that your answer rather should be "yes".
If you really need to consume all what is possible, then checking memory usage in % as #fabsas recommended could be way. But I'd go with integral definition — easier to managed. Your list will contain up-to N messages.

You can drop a range of elements from a ArrayList using subList:
list.subList(from, to).clear();
Where from is the first index of the range to be removed and to is the last. In your case, you can do something like:
list.subList(0, sensorQueue.size() / 2).clear();
Note that this command will return a List.

Related

Is LinkedList.toString().replace() O(2*k)?

I know that converting a suitable object (e.g., a linked list) to a string using the toString() method is an O(n) operation, where 'n' is the length of the linked list. However, if you wanted to then replace something in that that string using the replace(), method, is that also an o(k) method, where 'k' is the length of the string?
For example, for the line String str = path.toString().replace("[", "").replace("]", "").replace(",", "");, does this run through the length of the linked list 1 time, and then the length of the string an additional 3 times? If so, is there a more efficient way to do what that line of code does?

Yes, it would. replace has no idea that [ and ] are only found at the start and end. In fact, it's worse - you get another loop for copying the string over (the string has an underlying array and that needs to be cloned in its entirety to lop a character out of it).
If your intent is to replace every [ in the string, then, no, there is no faster way. However, if your actual intent is to simply not have the opening brace and closing brace, then either write your own loop to toString the contents. Something like:
LinkedList<Foo> foos = ...;
StringBuilder out = new StringBuilder();
for (Foo f : foos) out.append(out.length() == 0 ? "" : ", ").append(f);
return out.toString();
Or even:
String.join(", ", foos);
Or even:
foos.stream().collect(Collectors.joining(", "));
None of this is the same thing as .replace("[", "") - after all, if a [ symbol is part of the toString() of any Foo object, it would be stripped out as well with .replace("[", "") - though you probably didn't want that to happen.
Note that the way modern CPUs work, unless that list has over a few thousand elements in it, looping it 4 times is essentially free and takes no measurable time. The concept of O(n) 'kicks in' after a certain number of loops. On modern hardware, it tends to be a lot of loops before it matters. Often other concerns are much more important. As a simple example, linked list, in general? Horrible performance relative to something like ArrayList. Even in cases where O(k) wise it should be faster. It's due to the way linkedlists create extra objects and how these tend to be non-contiguous (not near each other in memory). Modern CPUs can't read memory at all. They can ask the memory controller to take one of the on-die cache pages and replace it with the contents of another memory page, which takes 500 to a 1000 cycles. The CPU will ask the memory controller to do that and then go to sleep for 1000 cycles. You can see how reducing the number of times it does this can have a rather marked effect on performance, and yet the O(k) business doesn't and cannot take it into account.
Do not worry about performance unless you have a real life scenario where the program appears to run slower than you think it should. Then, use a profiler to figure out which 1% of the code is eating 99% of the resources (because it's virtually always a 1% 'hot path' that is responsible) and then optimize just that 1%. It's pretty much impossible to predict what the 1% is going to be. So, don't bother trying to do so while writing code, it just leads you to writing harder to maintain, less flexible code - which ironically enough tends to lead to situations where adjusting the hot path is harder. Worrying about performance, in essence, slows down the code. Hence why it's very very important not to worry about that, and worry instead about code that is easy to read and easy to modify.

Java - Most efficient random-access multi-threaded list

Chosen List Structure:
Synchronised LinkedList.
Scenario:
My program requires rendering some (rather computational) generated images in a grid. These images must update whenever some data value changes (on another thread), hence, I have a rendering queue to manage this.
The rendering queue is a synchronised LinkedList, where on a low-priority thread, it is constantly being iterated over to check if some render work needs doing. Since the images are based on all kinds of data, each of which could change independently, I needed some form of queue to combine changes.
Data tends to change in chunks, and so when a large batch comes through I see an imaginary line run down the area where it's re-rendering the tiles. To pretty this up a bit, I decided rather than rendering in standard order, I'd render them in a random order (to give a 'dissolve in/out' effect).
It looks lovely, but the only problem is, there is a notable different in the amount of time it takes to complete with this effect running.
Problem:
I've theorised a couple of reasons accessing this list randomly instead of iteratively would cause such a notable delay. Firstly, the Random number generator's nextInt method might take up a significant enough amount of time. Secondly, since it's a LinkedList, getting the nth item might also be significant when the size of the list is in the 4000s range.
Is there any other reason for this delay that I might have overlooked? Rather than using a random number generator, or even a linked list, how else might I efficiently achieve a random access & remove from a list? If you've read the scenario, perhaps you can think of another way I could go about this entirely?
Requirements:
Multi-threaded addition to & modification of list.
Random access & removal of items from list.
Efficient operation, with large data sets & number of runs

You can use an ArrayList along with a couple of simple operations to implement this very efficiently.
To insert, always insert new work at the end of the list (an amortized constant time operation).
To extract a random piece of work, pick a random number i, swap the element at i with the element at the end of the list, and then extract and return that new last element.
Here's code (untested, uncompiled):
class RandomizedQueue<T> {
private final List<T> workItems = new ArrayList<>();
private final Random random;
RandomizedQueue(Random random) {
this.random = random;
}
public synchronized void insert(T item) {
workItems.add(item);
}
public synchronized T extract() {
if (workItems.isEmpty()) {
return null; // or throw an exception
}
int pos = random.nextInt(workItems.size());
int lastPos = workItems.size() - 1;
T item = workItems.get(pos);
workItems.set(pos, workItems.get(lastPos));
return workItems.remove(lastPos);
}
}

You could perhaps use a PriorityQueue, and when adding things to this queue give each item a random priority. The rendering can just always take the top element on the queue since it is randomized already. Inserting at a "random" position in a PriorityQueue (or better put, with a random priority) is really fast.

Existing solution to "smart" initial capacity for StringBuilder

I have a piece logging and tracing related code, which called often throughout the code, especially when tracing is switched on. StringBuilder is used to build a String. Strings have reasonable maximum length, I suppose in the order of hundreds of chars.
Question: Is there existing library to do something like this:
// in reality, StringBuilder is final,
// would have to create delegated version instead,
// which is quite a big class because of all the append() overloads
public class SmarterBuilder extends StringBuilder {
private final AtomicInteger capRef;
SmarterBuilder(AtomicInteger capRef) {
int len = capRef.get();
// optionally save memory with expense of worst-case resizes:
// len = len * 3 / 4;
super(len);
this.capRef = capRef;
}
public syncCap() {
// call when string is fully built
int cap;
do {
cap = capRef.get();
if (cap >= length()) break;
} while (!capRef.compareAndSet(cap, length());
}
}
To take advantage of this, my logging-related class would have a shared capRef variable with suitable scope.
(Bonus Question: I'm curious, is it possible to do syncCap() without looping?)
Motivation: I know default length of StringBuilder is always too little. I could (and currently do) throw in an ad-hoc intitial capacity value of 100, which results in resize in some number of cases, but not always. However, I do not like magic numbers in the source code, and this feature is a case of "optimize once, use in every project".

Make sure you do the performance measurements to make sure you really are getting some benefit for the extra work.
As an alternative to a StringBuilder-like class, consider a StringBuilderFactory. It could provide two static methods, one to get a StringBuilder, and the other to be called when you finish building a string. You could pass it a StringBuilder as argument, and it would record the length. The getStringBuilder method would use statistics recorded by the other method to choose the initial size.
There are two ways you could avoid looping in syncCap:
Synchronize.
Ignore failures.
The argument for ignoring failures in this situation is that you only need a random sampling of the actual lengths. If another thread is updating at the same time you are getting an up-to-date view of the string lengths anyway.

You could store the string length of each string in a statistic array. run your app, and at shutdown you take the 90% quartil of your string length (sort all str length values, and take the length value at array pos = sortedStrings.size() * 0,9
That way you created an intial string builder size where 90% of your strings will fit in.
Update
The value could be hard coded (like java does for value 10 in ArrayList), or read from a config file, or calclualted automatically in a test phase. But the quartile calculation is not for free, so best you run your project some time, measure the 90% quartil on the fly inside the SmartBuilder, output the 90% quartil from time to time, and later change the property file to use the value.
That way you would get optimal results for each project.
Or if you go one step further: Let your smart Builder update that value from time to time in the config file.
But this all is not worth the effort, you would do that only for data that have some millions entries, like digital road maps, etc.

Java: Filling in-memory sorted batches

So I'm using Java to do multi-way external merge sorts of large on-disk files of line-delimited tuples. Batches of tuples are read into a TreeSet, which are then dumped into on-disk sorted batches. Once all of the data have been exhausted, these batches are then merge-sorted to the output.
Currently I'm using magic numbers for figuring out how many tuples we can fit into memory. This is based on a static figure indicating how may tuples can be roughly fit per MB of heap space, and how much heap space is available using:
long max = Runtime.getRuntime().maxMemory();
long used = Runtime.getRuntime().totalMemory();
long free = Runtime.getRuntime().freeMemory();
long space = free + (max - used);
However, this does not always work so well since we may be sorting different length tuples (for which the static tuple-per-MB figure might be too conservative) and I now want to use flyweight patterns to jam more in there, which may make the figure even more variable.
So I'm looking for a better way to fill the heap-space to the brim. Ideally the solution should be:
reliable (no risk of heap-space exceptions)
flexible (not based on static numbers)
efficient (e.g., not polling runtime memory estimates after every tuple)
Any ideas?

Filling the heap to the brim might be a bad idea due to garbage collector trashing. (As the memory gets nearly full, the efficiency of garbage collection approaches 0, because the effort for collection depends on heap size, but the amount of memory freed depends on the size of the objects identified as unreachable).
However, if you must, can't you simply do it as follows?
for (;;) {
long freeSpace = getFreeSpace();
if (freeSpace < 1000000) break;
for (;;freeSpace > 0) {
treeSet.add(readRecord());
freeSpace -= MAX_RECORD_SIZE;
}
}
The calls to discover the free memory will be rare, so shouldn't tax performance much. For instance, if you have 1 GB heap space, and leave 1MB empty, and MAX_RECORD_SIZE is ten times average record size, getFreeSpace() will be invoked a mere log(1000) / -log(0.9) ~= 66 times.

Why bother with calculating how many items you can hold? How about letting java tell you when you've used up all your memory, catching the exception and continuing. For example,
// prepare output medium now so we don't need to worry about having enough
// memory once the treeset has been filled.
BufferedWriter writer = new BufferedWriter(new FileWriter("output"));
Set<?> set = new TreeSet<?>();
int linesRead = 0;
{
BufferedReader reader = new BufferedReader(new FileReader("input"));
try {
String line = reader.readLine();
while (reader != null) {
set.add(parseTuple(line));
linesRead += 1;
line = reader.readLine();
}
// end of file reached
linesRead = -1;
} catch (OutOfMemoryError e) {
// while loop broken
} finally {
reader.close();
}
// since reader and line were declared in a block their resources will
// now be released
}
// output treeset to file
for (Object o: set) {
writer.write(o.toString());
}
writer.close();
// use linesRead to find position in file for next pass
// or continue on to next file, depending on value of linesRead
If you still have trouble with memory, just make the reader's buffer extra large so as to reserve more memory.
The default size for the buffer in a BufferedReader is 4096 bytes. So when finishing reading you will release upwards of 4k of memory. After this your additional memory needs will be minimal. You need enough memory to create an iterator for the set, let's be generous and assume 200 bytes. You will also need memory to store the string output of your tuples (but only temporarily). You say the tuples contain about 200 characters. Let's double that to take account for separators -- 400 characters, which is 800 bytes. So all you really need is an additional 1k bytes. So you're fine as you've just released 4k bytes.
The reason you don't need to worry about the memory used to store the string output of your tuples is because they are short lived and only referred to within the output for loop. Note that the Writer will copy the contents into its buffer and then discard the string. Thus, the next time the garbage collector runs the memory can be reclaimed.
I've checked and, a OOME in add will not leave a TreeSet in an inconsistent state, and the memory allocation for a new Entry (the internal implementation for storing a key/value pair) happens before the internal representation is modified.

You can really fill the heap to the brim using direct memory writing (it does exist in Java!). It's in sun.misc.Unsafe, but isn't really recommended for use. See here for more details. I'd probably advise writing some JNI code instead, and using existing C++ algorithms.

I'll add this as an idea I was playing around with, involving using a SoftReference as a "sniffer" for low memory.
SoftReference<Byte[]> sniffer = new SoftReference<String>(new Byte[8192]);
while(iter.hasNext()){
tuple = iter.next();
treeset.add(tuple);
if(sniffer.get()==null){
dump(treeset);
treeset.clear();
sniffer = new SoftReference<String>(new Byte[8192]);
}
}
This might work well in theory, but I don't know the exact behaviour of SoftReference.
All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.
Would like to hear feedback as it seems to me like an elegant solution, although behaviour might vary between VMs?
Testing on my laptop, I found that it the soft-reference is cleared infrequently, but sometimes is cleared too early, so I'm thinking to combine it with meriton's answer:
SoftReference<Byte[]> sniffer = new SoftReference<String>(new Byte[8192]);
while(iter.hasNext()){
tuple = iter.next();
treeset.add(tuple);
if(sniffer.get()==null){
free = MemoryManager.estimateFreeSpace();
if(free < MIN_SAFE_MEMORY){
dump(treeset);
treeset.clear();
sniffer = new SoftReference<String>(new Byte[8192]);
}
}
}
Again, thoughts welcome!

Starting Size for an ArrayList

I want to use an ArrayList (or some other collection) like how I would use a standard array.
Specifically, I want it to start with an intial size (say, SIZE), and be able to set elements explicitly right off the bat,
e.g.
array[4] = "stuff";
could be written
array.set(4, "stuff");
However, the following code throws an IndexOutOfBoundsException:
ArrayList<Object> array = new ArrayList<Object>(SIZE);
array.set(4, "stuff"); //wah wahhh
I know there are a couple of ways to do this, but I was wondering if there was one that people like, or perhaps a better collection to use. Currently, I'm using code like the following:
ArrayList<Object> array = new ArrayList<Object>(SIZE);
for(int i = 0; i < SIZE; i++) {
array.add(null);
}
array.set(4, "stuff"); //hooray...
The only reason I even ask is because I am doing this in a loop that could potentially run a bunch of times (tens of thousands). Given that the ArrayList resizing behavior is "not specified," I'd rather it not waste any time resizing itself, or memory on extra, unused spots in the Array that backs it. This may be a moot point, though, since I will be filling the array (almost always every cell in the array) entirely with calls to array.set(), and will never exceed the capacity?
I'd rather just use a normal array, but my specs are requiring me to use a Collection.

The initial capacity means how big the array is. It does not mean there are elements there. So size != capacity.
In fact, you can use an array, and then use Arrays.asList(array) to get a collection.

I recomend a HashMap
HashMap hash = new HasMap();
hash.put(4,"Hi");

Considering that your main point is memory. Then you could manually do what the Java arraylist do, but it doesn't allow you to resize as much you want. So you can do the following:
1) Create a vector.
2) If the vector is full, create a vector with the old vector size + as much you want.
3) Copy all items from the old vector to your new vector.
This way, you will not waste memory.
Or you can implement a List (not vector) struct. I think Java already has one.

Yes, hashmap would be a great ideia.
Other way, you could just start the array with a big capacity for you purpose.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.