We have this multi-page browser-based application in XPages. The main window contains 5 frames with different pages, a 2nd (and there may be more) contain documents, all with a different page layout. We used to run the application with "all pages in memory" (coming from R8.5.1 originally), it was lightning fast but hogging memory. We're now in the process of moving it to "one page in memory, the rest on disk".
I think the situation we have right now is this:
all pages share the same sessionScoped bean
page A is refreshed automatically: every minute an Ajax request is sent to fetch data
page B happens to be serializing a HashMap at the same time
the refresh of the first page changes the HashMap being serialized
The HashMap is an object inside the bean. Why is the bean serialized? I might be mistaken, it might just be a different HashMap that's being serialized...
Anyway, my question: how can I synchronize these actions, is there some easy way?
PS I already tried with a ConcurrentHashMap, but I got some very weird results...
Thanks for anything helpful!
"Why is the bean serialized?" A sessionScoped bean would not be serialized by default. It can happen if you use a load-time binding that evaluates to the bean like ${someBean} or if its serializing the HashMap you might have referenced that in a load-time binding, like ${someBean.someHashMap} (where ${ is load-time bindings, and #{ are runtime bindings ). The results of load-time bindings are saved in the control tree, which is serialized when you're saving server-side pages on disk. The solution there would be to change those references to runtime bindings.
"how can I synchronize these actions"
There's a synchronized keyword in SSJS, see:
http://mattwhite.me/blog/2009/9/14/on-synchronization-in-xpages.html
but that can only protect the object from concurrent access in SSJS; the page state serializing won't be synchronized on the same object, so you'd still have to fix it to not-serialize the bean&HashMap.
As always when encountering an error like this you should ask yourself why and read the documentation for the exception. This exception occurs because you modify a collection while iterating it (eg reading from it). For this exact reason Java ship with a collection implementation called CopyOfWriteList which allows writes while reading. It does this by making a new list (eg copy the pointer) when the list is written. This is great when writes are less frequent than reads. Unfortunately there is no such thing for Maps built into the jdk.
My suggestion would be to encapsulate the map and implementing a similar feature so a new map is created on new data. This would make the map immutable for readers and hence would remove the concurrent mod exception. Users that read while new data arrive would get the "old" data but it would perform much better than synchronizing access.
Hope is helps.
Related
I am implementing an HTTP API using the Spring MVC framework.
I want to store some data between requests and between sessions. The data needs to be readable and modifiable by multiple requests in completely independent sessions, but it only needs to exist in-memory while the application is running, it does not need to be persisted to a database, and it does not need to be shared between any scaled-up, multi-node, multi-process server backend design, just one per (e.g.) Tomcat instance is completely fine. Consider for example a cache or something logging some short-lived metrics about the application-specific data coming in through the requests.
I am assuming the usual way would be to use an in-memory database or something like Redis.
However, this being my first venture into web stuff and coming from c++ parallel computing personally, this seems like an extremely over-engineered and inefficient solution to me.
Can I not just create a singleton bean containing a ConcurrentHashMap of my required types, inject it as a dependency into my Controller, and be done with it? I never see anyone talk about this anywhere, even though it seems to be the simplest solution by far to me. Is there something about how Spring MVC or Tomcat works that makes this impossible?
Basically, yes. "A singleton ConcurrentHashMap" can be used as a cache.
But, I'd go with something that works like a map but has an API that is specifically tailored to caches. Fortunately, such a thing exists.
Guava is a 'general utilities' project (just a bunch of useful utility classes, lots of em now seem a bit pointless, in the sense that java.util and co have these too, but guava is over 10 years old, and everything it has didn't exist back then) - and one of the most useful things it has is a 'Cache' class. It's a Map with bonus features.
I strongly suggest you use it and follow its API designs. It's got a few things that map doesn't have:
You can set up an eviction system; various strategies are available. You can allow k/v pairs to expire X milliseconds after being created, or optionally X milliseconds after the last time they were read. Or simply guarantee that the cache will never exceed some set size, removing the least recently accessed (or written - again, your choice) k/v pair if needed.
The obvious 'get a value' API call isn't .get() like with map, it's a variant where you provide the key as well as a computation function that would calculate the value; the Cache object will just return the cache value if it exists, but if not, it will run the computation, store it in the cache, and return that. Making your life a lot easier, you just call the get method, pass in the key and the computer, and continue, not having to care about whether the computation function is used or not.
You get some control over concurrent calculations too - if 2 threads simultaneously end up wanting the value for key K which isn't in the cache, should both threads just go compute it, or should one thread be paused to wait for the other's calculation? That's also not entirely trivial to write in a ConcurrentHashMap.
Some fairly fancy footwork - weak keying/valuing: You can set things up such that if the key is garbage collected, the k/v pair gets evicted (eventually) too. This is tricky (string keys don't really work here, for example, and sometimes your value refers to your key in which case the existence of the value would mean your key can't be GCed, making this principle worthless - so you need to design your key and value classes carefully), but can be very powerful.
I believe you can also get just the guava cache stuff on its own, but if not - you know where to look: Add guava as a dependency to your project, fire up an instance of CacheBuilder, read the javadocs, and you're off :)
I am looking at the implementation of LogbackMDCAdapter, and it keeps track of lastOperation I don't understand the reason for doing this, anyone has an idea why this is done?
And, why duplicateAndInsertNewMap is required?
Based on the comment here the map copying is required for serialization purposes
Each time a value is added, a new instance of the map is created. This
is to be certain that the serialization process will operate on the
updated map and not send a reference to the old map, thus not allowing
the remote logback component to see the latest changes.
This refers to the behaviour of ObjectOutputStream sending references to previously written objects instead of the full object, unless using the writeUnshared method.
It is not directly obvious why it's possible to skip copying unless there's a get/put combination, but apparently even if you have multiple put operations in a row, the serialization will work properly as long as the map is copied only when a put/remove is performed right after a get. So this is a performance optimization to avoid copying the map unnecessarily when putting several items in it.
My web application is running in apache tomcat.
The classloader/component org.apache.catalina.loader.WebappClassLoader # 0x7a199fae8 occupies 1,70,86,32,104 (88.08%) bytes.
The memory is accumulated in one instance of java.util.concurrent.ConcurrentHashMap$Segment[] loaded by <system class loader>.
I got this problem while analyzing Heapdump. How to analyze it further ?
You provide very little information so I only can provide very little advice… ;-)
First you need to find out who is using the largest objects (the HashMap in your case). Try to look at the contents of the HashMap so you may find out what it is used for. You should also try to look at where this objects are referenced.
Than you can try to limit its size. Depending on whether it is used by a framework you use or by your own code this can be easy (e.g. configuration change for a frameworks cache), medium (e.g. you need to refactor your own code) or difficult (e.g. it is deeply buried in a library you have no control over).
Often the culprit is not the one you expect: Only because an object instance (in your case the HashMap) accumulates a lot of memory does not mean the "owner" of this object is the root cause of the problem. You might well have to look some levels above or below in the object tree or even in a completely different location. In most cases it is crucial that you know your application very well.
Update: You can try to inspect the contents of a HashMap by right clicking it and selecting Java Collections, Hash Entries. For general objects you can use List objects, with incoming references (to list all objects that reference the selected object) or with outgoing references (to list all object that are referenced by the selected object).
Memory analysis is not an easy task and can require a lot of time, at least if you are not used to it…
If you need further assistance you need to provide more details about your application, the frameworks you use and how the heap looks like in MAT.
I'm running a multi-threaded Java application which gets requests to classify instances. In order to be able to run many threads concurrently my application shares a Classifier object and an Instances object among the threads. The Instances object contains only attributes' related data and does not have any instance associated with it.
When my application gets a classification request, I create an Instance object with the request's attributes data and set the pre-generated Instances object as the dataset using Instance.setDataset(), e.g.:
myNewInstance.setDataset(sharedInstances);
Then myNewInstance is sent to the shared Classifier.
It seems to work well in most cases. However sometimes when 2 concurrent requests occur, an exception is thrown from Classifier.distributionForInstance(). Unfortunately the error message is not clear, however these are 2 different exceptions I see:
Caused by: java.lang.RuntimeException: Queue is empty
at weka.core.Queue.pop(Queue.java:194)
at weka.filters.Filter.output(Filter.java:563)
at weka.filters.unsupervised.attribute.PrincipalComponents.convertInstance(PrincipalComponents.java:626)
at weka.filters.unsupervised.attribute.PrincipalComponents.input(PrincipalComponents.java:812)
at weka.classifiers.meta.RotationForest.convertInstance(RotationForest.java:1114)
at weka.classifiers.meta.RotationForest.distributionForInstance(RotationForest.java:1147)
Caused by: java.lang.NullPointerException
at weka.filters.unsupervised.attribute.Standardize.convertInstance(Standardize.java:238)
at weka.filters.unsupervised.attribute.Standardize.input(Standardize.java:142)
at weka.filters.unsupervised.attribute.PrincipalComponents.convertInstance(PrincipalComponents.java:635)
at weka.filters.unsupervised.attribute.PrincipalComponents.input(PrincipalComponents.java:812)
at weka.classifiers.meta.RotationForest.convertInstance(RotationForest.java:1114)
at weka.classifiers.meta.RotationForest.distributionForInstance(RotationForest.java:1147)
As you can see, when the latest happens it comes with an empty message string.
To my understanding I can't make the objects immutable, and I'd rather not wrap this part in a critical section in order to utilize the most out of the concurrency.
I've tried creating a different 'Instances' object per each classification request by using the constructor Instances(Instances dataset), however, it did not yield different results. Using a different Classifier is not an option since it takes too much time to construct the object and it needs to respond fast (10 to 20 milliseconds at most), and to my understanding the problem does not rely there.
I assume that the problem comes from using the same Instances object. Based on the documentation of Instances the constructor only copies the references to the header information which explains why the problem was not solved by creating another object. Is there an option to create a completely different Instances object based on a previous object without going over all attributes in realtime?
Any other performance-oriented solution will also be highly appreciated.
thanks!
Probably you have solved this issue by now. This is just for those who face a similar issue. I was testing instances in a multi-threaded Java application, and I faced the same exception. To break down the problem, there are two issues in your case:
The first one is that you are using the same Instances object where you are setting the data for each request. With this, you will most probably run into concurrency issues that might not break the code, but will yield wrong results. Because the data from different requests could get mixed up. Your best bet is to create a new Instances object for each request. However, this is not what is producing the exception you are facing. It is the second issue.
The second issue is that you are using the same Classifier, and this is what is producing the exception. In my case I had the classifiers built and serialized and written to a file since I constructed them. Once I need to classify a test set, I used to deserialize the object in each thread, giving me a new instance. However, a proper way to solve this issue would be to use the weka.classifiers.Classifier.makeCopy(model) static method to make a copy for each request, which internally uses serialization.
I'm running Coldfusion8/MySQL 5.0.88.
My applications main feature is a search function, which on submit triggers an AJAX request calling a cfc-method. The method assembles the HTML, gzips it and returns gzipped HTML as Ajax response.
This is the gzip part:
<cfscript>
var result="";
var text=createObject("java","java.lang.String").init(arguments[1]);
var dataStream=createObject("java","java.io.ByteArrayOutputStream").init();
var compressDataStream=createObject("java","java.util.zip.GZIPOutputStream").init(dataStream);
compressDataStream.write(text.getBytes());
compressDataStream.finish();
compressDataStream.close();
</cfscript>
I am a little reluctant regarding the use of cfobject here, especially since this script will be called over and over again by every user.
Question:
Would it increase performance if I create the object on the application or session level or at least check for the existence of the object before re-creating it. What's the best way to handle this?
If your use of objects is like what's in the code snippet in the question, I'd not put anything into any scope longer-lived than request. The reasons being:
The objects you are instantiating are not re-usable (Strings are immutable, and the output streams don't look re-usable either)
Even if they were re-usable, the objects in question aren't thread-safe. They can't be shared between concurrent requests, so application scope isn't appropriate and actually session scope probably isn't safe either as concurrent requests for the same session can easily occur.
The objects you're using there are probably very low overhead to create, so there'd be little benefit to trying to cache them, if you could.
If you have objects that are really resource intensive, then caching and pooling them can make sense (e.g. Database Connections), but it's considerable effort to get right, so you need to be sure that you need it first.