Why LogbackMDC keep track of last operation? - java

I am looking at the implementation of LogbackMDCAdapter, and it keeps track of lastOperation I don't understand the reason for doing this, anyone has an idea why this is done?
And, why duplicateAndInsertNewMap is required?

Based on the comment here the map copying is required for serialization purposes
Each time a value is added, a new instance of the map is created. This
is to be certain that the serialization process will operate on the
updated map and not send a reference to the old map, thus not allowing
the remote logback component to see the latest changes.
This refers to the behaviour of ObjectOutputStream sending references to previously written objects instead of the full object, unless using the writeUnshared method.
It is not directly obvious why it's possible to skip copying unless there's a get/put combination, but apparently even if you have multiple put operations in a row, the serialization will work properly as long as the map is copied only when a put/remove is performed right after a get. So this is a performance optimization to avoid copying the map unnecessarily when putting several items in it.

Related

Implementing shared data structure (e.g. cache) as a concurrentHashMap in a singleton bean

I am implementing an HTTP API using the Spring MVC framework.
I want to store some data between requests and between sessions. The data needs to be readable and modifiable by multiple requests in completely independent sessions, but it only needs to exist in-memory while the application is running, it does not need to be persisted to a database, and it does not need to be shared between any scaled-up, multi-node, multi-process server backend design, just one per (e.g.) Tomcat instance is completely fine. Consider for example a cache or something logging some short-lived metrics about the application-specific data coming in through the requests.
I am assuming the usual way would be to use an in-memory database or something like Redis.
However, this being my first venture into web stuff and coming from c++ parallel computing personally, this seems like an extremely over-engineered and inefficient solution to me.
Can I not just create a singleton bean containing a ConcurrentHashMap of my required types, inject it as a dependency into my Controller, and be done with it? I never see anyone talk about this anywhere, even though it seems to be the simplest solution by far to me. Is there something about how Spring MVC or Tomcat works that makes this impossible?
Basically, yes. "A singleton ConcurrentHashMap" can be used as a cache.
But, I'd go with something that works like a map but has an API that is specifically tailored to caches. Fortunately, such a thing exists.
Guava is a 'general utilities' project (just a bunch of useful utility classes, lots of em now seem a bit pointless, in the sense that java.util and co have these too, but guava is over 10 years old, and everything it has didn't exist back then) - and one of the most useful things it has is a 'Cache' class. It's a Map with bonus features.
I strongly suggest you use it and follow its API designs. It's got a few things that map doesn't have:
You can set up an eviction system; various strategies are available. You can allow k/v pairs to expire X milliseconds after being created, or optionally X milliseconds after the last time they were read. Or simply guarantee that the cache will never exceed some set size, removing the least recently accessed (or written - again, your choice) k/v pair if needed.
The obvious 'get a value' API call isn't .get() like with map, it's a variant where you provide the key as well as a computation function that would calculate the value; the Cache object will just return the cache value if it exists, but if not, it will run the computation, store it in the cache, and return that. Making your life a lot easier, you just call the get method, pass in the key and the computer, and continue, not having to care about whether the computation function is used or not.
You get some control over concurrent calculations too - if 2 threads simultaneously end up wanting the value for key K which isn't in the cache, should both threads just go compute it, or should one thread be paused to wait for the other's calculation? That's also not entirely trivial to write in a ConcurrentHashMap.
Some fairly fancy footwork - weak keying/valuing: You can set things up such that if the key is garbage collected, the k/v pair gets evicted (eventually) too. This is tricky (string keys don't really work here, for example, and sometimes your value refers to your key in which case the existence of the value would mean your key can't be GCed, making this principle worthless - so you need to design your key and value classes carefully), but can be very powerful.
I believe you can also get just the guava cache stuff on its own, but if not - you know where to look: Add guava as a dependency to your project, fire up an instance of CacheBuilder, read the javadocs, and you're off :)

java.util.ConcurrentModificationException in XPages

We have this multi-page browser-based application in XPages. The main window contains 5 frames with different pages, a 2nd (and there may be more) contain documents, all with a different page layout. We used to run the application with "all pages in memory" (coming from R8.5.1 originally), it was lightning fast but hogging memory. We're now in the process of moving it to "one page in memory, the rest on disk".
I think the situation we have right now is this:
all pages share the same sessionScoped bean
page A is refreshed automatically: every minute an Ajax request is sent to fetch data
page B happens to be serializing a HashMap at the same time
the refresh of the first page changes the HashMap being serialized
The HashMap is an object inside the bean. Why is the bean serialized? I might be mistaken, it might just be a different HashMap that's being serialized...
Anyway, my question: how can I synchronize these actions, is there some easy way?
PS I already tried with a ConcurrentHashMap, but I got some very weird results...
Thanks for anything helpful!
"Why is the bean serialized?" A sessionScoped bean would not be serialized by default. It can happen if you use a load-time binding that evaluates to the bean like ${someBean} or if its serializing the HashMap you might have referenced that in a load-time binding, like ${someBean.someHashMap} (where ${ is load-time bindings, and #{ are runtime bindings ). The results of load-time bindings are saved in the control tree, which is serialized when you're saving server-side pages on disk. The solution there would be to change those references to runtime bindings.
"how can I synchronize these actions"
There's a synchronized keyword in SSJS, see:
http://mattwhite.me/blog/2009/9/14/on-synchronization-in-xpages.html
but that can only protect the object from concurrent access in SSJS; the page state serializing won't be synchronized on the same object, so you'd still have to fix it to not-serialize the bean&HashMap.
As always when encountering an error like this you should ask yourself why and read the documentation for the exception. This exception occurs because you modify a collection while iterating it (eg reading from it). For this exact reason Java ship with a collection implementation called CopyOfWriteList which allows writes while reading. It does this by making a new list (eg copy the pointer) when the list is written. This is great when writes are less frequent than reads. Unfortunately there is no such thing for Maps built into the jdk.
My suggestion would be to encapsulate the map and implementing a similar feature so a new map is created on new data. This would make the map immutable for readers and hence would remove the concurrent mod exception. Users that read while new data arrive would get the "old" data but it would perform much better than synchronizing access.
Hope is helps.

Can someone explain to me when it is useful to use MapMaker or WeakHashMaps?

I have read many people really like the MapMaker of Google Guava (Collections), however I cannot see any good uses of it.
I have read the javadoc, and it says that it behaves like ConcurrentHashMap. It also says new MapMaker().weakKeys().makeMap() can almost always be used as a drop-in replacement for WeakHashMap.
However, reading the javadocs of ConcurrentHashMap and WeakHashMap makes me wonder when it is useful to use it? It seems to me that you cannot have a guarantee that whatever you put in the map will be there, or have I misunderstood?
The thing about MapMaker is that there are many options for the kind of map you build, which enables those maps to serve many purposes.
Dirk gives a good example of a use for weak keys.
Soft values are useful for caching, as you can cache values in the map without worrying about running out of memory since the system is free to evict entries from the cache if it needs memory.
You can choose to have entries expire after a certain amount of time. This is also useful for caching, since you may want certain data cached for a specific period of time before doing an expensive operation to update it.
One of my favorite things is making a computing map. A computing map uses a Function<K, V> to automatically retrieve the value associated with a given key if it isn't already in the map. This combines well with soft values and/or expiration times. After an entry is evicted by the map (due to memory demand or expiration), the next time the value associated with that key is requested it will automatically be retrieved and cached in the map once more.
...and that's somewhat the point of it. Weak references are useful, if you don't want to (or cannot afford to) retain an object indefinetly in memory. Consider the following use case: you need to associate information with classes. Now, since you are running in an environment, where classes might get reloaded (say, a Tomcat, or OSGi environment), you want the garbage collector to be able to reclaim old versions of a class as soon as it deems safe to do so.
An initial attempt to implement this, might look like
class ClassAssoc {
private final IdentityHashMap<Class<?>,MyMetaData> cache = new ...;
}
The problem here is: this would keep all classes in the cache member forever (or at least, unless they are manually removed), forcing the garbage collector to retain them indefinitly, including everything referenced from the class (static member values, class loader information, ...)
By using weak references, the garbage collector can reclaim old version of the class as soon as no other references to it (usually instances) exist. On the other hand: as long as such references exist, the value is guaranteed to be also reachable from the weak reference object, and thus, is a valid key in the cache table.
Add concurrency and other atrocities to the picture, and you are at what MapMaker optionally also provides...
A WeakHashmap entry will be kept in the map while someone (other than the map) is referencing the entry. If nobody else thant the map is keeping a reference on the entry, then the entry can be removed on a next GC run.
ConcurrentHashMap is a Map which may be safely used in multi-threading environment. It is better than synchronized version of regular Map because concurrency means that different threads are often available to access this map without blocking.

Multiple threads modifying a collection in Java?

The project I am working on requires a whole bunch of queries towards a database. In principle there are two types of queries I am using:
read from excel file, check for a couple of parameters and do a query for hits in the database. These hits are then registered as a series of custom classes. Any hit may (and most likely will) occur more than once so this part of the code checks and updates the occurrence in a custom list implementation that extends ArrayList.
for each hit found, do a detail query and parse the output, so that the classes created in (I) get detailed info.
I figured I would use multiple threads to optimize time-wise. However I can't really come up with a good way to solve the problem that occurs with the collection these items are stored in. To elaborate a little bit; throughout the execution objects are supposed to be modified by both (I) and (II).
I deliberately didn't c/p any code, as it would be big chunks of code to make any sense.. I hope it make some sense with the description above.
Thanks,
In Java 5 and above, you may either use CopyOnWriteArrayList or a synchronized wrapper around your list. In earlier Java versions, only the latter choice is available. The same is true if you absolutely want to stick to the custom ArrayList implementation you mention.
CopyOnWriteArrayList is feasible if the container is read much more often than written (changed), which seems to be true based on your explanation. Its atomic addIfAbsent() method may even help simplify your code.
[Update] On second thought, a map sounds more fitting to the use case you describe. So if changing from a list to e.g. a map is an option, you should consider ConcurrentHashMap. [/Update]
Changing the objects within the container does not affect the container itself, however you need to ensure that the objects themselves are thread-safe.
Just use the new java.util.concurrent packages.
Classes like ConcurrentLinkedQueue and ConcurrentHashMap are already there for you to use and are all thread-safe.

Optimal diff between object lists in Java

I have a List of Java objects on my server which is sent to the client through some serialization mechanism. Once in a while the List of objects gets updated on the server, that is, some objects get added, some get deleted and others just change their place in the List. I want to update the List on the client side as well, but send the least possible data. Especially, I don't want to resend Objects which are already available on the client.
Is there a library available which will produce some sort of diff from the two lists, so that I can only send the difference and the new Objects accross the wire?
I have found several Java implementation of the unix diff command, but this algorithm is unpractical for order changes. ie. [A,B,C] -> [C,B,A] could be sent as only place changes [1->3] [3->1], while diff will want to resend the whole A and C objects (as far as I understand).
I would do this by making the public interface of the objects wherever they are modified silently keep a log of changes made, that is, add an object representing each modification to a list of modifications.
That way you have a minimal list of the exact changes to send to the other machine, rather than needing to infer them using fallible guesswork by comparing old versus new.
To create the object model so that it automatically records changes to itself, you will likely benefit from some code generation or AOP to avoid a lot of repetitive patterns. Methods that set the value of a property, or add/remove from lists, all need to call into a central log shared by the object hierarchy.
You can "pretend" that your list is a string, and use Damerau–Levenshtein distance to find the minimum operations necessary to transform one to another, allowing insertion, deletion, substitution, and transposition (which is what your example suggests).
I'm not aware of a mature and/or stable implementation, and even if one exists, it's likely targeted for strings, so adapting to a list of abstract value types would be a challenge. Implementing your own is also likely to be a challenging task, but it's certainly doable.
JaVers lib (http://javers.org) do the job.
Diff diff = javers.compare(list1, list2);
Diff contains list of changes like: object-added, object-removed, index-changed
For now I'll just send the complete List over the wire but instead of the objects, I use only a unique ID. If the client does not have the object locally, it requests it using the ID.
This is certainly less beautiful than an optimal algorithm but has the expected result: expensive objects are only sent once over the wire.

Categories