Multiple threads modifying a collection in Java? - java

The project I am working on requires a whole bunch of queries towards a database. In principle there are two types of queries I am using:
read from excel file, check for a couple of parameters and do a query for hits in the database. These hits are then registered as a series of custom classes. Any hit may (and most likely will) occur more than once so this part of the code checks and updates the occurrence in a custom list implementation that extends ArrayList.
for each hit found, do a detail query and parse the output, so that the classes created in (I) get detailed info.
I figured I would use multiple threads to optimize time-wise. However I can't really come up with a good way to solve the problem that occurs with the collection these items are stored in. To elaborate a little bit; throughout the execution objects are supposed to be modified by both (I) and (II).
I deliberately didn't c/p any code, as it would be big chunks of code to make any sense.. I hope it make some sense with the description above.
Thanks,

In Java 5 and above, you may either use CopyOnWriteArrayList or a synchronized wrapper around your list. In earlier Java versions, only the latter choice is available. The same is true if you absolutely want to stick to the custom ArrayList implementation you mention.
CopyOnWriteArrayList is feasible if the container is read much more often than written (changed), which seems to be true based on your explanation. Its atomic addIfAbsent() method may even help simplify your code.
[Update] On second thought, a map sounds more fitting to the use case you describe. So if changing from a list to e.g. a map is an option, you should consider ConcurrentHashMap. [/Update]
Changing the objects within the container does not affect the container itself, however you need to ensure that the objects themselves are thread-safe.

Just use the new java.util.concurrent packages.
Classes like ConcurrentLinkedQueue and ConcurrentHashMap are already there for you to use and are all thread-safe.

Related

Duplicate planning entities in the solution

I'm new to Optaplanner, and I try to solve a quite simple problem (for now, I will add more constraints eventually).
My model is the following: I have tasks (MarkerNesting), that must run one at a time on a VirtualMachine; the goal is to assign a list of MarkerNestings to VirtualMachines, having all machines used (we can consider that we have more tasks than machines as a first approximation). As a result, I expect each task to have a start and a end date (as shadow variables - not implemented yet).
I think I must use a chained variable, with the VirtualMachine being the anchor (chained through time pattern) - am I right?
So I wrote a program inspired by some examples (tsp and coach and shuttle) with 4 machines and 4 tasks, and I expect each machine having one task when it is solved. When running it, though, I get some strange results : not all machines are used, but the worst is that I have duplicate MarkerNesting instances (output example):
[VM 1/56861999]~~~>[Nesting(155/2143571436)/[Marker m4/60s]]~~~>[Nesting(816/767511741)/[Marker m2/300s]]~~~>[Nesting(816/418304857)/[Marker m2/300s]]~~~>[Nesting(980/1292472219)/[Marker m1/300s]]~~~>[Nesting(980/1926764753)/[Marker m1/300s]]
[VM 2/1376400422]~~~>[Nesting(155/1815546035)/[Marker m4/60s]]
[VM 3/1619356001]
[VM 4/802771878]~~~>[Nesting(111/548795052)/[Marker m3/180s]]
The instances are different (to read the log: [Nesting(id/hashcode)]), but they have the same id, so they are the same entity in the end. If I understand well, Optaplanner clones the solution whenever it finds a best one, but I don't know why it mixes instances like that.
Is there anything wrong in my code? Is it a normal behavior?
Thank you in advance!
Duplicate MarkerNesting instances that you didn't create, have the same content, but a different memory address, so are != from each other: that means something when wrong in the default solution cloner, which is based on reflection. It's been a while since anyone ran into an issue there. See docs section on "planning clone". The complex model of chained variables (which will be improved) doesn't help here at all.
Sometimes a well placed #DeepPlanningClone fixes it, but in this case it might as well be due to the #InverseRelationShadowVariable not being picked.
In any case, those system.out's in the setter method are misleading - they can happen both by the solution cloner as well as by the moves, so without the solution hash (= memory address), they tell nothing. Try doing a similar system.out in either your best solution change events, or in the BestSolutionRecaller call to cloneWorkingSolution(), for both the original as well as the clone.
As expected, I was doing something wrong: in Schedule (the PlanningSolution), I had a getter for a collection of VirtualMachine, which calculate from another field (pools : each Pool holds VirtualMachines). As a result, there where no setter, and the solution cloner was probably not able to clone the solution properly (maybe because pools is not annotated as a problem fact or a planning entity?).
To fix the problem, I removed the Pool class (not really needed), leaving a collection of VirtualMachines in Schedule.
To sum up, never introduce too many classes before you need them ^_^'
I pushed the correct version of my code on github.

combined vs. separate backend calls

I try to figure out the best solution for a use case I'm working on. However, I'd appreciate getting some architectural advice from you guys.
I have a use case where the frontend should display a list of users assigned to a task and a list of users who are not assigned but able to be assigned to the same task.
I don't know what the better solution is:
have one backend call which collects both lists of users and sends them
back to the frontend within a new data class containing both lists.
have two backend calls which collect one of the two lists and send them
back separately.
The first solution's pro is the single backend call whereas the second solution's pro is the reusability of the separate methods in the backend.
Any advice on which solution to prefer and why?
Is there any pattern or standard I should get familiar with?
When I stumble across the requirement to get data from a server I start with doing just a single call for, more or less (depends on the problem domain), a single feature (which I would call your task-user-list).
This approach saves implementation complexity on the client's side and saves protocol overhead for transactions (TCP header, etc.).
If performance analysis shows that the call is too slow because it requests too much data (user experience suffers) then I would go with your 2nd solution.
Summed up I would start with 1st approach. Optimize (go with more complex solution) when it's necessary.
I'd prefer the two calls because of the reusability. Maybe one day you need add a third list of users for one case and then you'd need to change the method if you would only use one method. But then there may be other use cases which only required the two lists but not the three, so you would need to change code there as well. Also you would need to change all your testing methods. If your project gets bigger this makes your project hard to update or fix. Also all the modifications increase the chances of introducing new bugs as well.
Seeing the methods callable by the frontend of the backend like an interface helps.
In general an interface should be open for extension but closed on what the methods return and require. As otherwise a slight modification leads to various more modifications.

How can I compare 2 large objects running on separate jvm's?

I am looking at changing the way some large objects which maintain the data for a large website are reloaded, they contain data relating to catalogue structure, products etc and get reloaded daily.
After changing how they are reloaded I need to be able to see whether there is any difference in the resulting data so the intention is to reload both and compare the content.
There may be some issues(ie. lists used when ordering is not imporatant) that make the comparison harder so I would need to be able to alter the structure before comparison. I have tried to serialise to json using gson but I run out of memory. I'm thinking of trying other serialisation methods or writing my own simple one.
I imagine this is something that other people will have wanted to do when changing critical things like this but I haven't managed to find anythign about it.
In this special case (separate VMs) I suggest adding something like a dump method to each class which writes the relevant content into a file (human readable text). This method calls dump on each aggregated object as well.
In the end you have to files from each VM, and then you can compare them using an MD5 checksum for example.
This is probably a lot of work, but if you encounter any differences, you can use diff on both files, and this will be a great help.
You can start with a simple version, and refine it step-by-step by adding more output.
Adding (complete) serialization later to a class is cumbersome. There might be tools which simplify this (using reflection etc.), but in my experience you have to tweak your classes: Exclude fields which are not relevant, define a sort order for lists, cyclic relations etc.
Actually I use a similar approach for the same reasons (to check whether a new version still returns the same result): The application contains multiple services (for each version), the results are always data transfer objects, serialization is added immediately to the DTOs, and DTOs must provide a comparison method dedicated for this purpose.
Looking at the complications and memory issues, also as you have mentioned you dont want to maintain versions, i would look to use database for comparison.
It will need some effort in terms of mapping your data in jvm to db table but once you have done that, it will be staright forward. You can dump data from one large object in db tables and then you can simply run a check from 2nd object in db.
Creating a stored proc can simplify things. This solution can support data check from any number of jvms.

Will I have a big performance penalty when I use Reflection api in Android application?

I want to implement a Map of functions. I have several functions for inserting different data in SQLite database (InsertUSER_Data(), InsertMessages()) and so on. I want to create a Map and to call a specific function by a key command. As I see during search there are two approaches for this: anonymous classes and reflection. (e.g. here How to call a method stored in a HashMap? (Java)) I really like the approach based on Reflection API (via a Method type) : Map <String, Method> instead of just using anonymous classes which implement interface. But I have doubts: is it a really big performance overhead in this solution, especially if I'll use it in Android or it's not really significant? Detail explanation will be very helpful.
I cannot answer your exact point, but some basics could help :
The art of knowing the performance beforehand is subtle, difficult and dangerous. The best way to do it is to measure the difference between the two solutions (that could be as simple as System.currentTimeMillis()).
As a rule of thumb, the time passed "in memory" will often be of little importance in terms of performance, in comparison with other things like IO (file or database acesses), remote calls or even something UI elements.
Except if you intend to do (b?)millions of calls, I sincerely doubt that such as small Map would be of consequence. But again, do not believe me, test it.
Finally, could your problem not be solve simply using interfaces ? That would remove this problem, and it is often easier to write, read, and debug. Something like :
interface Actionable {
Result doStuff(Param p);
}
That your various function classes could implement, and that would be possible to call afterwards without knowing exactly what is behind.

Differences between 2 objects with Java reflection : how?

For an audit log, i need to know the differences between 2 objects.
Those objets may contains others objets, list, set of objects and so the differences needed maybe recursive if desired.
Is there a api using reflection (or other) already for that ?
Thanks in advance.
Regards
It's a pretty daunting problem to try and solve generically. You might consider pairing a Visitor pattern, which allows you to add functionality to a graph of objects, with a Chain of Responsibility pattern, which allows you to break separate the responsibility for executing a task out into multiple objects and then dynamically route requests to the right handler.
If you did this, you would be able to generate simple, specific differentiation logic on a per-type basis without having a single, massive class that handles all of your differentiation tasks. It would also be easy to add handlers to the tree.
The best part is that you can still have a link in your Chain of Responsibility for "flat" objects (objects that are not collections and basically only have propeties), which is where reflection would help you the most anyway. If you "catch-all" case uses simple reflection-based comparison and your "special" cases handle things like lists, dictionaries, and sets, then you will have a flexible, maintainable, inexpensive solution.
For more info:
http://www.netobjectives.com/PatternRepository/index.php?title=TheChainOfResponsibilityPattern
http://www.netobjectives.com/PatternRepository/index.php?title=TheVisitorPattern
I have written a framework that does exactly what you were looking for. It generates a graph from any kind of object, no matter how deeply nested it is and allows you to traverse the changes with visitors. I have already done things like change logs generation, automatic merging and change visualization with it and so far it hasn't let me down.
I guess I'm a few years too late to help in your specific case, but for the sake of completion, here's the link to the project: https://github.com/SQiShER/java-object-diff

Categories