It seems more efficient to store events on the dispatcher if they're never going to change. Isn't creating them costly? Is there some kind of information I'm losing if I dispatch an already-stored event?
Essentially what your asking is a specific use case of object pooling, and the performance of object pooling vs. object creation. Languages like Java don't get much benefit from object pooling because Java does this for you. And does it better than if you were to do it on your own. In fact Java engineers have made it very clear you shouldn't do this because Java allocates, scales, and handles thousands of objects better than you can. That's the whole point of GC. Object allocation in Java is already doing pooling for you, but at a lower level that's why memory allocation in Java and other languages is way faster than C. Every object allocated in Java comes from a ready made memory pool by the JVM.
http://www.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-jw22JavaUrbanLegends
Another reason events are created at dispatch time, instead of caching them, is they carry parameters in them that change between each dispatch. For example, what character was typed, where the mouse was clicked, etc. Recycling events might sound like a good idea until you do it, and all of a sudden you're sending old information with the wrong event. Better to just new that event up each time and have it properly initialize itself with the right data. It's simpler for the author and less error prone.
There are also technical problems you might encounter if you reused events. Event cancellation schemes usually store that information in the event that is modified after dispatch by some listener. In actionscript you can say event.preventDefault() to affect the chaining of event listeners. What happens if you start reusing that event after preventDefault has been called? And at what time is it safe to say this event isn't being used by a listener that has yet to fire (callLater/invokeLater makes it hard). There's no callback in either Java/Actionscript that says this event is ok to reuse (no reclamation semantics to return the object to the pool).
That's not to say you could find a case where holding onto the event and reusing it performs better. But, just because it's faster for those high performance cases doesn't mean it's a good idea for everything every time. Remember don't optimize until you know there's a problem.
Related
I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.
In this amazing book the author Josh Bloch mentions:
"Oh, and one more thing: there is a severe performance penalty for using finalizers. On my machine, the time to create and destroy a simple object is about 5.6 ns. Adding a finalizer increases the time to 2,400 ns. In other words, it is about 430 times slower to create and destroy objects with finalizers."
Is there a way we can delete and object in java?
I thought we can simply let the objects fall out of scope or reset them to null. I intend to experiment this on my machine, seems like a fun idea but I am not sure how to delete and object.
Once you make the reference variable refers to null (assuming last reference) and that variable gets out of its scope, then the object is eligible to be garbage-collected at next garbage-collection cycle.
An object will cease to exist when there are no longer any strong rooted references to it; in most cases that's exactly what should happen. In some cases, however, an object will ask an outside entity to do something on its behalf, possibly to the detriment of other entities, in exchange for a promise to let that other entity know when its services are no longer required. For example, a "File" object might ask the OS for exclusive access to a file; until the OS is told that such access is no longer required, it will block everyone else's ability to use that file.
If an object which had made such a promise were abandoned and simply ceased to exist, the outside entity would keep on doing whatever it had been asked to do, to the detriment of everyone else, even though its actions were no longer of any benefit to anyone. To avoid this situation, Java allows objects to request notification when the GC notices that they seem to have been abandoned. Such notifications will be given (i.e. Finalize will be called on such objects) before the objects cease to exist, but there's no real guarantee of timeliness beyond that. An object which is finalized can then notify any and all entities acting on its behalf that they should stop doing so.
The creators of Java may have expected finalizers to be the primary mechanism by which objects could notify outside entities that their services are no longer required, but finalization really doesn't work very well. Other mechanisms such as AutoCloseable or PhantomReference are better in many cases.
I would like opinion on this to settle a small dispute. Any help would be greatly appreciated.
I have written my own file handler that is attached to the logger. This being a file handler and being accessed by multiple threads, I am using synchronization in order to ensure that there is no collision during the writing process. Additionally it is a rolling log, so I also close and open files, and do not want any problems there either.
His response to it was (as pasted from email)
I strongly believe that Synchronization is very bad in the Handler. It
is too complex for such easy task. So, I would say why do not use one
instance per Thread?
What would you say is better from performance's and memory management perspective.
Thank you very much for any response. Whenever writing and reading is involved in multithreaded applications I have used synchronization on java applications all my life, and have not heard of any severe performance issues.
So please I would like to know if there are any issues and I really should switch to one instance per thread.
And in general, what would be the downfall of using synchronization?
EDIT: the reason why I wrote a custom file handler (yes I do love slf4j), is because my custom handler is dealing with two files at once, and additionally I have few other functions I perform on top of writing to files.
another solution would be to use a separate thread to do the (costly on its own) writing and use concurrent queues to pass the log messages from the domain threads
the key part here is that pushing to a queue is much less costly that writing to a file and means that there is less interference from concurrent log calls
the call to log would then log like
private static BlockingQueue logQueue = //...
public static void log(String message){
//construct&filter message
logQueue.add(message);
}
then in the logger thread it will look like
while(true){
String message = logQueue.poll();
logFile.println(message);//or whatever you are doing
}
As with all I/O, you have little choice but mutual exclusion. You may theoretically build up a complex scheme with a lock-free queue which accumulates logging entries, but its utility, and especially its reliability, would be very questionable: without careful design you could get a logging-caused OOME, have the application hang on due to threads which you didn't clean up, etc.
Keep in mind that, assuming you are using buffered I/O, you already have an equivalent of a queue, minimizing the time spent occupying the lock.
The downfall to synchronisation is the fact that only one thread can access that part of the code at any one time, meaning your code will see little benefit from multithreading I.e. the synchronised part of your application will only be as fast as a single thread. (Small overhead for handling the synchronised status too, so a little slower perhaps)
However, in subjects where you don't want the threads to interfere with one another, such as writing to files, the security gained from the synchronisation is paramount, and the performance loss should just be accepted.
I guess JAXB calls the zero-arg constructor and then starts filling the non volatile fields and adds stuff to the lists.
In my own code: Immediately after doing this (the unmarshalling) the generated beans get deported to some worker threads over some add method, but not through the constructor or any other way that would trigger the memory model to flush and refetch the data to and from shared area.
Is this safe? Or does JAXB do some magic trick behind the scenes? I can't think of any way in the java programming language that could enforce everything being visible for all threads. Does the user of JAXB generated beans have to worry about fields maybe not being visibly set in a concurrent setup?
Edit: Why are there so many downvotes? Nobody was yet able to explain how JAXB ensures this seemingly impossible task.
I won't bother to investigate the various "facts" in your question, I'll just paraphrase:
"Without references it ain't true!"
That said, anyone dealing with threads in Java these days will have to actually try to avoid establishing happens-before and happens-after relationships inadvertently. Any use of a volatile variable, a synchronized block, a Lock object or an atomic variable is bound to establish such a relationship. That immediately pulls in blocking queues, synchronized hash maps and a whole lot of other bits and pieces.
How are you so certain that the JAXB implementation actually manages to do the wrong thing?
That said, while objects obtained from JAXB are about as safe as any Java object once JAXB is done with them, the marshalling/unmarshalling methods themselves are not thread-safe. I believe that you do not have to worry unless:
Your threads share JAXB handler objects.
You are passing objects between your threads without synchronization: A decidedly unhealthy practice, regardless of where those objects came from...
EDIT:
Now that you have edited your question we can provide a more concrete answer:
JAXB-generated objects are as thread-safe as any other Java object, which is not at all. A direct constructor call offers no thread-safety on its own either. Without an established happens-before relationship, the JVM is free to return partially initialized objects at the time when new is called.
There are ways, namely via the use of final fields and immutable objects, to avoid this pitfall, but it is quite hard to get right, especially with JAXB, and it does not actually solve the issue of propagating the correct object reference so that all threads are looking at the same object.
Bottom line: it is up to you to move data among your threads safely, via the use of proper synchronization methods. Do not assume anything about the underlying implementation, except for what is clearly documented. Even then, it's better to play it safe and code defensively - it usually results in more clear interactions between the threads anyway. If at a later stage a profiler indicates a performance issue, then you should start thinking about fine-tuning your synchronization code.
I am kicking off my final year project right now. I am going to be investigating the concurrency approaches from java and scala perspectives. Having come out of a java concurrency module, I can see why people say that the shared state threading approach is difficult to reason about. You have critical sections to worry about, run the risk of race conditions and deadlocks etc due to the non deterministic way in which java threads operate. With 1.5 this reasoning was given some clarity ,but still, far from crystal clear.
At first view, scala appears to remove this complex reasoning through the actors class. This has given the programmer the ability to develop concurrent systems from a more sequential viewpoint and easier to conceptualize. But, for this positive, am I right in saying that there are some drawbacks? For instance, say we want to sort a large list in both scenarios - with java you create two threads split the list in two, worry about the critical sections, atomic actions etc and go code. With scala, because it is "share nothing" you actually have to pass the list/2 to two actors to peform the sort operation, right?
I guess my question is that the price you pay for simpler reasoning is performance overhead of having to pass the collection to your actors, in scala?
I was thinking of doing some benchmark tests to this effect (selection sort, quick sort etc;) but because one is functional and one is imperative - I will not be comparing apples with apples from an algorithm viewpoint.
I would really appreciate any views you guys have on the above to give me some ideas to get me started.
Many thanks.
The nice thing about Scala is that you can do concurrency the Java way if you want. All the Java classes are available.
So it really boils down to the difference between a model where you have threads with concurrent access to mutable variables, and a model where you have stateful actors which send messages to each other but do not peek into each others' internals. And you're absolutely right that in some scenarios you have to trade off performance against ease of getting the code correct.
I generally find as a rough rule of thumb that if you're going to have a pile of threads spending a significant amount of time waiting for a lock to open up, using a Java model, and there is no clean way to separate the work to avoid having everyone waiting for that resource, and if the execution switches between threads quickly, then the Java model is far superior to an actor model where the actor sends an "I'm done" message back to a supervisor, which then sends out a "Here's new work!" message to an existing non-busy actor. Sorting algorithms, depending on how you envision them, can very much fall into this category.
For most everything else, the performance penalty associated with actors doesn't amount to much as far as I've seen. If you can conceive of your problem as lots and lots of reactive elements (i.e. they only need time when they've received a message), then actors can scale particularly well (millions available, though only a handful are working at any given instant); with threads, you'd need to have some sort of extra internal state to keep track of who should be doing what work, since you couldn't handle that many active threads.
I'm just going to point out here that Scala does not copy arguments passed to actors, so actors can share whatever it is passed to them.
As opposed to Erlang, it is the programmer's responsibility to avoid sharing mutable stuff. However, there is no penalty in sharing immutable stuff, since there's no need to lock it, as all accesses to it are read-only. And Scala has strong support for immutable data structures.