There is a specification of Java memory model.
And I want to dive into the source code to actually investigate how those mechanisms are implemented. (e.g., synchronized, volatile, ..., etc.)
But the codebase is so huge, I have no idea where to start with.
(http://www.java2s.com/Open-Source/Java-Document/CatalogJava-Document.htm)
Could anyone give me some clues?
Thanks a lot!
You might start by looking at the synchronizer.cpp class in the current version of the JDK. Prepare yourself a strong pot of coffee-- you've picked one of the most complex areas of the JVM to start delving into the source code.
If you haven't already done so, I would also suggest that you take a look at Bill Pugh's page on the Java Memory Model and Doug Lea's recommendations for compiler writers on implementing the Java memory model.
You may also glean something from running the debug JVM with the option turned on to output the JIT-compiled assembly which you can then inspect. (This won't tell you everything, but it might give you some pointers in: I think some of the things it prints will if nothing else give you some things to search for in the JDK source code...)
Related
So up until about 6 months ago, most of my work (big graph processing) consisted of Python and C++. Up to that point, and even now, I had not written any Java whatsoever.. I had seen the language and was familiar with the syntax (having come from a C/C++ background), and liked the idea of the JVM, but never actually written any substantial amount of Java.
When I picked up Scala, I loved it, OOP and functional programming features all in one, and it being on the JVM was great. I've been constantly striving to improve my Scala and have been playing with Akka, and still loving it. However, at times, perhaps it is just me overthinkng it, but I feel I should learn some more about Java and/or the JVM.
I've heard from many that Scala should be considered a separate language from Java, much like C++ to C. Perhaps you may feel the same way, and perhaps learning Java is more or less disjoint from learning Scala, but I'm feeling learning more about the JVM (e.g. JIT compilation, type erasure) would be helpful?
Thoughts?
The JVM executes Bytecode and it is definately helpful to know how this works, just as it is sometimes helpful to know how C/C++ method invocations work or how classes are initialized; because sometimes it matters and cannot be abstracted away.
Java is the prime language for the JVM, and it is helpful to be able to read Java to some extent if you need to use Java classes directly. And this may happen quite often; only a few examples:
you need to use some third party Java library (and there are tons)
working with Properties
you need to do something special in Swing which is not supported by the Scala-Swing wrapper
also sources explaining stuff for 1) will most probably use Java examples
But my advice is not to study it in advance - you'll pick it up when you need it.
Buy this book now: Java Performance. It was only released last October and is a treasure trove of information for anyone who wants to understand the JVM. If you're going to be a Scala developer, you must understand garbage collection and JVM runtime parameters at the very least.
Off top of my head
primitives, autoboxing, and how java arrays are special;
erasure and manifests
how logically tail recursive calls in scala source are compiled
installing -client, -server on your platform and when you want to try 32 bit: e.g. JAVA_HOME and "Java preferences" in OS X. I think openJDK should work anywhere you use same version of Oracle JDK, but IntelliJ warns you to use only official Oracle JDK. I've seen very isolated reports that 3d graphics libs have had problems with openJDK, and also parts of openJDK like the fonts have licensing issues.
setting classpaths in REPL, as compiler option, and in SBT
Hotspot switches, XMX, XMS (heap settings), most common garbage collectors, inlining method calls
java.util.concurrent
possible binary compatibility issues with java and scala code compiled in JDK 6 and 7.
I don't know exactly what you need, but there are several similar questions on SO, look it.
Understanding JVM Better
Understanding the Sun JVM
Also, here good articles for Java Memory Model
Java theory and practice: Fixing the Java Memory Model, Part 1
Java theory and practice: Fixing the Java Memory Model, Part 2
My thought it's better to dig in Java language, write some code, read Java-specific books for better understanding how all things works.
There are a lot of tools surrounding the JVM. If you want to understand how your programs are running (for performance or other reasons) then it's worth being au fait with these. Two useful tools are:
jstack
visualvm
Both are particularly useful for monitoring and interrogating long-running processes.
I think you will know more about JVM when you program Scala. I mean you will have more questions like 'Why this solution is slow and that is fast?' - one way to answer this question is to check the bytecode
I'd like to learn how, or if its possible at all to programmaticly interact with a black-box java application(by reading its data). Has there been any previous research/work on doing this sort of thing?
I'd imagine that running on a JVM significantly complicates things.
#anon: Doing this with any JVM is relevant. Do you have to know or control the specifics of how the JVM allocates memory to extract data from a java application?
You could look into java.lang.instrument. As long as you understand the class structure of the application, it will let you modify the methods in an already-running JVM and you may be able to concoct a way that allows you to extract or insert data enough to communicate (depends on the methods available, of course).
The Sable group at McGill University has contributed a lot of research to the Java world.
Much of the work is getting rather dated, but you might find some help in their EVolve project which has the goal of visualizing object-oriented programs. Some of their projects appear to be actively maintained (such as Soot, their Java optimization framework), so you might find luck contacting them directly
It is easily possible with, for example, StackTrace. It can attach to a java process and let you inspect and change almost everything with BeanShell.
I believe what you're looking for is what the Eclipse MAT does. You might want to take a look at the source code...
The HotSpot JVM allows you to hook up an agentlib from a profiler (see Open Source Java Profilers or commercials like Your Kit), in the profiler you can then inspect the memory/cpu/threads etc.
If you want very specific stuff you might want to make your own agentlib that sends you information about the jvm that you need.
I've used Concurrent Pascal, a tool which helps debug concurrent algorithms because when it runs your code, it randomizes which thread to swap to at every possible step, trying out as many paths as possible.
Is there a JVM that can do this?
Take a look at the Java Pathfinder (from NASA, nonetheless—and it's free). I think it should do what you need almost out of the box, that is, trying different interleavings (some assembly may be required).
Of course, you still need to specify the verification property on your data that you're interested in, like an invariant. Otherwise, by default it would probably only tell you if there was a deadlock. Take a look at the section "Explore Execution Alternatives".
There are no commercial JVMs I'm aware of that do this, but I suggest you look at tools like ConTest that try to help you in your problem domain:
ConTest on developerWorks
ConTest on research site
In general, because most commerical JVMs rely on the OS to do thread scheduling, it's not a natural thing for JVMs themselves to do. There might be something out there for the green-threads versions of Jikes-RVM (which might be the older ones).
This may be a bit off topic of "right answer, not discussion."
However, I am trying to debug my thought process, so maybe someone can help me:
I use compilers all the time, and the fact that I'm giving up control over machine code generation (the layout of my caches, and the flow of electrons) does not bother me.
However, giving up control of memory layout (being able to place stuff in memory) and memory management (garbage collection) still bothers me these days.
Have others dealt with this? If so, how did you get past it? (In particular, how I often feel "safer" in C++ than in Java.)
Thanks!
Your feeling is, naturally, very subjective.
You might feel comfortable managing your own memory space in C++.
Others might appreciate the easiness of Java managing the heap for you, and reducing memory management overhead to a minimum.
Programming domain has an influence as well. For example, in an embedded environment, you most likely will not have the privilege to enjoy a garbage collection mechanism, leaving you to manage your own memory, whether you like it or not.
Bottom line - subjective and domain-dependent.
Confront your nightmare! Profile a busy application in NetBeans and watch the garbage collector do its job.
If you trust the JVM with code generation, why not trust it with data generation too?
Please note that things like cache sizes on CPU may influence the optimal placement of your objects, and that the JIT basically knows better than you because it can measure and take action in the process.
If you've ever used COM under C++ its really no different to using "Release()". The momory may or may not be freed right then or it may be freed somewhere down the line when the thing using it has finished using it.
Best thing to do is just assume it works and stop worrying about it.
The original poster asked about (a) memory layout and (b) memory management. The previous answers only talk about memory management.
Regarding memory layout, the keyword to search for seems to be "struct".
C and C++ both have memory layout control. D should as well.
It appears (based on a quick search) Java does not.
C# has grants memory layout control via structs. See:
Stack Overflow: incorrect members order in a C# structure
http://www.developerfusion.com/article/84519/mastering-structs-in-c/
Go's data structures are called "structs", but I cannot tell if they grant control over memory layout. (I suspect they do, but have not been able to confirm this.)
I welcome any corrections/additions to the above.
(And regarding memory management, I'm quite happy to let the language/platform do it.)
In a recent question asked recently my simple minded answer highlighted many of my misconceptions about Java, the JVM, and how the code gets compiled and run. This has created a desire in me to take my understanding to a lower level. I have no problems with the low level understanding like assembly how ever bytecode and the JVM confound me. How object oriented code gets broken down on a low level is lost to me. I was wondering if anyone had any suggestion on how to learn about the JVM, bytecode and the lower level functioning of Java. Are there any utilities out there that allow you to write and run bytecode directly as I believe hands on experience with something is the best way to grow in understanding of it? Additionally and reading suggestions on this topic would be appreciated.
Edit: Secondary question. So I have a kinda sub question, the answers gave me an interesting idea to learn about the jvm, what would the plausibility of writing a really simple language like brainf**k or Ook only in a readable syntax (maybe I could even develop it to support oo eventually) that compiles into bytecode be? Would that be a good learning experience?
Suggested reading: the JVM spec.
You might also want to play with BCEL - there are other libraries around for manipulating bytecode, but that's probably the best known one.
The Apache BCEL will allow you to analyse and hand craft .class files from bytecode.
javap will allow you to disassemble existing .class files. It's particularly useful for knocking up quick test classes to understand what is really going on underneath the covers.
I learned by reading the ASM tutorial and mucking about with the library itself.
IMHO, ASM is better than BECL.
BCEL is already being used
successfully in several projects such
as compilers, optimizers,
obsfuscators, code generators and
analysis tools. Unfortunately there
hasn't been much development going on
over the past few years. Feel free to
help out or you might want to have a
look into the ASM project at
objectweb.
- http://jakarta.apache.org/bcel/
There is only one reliable source for JVM understanding
The Java® Virtual Machine Specification Java SE 7 Edition
http://docs.oracle.com/javase/specs/jvms/se7/html/index.html
Programming for the Java Virtual Machine is a good book for this topic. (Disclosure: I work with the author.)
For understanding Java/the JVM's architecture: read Wikipedia, the specs and the source code.
For understanding how object-orientated code is done on a low level: try and emulate features like inheritance/polymorphism/encapsulation in a lower-level language like C.
In C you can achieve the above through, for example, a combination of function pointers and nested structures.