I am wondering about how unsafe the use sun.misc.Unsafe actually is. I want to create a proxy of an object where I intercept every method call (but the one to Object.finalize for performance considerations). For this purpose, I googled a litle bit and came up with the following code snippet:
class MyClass {
private final String value;
MyClass() {
this.value = "called";
}
public void print() {
System.out.println(value);
}
}
#org.junit.Test
public void testConstructorTrespassing() throws Exception {
#SuppressWarnings("unchecked")
Constructor<MyClass> constructor = ReflectionFactory.getReflectionFactory()
.newConstructorForSerialization(MyClass.class, Object.class.getConstructor());
constructor.setAccessible(true);
assertNull(constructor.newInstance().print());
}
My consideration is:
Even though Java is advertised as Write once, run everywhere my reality as a developer looks rather like Write once, run once in a controllable customer's run time environment
sun.misc.Unsafe is considered to become part of the public API in Java 9
Many non-Oracle VMs also offer sun.misc.Unsafe since - I guess - there are quite some libraries already use it. This also makes the class unlikely to disappear
I am never going to run the application on Android, so this does not matter for me.
How many people are actually using non-Oracle VMs anyways?
I am still wondering: Are there other reasons why I should not use sun.misc.Unsafe I did not think of? If you google this questions, people rather answer an unspecified because its not safe but I do not really feel it is besides of the (very unlikely) possibility that the method will one day disappear form the Oracle VM.
I actually need to create an object without calling a constructor to overcome Java's type system. I am not considering sun.misc.Unsafe for performance reasons.
Additional information: I am using ReflectionFactory in the example for convenience which delegates to Unsafe eventually. I know about libraries like objenesis but looking at the code I found out that they basically do something similar but check for other ways when using Java versions which would not work for me anyways so I guess writing four lines is worth saving a dependency.
There are three significant (IMO) issues:
The methods in the Unsafe class have the ability to violate runtime type safety, and do other things that can lead to your JVM "hard crashing".
Virtually anything that you do using Unsafe could in theory be dependent on internal details of the JVM; i.e. details of how the JVM does things and represents things. These may be platform dependent, and may change from one version of Java to the next.
The methods you are using ... or even the class name itself ... may not be the same across different releases, platforms and vendors.
IMO, these amount to strong reasons not to do it ... but that is a matter of opinion.
Now if Unsafe becomes standardised / part of the standard Java API (e.g. in Java 9), then some of the above issues would be moot. But I think the risk of hard crashes if you make a mistake will always remain.
During one JavaOne 2013 session Mark Reinhold (the JDK architect) got a question: "how safe it is to use the Unsafe class?". He replied with sort of surprising answer: "I believe its should become a stable API. Of course properly guarded with security checks, etc..."
So it looks like there may be something like java.util.Unsafe for JDK9. Meanwhile using the existing class is relatively safe (as safe as doing something unsafe can be).
Related
I understand what public, private, and protected do. I know that you are supposed to use them to comply with the concept of Object Oriented Programming, and I know how to implement them in a program using multiple classes.
My question is: Why do we do this? Why shouldn't I have one class modifying the global variables of another class directly? And even if you shouldn't why are the protected, private, and public modifiers even necessary? It's as if programmers don't trust themselves not to do it, even though they are the ones writing the program.
Thanks in advance.
You're right, it's because we can't trust ourselves. Mutable state is a major factor in complexity of computer programs, it's too easy to build something that seems ok at first and later grows out of control as the system gets bigger. Restricting access helps to reduce the opportunities for objects' states to change in unpredictable ways. The idea is for objects to communicate with each other through well-defined channels, as opposed to tweaking each others' data directly. That way we have some hope of testing the individual objects and having some confidence in how they'll behave as part of a larger system.
Let me give a basic example (this is only for illustration):
class Foo {
void processBar() {
Bar bar = new Bar();
bar.value = 10;
bar.process();
}
}
class Bar {
public int value;
public void process() {
// Say some code
int compute = 10/value;
// Her you have to write some code to handle
// exception
}
}
Every thing looks good and you are happy. Now later you realized that other developers or your other apis that you are using to set the value are setting to 0 and this is leading to exception in your Bar.process() function.
Now according to above implementation there is no way you can restrain users from setting it to 0. Now look at below implementation.
class Foo {
void processBar() {
Bar bar = new Bar();
bar.setValue(0);
bar.process();
}
}
class Bar {
public int value;
public void setValue(int value) {
if(value == 0)
throw new IllegalArgumentException("value = 0 is not allowed");
this.value = value;
}
public void process() {
// Say some code
int compute = 10/value;
// No need to write exception handling code
// so in theory can give u better performance too
}
}
Not only you can now put check but also give a informative exception which can help figuring errors quickly and at early stage.
This is just one of the examples, the basics of OOP (Encapsulation, Abstraction etc.), help you standardize interface and hide the underneath implementation.
Keep in mind that the developer coding a given class may not be the only one using it. Teams of developers write software libraries, which in Java are commonly distributed as JARs, used by completely different teams of developers. If these standards weren't in place, it would be very difficult for others to know, at a minimum, what the intent was of any available variables / methods.
If I have a private/protected instance variable, for example, I may have a public "setter" method that checks for validity, preconditions, and performs other activities - all which would be bypassed if anyone was freely able to modify the instance variable directly.
Another good point is in the Java documentation / tutorial: http://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html :
Public fields tend to link you to a particular implementation and
limit your flexibility in changing your code.
Non-local behavior is difficult to reason about.
Minimizing surface area increases comprehension.
I don't trust myself to remember all behavioral side-effects.
I definitely don't trust "you" to understand all behavioral side-effects.
The less I expose the more flexibility I have to modify and extend.
My question is: Why do we do this?
Basically, because by restricting ourselves in this way, we make it easier ... for ourselves, and others who may need to read / modify the code in the future ... to understand the code, and the way that the various parts interact.
Most developers understand code by a mental process of abstraction; i.e. mentally drawing boundaries around bits of code, understanding each bit in isolation, and then understanding how each bit interacts with other bits. If any part of the code could potentially mess around with the "innards" of any other part of the code, then it makes it hard for the typical developer to understand what is going on.
This may not be a problem for you while you are writing the code, because you may be able to keep all of the complex interactions in your head while you create the code. But in a year or two's time, you will have forgotten a lot of the details. And other people never had the details in their heads to start with.
Why shouldn't I have one class modifying the global variables of another class directly?
Because it makes your code harder to understand; see above. The larger your codebase is, the more pronounced the problem will be.
Another point is that if you use over-use globals (statics actually) then you create problems if your code needs to be multi-threaded / reentrant, for unit testing, and if you need to reuse your code in other contexts.
And even if you shouldn't why are the protected, private, and public modifiers even necessary? It's as if programmers don't trust themselves not to do it, even though they are the ones writing the program.
It is not about trust. It is about expressing in the source code where the boundaries are.
If I write a class and declare a method or a field private, I know that I don't have to consider the problem of what happens if some other class calls it / accesses it / modifies it. If I'm reading someone elses code, I know that I can (initially) ignore the private parts when mapping the interactions and boundaries. The private and protected modifiers and package private just
provide different granularities of boundary.
(Or maybe it is about trust; i.e. not trusting ourselves to remember where the abstraction boundaries in our design are / were.)
There are basically two reasons:
1) There are interfaces in Java where security is required. For instance, when running a java applet on your box you want to assure that the applet can't access parts of the file system to which it's not authorized. Without enforcible security, an applet could reach into the Java security layer and modify its own authorities.
2) Even when everyone is "trusted", sometimes expediency trumps common sense, and programmers bypass APIs to access internal interfaces rather than getting the APIs enhanced (which admittedly can often take longer than practical). This creates problems both for stability and for upgrade compatibility.
(There's a legend, deep in the ancient history of computing, of an OS that was treated this way by the application programmers, to the extent that the programmers maintaining the OS were forced to make sure that certain code sections (not entry points, but actual internal code sequences) didn't change physical addresses when the OS was revised.)
Note that these problems existed before the OOP paradigm became common, and are some of the motivation for OOP. OOP isn't an arbitrary religious doctrine invented out of thin air, but is a set of principles that have been filtered through about 6 decades of programming experience.
For a bunch of reasons that (believe it or not) are not as unsound as you may think, we are still (sigh) using Java 1.4 to build and run our code (though we plan to finally move to Java 7 by the end of the year).
Our existing code that uses Collection classes doesn't do a very good job of making it clear what is expected to be in the Collection. Obviously, you can read the code and see what the downcasts end up being done and infer from that, but you can't just look at a method declaration and know what the Collection object that is a method argument or method return value actually holds.
In new code that I'm writing and when I am in older code that uses Collections, I've been adding in-line comments to Collections declarations to show what would have been declared if generics were being used. For example:
Map/*<String, Set<Integer>>*/ theMap = new HashMap/*<String, Set<Integer>>*/();
or
List/*<Actions>*/ someMethod(List/*<Job>*/ jobs);
In keeping with the frowning at subjectivity here at SO, rather than asking what you think of this (though admittedly I'd like to know -- I do find it a bit ugly but still like having the type info there) I'd instead just ask what, if anything, you do to make it clear what is being held by pre-generics Collection objects.
What we recommended back in the old days -- and I was a Java Architect at Sun when Java 1.1 was the New Thing -- was to write a class around the structure (I don't think 1.1 even had Collection as a base class) so that the typecasts happned in code you control instead of in user code. So, for example, something like
public class ArrayOfFoo {
Object [] ary; // ctor left as exercise
public void set(int index, Foo value){
ary[index] = (Object) value; // cast strictly not needed, any Foo is an Object
}
public void get(int index){
return (Foo) ary[index]; // cast needed, not every Object is a Foo
}
}
Sounds like the code base you have isn't built to this convention; if you're writing new code, there's no reason you can't start. Failing that, your convention isn't bad, but it's easy to forget the cast and then have to search to find out why you're getting a bad cast exception. It's mildly better to resort of some variant on Hungarian notation, or the Smalltalk 'aVariable' convention, by encoding the type in the names, so that you use
Object fooAry = new Object[aZillion];
fooAry[42] = new Foo();
Foo aFoo = fooAry[42];
Use clear variable identifiers such as jobList, actionList, or dictionaryMap. If you're concerned with the type of objects they contain, you could even make it a convention to always let the identifier of a Collection hint about which type of objects it holds.
The inlined comments aren't that idea actually. When I ported a 1.5 project back to 1.4 I did just that (instead of removing the type parameters). It worked out quite well.
I'd recommend writing tests. For various reasons:
You should be writing tests anyway!
You can assert the type of a collection member very easily to ensure that all your code paths are adding the right types to the collection
You can use the test to write code that serves as an "example" of how to use the collection correctly
If you just need binary compatibility to 1.4 you could consider using a tool to downgrade the class files back to 1.4 and thus start to develop in 1.6 or 1.7 right now. You would of course need to avoid any API that hasn't been there in 1.4 (unfortunately you can't compile code with generics against the 1.4 jars directly as they don't declare any generic types). The Bytecode is still the same (at least with 1.6, I don't know for sure about 1.7). One free tool that can do the trick is ProGuard. It can do much more sophisticated things and can also remove all traces of generics in the class files. Just turn off the obfuscation and optimization if you don't need it. It will also warn you if some missing API was used in the processed code if you feed it the 1.4 libraries.
I'm aware that is considered a hack by many but we had a similar requirement where we needed some code to still run on a Personal Java VM (this is essentially Java 1.1) and several other exotic VMs and this approach worked quite well. We started with ProGuard and then made our own tool for the task to be able to implement a few workarounds for some Bugs in the diverse VMs.
Before I ask my question can I please ask not to get a lecture about optimising for no reason.
Consider the following questions purely academic.
I've been thinking about the efficiency of accesses between root (ie often used and often accessing each other) classes in Java, but this applies to most OO languages/compilers. The fastest way (I'm guessing) that you could access something in Java would be a static final reference. Theoretically, since that reference is available during loading, a good JIT compiler would remove the need to do any reference lookup to access the variable and point any accesses to that variable straight to a constant address. Perhaps for security reasons it doesn't work that way anyway, but bear with me...
Say I've decided that there are some order of operations problems or some arguments to pass at startup that means I can't have a static final reference, even if I were to go to the trouble of having each class construct the other as is recommended to get Java classes to have static final references to each other. Another reason I might not want to do this would be... oh, say, just for example, that I was providing platform specific implementations of some of these classes. ;-)
Now I'm left with two obvious choices. I can have my classes know about each other with a static reference (on some system hub class), which is set after constructing all classes (during which I mandate that they cannot access each other yet, thus doing away with order of operations problems at least during construction). On the other hand, the classes could have instance final references to each other, were I now to decide that sorting out the order of operations was important or could be made the responsibility of the person passing the args - or more to the point, providing platform specific implementations of these classes we want to have referencing each other.
A static variable means you don't have to look up the location of the variable wrt to the class it belongs to, saving you one operation. A final variable means you don't have to look up the value at all but it does have to belong to your class, so you save 'one operation'. OK I know I'm really handwaving now!
Then something else occurred to me: I could have static final stub classes, kind of like a wacky interface where each call was relegated to an 'impl' which can just extend the stub. The performance hit then would be the double function call required to run the functions and possibly I guess you can't declare your methods final anymore. I hypothesised that perhaps those could be inlined if they were appropriately declared, then gave up as I realised I would then have to think about whether or not the references to the 'impl's could be made static, or final, or...
So which of the three would turn out fastest? :-)
Any other thoughts on lowering frequent-access overheads or even other ways of hinting performance to the JIT compiler?
UPDATE: After running several hours of test of various things and reading http://www.ibm.com/developerworks/java/library/j-jtp02225.html I've found that most things you would normally look at when tuning e.g. C++ go out the window completely with the JIT compiler. I've seen it run 30 seconds of calculations once, twice, and on the third (and subsequent) runs decide "Hey, you aren't reading the result of that calculation, so I'm not running it!".
FWIW you can test data structures and I was able to develop an arraylist implementation that was more performant for my needs using a microbenchmark. The access patterns must have been random enough to keep the compiler guessing, but it still worked out how to better implement a generic-ified growing array with my simpler and more tuned code.
As far as the test here was concerned, I simply could not get a benchmark result! My simple test of calling a function and reading a variable from a final vs non-final object reference revealed more about the JIT than the JVM's access patterns. Unbelievably, calling the same function on the same object at different places in the method changes the time taken by a factor of FOUR!
As the guy in the IBM article says, the only way to test an optimisation is in-situ.
Thanks to everyone who pointed me along the way.
Its worth noting that static fields are stored in a special per-class object which contains the static fields for that class. Using static fields instead of object fields are unlikely to be any faster.
See the update, I answered my own question by doing some benchmarking, and found that there are far greater gains in unexpected areas and that performance for simple operations like referencing members is comparable on most modern systems where performance is limited more by memory bandwidth than CPU cycles.
Assuming you found a way to reliably profile your application, keep in mind that it will all go out the window should you switch to another jdk impl (IBM to Sun to OpenJDK etc), or even upgrade version on your existing JVM.
The reason you are having trouble, and would likely have different results with different JVM impls lies in the Java spec - is explicitly states that it does not define optimizations and leaves it to each implementation to optimize (or not) in any way so long as execution behavior is unchanged by the optimization.
Suppose I have an interface with lots of methods that I want to mock for a test, and suppose that I don't need it to do anything, I just need the object under test to have an instance of it. For example, I want to run some performance testing/benchmarking over a certain bit of code and don't want the methods on this interface to contribute.
There are plenty of tools to do that easily, for example
Interface mock = Mockito.mock(Interface.class);
ObjectUnderTest obj = ...
obj.setItem(mock);
or whatever.
However, they all come with some runtime overhead that I would rather avoid:
Mockito records all calls, stashing the arguments for verification later
JMock and others (I believe) require you to define what they going to do (not such a big deal), and then execution goes through a proxy of various sorts to actual invoke the method.
Good old java.lang.reflect.Proxy and friends all go through at least a few more method calls on the stack before getting to the method to be invoked, often reflectively.
(I'm willing to be corrected on any of the details of those examples, but I believe the principle holds.)
What I'm aiming for is a "real" no-op implementation of the interface, such as I could write by hand with everything returning null, false or 0. But that doesn't help if I'm feeling lazy and the interface has loads of methods. So, how can I generate and instantiate such a no-op implementation of an arbitrary interface at runtime?
There are tools available such as Powermock, CGLib that use bytecode generation, but only as part of the larger mocking/proxying context and I haven't yet figured out what to pick out of the internals.
OK, so the example may be a little contrived and I doubt that proxying will have too substantial an impact on the timings, but I'm curious now as to how to generate such a class. Is it easy in CGLib, ASM?
EDIT: Yes, this is premature optimisation and there's no real need to do it. After writing this question I think the last sentence didn't quite make my point that I'm more interested in how to do it in principle, and easy ways into dynamic class-generation than the actual use-case I gave. Perhaps poorly worded from the start.
Not sure if this is what you're looking for, but the "new class" wizard in Eclipse lets you build a new class and specify superclass and/or interface(s). If you let it, it will auto-code up dummy implementations of all interface/abstract methods (returning null unless void). It's pretty painless to do.
I suspect the other "big name" IDEs, such as NetBeans and Idea, have similar facilities.
EDIT:
Looking at your question again, I wonder why you'd be concerned about performance of auto proxies when dealing with test classes. It seems to me that if performance is an issue, you should be testing "real" functionality, and if you're dealing with mostly-unimplemented classes anyway then you shouldn't be in a testing situation where performance matters.
It would take a little work to build the utility, but probably not too hard for basic vanilla Java interface without "edge cases" (annotations, etc), to use Javassist code generation to textually create a class at runtime that implements null versions of every method defined on the interface. This would be different from Javassist ProxyFactory (Or CGLib Enhancer) proxy objects which would still have a few layers of indirection. I think there would be no overhead in the resulting class from the direct bytecode generation mode. If you are brave you could also dive into ASM to do the same thing.
C# has syntax for declaring and using properties. For example, one can declare a simple property, like this:
public int Size { get; set; }
One can also put a bit of logic into the property, like this:
public string SizeHex
{
get
{
return String.Format("{0:X}", Size);
}
set
{
Size = int.Parse(value, NumberStyles.HexNumber);
}
}
Regardless of whether it has logic or not, a property is used in the same way as a field:
int fileSize = myFile.Size;
I'm no stranger to either Java or C# -- I've used both quite a lot and I've always missed having property syntax in Java. I've read in this question that "it's highly unlikely that property support will be added in Java 7 or perhaps ever", but frankly I find it too much work to dig around in discussions, forums, blogs, comments and JSRs to find out why.
So my question is: can anyone sum up why Java isn't likely to get property syntax?
Is it because it's not deemed important enough when compared to other possible improvements?
Are there technical (e.g. JVM-related) limitations?
Is it a matter of politics? (e.g. "I've been coding in Java for 50 years now and I say we don't need no steenkin' properties!")
Is it a case of bikeshedding?
I think it's just Java's general philosophy towards things. Properties are somewhat "magical", and Java's philosophy is to keep the core language as simple as possible and avoid magic like the plague. This enables Java to be a lingua franca that can be understood by just about any programmer. It also makes it very easy to reason about what an arbitrary isolated piece of code is doing, and enables better tool support. The downside is that it makes the language more verbose and less expressive. This is not necessarily the right way or the wrong way to design a language, it's just a tradeoff.
For 10 years or so, sun has resisted any significant changes to the language as hard as they could. In the same period C# has been trough a riveting development, adding a host of new cool features with every release.
I think the train left on properties in java a long time ago, they would have been nice, but we have the java-bean specification. Adding properties now would just make the language even more confusing. While the javabean specification IMO is nowhere near as good, it'll have to do. And in the grander scheme of things I think properties are not really that relevant. The bloat in java code is caused by other things than getters and setters.
There are far more important things to focus on, such as getting a decent closure standard.
Property syntax in C# is nothing more than syntactic sugar. You don't need it, it's only there as a convenience. The Java people don't like syntactic sugar. That seems to be reason enough for its absence.
Possible arguments based on nothing more than my uninformed opinion
the property syntax in C# is an ugly
hack in that it mixes an
implementation pattern with the
language syntax
It's not really necessary, as it's fairly trivial.
It would adversly affect anyone paid based on lines of code.
I'd actually like there to be some sort of syntactical sugar for properties, as the whole syntax tends to clutter up code that's conceptually extremely simple. Ruby for one seems to do this without much fuss.
On a side note, I've actually tried to write some medium-sized systems (a few dozen classes) without property access, just because of the reduction in clutter and the size of the codebase. Aside from the unsafe design issues (which I was willing to fudge in that case) this is nearly impossible, as every framework, every library, every everything in java auto-discovers properties by get and set methods.They are with us until the very end of time, sort of like little syntactical training wheels.
I would say that it reflects the slowness of change in the language. As a previous commenter mentioned, with most IDEs now, it really is not that big of a deal. But there are no JVM specific reasons for it not to be there.
Might be useful to add to Java, but it's probably not as high on the list as closures.
Personally, I find that a decent IDE makes this a moot point. IntelliJ can generate all the getters/setters for me; all I have to do is embed the behavior that you did into the methods. I don't find it to be a deal breaker.
I'll admit that I'm not knowledgeable about C#, so perhaps those who are will overrule me. This is just my opinion.
If I had to guess, I'd say it has less to do with a philosophical objection to syntactic sugar (they added autoboxing, enhanced for loops, static import, etc - all sugar) than with an issue with backwards compatibility. So far at least, the Java folks have tried very hard to design the new language features in such a way that source-level backwards compatibility is preserved (i.e. code written for 1.4 will still compile, and function, without modification in 5 or 6 or beyond).
Suppose they introduce the properties syntax. What, then does the following mean:
myObj.attr = 5;
It would depend on whether you're talking about code written before or after the addition of the properties feature, and possibly on the definition of the class itself.
I'm not saying these issues couldn't be resolved, but I'm skeptical they could be resolved in a way that led to a clean, unambiguous syntax, while preserving source compatibility with previous versions.
The python folks may be able to get away with breaking old code, but that's not Java's way...
According to Volume 2 of Core Java (Forgotten the authors, but it's a very popular book), the language designers thought it was a poor idea to hide a method call behind field access syntax, and so left it out.
It's the same reason that they don't change anything else in Java - backwards-compatibility.
- Is it because it's not deemed important enough when compared to other possible improvements?
That's my guess.
- Are there technical (e.g. JVM-related) limitations?
No
- Is it a matter of politics? (e.g. "I've been coding in Java for 50 years now and I say: we don't need no steenkin' properties!")
Most likely.
- Is it a case of bikeshedding?
Uh?
One of the main goals of Java was to keep the language simple.
From the: Wikipedia
Java suppresses several features [...] for classes in order to simplify the language and to prevent possible errors and anti-pattern design.
Here are a few little bits of logic that, for me, lead up to not liking properties in a language:
Some programming structures get used because they are there, even if they support bad programming practices.
Setters imply mutable objects. Something to use sparsely.
Good OO design you ask an object to do some business logic. Properties imply that you are asking it for data and manipulating the data yourself.
Although you CAN override the methods in setters and getters, few ever do; also a final public variable is EXACTLY the same as a getter. So if you don't have mutable objects, it's kind of a moot point.
If your variable has business logic associated with it, the logic should GENERALLY be in the class with the variable. IF it does not, why in the world is it a variable??? it should be "Data" and be in a data structure so it can be manipulated by generic code.
I believe Jon Skeet pointed out that C# has a new method for handling this kind of data, Data that should be compile-time typed but should not really be variables, but being that my world has very little interaction with the C# world, I'll just take his word that it's pretty cool.
Also, I fully accept that depending on your style and the code you interact with, you just HAVE to have a set/get situation every now and then. I still average one setter/getter every class or two, but not enough to make me feel that a new programming structure is justified.
And note that I have very different requirements for work and for home programming. For work where my code must interact with the code of 20 other people I believe the more structured and explicit, the better. At home Groovy/Ruby is fine, and properties would be great, etc.
You may not need for "get" and "set" prefixes, to make it look more like properties, you may do it like this:
public class Person {
private String firstName = "";
private Integer age = 0;
public String firstName() { return firstName; } // getter
public void firstName(String val) { firstName = val; } // setter
public Integer age() { return age; } // getter
public void age(Integer val) { age = val; } //setter
public static void main(String[] args) {
Person p = new Person();
//set
p.firstName("Lemuel");
p.age(40);
//get
System.out.println(String.format("I'm %s, %d yearsold",
p.firstName(),
p.age());
}
}