I am wondering a stupid question but well, I love to learn :)
Say I got the following code :
public String method(<T> a)
{
String dataA = a.getB().getC().getD();
}
At what point it becomes interesting to define a map which cache our requests and holds this :
Map<<T>, String> m;
m.put(a, dataA);
and then of course,
[SNIP the usual tests of nullity and adding the object if it is missing and so forth plus the refreshing issues]
return m.get(a);
Let me stress that the successive gets are NOT costy (no things such as DB calls, or JNDI lookups).
It's just that it's clearer if we define a dictionnary rather than read the whole string of "gets".
I consider that making a get call is NEARLY "free" in CPU time. Again, I suppose that retrieving the data from an hashmap is NOT exactly free but nearly (at least, in my case, it is :) ).
My question is really in terms of readibility, not performance.
Thanks !
To increase readability (and decrease dependencies), you should define an accessor in A, such as
public String getDataA() {
return getB().getC().getD();
}
Then your calling code becomes
String dataA = a.getDataA();
You may say that you would need too many such shortcut methods in A, cluttering its interface. That is actually a sign of a class design issue. Either A has grown too big and complex (in which case it may be better to partition it into more than one class), or the code needing all these far away pieces of data actually belongs to somewhere else - say into B or C - rather than to A's client.
A couple of things to consider:
Apache Beanutils has a lot of utilities for this sort of thing: http://commons.apache.org/beanutils/
java.util.properties, if the values are all strings
If you really want to access things like this you can also look at using groovy instead. All lookups on maps in groovy can be done with '.' notation and it also supports a "safe" accessor which will check for nulls.
MVEL is another option:
String d = (String) MVEL.eval("b.?c.?d", a);
I will say that data dictionaries lead to typesafety issues. There's no guarantee that everyone puts the right types in the right data elements, or even defines all the required elements.
Also, when using an expression language as above, there's also typesafety issues, as there's no compile time check on the actual expression to make sure that a) it makes sense, and b) it returns the right type.
Related
I'm currently moving from C++ to Java for work and am having difficulty without const and pointers to make sure intent is always clear. One of the largest problems I'm having is with returning a modified object of the same type.
Take for example a filter function. It's used to filter out values.
public List<int> filter(List<Integer> values) {
...
}
Here everything is Serializable so we could copy the whole list first then modify the contents and return it. Seems a little pointlessly inefficient though. Especially if that list is large. Also copying your inputs every time looks quite clumsy.
We could pass it in normally, modify it and make it clear that we are doing that from the name:
public void modifyInputListWithFilter(List<Integer> values) {
...
}
This is the cleanest approach I can think of - you can copy it before hand if you need to, otherwise just pass it in. However I would still rather not modify input parameters.
We could make the List a member variable of the class we are in and add the filter method to the current class. Chances are though that now our class is doing more than one thing.
We could consider moving the List to it's own class which filter is a function of. It seems a little excessive for one variable though, we'll quickly have more classes than we can keep a track of. Also if we only use this strategy and more than just filtering happens to the List class will unavoidably start doing more than one thing.
So what is the best way of writing this and why?
The short answer is that there is not a single best way. Different scenarios will call for different approaches. Different design patterns will call for different approaches.
You've suggested two approaches, and either one of them might be valid depending on the scenario.
I will say that there is nothing inherently wrong with modifying a List that you pass into a function: take a look at the Collections.sort() function, for example. There is also nothing wrong with returning a copy of the List instead.
The only "rule" here is a Liskov rule (not the Liskov rule): your function should do whatever its documentation says it will do. If you're going to modify the List, make sure your documentation says so. If you aren't, make sure it says that instead.
My perception for defining string constants in Java is that one should define a string constant, when the same string is used at multiple places. This help in reducing typo errors, reduce the effort for future changes to the string etc.
But how about string that are used at a single place. Should we declare string constant even in that case.
For eg. Logging Some counter (random example).
CounterLogger.addCounter("Method.Requested" , 1)
Is there an advantage of declaring constant rather than using raw string?
Does the compiler does any optimization?
Declaring constants can improve your code because they can be more descriptive. In your example
CounterLogger.addCounter("Method.Requested" , 1)
The method parameter "Method.Requested" is quite self describing but the 1 is not making this a constant would make this example more readable.
CounterLogger.addCounter("Method.Requested" , INITIAL_VALUE)
The way I see it, Strings can be used in one of two ways:
As properties / keys / enumerations - or in other words, as an internal representation of another Objects/states of your application, where one code component writes them, and another one reads them.
In UI - for GUI / console / logging display purposes.
I Think it's easy to see how in both cases it's important to avoid hard-coding.
The first kind of strings must (if possible) be stored as constants and exposed to whichever program component that might use them for input/output.
Displayed Strings (like in your Logger case) are strings that you might change somewhere in the future. Having them all stored as static final fields in a constants-dedicated class can make later modifications much easier, and help avoid duplicates of similar massages.
Regarding the optimization question - as others have already answered, I believe there's no significant difference.
Presumably, you'll want to write a unit test for whichever method contains that line of code. That unit test will need access to that String value. If you don't use a constant, you'll have the String repeated twice, and if you have to change it in the future, you'll have to change it in both places.
So best to use a constant, even though the compiler is not going to do any helpful optimisations.
In my view in your case is fine. If you cant see any advantage in declaring it as a constant dont do it. To support this point take a look at Spring JdbcTemplate (I have no doubt that Spring code is a good example to follow) it is full of String literals like these
Assert.notNull(psc, "PreparedStatementCreator must not be null");
Assert.notNull(action, "Callback object must not be null");
throw getExceptionTranslator().translate("StatementCallback", getSql(action), ex);
but only two constants
private static final String RETURN_RESULT_SET_PREFIX = "#result-set-";
private static final String RETURN_UPDATE_COUNT_PREFIX = "#update-count-";
Iterestingly, this line
Assert.notNull(sql, "SQL must not be null");
repeats 5 times in the code nevertheless the authors refused to make it a constant
Consider the two following situations:
void Foo1(int a, int b, int c, int d) {
return a + b + c + d;
}
versus
void Foo2(MyArgs args) {
return args.getA() + args.getB() + args.getC() + args.getD();
}
Is there a speed advantage to either case? I have read that the JIT will inline getters. So is there more overhead in passing multiple objects versus a single object to a function, or is this optimized away too?
I am specifically looking for answers about speed only. I am writing some code to recursively search a very large tree, so this type of call will be used many, many times. My function must return as quickly as possible before a timeout occurs. I would like to search as far as possible into the tree.
If the speed is essentially the same (i.e. JIT makes both functions essentially equivalent), then I can choose based on readability and maintainability.
BTW, is there a magic number of parameters in Java that if you stay below, it is okay but if you go over it is bad? For instance I have worked on machines that if you have 4 or less parameters, then they will be stored in registers, where as more than 4 will get pushed onto the stack?
BTW, I am still a Java novice...
Also, before you answer "Premature", I understand premature optimization. I am now in the optimization phase. I have written my recursive function that I am trying to optimize, TIA.
You seem to be in the design phase, not in the optimization one. A method signature is the last thing you would ever check when optimizing.
The method signature is decided entirely by what it needs to process. If those numbers are related, it does make sense to have a wrapper object for them and find a meaningful name for that wrapper class in your domain. The main advantage is that when you'll need to add some extra fields to that object (or remove some), you won't need to touch your method signatures. Moreover, you will most likely find other usages for that wrapper object, since it's something that has a meaning in your domain.
In terms or performance, there's an overhead of creating a holder object so if those numbers don't naturally belong together, it's a bad idea to define a container for them. But if you know they don't belong together, and yet your method needs them all, then you will sort this out from the design phase (by simply listing them individually in the method signature).
Long story short, if you end up modifying method signatures in the "optimization phase", that's a strong signal that you've skipped the design phase.
I've been looking at a lot of code recently (for my own benefit, as I'm still learning to program), and I've noticed a number of Java projects (from what appear to be well respected programmers) wherein they use some sort of immediate down-casting.
I actually have multiple examples, but here's one that I pulled straight from the code:
public Set<Coordinates> neighboringCoordinates() {
HashSet<Coordinates> neighbors = new HashSet<Coordinates>();
neighbors.add(getNorthWest());
neighbors.add(getNorth());
neighbors.add(getNorthEast());
neighbors.add(getWest());
neighbors.add(getEast());
neighbors.add(getSouthEast());
neighbors.add(getSouth());
neighbors.add(getSouthWest());
return neighbors;
}
And from the same project, here's another (perhaps more concise) example:
private Set<Coordinates> liveCellCoordinates = new HashSet<Coordinates>();
In the first example, you can see that the method has a return type of Set<Coordinates> - however, that specific method will always only return a HashSet - and no other type of Set.
In the second example, liveCellCoordinates is initially defined as a Set<Coordinates>, but is immediately turned into a HashSet.
And it's not just this single, specific project - I've found this to be the case in multiple projects.
I am curious as to what the logic is behind this? Is there some code-conventions that would consider this good practice? Does it make the program faster or more efficient somehow? What benefit would it have?
When you are designing a method signature, it is usually better to only pin down what needs to be pinned down. In the first example, by specifying only that the method returns a Set (instead of a HashSet specifically), the implementer is free to change the implementation if it turns out that a HashSet is not the right data structure. If the method had been declared to return a HashSet, then all code that depended on the object being specifically a HashSet instead of the more general Set type would also need to be revised.
A realistic example would be if it was decided that neighboringCoordinates() needed to return a thread-safe Set object. As written, this would be very simple to do—replace the last line of the method with:
return Collections.synchronizedSet(neighbors);
As it turns out, the Set object returned by synchronizedSet() is not assignment-compatible with HashSet. Good thing the method was declared to return a Set!
A similar consideration applies to the second case. Code in the class that uses liveCellCoordinates shouldn't need to know anything more than that it is a Set. (In fact, in the first example, I would have expected to see:
Set<Coordinates> neighbors = new HashSet<Coordinates>();
at the top of the method.)
Because now if they change the type in the future, any code depending on neighboringCoordinates does not have to be updated.
Let's you had:
HashedSet<Coordinates> c = neighboringCoordinates()
Now, let's say they change their code to use a different implementation of set. Guess what, you have to change your code too.
But, if you have:
Set<Coordinates> c = neighboringCoordinates()
As long as their collection still implements set, they can change whatever they want internally without affecting your code.
Basically, it's just being the least specific possible (within reason) for the sake of hiding internal details. Your code only cares that it can access the collection as a set. It doesn't care what specific type of set it is, if that makes sense. Thus, why make your code be coupled to a HashedSet?
In the first example, that the method will always only return a HashSet is an implementation detail that users of the class should not have to know. This frees the developer to use a different implementation if it is desirable.
The design principle in play here is "always prefer specifying abstract types".
Set is abstract; there is no such concrete class Set - it's an interface, which is by definition abstract. The method's contract is to return a Set - it's up the developer to chose what kind of Set to return.
You should do this with fields as well, eg:
private List<String> names = new ArrayList<String>;
not
private ArrayList<String> names = new ArrayList<String>;
Later, you may want to change to using a LinkedList - specifying the abstract type allows you to do this with no code changes (except for the initializtion of course).
The question is how you want to use the variable. e.g. is it in your context important that it is a HashSet? If not, you should say what you need, and this is just a Set.
Things were different if you would use e.g. TreeSet here. Then you would lose the information that the Set is sorted, and if your algorithm relies on this property, changing the implementation to HashSet would be a disaster. In this case the best solution would be to write SortedSet<Coordinates> set = new TreeSet<Coordinates>();. Or imagine you would write List<String> list = new LinkedList<String>();: That's ok if you want to use list just as list, but you wouldn't be able to use the LinkedList as deque any longer, as methods like offerFirst or peekLast are not on the List interface.
So the general rule is: Be as general as possible, but as specific as needed. Ask yourself what you really need. Does a certain interface provide all functionality and promises you need? If yes, then use it. Else be more specific, use another interface or the class itself as type.
Here is another reason. It's because more general (abstract) types have fewer behaviors which is good because there is less room to mess up.
For example, let's say you implemented a method like this: List<User> users = getUsers(); when in fact you could have used a more abstract type like this: Collection<User> users = getUsers();. Now Bob might assume wrongly that your method returns users in alphabetic order and create a bug. Had you used Collection, there wouldn't have been such confusion.
It's quite simple.
In your example, the method returns Set. From an API designer's point of view this has one significant advantage, compared to returning HashSet.
If at some point, the programmer decides to use SuperPerformantSetForDirections then he can do it without changing the public API, if the new class extends Set.
The trick is "code to the interface".
The reason for this is that in 99.9% of the cases you just want the behavior from HashSet/TreeSet/WhateverSet that conforms to the Set-interface implemented by all of them. It keeps your code simpler, and the only reason you actually need to say HashSet is to specify what behavior the Set you need has.
As you may know HashSet is relatively fast but returns items in seemingly random order. TreeSet is a bit slower, but returns items in alphabetical order. Your code does not care, as long as it behaves like a Set.
This gives simpler code, easier to work with.
Note that the typical choices for a Set is a HashSet, Map is HashMap and List is ArrayList. If you use a non-typical (for you) implementation, there should be a good reason for it (like, needing the alphabetical sorting) and that reason should be put in a comment next to the new statement. Makes the life easier for future maintainers.
I have two java classes that are very similar in semantics but differ in syntax. The differences are minor, like -
Changes in variable names,
Changes in position of some statements (with no dependent lines in between),
Extra imports, etc.
I need to compare these two classes to prove that they are indeed semantically identical. The same needs to be done for a large number of java file pairs.
The first approach of reading from the two files and comparing the lines, with logic to deal with the differences mentioned above seems inefficient. Is there some other way that I can achieve this task? Any helpful APIs out there?
Compile both of the classes without debug information and then decompile them back to source files. The decompiled files should be a lot more similar than the original source files.
You can improve this further by running some optimizations on the compiled files. For example you can use Proguard with just shrinking enabled to removed unused code.
Changes in position of some statements can be hard to detect though.
If you want to examine the changes in the code try Araxis Merge or WinMerge.
But if you want logical differences, I am afraid you might have to do it manually.
I would advise to use one of these tools to look for textual changes and then look for logical differences.
There are a lot of similarity checker out there, and until now there's no yet perfect tool for this. Each has its own advantages / disadvantages. The approaches generally falls into two categories: token-based or tree-based.
Token-based similarity checking is usually done with regular expressions, but other approaches are possible. In one of my projects at university, we developed one utilizing alignment strategy from bioinformatics field. The disadvantage of this technique is mainly if the size of the two sources isn't more or less equal.
Tree-based is more like a compiler, so normally using some compilation techniques it's possible (well, more or less) to check for this. Tree-based approach has disadvantages of being exponential in comparison complexity.
Comparing line by line wont work. I think you may need to use a parser. I would suggest that you take a look at ANTLR. It should have a java grammar where you could put your actions which will do the comparison.
As far as I know there's now way to compare the semantics of two Java classes. Take for example the following two methods:
public String m1(String a, int b) { ... }
and
public String m2(String x, int y) { ... }
A part from changes in variables and methods names, their signature is the same: same return type, and same input types. However, this is no guarantee that the two methods are semantically equivalent. For example, m1 could return a string consisting of the first b characters of a, while m2 could return a string consisting of y repetitions of x. As you can see, although only variables and names change, the semantics of the two methods is totally different.
I don't see an easy way out for your problem. You can perhaps make some assumption and try the following approach:
assume that the methods names in the two classes are the same
write test cases (for example with JUnit) for all the methods in the first class
run the test cases on the second class
ensure that the second class does not have other (untested) methods (for example using reflection)
This approach gives you an idea about equivalent semantics, but it makes strong assumption.
As a final remark, let me add that specifying the semantics of programs is an interesting and open research topic. Some interesting development in this area include research on Semantic Web Services. A widely adopted approach to give machine processable semantics to programs is that of specifying their IOPE: Input and Output types (as int the Java methods above), and their Preconditions and Effects. Preconditions are essentially logical conditions that must hold true for successfully invoking the program, and Effects are formal descriptions of the changes (in the state of the world) caused by the successful execution of the program. Even with IOPE there are a lot of problems ... which I skip in this short description.