Does introducing an intermediate list might cause performance overhead?

Does introducing an intermediate list might cause performance overhead? - java

List<UserData> dataList = new ArrayList<>();
List<UserData> dataList1 = dataRepository.findAllByProcessType(ProcessType.OUT);
List<UserData> dataList2 = dataRepository.findAllByProcessType(ProcessType.CORPORATE_OUT);
dataList.addAll(dataList1);
dataList.addAll(dataList2 );
return dataList ;
vs
List<UserData> dataList = new ArrayList<>();
dataList.addAll(dataRepository.findAllByProcessType(ProcessType.OUT));
dataList.addAll(dataRepository.findAllByProcessType(ProcessType.CORPORATE_OUT));
return dataList ;
does the first implementation will cause any performance overhead? (i.e. more garbage / memory allocation than the second one)
P.S. - Yes, it can be optimised using one round trip to db as mentionted by #Tim. But that's not the answer i am looking for.I am in general want to know whether this type of implementation will cause overhead or not. Because this type of implementation helps debugging.

I'm going to say no, on the basis that I would be very surprised if the two code blocks produce different bytecode.
The first code does not "introduce an intermediate list". All it does is create new variables to reference lists that were created by the dataRepository call. I would expect the compiler to simply optimise those variables out.
Those lists are also created in the second code example, so there's no real difference.
Knowing that the compiler performs these sorts of optimisations frees us as programmers to write code that is well laid-out, clear, and maintainable, whilst still remaining confident that it will perform well.
The other consideration is debugging. In the first code block, it is easy to set breakpoints on the variable declaration lines, and inspect the values of the variables. Those simple operations become a pain when code is implemented in the second code block.

As the addAll() method should just be referencing the same data, both of your versions should perform about the same. But, the best thing to do here is to avoid the two unnecessary roundtrips to your database, and just use a single query:
List<ProcessType> types = Arrays.asList(ProcessType.OUT, ProcessType.CORPORATE_OUT);
List<UserData> dataList = findAllByProcessTypeIn(types);

Related

Creating a variable instead of multiple getter usage - which is better for overall performance? [duplicate]

In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?

I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter

I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.

I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)

In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.

I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug

I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.

Java re-initializing java object with new , performance and memory

I want to know the drawback of writing below code using new for reinitializing object every time to create different object value.
List <Value> valueList = new ArrayList<>;
Value value = new Value();
value.setData("1");
valueList.add(value);
value = new value();
value.setData("2");
valueList.add(value);
value = new value();
value.setData("3");
valueList.add(value);
or a method could be added to return a value object similar to:
private Value getData(String input){
Value value = new Value();
value.setData(input);
return value;
}
List <Value> valueList = new ArrayList<>;
valueList.add(getData("1"));
valueList.add(getData("2"));
valueList.add(getData("3"));
Code wise the second approach looks better for me.
Please suggest the best approaches based on memory and performance.

Both options are creating 3 objects and adding them to a list. There is no difference for memory. Performance doesn't matter. If this code is executed often enough to "matter", the JIT will inline those method calls anyway. If the JIT decides: not important enough to inline, then we are talking about nanosecods anyway.
Thus: focus on writing clean code that gets the job done in a straight forward way.
From that perspective, I would suggest that you rather have a constructor that takes that data; and then you can write:
ValueList<Value> values = Arrays.asList(new Value("1"), new Value("2"), new Value("3"));
Long story short: performance is a luxury problem. Meaning: you only worry about performance when your tests/customers complain about "things taking too long".
Before that, you worry about creating a good, sound OO design and writing down a correct implementation. It is much easier to fix a certain performance problem within well built application - compared to getting "quality" into a code base that was driven by thoughts like those that we find in your questions.
Please note: that of course implies that you are aware of typical "performance pitfalls" which should be avoided. So: an experienced Java programmer knows how to implement things in an efficient way.
But you as a newbie: you only focus on writing correct, human readable programs. Keep in mind that your CPU does billions of cycles per second - thus performance is simply OK by default. You only have to worry when you are doing things on very large scale.
Finally: option 2 is in fact "better" - because it reduces the amount of code duplication.

In both cases, you create 3 instances of Value that are stored in a List.
It doesn't have sensitive differences in terms of consumed memory.
The last one produces nevertheless a cleaner code : you don't reuse a same variable and the variable has a limited scope.
You have indeed a factory method that does the job and returns the object for you.
So client code has just to "consume" it.
An alternative is a method with a varargs parameter :
private List<Value> getData(String... input){
// check not null before
List<Value> values = new ArrayList<>();
for (String s : input){
Value value = new Value();
value.setData(input);
}
return values;
}
List<Value> values = getData("1","2","3");

There is no difference in memory footprint, and there's little difference in performance, because method invocations are very inexpensive.
The second form of your code is a better-looking version of the first form of your code, with less code repetition. Other than that, the two are equivalent.
You can shorten your code by using streams:
List<Value> = Arrays.asList("1", "2", "3").stream()
.map(Value::new)
.collect(Collectors.toList());

Every time you calling new operator to create an object, it allocates spaces for this object on heap. It doesn't matter if you do it with 1st approach or 2nd approach, this objects are allocated to heap space the same way.
What you need to understand thou is a life-cycle of each object you creating and terms like Dependency, Aggregation, Association and Full Composition.

Java, optimal calling of objects and methods

Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.

Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.

The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.

It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..

In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.

Java software design - Looping, object creation VS modifying variables. Memory, performance & reliability comparison

Let's say we are trying to build a document scanner class in java that takes 1 input argument, the log path(eg. C:\document\text1.txt). Which of the following implementations would you prefer based on performance/memory/modularity?
ArrayList<String> fileListArray = new ArrayList<String>();
fileListArray.add("C:\\document\\text1.txt");
fileListArray.add("C:\\document\\text2.txt");
.
.
.
//Implementation A
for(int i =0, j = fileListArray.size(); i < j; i++){
MyDocumentScanner ds = new MyDocumentScanner(fileListArray.get(i));
ds.scanDocument();
ds.resultOutput();
}
//Implementation B
MyDocumentScanner ds = new MyDocumentScanner();
for(int i=0, j=fileListArray.size(); i < j; i++){
ds.setDocPath(fileListArray.get(i));
ds.scanDocument();
ds.resultOutput();
}
Personally I would prefer A due to its encapsulation, but it seems like more memory usage due to creation of multiple instances. I'm curious if there is an answer to this, or it is another "that depends on the situation/circumstances" dilemma?

Although this is obviously opinion-based, I will try an answer to tell my opinion.
You approach A is far better. Your document scanner obviously handles a file. That should be set at construction time and be saved in an instance field. So every method can refer to this field. Moreover, the constructor can do some checks on the file reference (null check, existence, ...).
Your approach B has two very serious disadvantages:
After constructing a document scanner, clients could easily call all of the methods. If no file was set before, you must handle that "illegal state" with maybe an IllegalStateException. Thus, this approach increases code and complexity of that class.
There seems to be a series of method calls that a client should or can perform. It's easy to call the file setting method again in the middle of such a series with a completely other file, breaking the whole scan facility. To avoid this, your setter (for the file) should remember whether a file was already set. And that nearly automatically leads to approach A.
Regarding the creation of objects: Modern JVMs are really very fast at creating objects. Usually, there is no measurable performance overhead for that. The processing time (here: the scan) usually is much higher.

If you don't need multiple instances of DocumentScanner to co-exist, I see no point in creating a new instance in each iteration of the loop. It just creates work to the garbage collector, which has to free each of those instances.
If the length of the array is small, it doesn't make much difference which implementation you choose, but for large arrays, implementation B is more efficient, both in terms of memory (less instances created that the GC hasn't freed yet) and CPU (less work for the GC).

Are you implementing DocumentScanner or using an existing class?
If the latter, and it was designed for being able to parse multiple documents in a row, you can just reuse the object as in variant B.
However, if you are designing DocumentScanner, I would recommend to design it such that it handles a single document and does not even have a setDocPath method. This leads to less mutable state in that class and thus makes its design much easier. Also using an instance of the class becomes less error-prone.
As for performance, there won't be a measurable difference unless instantiating a DocumentScanner is doing a lot of work (like instantiating many other objects, too). Instantiating and freeing objects in Java is pretty cheap if they are used only for a short time due to the generational garbage collector.

Someone breaks my sorting of a list - now which approach to choose: return unmodifiable List or a new List altogether?

I'm using lists on a jsf web server to e.g. access the data model from web pages. The access to these lists is done from various other places as well though (web services, tools).
There is a piece of code which gets broken by someone resorting the list I return. With someone I am talking about someone in my developer team - we are the only ones using this code. I have roughly 300 references on this function and it could be performance relevant to do the fix nicely:
The list can be anywhere from 1 - 10'000 entries and commonly I will have maybe 10-100 such lists. In reality I will probably have very often around 20 lists with each having 8 entries - so not such a big deal. But I can have more sometimes
I am btw talking about a function something like this:
public List<MyObject> getMyObjectList() {
if (this.myObjects== null) {
myObjects = new ArrayList<MyObject>(myObjectsMap.values());
}
return myObjects;
}
Now I could of course do the return like this:
public List<MyObject> getMyObjectList() {
if (this.myObjects== null) {
myObjects = new ArrayList<MyObject>(myObjectsMap.values());
}
return Collections.unmodifiableList(myObjects );
}
But this will break, eventually at several places in different projects / applications.
It would imho be the cleanest to return unmodifiable, add javadoc - and fix everything which breaks. But :-D this is work. I'd probably have to test roughly 10 applications.
On the other hand I could just return a new list, e.g.
public List<MyObject> getMyObjectList() {
return new ArrayList<MyObject>(myObjectsMap.values());
}
Which is no to little work - but what about the performance issues with this? Other than that - if someone is deleting stuff from the list I returned, it will silently break the application.
So:
What is the performance issue? Is it an issue?
What would you do?

What would you do?
If I understand you correctly, this is a production library used in several applications. And, like it or not, the de-facto contract for getMyObjectList() is that the user can sort the list without getting an error or exception.
I would immediately change this method and return a defensive copy:
// good idea
public List<MyObject> getMyObjectList() {
return new ArrayList<MyObject>(myObjectsMap.values());
}
You have now fixed the problem where someone is sorting your internal collection and you have not broken your contract. In fact you can even update the Javadoc and tell the user that they can do whatever they want with the copy.
This may or may not cause a performance problem. Remember, the objects in the collection are not being copied - they are still being shared. You are just creating a new array list and whatever internal objects it needs to keep track of the objects.
If it turns out that these copies are causing a performance problem, then you can consider enhancing your class to include a read-only cache of your internal collection. To access this you must give the method a new name - ex getMySharedObjectList and you can gradually update client code to use this new method as performance needs require.
But don't do it this way. I think this method is particularly bad:
// bad idea
public List<MyObject> getMyObjectList() {
if (this.myObjects== null) {
myObjects = new ArrayList<MyObject>(myObjectsMap.values());
}
return Collections.unmodifiableList(myObjects );
}
You have created a situation where it is very easy for myObjects to get out of sync with myObjectsMap. (What happens when an item is added to myObjectsMap after someone called getMyObjectList?) At the same time, you are making a copy of the list every time someone calls the method. So you just gave up whatever theoretical performance gains you had in the first place.
Anyway, good luck. Hope this helps.

If you can afford to test the applications, I'd go with the unmodifiableList. It will save you from other related issues in the future.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.