Fetching customer orders: Set<Order> getAllOrders() vs. Set<Integer> getAllOrders() - java

I haven’t done much Java programming and hence a lot of unanswered ORM questions revolve in my head that might seem as fairly straight forward to more seasoned folks.
Let's say we have two classes: Customer and Order. Customer class implements a method called listAllOrders, what should the method’s signature be?
Set<Order> getAllOrders(); // the OOP way
Set<Integer> getAllOrders(); // db-friendly way, based on the assumption that each order is assigned a unique int id
int[] getAllOrders(); // db-friendly way, optimised
Set<OrderID> getAllOrders(); // the OOP way, optimised
Other? A combination of the above through method overloading?
My thinking is to dismiss 3rd option right away, since the optimisation is premature, uncalled for and is likely to cause more trouble than good.
Choosing between the 1st and the 2nd options, the main argument seems to be that in many scenarios returning a collection of orders instead of id’s is going to be an overkill.
The 1st option would always require pre-loading of orders into memory and although some order properties can be handled through lazy loading the Order objects themselves would still require significantly more memory allocated as opposed to a set of bare id’s. The other reason seems to be that lazy loading won’t give me the advantage of using the final modifier to enforce immutability of Order fields.
However, option 2 doesn’t really encapsulate order numbering, i.e. if it were decided to start using String UUID or long as order identifier instead of integer it would become necessary to do a serious rewrite. Obviously this could be mitigated by introduction of a new lightweight object called OrderId (4th approach).
Well, at this point I'd really appreciate some help from ORM and Java gurus!

It always helps to know what the caller wants. It isn't an optimization to return an order id if every time, the caller is then going to have to look up every single order. On the other hand, if most of the time they don't need all that information, then there isn't any point in return it.
I would go for option 1 most of the time. But if it's something memory constrained like a smart phone, I think I'd go for the lazy expansion.

One of the hardest things for people that are new to ORM tools is that they try to apply the knowledge they have of the DB and the optimizations that were used in that context (storing IDs, etc.). I would suggest that you work with objects, and address performance problems if they come up.
Most ORMs handle this type of concern for you; Hibernate for example will by default lazy-load a collection (or Orders in your case). Internally it will store a collection of the ID values for you, but this is really not your concern as you will only interact with the object and collection of associated objects. If you focus on your domain objects and think "OO", your ORM tool should support you and allow you to further optimize later if it becomes necessary.

I suggest to use option one, because you are dealing with business objects and not messing with numbers. If you need the id, you can still obtain it from an order. And if you use a smart OR mapper, only the primary key of an order will be loaded until you access some real data - so no point to wory about database or memory performance.

I have to agree with Paul Tomblin, if the function is called getAllOrders() I would expect it to return an order object (assuming such a class exists). The function should be called getAllOrderIds() if its going to return something more specific.

Option #2 is, like option #3, a premature optimization. Your client wants to know what the orders are; it may very well not even care what the orders' IDs are. There are circumstances where a client might want the IDs, but if you see that happening, you should probably ask Why? and find a way to operate directly on the Orders themselves.

I'll echo Paul Tomblin in saying that the question is not, "What's the best thing to do?" but "What's the best way to accomplish this objective?" Before you can say what the return value of a function should be, you have to decide what the caller is going to do with the information. If what the caller wants to do is, say, display a list of Order IDs for the user, then return Order IDs. If the caller wants to display a list of order dates, total cost, and shipment statuses, then I'd return a set of objects containing that data. Etc.

Related

List iteration, which is the most efficient?

I am working on a game and have several customers stored in an ArrayList, and each customer has their own unique ID which is saved as a variable inside the object.
If I want to retrieve a customer from the list using their ID, which of these would be better practice?
Iterate through all customers in the list until I find a match.
Convert the list to a HashMap of keys (the ID) and values (the customers), and then simply use the .get() method. Maybe this does exactly the same as option one?
HashMap will be more efficient (in most cases O(1), in worst case O(n)), iterating over a list would be O(n).
Of course it depends on the size of your data, if you have lots of it then HashMap is the obvious choice, if you have few (e.g. 5 maybe 10), it might be the case that a List would be more efficient - constant factors have to be taken under consideration here, which Big-Oh notation ignores.
As written by others, collections having "hash" in their name typically provide improved performance properties for certain kind of actions.
But please keep in mind the very old rule to not do premature optimization. Seriously: unless we are talking a "real production setup"; and 10 000+ paying customers ... then you should understand that you should very much more focus on good design, and creating readable + maintainable code than on potential performance issues.
Yes, one should avoid outright stupid designs; but the thing is: if you focus too much and too early on performance, you risk two things:
a) missing the real bottleneck. If you really encounter performance issues, you have profile your application to understand where time is spent. Far too often people assume that their problem is X; and then they spend a lot of time fixing X - to later find out that actually the Just-in-Time compiler addresses X good enough; and that their real problem is some other Y they never thought of.
b) putting yourself into a corner. Thing is: good designs need to be able to evolve and change. If you assume you must do this or that to preserve some holy performance grail ... chances are, that this impacts your overall application in a negative way.
I know your question was specific, but I'm going to propose another solution.
Build you own set, backed by an array of Costumers. Once a Costumer is inserted, the Customer's ID is set to the actual index it gets when placed into the array.
that way you can have direct access to all you costumers, using their ID.
The hashmap and all sorted list can use the index to find the right item faster. It can pick an index in the middle, if the value it's looking for is higher or lower it can use that information and pick a new index in the lower or higher half and repeat the same task as before. This reduce the time to find the item. It's called binary search (thx fabian)
You can try it and compare the result!
EDIT:
I am with Jägermeister with this one, you should wait to up the performance and have it as a final step.

OOP: Which class should own a method? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I’m having trouble understanding how classes relate to their methods. Is a method something that the object does, or something that’s done to it? Or is this a different concept entirely?
Specifically, in a library’s software system, should the borrow() method belong to the class representing the library patron, or the class representing the item that the patron is borrowing? My intuition is that it should read like patron.borrow(copy), like English sentence structure, subject.verb(object); but my instructor says that’s Wrong, and I don’t understand why he would have borrow() belong to the Copy class (and he doesn’t really explain things too well). I’m not looking for justification, but can someone just explain the proper relationship?
Edit: This question was closed as “off topic”. I don’t understand. Are software design questions not appropriate for this site?
subjective :) but honestly, I'd go with the Information Expert Pattern and say something like
library.lend(item, patron)
The library contains the information about the items it has (perhaps in its catalog).
The library lends the item to the patron (which it knows because it registers them)
Not sure how your instructor sees this, but this is the level of 'abstraction' (software objects mimicking real world entities) that would make sense for your scenario.
You should not confuse the idea of OOP with one specific incarnation like Java or C++.
This limit "methods are a property of the object" is not part of the OOP idea, but just of some implementations and as you discovered it doesn't scale well.
How many methods sould an "integer number" object have? What is more logical... myfile.write(myint) or myint.write(myfile)? There is really no good general answer to this. The idea of a method being part of a single object is a special case and sometimes the bending needed to fit the problem to this solution can become noticeable or even close to a showstopper. The answer is really totally acceptable only when a method has no parameters except the object being processed: single dispatch is a perfect answer only when there is a single type involved.
In other languages you have a separation between objects and methods, so for example you have the file object, the integer object and a method write(myfile, myint) that describes what to do when the operation is needed... and this method is neither part of the file nor of the integer.
Some generic words first.
Software construction is not something which should be governed by English language rules or "beauty" or whatever, it's engineering discipline. Think of whether your design solves the problem, whether it will be maintainable, whether it will be testable, whether it will be possible to parallelize development and so on. If you want something more formalized take a look at the "On the Criteria To Be Used in Decomposing Systems into Modules" by D. L. Parnas.
As for your library example. Imagine you have a Copy outside of library, shoult it have borrow method then? How the borrowing is registered? Are you ok with either Copy or Patron classes responsible for data storage? It looks more appropriate to put borrow into a Library class. Responsibilities will be clearly divided, you wouldn't need to know much about borrowing to implement Copy and Patron and you wouldn't need much details about them to implement Library.
Public methods exposed from a class are the tasks that can be performed on the entity.
That way the class would only encapsulate its behavior.
For example:
if i say
Computer.TurnOn()
The method will only work on the computer system.
instead if i say,
SomeOne.TurnonComputer()
The someone will now have the responsibility to turn on the computer(set related properties of computer), that means we are not meeting the concept of encapsulation and scattering the class's properties all over the place.
As #Ryan Fernandes said, the lend/borrow operation cannot be with either patron or book. It has to be with some class that knows about the status of all the books and patrons of the library. For e.g., are there pending reservations against a book? How many copies are available? Has this patron paid all the fees? Is he eligible for this book? So typically this should be in Library or a LibraryService class.
The point of OOP is to create polymorphic functions that, in each implementation, deal with a defined set of data which obey specific invariants.
It follows that a method which alters an object should be defined in the class of that object. It matters less where code that is purely functional lives, but it should probably live on the type of its input (if it takes a single input) or on its output.
In your example, if borrow alters data in copy, then it should live there. If, however, you model the loan status of a book by it being held in a particular collection (either in a patron, or in a collection for the library), it would make more sense to put borrow on the holder classes. That latter design, however, runs the risk that a copy could be in more than one collection, so you would want to put some information (and a corresponding method) on the copy as well.
Not pretty sure for the exact justification , but you can think it this way, IF multiple patients go and visit a doctor, its only the doctor who know when to call in the next patient, so the next method would be a part of Doctor's Responsibility, though its tempting to think that next should be the part of Patient's responsibility as he has to go next, someways when the library book is to be issued, it should be the responsibility of book genre rather patron as book(RESOURCE) knows when it will be free .
Is a method something that the object does, or something that’s done to it? Or is this a different concept entirely?
Let me clear something about class and objects first. Class are generally used to a denote particular category. Like
Cars not Ferrari, or Porsche
Fruits not Banana, or Apple
So, it's Ferrari that is driven, and a banana that is eaten. Not their class
Its always an object that has properties and has behavior.
Even going to your case specifically.
borrow() method is an action/behavior done by a object of a person on an object of book whose records is kept by another object of the library system itself.
A good way to represent this in OO way for me would be like
libray.borrow(new book('book title'), new person('starx'));
Just for fun, What do you think about this
person starx = new person('starx');
book title1 = new book('title1');
library libraryname = new library('libraryname');
libraryname.addBook(title1);
if(starx.request(title1, libraryname)) {
starx.take(library.lend(title1, starx));
}
I guess it can go either way. There is no hard and fast rule for it. The idea is the group functions logically that makes sense. To me, Patron#borrow(BookCopy) make same sense as BookCopy#borrow(Patron). Or you may have a class LibManager.borrow(BookCopy, Patron).
Your instructor's right. Well, actually, he's wrong. I don't know.
My point is, for questions such as this, there are often no firm general answers one way or another. It largely comes down to what works best in your particular case. Go with whatever's easiest to code - it'll be the easiest to maintain. And, by "easiest to code", I suggest also taking into account the intended users of the classes (beyond just your Library, Copy and Person classes).
I was thinking about precisely that today. I came to this conclusion:
Whichever makes more sense in the appropriate context.

Empirical data on the effects of immutability?

In class today, my professor was discussing how to structure a class. The course primarily uses Java and I have more Java experience than the teacher (he comes from a C++ background), so I mentioned that in Java one should favor immutability. My professor asked me to justify my answer, and I gave the reasons that I've heard from the Java community:
Safety (especially with threading)
Reduced object count
Allows certain optimizations (especially for garbage collector)
The professor challenged my statement by saying that he'd like to see some statistical measurement of these benefits. I cited a wealth of anecdotal evidence, but even as I did so, I realized he was right: as far as I know, there hasn't been an empirical study of whether immutability actually provides the benefits it promises in real-world code. I know it does from experience, but others' experiences may differ.
So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?
I would point to Item 15 in Effective Java. The value of immutability is in the design (and it isn't always appropriate - it is just a good first approximation) and design preferences are rarely argued from a statistical point of view, but we have seen mutable objects (Calendar, Date) that have gone really bad, and serious replacements (JodaTime, JSR-310) have opted for immutability.
The biggest advantage of immutability in Java, in my opinion, is simplicity. It becomes much simpler to reason about the state of an object, if that state cannot change. This is of course even more important in a multi-threaded environment, but even in simple, linear single-threaded programs it can make things far easier to understand.
See this page for more examples.
So, my question is, have there been
any statistical studies done on the
effects of immutability in real-world
code?
I'd argue that your professor is just being obtuse -- not necessarily intentionally or even a bad thing. Its just that the question is too vague. Two real problems with the question:
"Statistical studies on the effect of [x]" doesn't really mean anything if you don't specify what kind of measurements you're looking for.
"Real-world code" doesn't really mean anything unless you state a specific domain. Real world code includes scientific computing, game development, blog engines, automated proof generators, stored procedures, operating system kernals, etc
For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:
The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.
Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).
So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.
Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.
Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".
At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.
It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.
I think your professor's being overly stubborn (probably deliberately, to push you to a fuller understanding). Really the benefits of immutability are not so much what the complier can do with optimisations, but really that it's much easier for us humans to read and understand. A variable that is guaranteed to be set when the object is created and is guaranteed not to change afterwards, is much easier to grok and reason with than one which is this value now but might be set to some other value later.
This is especially true with threading, in that you don't need to worry about processor caches and monitors and all that boilerplate that comes with avoiding concurrent modifications, when the language guarantees that no such modification can possibly occur.
And once you express the benefits of immutability as "the code is easier to follow", it feels a bit sillier to ask for empirical measurements of productivity increases vis-a-vis "easier-to-followness".
On the other hand, the compiler and Hotspot can probably perform certain optimisations based on knowing that a value can never change - like you I have a feeling that this would take place and is a good things but I'm not sure of the details. It's a lot more likely that there will be empirical data for the types of optimisation that can occur, and how much faster the resulting code is.
Don't argue with the prof. You have nothing to gain.
These are open questions, like dynamic vs static typing. We sometimes think functional techniques involving immutable data are better for various reasons, but it's mostly a matter of style so far.
What would you objectively measure? GC and object count could be measured with mutable/immutable versions of the same program (although how typical that would be would be subjective, so this is a pretty weak argument). I can't imagine how you could measure the removal of threading bugs, except maybe anecdotally by comparison with a real world example of a production application plagued by intermittent issues fixed by adding immutability.
Immutability is a good thing for value objects. But how about other things? Imagine an object that creates a statistic:
Stats s = new Stats ();
... some loop ...
s.count ();
s.end ();
s.print ();
which should print "Processed 536.21 rows/s". How do you plan to implement count() with an immutable? Even if you use an immutable value object for the counter itself, s can't be immutable since it would have to replace the counter object inside of itself. The only way out would be:
s = s.count ();
which means to copy the state of s for every round in the loop. While this can be done, it surely isn't as efficient as incrementing the internal counter.
Moreover, most people would fail to use this API right because they would expect count() to modify the state of the object instead of returning a new one. So in this case, it would create more bugs.
As other comments have claimed, it would be very, very hard to collect statistics on the merits of immutable objects, because it would be virtually impossible to find control cases - pairs of software applications which are alike in every way, except that one uses immutable objects and the other does not. (In nearly every case, I would claim that one version of that software was written some time after the other, and learned numerous lessons from the first, and so improvements in performance will have many causes.) Any experienced programmer who thinks about this for a moment ought to realize this. I think your professor is trying to deflect your suggestion.
Meanwhile, it is very easy to make cogent arguments in favor of immutability, at least in Java, and probably in C# and other OO languages. As Yishai states, Effective Java makes this argument well. So does the copy of Java Concurrency in Practice sitting on my bookshelf.
Immutable objects allow code which to share an object's value by sharing a reference. Mutable objects, however, have the identity that code which wants to share an object's identity to do so by sharing a reference. Both kinds of sharing are essential in most applications. If one doesn't have immutable objects available, it's possible to share values by copying them into either new objects or objects supplied by the intended recipient of those values. Getting my without mutable objects is much harder. One could somewhat "fake" mutable objects by saying stateOfUniverse = stateOfUniverse.withSomeChange(...), but would requires that nothing else modify stateOfUniverse while its withSomeChange method is running [precluding any sort of multi-threading]. Further, if one were e.g. trying to track a fleet of trucks, and part of the code was interested in one particular truck, it would be necessary for that code to always look up that truck in a table of trucks any time it might have changed.
A better approach is to subdivide the universe into entities and values. Entities would have changeable characteristics, but an immutable identity, so a storage location of e.g. type Truck could continue to identify the same truck even as the truck itself changes position, loads and unloads cargo, etc. Values would not have generally have a particular identity, but would have immutable characteristics. A Truck might store its location as type WorldCoordinate. A WorldCoordinate that represents 45.6789012N 98.7654321W would continue to so as long as any reference to it exists; if a truck that was at that location moved north slightly, it would create a new WorldCoordinate to represent 45.6789013N 98.7654321W, abandon the old one, and store a reference to that new one.
It is generally easiest to reason about code when everything encapsulates either an immutable value or an immutable identity, and when the things which are supposed to have an immutable identity are mutable. If one didn't want to use any mutable objects outside a variable stateOfUniverse, updating a truck's position would require something like:
ImmutableMapping<int,Truck> trucks = stateOfUniverse.getTrucks();
Truck myTruck = trucks.get(myTruckId);
myTruck = myTruck.withLocation(newLocation);
trucks = trucks.withItem(myTruckId,myTruck);
stateOfUniverse = stateOfUniverse.withTrucks(trucks);
but reasoning about that code would be more difficult than would be:
myTruck.setLocation(newLocation);

array of structures, or structure of arrays?

Hmmm. I have a table which is an array of structures I need to store in Java. The naive don't-worry-about-memory approach says do this:
public class Record {
final private int field1;
final private int field2;
final private long field3;
/* constructor & accessors here */
}
List<Record> records = new ArrayList<Record>();
If I end up using a large number (> 106 ) of records, where individual records are accessed occasionally, one at a time, how would I figure out how the preceding approach (an ArrayList) would compare with an optimized approach for storage costs:
public class OptimizedRecordStore {
final private int[] field1;
final private int[] field2;
final private long[] field3;
Record getRecord(int i) { return new Record(field1[i],field2[i],field3[i]); }
/* constructor and other accessors & methods */
}
edit:
assume the # of records is something that is changed infrequently or never
I'm probably not going to use the OptimizedRecordStore approach, but I want to understand the storage cost issue so I can make that decision with confidence.
obviously if I add/change the # of records in the OptimizedRecordStore approach above, I either have to replace the whole object with a new one, or remove the "final" keyword.
kd304 brings up a good point that was in the back of my mind. In other situations similar to this, I need column access on the records, e.g. if field1 and field2 are "time" and "position", and it's important for me to get those values as an array for use with MATLAB, so I can graph/analyze them efficiently.
The answers that give the general "optimise when you have to" is unhelpful in this case because , IMHO, programmers should always be aware of the performance in different in design choices when that choice leads to an order of magnitude performance penalty, particularly API writers.
The original question is quite valid and I would tend to agree that the second approach is better, given his particular situation. I've written image processing code where each pixel requires a data structure, a situation not too dissimilar to this, except I needed frequent random access to each pixel. The overhead of creating one object for each pixel was enormous.
The second version is much, much worse. Instead of resizing one array, you're resizing three arrays when you do an insert or delete. What's more, the second version will lead to the creation of many more temporary objects and it will do so on accesses. That could lead to a lot of garbage (from a GC point of view). Not good.
Generally speaking, you should worry about how you use the objects long before you think about performance. So you have a record with three fields or three arrays. Which one more accurately depicts what you're modeling? By this I mean, when you insert or delete an item, are you doing one of the three arrays or all three as a block?
I suspect it's the latter in which case the former makes far more sense.
If you're really concerned about insertion/deletion performance then perhaps a different data structure is appropriate, perhaps a SortedSet or a Map or SortedMap.
If you have millions of records, the second approach has several advantages:
Memory usage: the first approach uses more memory because a) every Java object in heap has a header (containing class id, lock state etc.); b) objects are aligned in memory; c) each reference to an object costs 4 bytes (on 64-bit JVMs with Compressed OOPs or 32-bit JVMs) or 8 bytes (64-bit JVMs without Compressed OOPs). See e. g. CompressedOops for more details. So the first approach takes about two times more memory (more precisely: according to my benchmark, an object with 16 bytes of payload + a reference to it took 28 bytes on 32-bit Java 7, 36 bytes on 64-bit Java 7 with compressed OOPs, and 40 bytes on 64-bit Java 7 w/o compressed OOPs).
Garbage collection: although the second approach seems to create many objects (one on each call of getRecord), it might not be so, as modern server JVMs (e. g. Oracle's Java 7) can apply escape analysis and stack allocation to avoid heap allocation of temporary objects in some cases; anyway, GCing short-lived objects is cheap. On the other hand, it is probably easier for the garbage collector if there are not millions of long-lived objects (as there are in the first approach) whose reachability to check (or at least, such objects may make your application need more careful tuning of GC generation sizes). Thus the second approach may be better for GC performance. However, to see whether it makes a difference in the real situation, one should make a benchmark oneself.
Serialization speed: the speed of (de)serializing a large array of primitives on disk is only limited by HDD speed; serializing many small objects is inevitably slower (especially if you use Java's default serialization).
Therefore I have used the second approach quite often for very large collections. But of course, if you have enough memory and don't care about serialization, the first approach is simpler.
How are you going to access the data? If the accesses over the fields are always coupled, then use the first option, if you are going to process the fields by its own, then the second option is better.
See this article in wikipedia: Parallel Array
A good example about when it's more convenient to have separate arrays could be simulations where the numerical data is packed together in the same array, and other attributes like name, colour, etc. that are accessed just for presentation of the data in other array.
I was curious so I actually ran a benchmark. If you don't re-create the object like you are[1], then SoA beats AoS by 5-100% depending on workload[2]. See my code here:
https://gist.github.com/twolfe18/8168262c5420c7a62d39
[1] I didn't add that because if you are concerned enough about speed to consider this refactor, it would be silly to do that.
[2] This also doesn't account for re-allocation, but again, this is often something you can either amortize away or know statically. This is a reasonable assumption for a pure-speed benchmark.
Notice that the second approach might have negative impact on caching behaviour. If you want to access a single record at a time, you'd better have that record not scattered all across the place.
Also, the only memory you win in the second approach, is (possibly) due to member alignment. (and having to allocate a separate object).
Otherwise, they have exactly the same memory use, asymptotically. The first option is much better due to locality, IMO
Whenever I have tried doing number crunching in Java, I have always had to revert to C-style coding (i.e. close to your option 2). It minimised the number of objects floating around in your system, as instead of 1,000,000 objects, you only have 3. I was able to do a bit of FFT analysis of real-time sound data using the C-style, and it was far too slow using objects.
I'd choose the first method (array of structures) unless you access the store relatively infrequently and are running into serious memory pressure issues.
First version basically stores the objects in their "natural" form (+1 BTW for using immutable records). This uses a little more memory because of the per-object overhead (probably around 8-16 bytes depending on your JVM) but is very good for accessing and returning objects in a convenient and human-understandable form in one simple step.
Second version uses less memory overall, but the allocation of a new object on every "get" is a pretty ugly solution that will not perform well if accesses are frequent.
Some other possibilities to consider:
An interesting "extreme" variant would be to take the second version but write your algorithms / access methods to interact with the underlying arrays directly. This is clearly going to result in complex inter-dependencies and some ugly code, but would probably give you the absolute best performance if you really needed it. It's quite common to use this approach for intensive graphics applications such as manipulating a large array of 3D coordinates.
A "hybrid" option would be to store the underlying data in a structure of arrays as in the second version, but cache the accessed objects in a HashMap so that you only generate the object the first time a particular index is accessed. Might make sense if only a small fraction of objects are ever likely to accessed, but all data is needed "just in case".
(Not a direct answer, but one that I think should be given)
From your comment,
"cletus -- I greatly respect your thoughts and opinions, but you gave me the high-level programming & software design viewpoint which is not what I'm looking for. I cannot learn to ignore optimization until I can get an intuitive sense for the cost of different implementation styles, and/or the ability to estimate those costs. – Jason S Jul 14 '09 at 14:27"
You should always ignore optimization until it presents itself as a problem. Most important is to have the system be usable by a developer (so they can make it usable by a user). There are very few times that you should concern yourself with optimization, in fact in ~20 years of professional coding I have cared about optimization a total of two times:
Writing a program that had its primary purpose to be faster than another product
Writing a smartphone app with the intention of reducing the amount of data sent between the client and server
In the first case I wrote some code, then ran it through a profiler, when I wanted to do something and I was not sure which approach was best (for speed/memory) I would code one way and see the result in the profiler, then code the other way and see the result. Then I would chose the faster of the two. This works and you learn a lot about low level decisions. I did not, however, allow it to impact the higher level classes.
In the second case, there was no programming involved, but I did the same basic thing of looking at the data being sent and figuring out how to reduce the number of messages being sent as well as the number of bytes being sent.
If your code is clear then it will be easier to speed up once you find out it is slow. As Cletus said in his answer, you are resizing one time -vs- three times... one time will be faster than three. From a higher point of view the one time is simpler to understand than the three times, thus it is more likely to be correct.
Personally I'd rather get the right answer slowly then the wrong answer quickly. Once I know how to get the right answer then I can find out where the system is slow and replace those parts of it with faster implementations.
Because you are making the int[] fields final, you are stuck with just the one initialization of the array and that is it. Thus, if you wanted 10^6 field1's, Java would need to separate that much memory for each of those int[], because you cannot reassign the size of those arrays. With an ArrayList, if you do not know the number of records beforehand and will be removing records potentially, you save a lot of space upfront and then later on as well when you go to remove records.
I would go for the ArrayList version too, so I don't need to worry about growing it. Do you need to have a column like access to values? What is your scenario behind your question?
Edit You could also use a common long[][] matrix.
I don't know how you pass the columns to Matlab, but I guess you don't gain much speed with a column based storage, more likely you loose speed in the java computation.

Downsides to immutable objects in Java? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
The advantages of immutable objects in Java seem clear:
consistent state
automatic thread safety
simplicity
You can favour immutability by using private final fields and constructor injection.
But, what are the downsides to favouring immutable objects in Java?
i.e.
incompatibility with ORM or web presentation tools?
Inflexible design?
Implementation complexities?
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
But, what are the downsides to
favouring immutable objects in Java?
incompatibility with ORM or web
presentation tools?
Reflection based frameworks are complicated by immutable objects since they requires constructor injection:
there are no default arguments in Java, which forces us to ALWAYS provide all of the necessary dependencies
constructor overriding can be messy
constructor argument names are not usually available through reflection, which forces us to depend on argument order for dependency resolution
Implementation complexities?
Creating immutable objects is still a boring task; the compiler should take care of the implementation details, as in groovy
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
definitely yes; immutable objects makes great building blocks for other objects (they favor composition) since it's much easier to maintain the invariant of a complex object when you can rely on its immutable components. The only true downside to me is about creating many temporary objects (e.g. String concat was a problem in the past).
With immutability, any time you need to modify data, you need to create a new object. This can be expensive.
Imagine needing to modify one bit in an object that consumes several megabytes of memory: you would need to instantiate a whole new object, allocate memory, etc. If you need to do this many times, mutability becomes very attractive.
If you go for mutability then you will find that whenever you need to call a method that you don't want to have the object change, or you need to return an object that is part of the internal state, you need to make a defensive copy.
If you really look at programs that make use of mutible objects you will find that they are prone to "attack" by modifying:
objects passed to constructors
objects passed to methods
objects returned from methods.
The issue doesn't show up very often because most programs don't change the data (they are in reality immutable by virtue of them never changing).
I personally make every thing I possibly can final. I probably have 90%-95% of all variables (parameters, local, instance, static, exceptions, etc...) marked as final. There are some cases where it has to be mutable, but the vast majority of cases it does not.
I think it might depend on your focus. If you are writing libraries for 3rd parties to use you think about this much more than if you are writing an application that only you (or your team) will maintain.
I find that you can write large scale applications using immutable objects for the majority of the system without too much pain.
Fundamentally, in the real world, the state associated with many particular identities will change. If I ask what is "the present position of Joe's Buick", today it might be a location in Seattle, and tomorrow it might be a location in Los Alamos. It would be possible to define and create a GeographicLocation object whose value will always represent the location where Joe's Buick was at some particular moment in time and would never changes--if today it represents a spot in Seattle, then it will always do so. Such an object, however, would have no continuing identity as "the present location of Joe's Buick".
It may also be possible to define things so that there is a VehicleLocation object which is connected to Joe's Buick such that the object always represents "the present location of Joe's Buick". Such an object could retains its identity as "the present location of Joe's Buick", even as the car moves around, but would not represent a constant geographical location. Defining "identity" may be tricky if one considers the scenario where Joe sells his Buick to Bob and buys a Ford--should the object track "the present location of Joe's Ford" or "the present location of Bob's Buick"--but in many cases such issues may be avoided by using a data model that guarantees that some aspects of object identity will never change.
It isn't possible for everything about an object to be immutable. If an object is immutable, then it cannot have an immutable identity that encapsulates anything beyond its current state. If an object is mutable, however, it can have an immutable identity whose meaning transcends its present state. In many situations, having an immutable identity is more useful than having an immutable state, and in such situations mutable objects are nearly essential. While it is possible in some cases to "simulate" mutable objects by having an immutable object which would search through the most recent version of an immutable objects to find information that may "change" between one version and the next, such an approaches are often extremely inefficient. Even if one could magically receive once per minute a bound book that gave the location of every vehicle everywhere, looking up "Joe's Buick" in the book would take a lot longer than merely asking a "present location of Joe's Buick" object which would always know where the car was.
You pretty much answered your own question. The JavaBean specification, I don't believe, mentions anything about immutability, yet JavaBeans are the bread and butter of many Java frameworks.
The concept of immutable types is somewhat uncommon for people used to imperative programming styles. However, for many situations immutability has serious advantages, you named the most important ones already.
There are good ways to implement immutable balanced trees, queues, stacks, dequeues and other data structures. And in fact many modern programming languages / frameworks only support immutable strings because of their advantages and sometimes also other objects.
With an immutable object, if the value needs to be changed, then it must be replaced with a new instance. Depending on the lifecycle of the object, replacing it with a different instance can potentially increase the tenured (long) garbage collection time. This becomes more critical if the object is kept around in memory long enough to be placed in the tenured generation.
The problem in java is that one has to live with all those objects, where the class looks like:
class Mutable {
State1 f1;
MoreState f2;
void doSomething() { // mutate the state, but don't document it }
void doSomethingElse() /// mutate the state heavily, do not mention in doc
}
(Note the missing Cloneable interface).
The problem with the garbage collector is not such a big one nowadays. The VM's are happy with short living objects.
Advances in Compiler/JIT technology will make it possible, sooner or later, to optimize intermediate temporary object creation away. For example:
BigInteger three =, two =, i1 = ...;
BigInteger i2 = i1.mul(three).div(two);
The JIT could notice that the intermediate object i1.mul(three) can be used for the end result and call a variant of the div method that works on a mutable accumulator.
See Functional Java to attain a comprehensive answer to your question.
Immutability, as every other design pattern, should only be used when you need it. You give the example of thread safety: In a highly threaded application, you could favor immutability over the added expense of making it thread safe yourself.
However, if your design requires objects to be mutable, don't go out of your way to make them immutable, just because "it's a design pattern".
As for your graph, you could choose to make your nodes immutable and let another class take care of the connections between them, or you could make a mutable node that takes care of its own children and has an immutable value class.
Probably the biggest cost of using immutabile objects in Java is that future developers won't be expecting it or used to that style. Expect to either document heavily or watch alot of your objects spawn mutable peers over time.
That being said, the only real technical reason I can think of to avoid immutable objects is GC churn. For most applications, I don't think this is a compelling reason to avoid them.
The biggest thing I've ever done with a ~90% immutable objects was a toy scheme-esque interpreter, so its certainly possible to do complex Java projects.
in immutable data you dont set things twice... see haskell and scala vals (and clojure of cource)...
for example.. for a data structure.. like a tree, when you perform write operation to the tree, in fact you are adding elements outside of the immutable tree.. after you done.. the tree and the branch are recombined in a new tree.. so like this you could perform concurrent reads and writes very safelly..
in tradicional model, you must lock a value cause it could be reseted any time.. so.. you end up with a very heat zone for threads..since they act sequentially there anyway..
with imuttable data, you dont set things more than once.. its a whole new way of programming.. you may end up using a little bit more memory.. but parallelizing is natural and painless..
As with any tool, you have to know when to use it and when not to.
Like Tehblanx points out that if you want to change the state of a variable that holds an immutable object, you have to create a new object, which can be expensive, especially if the object is big and complex. Absolutely true, but that simply means that you have to intelligently decide which objects should be mutable and which should be immutable. If someone is saying that ALL objects should be immutable, well, that's just crazy talk.
I'd tend to say that objects that represent a single logical "fact" should be immutable, while objects that represent multiple facts should be mutable. Like, an Integer or a String should be immutable. A "Customer" object that contains name, address, current amount, date of last purchase, etc should be mutable. Of course I can immediately think of a hundred exceptions to such a general rule. An exception I make all the time is when I have a class that just exists as a wrapper to hold a primitive in some case where a primitive is not legal, like in a collection, but I need to update it constantly.
In Java, a method can't return multiple objects, like return a, b, c. Returning an array of objects makes the code look ugly. In this situation, I have to pass mutable objects to the method and let it change the states of these objects. However, I don't know whether returning multiple objects is a code smell or not.
The answer is none. There are not any good reasons to be mutable.
You do run in to problems with lots of frameworks(or framework versions) that require mutable objects in order to work with them(Spring I am glaring in your direction). As you work with them and fish through the code you will shake your fist in anger that you need to introduce dirty mutability into an otherwise glorious block of code when it could have been easily avoided.
I'm sure there are limited corner cases(probably more hypothetical that anything) where the overhead of object creation and collection is uncceptable. But I urge the people that would make this argument to look at languages like scala where included collections are immutable by default and then look at the bevy of performance critical apps built on top of that concept.
This is of course hyperbole. In reality, you should go with immutability first, see if it causes you any measurable problems, if it does then introduce mutability, but make sure you can prove it solves your problem. Otherwise you've just created liability for no benefit. In doing this I think you'll find objective cases for "Implementation Complexity" and "Inflexibility" very hard to make.
Some implementations of immutable objects have transactional means to update an immutable object. Similar to how databases provide safe commits and rollbacks. But in apparent contrast with many of the answers here. Immutable objects are never changed. A typical operation would be.
B = append(A,C)
B is a new object. Just like A and C. No modification was made to A or C. Internally a red black tree implementation makes such semantics fast enough to be usable.
The downside is that it is not as fast as making the operations in place. But that only compares a single part of the system. When evaluating possible downsides we need to look at the system as a whole. And I personally don't have a clear picture of the entire impact. Although I suspect immutability wins out at the end.
I know some experts contend there is contention at the top level of the red black tree. And that has a negative effect in throught-put.
My biggest worry with immutable data structures is how to save/reconstitute them. That is, if a class has final fields, I can't instantiate it and then set its fields.

Categories