There's something I really don't understand: a lot (see my comment) of people complain that Java isn't really OO because you still have access to primitive and primitive arrays. Some people go as far as saying that these should go away...
However I don't get it... Could you do efficiently things like signal processing (say write an FFT, for starters), writing efficient encryption algorithms, writing fast image manipulation libraries, etc. in Java if you hadn't access to, say, int[] and long[]?
Should I start writing my Java software by using List<Long> instead of long[]?
If the answer is "simply use higher-level libraries doing what you need" (for example, say, signal processing), then how are these libraries supposed to be written?
I personally use List most of the times, because it gives you a lot of convenience. You can also have concurrent collections, but not concurrent raw arrays.
Almost the only situation I use raw arrays is when I'm reading a large chunk of binary data, like image processing. I'm concerned instantiating e.g.Byte objects 100M times, though I have to confess I never tried working with that huge Byte list. I noticed when you have something like a 100KB file, List<Byte> works ok.
Most of the image processing examples etc. use array as well, so in this field it's actually more convenient to use raw arrays.
So in conclusion, my practical answer to this is
Use wrappers unless you are
Working with a very large array
like length > 10M (I'm too lazy to
write a benchmark!),
Working in a field
where many examples or people prefer
raw arrays (e.g. network
programming, image processing),
You found out there is a significant
performance gain by changing to raw arrays, by doing
experiments.
If for whatever
reason it's easier to work with raw
arrays on that problem for you.
In high performance computing, arrays of objects (as well as primitives) are essential as they map more robustly onto the underlying CPU architecture and behave more predictably for things such as cache access and garbage collection. With such techniques, Java is being used very successfully in areas where the received wisdom is that the language is not suitable.
However, if your goal is solely to write code that is highly maintainable and provably self consistent, then the higher level constructs are the obvious way to go. In your direct comparison, the List object hides the issue of memory allocation, growing your list and so on, as well as providing (in various implementations) additional facilities such as particular access patterns like stacks or queues. The use of generics also allows you to carry out refactoring with a far greater level of confidence and the full support of your IDE and toolchain.
An enlightened developer should make the appropriate choice for the use case they are approaching. Suggesting that a language is not "OO enough" because it allows such choices would lead me to suspect that the person either doesn't trust that other developers are as smart as they are or has not had a particularly wide experience of different application domains.
It's a judgment call, really. Lists tend to play better with generic libraries and have stuff like add, contains, etc, while arrays generally are faster and have built-in language support and can be used as varargs. Select whatever you find serves your purpose better.
Okay.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created. But, a list can grow dynamically after it has been created, and it has the .Add() function to do that.
Have you gone through this link ?
A nice comparison of Arrays vs List.
Array or List in Java. Which is faster ?
List v/s Array
Related
This may be a silly question, but it's been bugging me for a while. Most programming languages have arrays (e.g. Java, C/C++, C#...ok Python has lists!) but in much of the literature I see, some data structures (such as stacks and queues) are treated as more basic as arrays. But since so many languages have such great support for arrays, why would anyone use a stack or queue? I realize that conceptually a data structure other than an array may better fit the model, but considering you may have to implement your own stack or queue it's a lot more work considering how fundamental arrays are.
One example I'm thinking of is from these notes on the Gale Shapley
algorithm Maintain a list of free men (in a stack or queue).
Wouldn't it be easier to just use an array and trust yourself to only look at the front/end of it?
I guess I'm asking, why would anyone bother using something like a stack or queue when most are implemented using arrays, and arrays are more powerful anyways?
There are several reasons. First of all, it's helpful if your implementation more closely matches the algorithm. That is, if your algorithm uses a stack it's useful if you can say, for example, stack.push(thing) rather than writing:
if (stack_index >= stack_array.length)
{
// have to extend the stack
....
}
stack_array[stack_index++] = thing
And you'd have to write that code every place you push something onto the stack. And you'd have to write multiple lines of code every time you wanted to pop from the stack.
I don't know about you, but I find it incredibly easy to make mistakes when I'm writing code like that.
Seems easier, much more clear, and way more reliable to encapsulate that functionality into a Stack object that you can then use as it's intended: with calls to push and pop methods.
Another benefit is that when you find yourself having to do a quick thread-safe stack, you can modify your Stack class to put locks around any code that changes the internal structure (i.e. the array), and any callers automatically have a thread-safe stack. If you were to address the array directly in your code, then you'd have to go to every place that you access the underlying array and add your lock statements. I'd give better than even odds that you'd make a mistake in there somewhere and then you'd have an interesting time tracking down that intermittent failure.
Arrays themselves aren't particularly powerful. They're flexible, but they have no smarts. We wrap behavior around arrays to limit what they can do so that we don't have to "trust ourselves" not to do stupid things, and also so that we can do the right things more easily. If I'm implementing an algorithm that uses a queue, then I'm going to be thinking in terms of a queue that has Enqueue and Dequeue operations rather than in terms of a linked list or an array that has head and tail indexes that I have to manage, etc.
Finally, if you write a Stack or Queue or similar data structure implementation, test it and prove it correct, you can use it over and over again in multiple projects without ever having to look at the code again. It just works. That's opposed to rolling your own with an array so that you have to debug it not just in every project you use a stack in, but you have the potential of screwing up every single push or pop.
To sum up, we create data structures rather than use raw arrays:
Because it's easier to think in terms of data structure operations rather than the mechanics of working with an array.
Code re-use: write once, debug, and then use it in multiple projects.
Simplifies code (stack.push(item) rather than multiple lines of array indexing).
Reduces potential for error.
Easier for the next guy to come by and understand what you did. "Oh, he's pushing items onto a stack."
Internally I'd say most of those classes are implemented with the help of arrays. But it would be tedious to use Arrays as Stacks or Queues. Arrays are fixed length things where you cannot insert stuff at arbitrary places. You would have to do much copying around of array elements, enlarging and shrinking the array or keep in mind what your head and tail positions are etc. The Stack and Queue classes do all this for you and you can just use the much more convenient push, pop, etc. methods.
Arrays are just one more type of data structure. They have specific use cases, just like any other.
All data structures have particular properties, e.g.
fixed vs variable size;
ordered vs unordered;
allows duplicates vs prohibits duplicates
covariant or not
can contain primitives or not
specific time complexity on insertion/removal/retrieval operations
iteration order
...
Whether you choose to use an array or any other data structure depends upon what you are trying to do, and whether that data structure possesses the properties you require.
And it is better to have simple data structures which do one thing well, than to attempt to have an uber data structure which does everything.
you can change the size of stacks and queues easier than the size from an array
thats is very difficult.
if you know how big your array should be, use an array. but if you don#t know it stacks and queues are the better choice.
Admirable that you would trust yourself so much, but when you're creating a project with several other developers (or even by yourself), you can't rely on trust.
Different data structures make it more obvious what the code is doing, it prevents you from doing wrong things (no matter how much you trust yourself), they can provide performance, concurrency or content guarantees and dozens of other things that simple arrays can't do or just aren't the best fit for.
Why would you have pockets when backpacks are invented?
I agree with #Vampire. Arrays provide instant access to any of it's elements if you have the index. Whereas stacks and queues give you the convenience of LIFO and FIFO orderings which allow you to implement many algorithms much easier. Also with the stacks and queues it's easier to add or remove elements since with array you have a limited memory at all times and your variables are in consecutive memory blocks you would have to move a lot of data and allocate/deallocate memory accordingly. Please check this link for further information.
I'm developing in an application requiring lots of objects in memory. One of the largest structures is of the type
Map<String,Set<OwnObject>> (with Set as HashSet)
with OwnObject being a heavyweight object representing records in a database. The application works, but has a rather large memory footprint. Reading this Java Specialists newsletter from 2001, I've analyzed the memory usage of my large structure above. The HashSet uses a HashMap in the back, which in turn is quite a heavyweight object, and I guess this is where most of my additional memory goes.
Trying to optimize the memory usage of the structure, I tried around with multiple versions:
Map<String,List<OwnObject>> (with List as ArrayList)
Map<String,OwnObject[]>
Both work, and both are much more lean than the version using the Set<>. However, I'd like to keep the Set contract in place (uniqueness of entries).
One way would be to implement the logic myself. I could extend ArrayList and ensure the contract in add().
Are there frameworks implementing lightweight collections that honor the Set contract? Or do I miss something from the Java collections that I could use without ensuring uniqueness by myself?
The solution I implemented is the following:
Map<String,OwnObject[]>
Adding and removing to the array was done using Arrays.binarySearch() and 2 slice System.arraysCopy()s, by which sorting and uniqueness happen on the side.
Is there a method where I can iterate a Collection and only retrieve just a subset of attributes without loading/unloading the each of the full object to cache? 'Cos it seems like a waste to load/unload the WHOLE (possibly big) object when I need only some attribute(s), especially if the objects are big. It might cause unnecessary cache conflicts when loading such unnecessary data, right?
When I meant to 'load to cache' I mean to 'process' that object via the processor. So there would be objects of ex: 10 attributes. In the iterating loop I only use 1 of those. In such a scenario, I think its a waste to load all the other 9 attributes to the processor from the memory. Isn't there a solution to only extract the attributes without loading the full object?
Also, does something like Google's Guava solve the problem internally?
THANK YOU!
It's not usually the first place to look, but it's not certainly impossible that you're running into cache sharing problems. If you're really convinced (from realistic profiling or analysis of hardware counters) that this is a bottleneck worth addressing, you might consider altering your data structures to use parallel arrays of primitives (akin to column-based database storage in some DB architectures). e.g. one 'column' as a float[], another as a short[], a third as a String[], all indexed by the same identifier. This structure allows you to 'query' individual columns without loading into cache any columns that aren't currently needed.
I have some low-level algorithmic code that would really benefit from C's struct. I ran some microbenchmarks on various alternatives and found that parallel arrays was the most effective option for my algorithms (that may or may not apply to your own).
Note that a parallel-array structure will be considerably more complex to maintain and mutate than using Objects in java.util collections. So I'll reiterate - I'd only take this approach after you've convinced yourself that the benefit will be worth the pain.
There is no way in Java to manage loading to processor caches, and there is no way to change how the JVM works with objects, so the answer is no.
Java is not a low-level language and hides such details from the programmer.
The JVM will decide how much of the object it loads. It might load the whole object as some kind of read-ahead optimization, or load only the fields you actually access, or analyze the code during JIT compilation and do a combination of both.
Also, how large do you worry your objects are? I have rarely seen classes with more than a few fields, so I would not consider that big.
Are there any Java libraries for maps and sets that alter their representation strategy based upon the capacity? I have an application where we have many many maps and sets, but most of the time they are small, usually 6 elements or less.
As such we've been able to extract some good memory improvements by writing some specialized maps and sets that just use arrays for small sizes and then default to standard Java Sets and Maps for larger capacities.
However, rolling our own specialized versions of set and maps seems kind of silly if there is already something off the shelf. I've looked at guava and the Apache collections and they do not seem to offer anything like this. Trove sounds like it is more memory efficient than the JDK's collections in general, but it isn't clear if it will attempt to minimize memory usage like this.
You may want to look at Clojure's persistent data structures. Although the "persistent" part may be overkill for you, it does exactly what you are looking for and is still really fast. There is a PersistentArrayMap that is promoted to a PersistentHashMap once the collection exceeds 16 entires.
I'm not aware of any such library.
The problem is that the representations that use the least amount of memory tend to:
be incompatible with the Java Collections APIs which makes integration hard, and
break down the abstraction boundaries; e.g. by adding link fields to element types.
These make it difficult to create a general purpose library along these lines. Then we add the problem that a representation that adapts to minimize heap space usage as the collection grows and shrinks will inevitably create a lot more garbage ... and that will have CPU performance implications.
Your approach is kind of interesting, though it doesn't give you anywhere like minimal memory usage. I assume that your classes are effectively wrappers for the standard implementation classes when the collections get big. If it works for you, I suggest that you stick with it.
i have to write a simple vector/matrix library for a small geometry related project i'm working on. here's what i'm wondering.
when doing mathematical operations on vectors in a java environment, is it better practice to return a new instance of a vector or modify the state of the original.
i've seen it back and forth and would just like to get a majority input.
certain people say that the vectors should be immutable and static methods should be used to create new ones, others say that they should be mutable and normal methods should be used to modify their state. i've seen it in some cases where the object is immutable and normal methods are called which returns a new vector from the object without changing the state - this seems a little off to me.
i would just like to get a feel for if there is any best practice for this - i imagine it's something that's been done a million times and am really just wondering if there's a standard way to do this.
i noticed the apache commons math library returns a new vector every time from the original.
How important is performance going to be? Is vector arithmetic going to be a large component so that it affects the performance of overall system?
If it is not and there is going to be lot of concurrency then immutable vectors will be useful because they reduce concurrency issues.
If there are lot of mutations on vectors then the overhead of new objects that immutable vectors will require will become significant and it may be better to have mutable vectors and do the concurrency the hard way.
It depends. Generally speaking, immutability is better.
First and foremost, it is automatically threadsafe. It is easier to maintain and test.
That said, sometimes you need speed where creating new instances will take too much time.
(Note: If you're not 100% positive you need that amount of speed, you don't need it. Think high-frequency trading and real-time math-intensive applications. And even though, you should go simple first, and optimize later.)
As for static vs normal methods, following good OOP principles, you shouldn't have static methods. To create new vectors/matrices you can use the constructor.
Next, what's your backing structure? Your best bet is probably single-dimensional arrays of doubles for vectors and multi-dimensional arrays of doubles for matrices. This at least lets you stay relatively quick by using primitive objects.
If you get to the point that you need even more performance, you can add modifiers on your Vector/Matrix that can change the backing data. You could even decide that the dimensions are immutable but the contents are mutable which would give you some other safeties as well.