Can anyone give me references of a web site containing a summary of the main Java data structures, and their respective complexity in time (for some given operations like add, find, remove), e.g. Hashtables are O(1) for finding, while LinkedLists are O(n). Some details like memory usage would be nice too.
This would be really helpful for thinking in data structures for algorithms.
Is there a reason to think that Java's implementation is different (in terms of complexity) than a generic, language agnostic implementation? In other words, why not just refer to a general reference on the complexity of various data structures:
NIST Dictionary of Algorithms and Data Structures
But, if you insist on Java-specific:
Java standard data structures Big O notation
Java Collections cheatsheet V2 (dead link, but this is the first version of the cheatsheet)
The most comprehensive Java Collections overview is here
http://en.wikiversity.org/wiki/Java_Collections_Overview
I found very useful The Collections Framework page, expecially the Outline of the Collections Framework, where every interface/class is breeefly described. Unfortunately there's no big-O information.
I couldn't see this particular resource mentioned here, i've found it of great use in the past. Know Thy Complexities!
http://bigocheatsheet.com/
Time and space complexities for the main collection classes should correspond to data structures known time complexity. I don't think there's anything Java specific about it, e.g. (as you say) hash lookup should be O(1). You could look here or here.
I don't believe there is any single website outlining this (sounds like a good idea for a project though). I think part of the problem is that an understanding in how each of the algorithms runs is very important. For the most part, it sounds like you understand Big-O, so I would use that as your best guesses. Follow it up with some benchmarking/profiling to see what runs faster/slower.
And, yes, the Java docs should have much of this information in java.util.
Related
I am very new to Java. I have recently come across fastutil and found ObjectArrayList class.
Is there any difference in performance if ObjectArrayList is used instead of ArrayList? What are the use cases for using ObjectArrayList?
According to fastutil documentation
A type-specific array-based list; provides some additional methods that use polymorphism to avoid (un)boxing.
There is a performance benefit to fastutil's implementation in cases where (un)boxing takes place.
The ObjectArrayList is backed by a generic type array. Whereas an ArrayList is backed by Object[] Really the performance between the two would be nominal. However FWIW, looks like this library provides primitive backed arrays IntArrayList DoubleArrayList where these boxing claims would actually see visible benefits in large datasets.
However, if you're new to Java. I'd highly recommend getting familiar with java.util.ArrayList before seeking out other variants.In most cases, taking the standard outweighs the performance benefit.
You should not be concerned with performance when you start learning a new language. Focus on the basics. Write code that runs, and then write code that is useful. Speed and efficiency only matters when the code you write affects the world around you (such as writing code for work). Do not worry about fastutil and ObjectArrayList. Finish your application with ArrayList and if your app is too slow for you only then should you find something faster.
Both ArrayLists and Vectors make use of typical arrays internally. However, that leaves me thinking... why would I use ArrayLists when I can technically do the same thing using Arrays? Is convenience the only reason? Do performance-critical applications ever make use of an ArrayList?
Any tips would be appreciated.
I believe there are multiple reasons to prefer Lists over "implementing lists over arrays" or over "using arrays", but here are the two that I think are most important:
Lists have better support to generics than Arrays (you can, and should, read about it in "Effective Java" by Bloch - see Item 25)
If you ask about using ArrayList vs. implementing it yourself - I find it hard to believe that you'll do a better job than the guys that developed it in openjdk (Josh Bloch and Neal Gafter).
Yes, performance critical applications use ArrayList all the time. It's very unlikely that array access is the dominant factor in the vast majority of programs written in Java.
The ArrayList Collection interface is much richer than the functionality provided by built-in primitive arrays. This extra functionality will save you development time as well as debugging time by not having to write those algorithms yourself.
Additionally, many programmers are already familiar with the ArrayList Collection interface and thus by utilizing the existing standard libraries it will make your code easier to read and maintain for the long term.
One reason is that ArrayLists sizes are dynamic, arrays aren't.
The internal implementation of ArrayList is array only. but ArrayList is an wrapper class which is having more capabilities added to it. These capabilities are not available when you deal with Array directly.
For example,
Delete an element from array, you will have to implement logic if your are using an Array. But if you are using ArrayList, it will do the deletion for you.
Adding an element to array:
If you are using an array, you will have to implement the logic. But using an ArrayList, it is pretty easy.
You will find lot of methods in this ArrayList class that are handy for day to day use.
Hope this will help you.
I am currently writing some code in java meant to be a little framework for a project which revolves around a database with some billions of entries. I want to keep it high-level and the data retriueved from the database shoud be easily usable for statistic inference. I am resolved to use the Map interface in this project.
a core concept is mapping the attributes ("columns in the database") to values ("cells") when handling single datasets (with which I mean a columns in the database) for readable code: I use enum objects (named "Attribute") for the attribute types, which means mapping <Attribute, String>, because the data elements are all String (also not very large, maximum 40 characters or so).
There are 15 columns, so there are 15 enums, and the maps will have only so much entries, or less.
So it appears, I will be having a very large number of Map objects floating around, at times, but with comparatively little payload (15-). My goal is to not make the memory explode due to the implementation memory overhead, compared to the actual payload. (Stretch goal: do the same with cpu usage ;] )
I was not really familiar with all the different implementations of Java Collections to date, and when the problem dawned at me today, I looked into my to-date all-time favorite 'HashMap', and was not happy with how much memory overhead there was declared. I am sure, that additonal to the standard implementations, there are a number of implementations not shipped with Java. Googling my case brought not up much of a result, So I am asking you:
Do you know a good implementation of Map for my use case (low entry count, low value size, enumerable keys, ...)
I hope I made my use case clear, and am anxious for your input =)
Thanks a lot!
Stretch answer goal, absolutely optional and only if you got the time and knowledge:
What other implementations of collections are suitable for:
handling attribute (the String things) vectors, and matrices for inference data (counts/probabilities) (Matrices: here I am really clueless for now, Did really no serious math work with java to date)
math libraries for statistical inference, see above
Use EnumMap, this is the best map implementation if you have enums as key, for both performance and memory usage.
The trick is that this map implementation is the only one that that does not store the keys, it only needs a single array with the values (similar to an ArrayList of the values). There is only a little bit of overhead if there are keys that are not mapped to a value, but in most cases this won't be a problem because enums usually do not have too many instances.
Compared to HashMap, you additionally get a predictable iteration order for free.
Since you start off saying you want to store lots of data, eventually, you'll also want to access/modify that data. There are many high performance libraries out there.
Look at
Trove4j : https://bitbucket.org/robeden/trove/
HPPC: http://labs.carrotsearch.com/hppc.html
FastUtil: http://fastutil.di.unimi.it/
When you find a bottleneck, you can switch to using a lower level API (more efficient)
You'll many more choices if look a bit more: What is the most efficient Java Collections library?
EDIT: if your strings are not unique, you could save significant amounts of memory using String.intern() : Is it good practice to use java.lang.String.intern()?
You can squeeze out a bit of memory with a simple map implementation that uses two array lists (keys and values). For larger maps, that is going to mean insertion and look up speeds become much slower because you have to scan the entire list. However, for small maps it is actually faster this way since you don't have to calculate any hashcodes and only have to look at a small number of entries.
If you need an implementation, take a look at my SimpleMap in my jsonj project: https://github.com/jillesvangurp/jsonj/blob/master/src/main/java/com/github/jsonj/SimpleMap.java
Hi I want to know what is the time complexity of the "replaceAll" function of the String class but I can't find any information on it.(http://docs.oracle.com/javase/6/docs/api/java/lang/String.html)
Wouldn't it be better for Java to include the complexities in the Javadoc? I believe it's a very important thing for someone to know.
Most functions have fairly straight forward time complexities. AFAIK, replaceAll is O(n)
IMHO. Nothing beats testing this yourself empirically e.g. with a profiler, because its highly likely that 99% of the methods you use have little to no impact on the performance of your application.
The complexity may be documented if guaranteed. For example, some of the collections classes document complexity guarantees. For example, from HashMap:
This implementation provides constant-time performance for the basic operations (get and put) ...
However, sometimes the complexity is:
Not guaranteed, and free to change with modifications to the implementation.
Obviously O(1).
The javadocs of the Java API specify a general contract of what must be done by each method, not how. Each implementor of the API (say, OpenJDK, Oracle's JDK, etc.) has a certain freedom on how to implement each contract, and that freedom may include making optimizations, even sacrifices in performance. So the javadocs in general don't specify details such as time/complexity of functions, unless it's absolutely necessary for a method to meet certain performance requirements.
If you're using the space / time complexity of basic operations to drive design decisions, then you're almost certainly doing it wrong.
First build a correct application, then profile it. Then optimize what the profiling process reveals as the bottlenecks.
The general answer is that the complexity typically depends on factors that are too difficult to analyse. This certainly applies for String.replaceAll, where the effective complexity depends critically on the regex string. (A poorly designed regex can make matching veerryyy slow.)
I'm learning PHP5 (last time I checked PHP was in PHP4 days) and I'm glad to see that PHP5 OO is more Java-alike than the PHP4 one but there's still an issue that makes me feel quite unconfortable because of my Java background : ARRAYS.
I'm reading "Proffesional PHP6" (Wrox) and It shows its own Collection implementation.
I've found other clases like the one in http://aheimlich.dreamhosters.com/generic-collections/Collection.phps based on SPL.
I've also found that there's some kind of Collection in SPL (ArrayObject)
However, I'm surprised because I don't really see people using Collections in PHP, they seem to prefer arrays.
So, isn't it a good idea using Collections in PHP just like people use ArrayList instead of basic arrays in Java? After all, php arrays aren't really like java arrays.
Collections in Java make a lot of sense since it's a strongly typed language. It makes sense to have a collection of say "Cars" and another of "Motorbikes".
However, in PHP, due to the dynamically typed nature, it is quite common to sacrifice the formality of Collections. Arrays are sufficient to be used as generic containers of various object types (Cars, Motorbikes, etc.). Also, the added benefit comes from the fact that arrays can be mutated very easily (which sometimes can be a big disadvantage when proper error checking is absent).
I come from a Java background, and I've found that using a Collections design pattern in PHP does not buy much in the way of advantages (no multi-threading, no optimization of memory allocation, no iterators, etc.).
If you're looking for any of those advantages, its probably better to construct a wrapper class around the array, implementing each feature (iterators, etc.) a la carte.
I am very pro collection objects in PHP, they can be used to add type safety, impliment easy to use search, sort and manipulation functionality, and represent the correct OO approach rather then using arrays and the multitude of useful but procedual functions that operate on them in differing patterns all over the source.
We have various collections that we use for various purposes all neatly inherited promoting type safety, consistent coding standards and a high level of code reuse.
But ultimatley, they are all array's internally!
I suppose really it comes down to choice, but in my object oriented world I like to keep easily repeatable segments of code such as sort and search algorithms in base classes, and I find the object notation more self documenting.
PHP arrays are associative... They're far more powerful than Java's arrays, and include much of the functionality of List<> and Map<>.
What do you mean by "good idea"? They're different tools, using one language in the way you used another usually results in frustration.
I, too, was somewhat dismayed to find no Collection type classes in PHP. Arrays have a couple of real disadvantages in my experience.
First, the number of functions available to manipulate them is somewhat limited. For example, I need to be able to arbitrarily insert and remove items to/from a Collection at a given index position. Doing that with the built-in language functions for arrays in PHP is painful at best.
Second, as a sort of offshoot of the first point, writing clean, readable code that manipulates arrays at any level of complexity beyond simple push/pop and iterator stuff is difficult at best. I often find that I have to use one array to index and keep track of another array in data-intensive apps I create.
I prefer working in a framework (my personal choice is NOLOH). There, I have a real Collection class called ArrayList that has functions such as Add, Insert, RemoveAt, RemoveRange and Toggle. I imagine other PHP frameworks address this issue as well.
A nice implementation of collection in php is provided by Varien Lib, this library is part of Magento code with OSL license. ( more info about Magento license and code reuse here.
Cannot find any source code for the library so the best way is to download magento and then look in /lib/Varien/
Yii has implementation of full java like collections stack
http://www.yiiframework.com/doc/api/1.1/CList
I sometimes use this really simple implementation to give me a rough and ready collection.
Normally the main requirement of a collection is enforcing a group of one type of object, you just have to setup a basic class with a constructor to implement it.
class SomeObjectCollection {
/**
* #var SomeObject[]
*/
private $collection = array();
/**
* #param SomeObject $object1
* #param SomeObject $_ [optional]
*/
function __construct(SomeObject $object1 = null, SomeObject $_ = null)
{
foreach (func_get_args() as $index => $arg) {
if(! $arg instanceof SomeObject) throw new \RuntimeException('All arguments must be of type SomeObject');
$this->collection[] = $arg;
}
}
/**
* #return SomeObject[]
*/
public function getAll()
{
return $this->collection;
}
}