I have list of objects which contains a statusEnum. Now, I want to return all those objects which falls under specific list of provided statuses.
A simple solution is to loop on list of objects and then another for loop on provided list of statusEnums ... This would work however, It would make the time complexity of O(n)^2. Is there a way i could reduce it to O(n) ?
I can't change the map. The only other solution i could think of is maintaining another map based on statusEnums as the key but then it would increase the space complexity a lot.
EDIT
I had hashMap of objects (which i said as a list)
Here is the code which i came up with for others ...
public List<MyObjects> getObjectsBasedOnCriteria (List<ObjectStatus> statuses, String secondCriteria){
EnumSet<ObjectStatus> enumSet = EnumSet.copyOf(statuses);
for (Map.Entry<Long, MyObject> objEntry : myObjs.entrySet()){
MyObjects obj = objEntry.getValue();
if (enumSet.contains(obj.getStatus()) && obj.equals(secondCriteria)){
...
}
}
}
Use an Set to hold statusEnums (probably an EnumSet), and check if each instance's status is in that set using set.contains(object.getStatus()), or whatever.
Lookups in EnumSet and HashSet are O(1), so the solution is linear (assuming just one status per object). EnumSet.contains is more efficient than HashSet.contains with enum values; however, the choice is irrelevant to overall time complexity.
Assuming you have a sane number of statuses, esp if you have a enum of statuses you can use an EnumSet to match the status or a HashMap.
Even if you don't do this the time complexity is O(n * m) where n is the number of entries and m if the number of statuses you are looking for. In general it is assumed that you will have much more records than you have statuses you are checking for.
The number of possible enum values is limited to a few thousand due to a limitation in way Java is compiled so this is always an upper bound for enums.
Related
For one of my school assigments, I have to parse GenBank files using Java. I have to store and retrieve the content of the files together with the extracted information maintaining the smallest time complexity possible. Is there a difference between using HashMaps or storing the data as records? I know that using HashMaps would be O(1), but the readability and immutability of using records leads me to prefer using them instead. The objects will be stored in an array.
This my approach now
public static GenBankRecord parseGenBankFile(File gbFile) throws IOException {
try (var fileReader = new FileReader(gbFile); var reader = new BufferedReader(fileReader)) {
String organism = null;
List<String> contentList = new ArrayList<>();
while (true) {
String line = reader.readLine();
if (line == null) break; //Breaking out if file end has been reached
contentList.add(line);
if (line.startsWith(" ORGANISM ")) {
// Organism type found
organism = line.substring(12); // Selecting the correct part of the line
}
}
// Loop ended
var content = String.join("\n", contentList);
return new GenBankRecord(gbFile.getName(),organism, content);
}
}
with GenBankRecord being the following:
record GenBankRecord(String fileName,String organism, String content) {
#Override
public String toString(){
return organism;
}
}
Is there a difference between using a record and a HashMap, assuming the keys-value pairs are the same as the fields of the record?
String current_organism = gbRecordInstance.organism();
and
String current_organism = gbHashMap.get("organism");
I have to store and retrieve the content of the files together with the extracted information maintaining the smallest time complexity possible.
Firstly, I am somewhat doubtful that your teachers actually stated the requirements like that. It doesn't make a lot of sense to optimize just for time complexity.
Complexity is not efficiency.
Big O complexity is not about the value of the measure (e.g. time taken) itself. It is actually about how the measure (e.g. time taken) changes as some variable gets very large.
For example, HashMap.get(nameStr) and someRecord.name are both O(1) complexity.
But they are not equivalent in terms of efficiency. Using Java 17 record types or regular Java classes with named fields will be orders of magnitude faster than using a HashMap. (And it will use orders of magnitude less memory.)
Assuming that your objects have a fixed number of named fields, the complexity (i.e how the performance changes with an ever increasing number of fields) is not even a relevant.
Performance is not everything.
The most differences between HashMap and a record class are actually in the functionality that they provide:
A Map<String, SomeType> provides an set of name / value pairs where:
the number of pairs in the set is not fixed
the names are not fixed
the types of the values are all instances of SomeType or a subtype.
A record (or classic class) can be viewed as set of fieldname / value pairs where:
the number of pairs is fixed at compile time
the field names are fixed at compile time
the field types don't have to be subtypes of any single given type.
As #Louis Wasserman commented:
Records and HashMap are apples and oranges -- it doesn't really make sense to compare them.
So really, you should be choosing between records and hashmaps by comparing the functionality / constraints that they provide versus what your application actually needs.
(The problem description in your question is not clear enough for us to make that judgement.)
Efficiency concerns may be relevant, but it is a secondary concern. (If the code doesn't meet functional requirements, efficiency is moot.)
Is Complexity relevant to your assignment?
Well ... maybe yes. But not in the area that you are looking at.
My reading of the requirements is that one of them is that you be able to retrieve information from your in-memory data structures efficiently.
But so far you have been thinking about storing individual records. Retrieval implies that you have a collection of records and you have to (efficiently) retrieve a specific record, or maybe a set of records matching some criteria. So that implies you need to consider the data structure to represent the collection.
Suppose you have a collection of N records (or whatever) representing (say) N organisms:
If the collection is a List<SomeRecord>, you need to iterate the list to find the record for (say) "cat". That is O(N).
If the collection is a HashMap<String, SomeRecord> keyed by the organism name, you can find the "cat" record in O(1).
I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose
You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.
Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.
Consider, if I have to search for a particular row in a table, as per ORM each row is an object I believe. I did not work intensely on JDBC, so generally as a better practice where are these POJO objects collected or holded? In a set or list?
I am trying to find the complexity of searching an element in a List Vs. Set
What I have done?
private void searchSet() {
Set<String> names = new HashSet<>();
names.add("srk");
names.add("lastminute");
names.add("monkey");
for(String x:names){
if(x.equals("monkey")){
System.out.println("caught the name "+x);
}
}
}
private void searchList() {
List<String> names = new ArrayList<>();
names.add("srk");
names.add("lastminute");
names.add("monkey");
for(String x:names){
if(x.equals("monkey")){
System.out.println("caught the name "+x);
}
}
}
I am calculating the time taken to search element in set and list using the following approach.
long startTime,endTime,totalTime;
startTime = System.nanoTime();
endTime = System.nanoTime();
totalTime = endTime - startTime;
Now, I have the statistics which are hereunder
System.out.println("Time taken to search an element in list : "+totalTime);//for list - 614324
System.out.println("Time taken to search an element in set : "+totalTime);//for set - 757359
Based on these stats can conclude that its faster to search an element in List than set?
Which is a better collection to store database record objects, for searching. What is the complexity of searching an element in a List Vs Set. in a generic sense?
Data structures don't have complexities, algorithms have. (Note that data structures usually come with the complexities of their basic operations, which are tiny algorithms themselves.) In your case, you implemented the find algorithm yourself for both containers, and you did it as a linear search, which is O(n). The speed difference you observed is the result of an ArrayList being simpler and faster to traverse than a HashSet, i.e. the algorithm has the same complexity, but the constant factor is smaller.
Second, you have I/O within the functions you want to time. This will usually completely dominate any actual operations you perform and make your benchmark useless.
Third, you're looking for complexity and you wrote a benchmark. That's just wrong. You can get a hint for complexity by having a benchmark and plotting the results for different input sizes in a graph, but to really learn the complexity, you have to analyze the algorithm, not run it.
Fourth, List and Set in Java aren't data structures, they are interfaces. The data structures you have chosen are ArrayList (a version of the contiguous array data structure implementing the List interface) and HashSet (a version of the hash table data structure implementing the Set interface). So you need to look at those.
For an array, unless it's sorted, the find algorithm takes linear time, because you have no option other than traversing the whole thing.
For a hash table, which is optimized for lookup, the find algorithm is still technically O(n) in the worst case, but in the common case will be O(1). However, you have to actually use the optimized find algorithm (offered by Set.contains) in order to exploit this - a linear search over HashSet is no better (and actually worse) than a linear search over an ArrayList.
There are already method contain() given in both the collections then why are you again traversing ? complexity for list is O(n) and for set it is O(1) which is constant.
List Implementation code:
https://referencesource.microsoft.com/#PresentationFramework/src/Framework/System/Windows/Documents/List.cs,eabc7101897ec6e6
Set Implementation Code:
https://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,50c894a3f7ad7bd0
Data Structure Time Complexity:
https://www.bigocheatsheet.com/
Useful book: Introduction to the design and analysis of algorithms by Anany Levitin
The first two links demonstrate the inner implementation for the Set class and the List, basically both of them are implemented using the Array data structure type.
The third link demonstrate each data structure complexity for different operations.
If you wish to measure the complexity for the two different code (Set, List) we could
Use Time complexity for algorithm analysis by looking at the most operation compensating for the most of time the algorithm takes to solve a problem.
2.Set up a sum expressing the number of times the algorithm’s basic operation
is executed
Using standard formulas and rules of sum manipulation, either find a closed-
form formula for the count or, at the very least, establish its order of growth.
I have simple collection of string objects might be around 10 elements ,
but i use this collection in production environment such that the we search for a given string in that collection millions of tiimes ,
what is the best collection or data structure we can use to get the best results so that seach operation can be performed in 0(1) time
we can use HashMap here but the order of search there is in constant time not 0(1) i want to make sure that search is 0(1).
Our data structure must return true if present , else false if not present
Use a HashSet<String> structure. The contains() operation has a complexity of O(1).
Constant time is O(1). HashMap is fine. (Or HashSet, depending on whether you need a Set or a Map.)
If your set is immutable, Guava's ImmutableSet will reduce memory footprint by a factor of ~3 (and probably give you a small constant factor of improved speed).
If you can't use HashSet/HashMap as previously suggested, you could write a Radix Tree implementation.
I have a class along the lines of:
public class Observation {
private String time;
private double x;
private double y;
//Constructors + Setters + Getters
}
I can choose to store these objects in any type of collection (Standard class or 3rd party like Guava). I have stored some example data in an ArrayList below, but like I said I am open to any other type of collection that will do the trick. So, some example data:
ArrayList<Observation> ol = new ArrayList<Observation>();
ol.add(new Observation("08:01:23",2.87,3.23));
ol.add(new Observation("08:01:27",2.96,3.17));
ol.add(new Observation("08:01:27",2.93,3.20));
ol.add(new Observation("08:01:28",2.93,3.21));
ol.add(new Observation("08:01:30",2.91,3.23));
The example assumes a matching constructor in Observation. The timestamps are stored as String objects as I receive them as such from an external source but I am happy to convert them into something else. I receive the observations in chronological order so I can create and rely on a sorted collection of observations. The timestamps are NOT unique (as can be seen in the example data) so I cannot create a unique key based on time.
Now to the problem. I frequently need to find one (1) observation with a time equal or nearest to a certain time, e.g if my time was 08:01:29 I would like to fetch the 4th observation in the example data and if the time is 08:01:27 I want the 3rd observation.
I can obviously iterate through the collection until I find the time that I am looking for, but I need to do this frequently and at the end of the day I may have millions of observations so I need to find a solution where I can locate the relevant observations in an efficient manner.
I have looked at various collection-types including ones where I can filter the collections with Predicates but I have failed to find a solution that would return one value, as opposed to a subset of the collection that fulfills the "<="-condition. I am essentially looking for the SQL equivalent of SELECT * FROM ol WHERE time <= t LIMIT 1.
I am sure there is a smart and easy way to solve my problem so I am hoping to be enlightened. Thank you in advance.
Try TreeSet providing a comparator that compares the time. It mantains an ordered set and you can ask for TreeSet.floor(E) to find the greatest min (you should provide a dummy Observation with the time you are looking for). You also have headSet and tailSet for ordered subsets.
It has O(log n) time for adding and retrieving. I think is very suitable for your needs.
If you prefer a Map you can use a TreeMap with similar methods.
Sort your collection (ArrayList will probably work best here) and use BinarySearch which returns an integer index of either a match of the "closest" possible match, ie it returns an...
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(),
Have the Observation class implement Comparable and use a TreeSet to store the objects, which will keep the elements sorted. TreeSet implements SortedSet, so you can use headSet or tailSet to get a view of the set before or after the element you're searching for. Use the first or last method on the returned set to get the element you're seeking.
If you are stuck with ArrayList, but can keep the elements sorted yourself, use Collections.binarySearch to search for the element. It returns a positive number if the exact element is found, or a negative number that can be used to determine the closest element. http://download.oracle.com/javase/1.4.2/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20java.lang.Object)
If you are lucky enough to be using Java 6, and the performance overhead of keeping a SortedSet is not a big deal for you. Take a look at TreeSet ceiling, floor, higher and lower methods.