I have the following homework question:
Suppose you are given two sequences S1 and S2 of n elements, possibly containing duplicates, on which a total order relation is defined. Describe an efficient algorithm for determining if S1 and S2 contain the same set of elements. Analyze the running time of this method
To solve this question I have compared elemements of the two arrays using retainAll and a HashSet.
Set1.retainAll(new HashSet<Integer>(Set2));
This would solve the problem in constant time.
Do I need to sort the two arrays before the retainAll step to increase efficiency?
I suspect from the code you've posted that you are missing the point of the assignment. The idea is not to use a Java library to check if two collections are equal (for that you could use collection1.equals(collections2). Rather the point is to come up with an algorithm for comparing the collections. The Java API does not specify an algorithm: it's hidden away in the implementation.
Without providing an answer, let me give you an example of an algorithm that would work, but is not necessarily efficient:
for each element in coll1
if element not in coll2
return false
remove element from coll2
return coll2 is empty
The problem specifies that the sequences are ordered (i.e. total order relation is defined) which means you can do much better than the algorithm above.
In general if you are asked to demonstrate an algorithm it's best to stick with native data types and arrays - otherwise the implementation of a library class can significantly impact efficiency and hide the data you want to collect on the algorithm itself.
Related
Performance wise, is it better to use ArrayLists to store a list of values, or a String (using concat/+)? Intuitively, I think Strings would perform better since it'd likely use less overhead than ArrayLists but I haven't been able to find anything online.
Also, the entries wouldn't be anything too large (~10).
ArrayList operations
You can get a value from an ArrayList in O(1) and add a value in O(1).
Furthermore, ArrayList has already build-in operations that help you to retrieve and add elements.
String operations
Concatenation: With a concat and slice operations, it will be worst. A String is roughly speaking arrays of characters. For example, "Hello" + "Stack" can be represented as array ['H', 'e', 'l', 'l', 'o'] and array ['S', 't', 'a', 'c', 'k'].
Now, if you want to concat these two String, you will have to combined all elements of both arrays. It will give you an array of length 10. Therefore, the concatenation - or creating your new char array - is operation in O(n + m).
Worst, if you are concatening n String, you will have a complexity of O(n^2).
Splitting: the complexity of splitting a String is usually O(N) or more. It depends on the regex you will give for the split-operation.
Operations with String are often not that readable and can be stricky to debug.
Long story short
An ArrayList is usually better than operation with String. But all depend on your use case.
Just use ArrayList, it stores the references to your object values, and the reference is not big at all, and that's the point to use references. I keep wondering why will you want to store the value inside a String... that's just odd. ArrayList to store values and get them are faster enough, and String implementation, inside uses arrays also... so... Use ArrayList.
Java performance doesn't come out of "clever" Java source code.
It comes out of the JIT doing a good job at runtime.
You know what the JIT needs to be able to do a good job?
code that looks like code everybody else is writing (it is optimised to produce optimal results for the sort of code everybody else writes)
many thousands of method invocations.
Meaning: the JIT decides whether it makes sense to re-compile code. When the JIT decides to do so, and only then, you want to make sure it can do that.
"Custom" clever java source code ideas, such as what you are proposing here might achieve the opposite.
Thus: don't invent a clever strategy to mangle values into a String. Write simple, human understandable code that uses a List. Because Lists are the concept that Java offers here.
The only exception would be: if you experience a real performance bottleneck, and you did good profiling, and then you figure: the list isn't good enough, then you would start making experiments using other approaches. But I guess that didn't happen yet. You assume you have a performance problem, and you assume that you should fix it that way. Simply wrong, a waste of time and energy.
ArrayList is better choice because String is underneath an array of chars. So every concatenation is just copying the whole old string to a new place, with added new value - time O(n) for each operation.
When you use ArrayList, it has an initial capacity - until it is filled - every add operation is in time O(1). adding new String into ArrayList is only slower when array is full, then it needs to be copied into new place with more capacity. But only references needs to be moved, not the whole Strings - which is way faster than moving whole Strings.
You can even make ArrayList performance better by setting initial capacity when you know how many element do you have:
List<String> list = new ArrayList<String>(elementsCount);
TreeMap has O(log n) performance (best case), however, since I need the following operations efficiently:
get the highest element
get XY highest elements
insert
Other possibility would be to make a PriorityQueue with the following:
use "index" element as order for PriorityQueue
equals implementation to check only "index" element equality
but this would be a hack since "equals" method would be error prone (if used outside of PriorityQueue).
Any better structure for this?
More details below which you might skip since the first answer provided good answer for this specifics, however, I'm keeping it active for the theoretical discussion.
NOTE: I could use non standard data structures, in this project I'm already using UnrolledLinkedList since it most likely would be the most efficient structure for another use.
THIS IS USE CASE (in case you are interesting): I'm constructing AI for a computer game where
OffensiveNessHistory myOffensiveNess = battle.pl[orderNumber].calculateOffensivenessHistory();
With possible implementations:
public class OffensiveNessHistory {
PriorityQueue<OffensiveNessHistoryEntry> offensivenessEntries = new PriorityQueue<OffensiveNessHistoryEntry>();
..
or
public class OffensiveNessHistory {
TreeMap<Integer, OffensiveNessHistoryEntry> offensivenessEntries = new TreeMap();
..
I want to check first player offensiveness and defensiveness history to calculate the predict if I should play the most offensive or the most defensive move.
First, you should think about the size of the structure (optimizing for just a few entries might not be worth it) and the frequency of the operations.
If reads are more frequent than writes (which I assume is the case), I'd use a structure that optimizes for reads on the cost of inserts, e.g. a sorted ArrayList where you insert at a position found using a binary search. This would be O(log n) for the search + the cost of moving other entries to the right but would mean good cache coherence and O(1) lookups.
A standard PriorityQueue internally also uses an array, but would require you to use an iterator to get element n (e.g. if you'd at point need the median or the lowest entry).
There might be strutures that optimize write even more while keeping O(1) reads but unless those writes are very frequent you might not even notice any performance gains.
Finally and foremost, you should try not to optimize on guesses but profile first. There might be other parts of your code that might eat up performance and which might render optimization of the datastructures rather useless.
I am writing a program which will add a growing number or unique strings to a data structure. Once this is done, I later need to constantly check for existence of the string in it.
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found (or reach the end and return false).
However, with a HashMap I know that in constant time I can simply use the key as a String and return any non-null object, making this operation faster. However, I am not keen on filling a HashMap where the value is completely arbitrary. Is there a readily available data structure that uses hash functions, but doesn't require a value to be placed?
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found
Correct, checking a list for an item is linear in the number of entries of the list.
However, I am not keen on filling a HashMap where the value is completely arbitrary
You don't have to: Java provides a HashSet<T> class, which is very much like a HashMap without the value part.
You can put all your strings there, and then check for presence or absence of other strings in constant time;
Set<String> knownStrings = new HashSet<String>();
... // Fill the set with strings
if (knownString.contains(myString)) {
...
}
It depends on many factors, including the number of strings you have to feed into that data structure (do you know the number by advance, or have a basic idea?), and what you expect the hit/miss ratio to be.
A very efficient data structure to use is a trie or a radix tree; they are basically made for that. For an explanation of how they work, see the wikipedia entry (a followup to the radix tree definition is in this page). There are Java implementations (one of them is here; however I have a fixed set of strings to inject, which is why I use a builder).
If your number of strings is really huge and you don't expect a minimal miss ratio then you might also consider using a bloom filter; the problem however is that it is probabilistic; but you can get very quick answers to "not there". Here also, there are implementations in Java (Guava has an implementation for instance).
Otherwise, well, a HashSet...
A HashSet is probably the right answer, but if you choose (for simplicity, eg) to search a list it's probably more efficient to concatenate your words into a String with separators:
String wordList = "$word1$word2$word3$word4$...";
Then create a search argument with your word between the separators:
String searchArg = "$" + searchWord + "$";
Then search with, say, contains:
bool wordFound = wordList.contains(searchArg);
You can maybe make this a tiny bit more efficient by using StringBuilder to build the searchArg.
As others mentioned HashSet is the way to go. But if the size is going to be large and you are fine with false positives (checking if the username exists) you can use BloomFilters (probabilistic data structure) as well.
Say I have an arrayList of strings like [a, b, c, d, ....]. Can anybody help me with a sample code that how can I come out with a result that contains all possible power subsets form this list which are including a particular string from that list(except the single and empty subset)?
For example: if I like to get all the power subsets including a from the example list then the output will be:
[a,b], [a,c], [a,d], [a,b,c], [a,b,d], [a,c,d] without the empty and single subset([a])
Similarly if I want for b then the output will be:
[b,a], [b,c], [b,d], [b,a,c], [b,a,d], [b,c,d] without the empty and single subset([b])
As all of the items in the example list are string then their might be a memory problem when the subsets will be too rich. Because I need to keep this subsets in memory for a single string at a time. So I also need help about what would be the optimized solution for this scenario?
I need the help in Java. As I am not that much good at Java please pardon me if I made any mistake!
Thanks!
If your initial arraylist of strings has 30 or fewer items, you can use the set method powerSet (http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Sets.html#powerSet%28java.util.Set%29 - Thanks, Jochen). The documentation claims the memory usage is only O(n) for the set-of-sets returned by that method. You can then iterate over that, with an if condition to only consider sets which contain "A" and are of size 2 or greater.
I recommend you try the above or a similar simple solution first and see if you run into memory problems.
If you do run into memory issues, you can try to optimize by minimizing the number of copies of the strings you hold in memory. For example, you can use lists of bytes, shorts, or ints (depending on how long your arraylist is) where each is an index into your arraylist of strings.
The ultimate way to reduce memory usage, however, would be to only hold one subset in memory at a time (if possible). I.e. generate (A, B), process it, discard it, then generate (A, C), process it, discard it, etc.
I have a requirement to present highly structured information picked from a highly un-structured web service. In order to display the info correctly, I have to do a lot of String matches and duplicate removals to ensure I'm picking the right combination of elements.
One of my challenges involves determining if a String is in an Array of Strings.
My dream is to do "searchString.isIn(stringArray);" but I realize the String class doesn't provide for that.
Is there a more efficient way of doing this beyond this stub?:
private boolean isIn(String searchString, String[] searchArray)
{
for(String singleString : searchArray)
{
if (singleString.equals(searchString)
return true;
}
return false;
}
Thanks!
You may want to look into HashMap or HashSet, both of which give constant time retrieval, and it's as easy as going:
hashSet.contains(searchString)
Additionally, HashSet (and HashMap for its keys) prevents duplicate elements.
If you need to keep them in order of insertion, you can look into their Linked counterparts, and if you need to keep them sorted, TreeSet and TreeMap can help (note, however, that the TreeSet and TreeMap do not provide constant time retrieval).
Everybody else seems to be viewing this question in a broader scope (which is certainly valid). I am only answering this bit:
One of my challenges involves
determining if a String is in an Array
of Strings.
That's simple:
return Arrays.asList(arr).contains(str)
Reference:
Arrays.asList(array)
If you are doing this a lot, you can initially sort the array and do a binary search for your strings.
As mentioned a HashMap or HashSet can provide reasonable performance above what you've mentioned. It depends greatly on how well distributed your hash algorithm is and how many buckets are in the Map.
You could also keep a sorted list and perform a binary search on that list which could perform slightly better, though you pay the cost of sorting. If it's a one time sort, then that's not a big deal. If the list is constantly changing, you may pay a larger cost.
Lastly, you could consider a Trie structure. I think this would be the fastest way to search, but that's a gut reaction. I don't have the numbers to support that.
As explained before you can use a Set (see http://download.oracle.com/javase/1.5.0/docs/api/java/util/Set.html and specially the boolean contains(Object o) method) for that purpose. Here is a quick 'n dirty example that demonstrates this:
String[] a = {"a", "2"};
Set<String> hashSet = new HashSet<String>();
Collections.addAll(hashSet, a);
System.out.println(hashSet.contains("a")); // Returns true
System.out.println(hashSet.contains("2")); // Returns true
System.out.println(hashSet.contains("e")); // Returns false
Hope this helps ;)
As Zach has pointed out , you can use hashset to prevent duplicate, and use contains method to search for a string , which returns true when a match is found.You also need to override equals in ur class.
public boolean equals(Object other) {
return other != null && other instanceof L && this.l == ((L)other).l;
If the search space (your collection of strings) is limited than I agree with the answers already posted. If, however, you have a large collection of strings and need to perform a sufficient number of searches on it (to outweigh the setup overhead), you might also consider encoding the search strings in a trie data structure. Again this would only be advantageous if there are enough strings and you search enough times to justify the setup overhead.