Searching a HashSet for any element in a string array

Searching a HashSet for any element in a string array - java

I have a HashSet of strings and an array of strings. I want to find out if any of the elements in the array exists in the HashSet. I have the following code that work, but I feel that it could be done faster.
public static boolean check(HashSet<String> group, String elements[]){
for(int i = 0; i < elements.length; i++){
if(group.contains(elements[i]))
return true;
}
return false;
}
Thanks.

It's O(n) in this case (array is used), it cannot be faster.
If you just want to make the code cleaner:
return !Collections.disjoint(group, Arrays.asList(elements));

That seems somewhat reasonable. HashSet has an O(1) (usually) contains() since it simply has to hash the string you give it to find an index, and there is either something there or there isn't.
If you need to check each element in your array, there simply isn't any faster way to do it (sequentially, of course).

... but I feel that it could be done faster.
I don't think there is a faster way. Your code is O(N) on average, where N is the number of strings in the array. I don't think that you can improve on that.

As others have said, the slowest part of the algorithm is iterating over every element of the array. The only way you could make it faster would be if you knew some information about the contents of the array beforehand which allowed you to skip over certain elements, like if the array was sorted and had duplicates in known positions or something. If the input is essentially random, then there's not a lot you can do.

If you know that the set is a sorted set, and that the array is sorted, you can get the interval set from the start to the end to possibly do better than O(|array| * access-time(set)), and which especially allows for some better than O(|array|) negative results, but if you're hashing you can't.

Related

Best way to remove one arraylist elements from another arraylist

What is the best performance method in Java (7,8) to eliminate integer elements of one Arraylist from another. All the elements are unique in the first and second lists.
At the moment I know the API method removeall and use it this way:
tempList.removeAll(tempList2);
The problem appears when I operate with arraylists have more than 10000 elements. For example when I remove 65000 elements, the delay appears to be about 2 seconds. But I need to opperate with even more large lists with more than 1000000 elements.
What is the strategy for this issue?
Maybe something with new Stream API should solve it?

tl;dr:
Keep it simple. Use
list.removeAll(new HashSet<T>(listOfElementsToRemove));
instead.
As Eran already mentioned in his answer: The low performance stems from the fact that the pseudocode of a generic removeAll implementation is
public boolean removeAll(Collection<?> c) {
for (each element e of this) {
if (c.contains(e)) {
this.remove(e);
}
}
}
So the contains call that is done on the list of elements to remove will cause the O(n*k) performance (where n is the number of elements to remove, and k is the number of elements in the list that the method is called on).
Naively, one could imagine that the this.remove(e) call on a List might also have O(k), and this implementation would also have quadratic complexity. But this is not the case: You mentioned that the lists are specifically ArrayList instances. And the ArrayList#removeAll method is implemented to delegate to a method called batchRemove that directly operates on the underlying array, and does not remove the elements individually.
So all you have to do is to make sure that the lookup in the collection that contains the elements to remove is fast - preferably O(1). This can be achieved by putting these elements into a Set. In the end, it can just be written as
list.removeAll(new HashSet<T>(listOfElementsToRemove));
Side notes:
The answer by Eran has IMHO two major drawbacks: First of all, it requires sorting the lists, which is O(n*logn) - and it's simply not necessary. But more importantly (and obviously) : Sorting will likely change the order of the elements! What if this is simply not desired?
Remotely related: There are some other subtleties involved in the removeAll implementations. For example, HashSet removeAll method is surprisingly slow in some cases. Although this also boils down to the O(n*n) when the elements to be removed are stored in a list, the exact behavior may indeed be surprising in this particular case.

Well, since removeAll checks for each element of tempList whether it appears in tempList2, the running time is proportional to the size of the first list multiplied by the size of the second list, which means O(N^2) unless one of the two lists is very small and can be considered as "constant size".
If, on the other hand, you pre-sort the lists, and then iterate over both lists with a single iteration (similar to the merge step in merge sort), the sorting will take O(NlogN) and the iteration O(N), giving you a total running time of O(NlogN). Here N is the size of the larger of the two lists.
If you can replace the lists by a sorted structure (perhaps a TreeSet, since you said the elements are unique), you can implement removeAll in linear time, since you won't have to do any sorting.
I haven't tested it, but something like this can work (assuming both tempList and tempList2 are sorted) :
Iterator<Integer> iter1 = tempList.iterator();
Iterator<Integer> iter2 = tempList2.iterator();
Integer current = null;
Integer current2 = null;
boolean advance = true;
while (iter1.hasNext() && iter2.hasNext()) {
if (advance) {
current = iter1.next();
advance = false;
}
if (current2 == null || current > current2) {
current2 = iter2.next();
}
if (current <= current2) {
advance = true;
if (current == current2)
iter1.remove();
}
}

I suspect removing from an ArrayList, is a perfromance hit since the list may either be divided when an element in the middle is removed, or if the list must be compacted after an element is removed. It may be faster to do this:
Create 'Set' of the elements to be removed
Create a new result ArrayList that you need, call it R. You can give it enough size at construction.
Iterate thru the original list you need elements from it removed, if the element is found in the Set, don't add it to R, otherwise add it.
This should have O(N); if creating the Set and a lookup in it is assumed constant.

is there any faster way to generate List of N integer

well I know it is very novice question, but nothing is getting into my mind. Currently I am trying this, but it is the least efficient way for such a big number. Help me anyone.
int count = 66000000;
LinkedList<Integer> list = new LinkedList<Integer>();
for (int i=1;i<=count;i++){
list.add(i);
//System.out.println(i);
}
EDIT:
Actually I have o perform operation on whole list(queue) repeatedly (say on a condition remove some elements and add again), so having to iterate whole list became so slow what with such number it took more than 10min.

the size of your output is O(n) therefore it's literally impossible to have an algorithm that populates your list any more efficient than O(n) time complexity.
You're spending a whole lot more time just printing your numbers to the screen than you actually are spending generating the list. If you really want to speed this code up, remove the
System.out.println(i);
On a separate note, I've noticed that you're using a LinkedList, If you used an array(or array-based list) it should be faster.

You could implement a List where the get(int index) method simply returns the index (or some value based on the index). The creation of the list would then be constant time (O(1)). The list would have to be immutable.

Your question isn't just about building the list, it includes deletion and re-insertion. I suspect you should be using a HashSet, maybe even a BitSet instead of a List of any kind.

How can I Subtract these lists faster?

I want to subtract two ArrayLists so I can have the child that are not in the other list.
I do it this way:
removeIDs=(ArrayList<Integer>) storedIDs.clone();
removeIDs.removeAll(downloadedIDs);
downloadIDs=(ArrayList<Integer>) downloadedIDs.clone();
downloadIDs.removeAll(storedIDs);
The Problem is that both lists contain 5000childs and it takes almost 4 seconds on my androidphone.
Is there a fast way to do this?
Is using sets faster?(i dont have duplicate values in the lists)
I develop an android app

Use HashSet instead of ArrayList unless you need to keep the order.
Removing an element requires a scan of the full List for list implementations, a HashSet by comparison is just the calculation of a hash code and then identification of a target bucket.

Sets should be must faster. Right now, it's basically doing an n^2 loop. It loops over every element in removeIDs and checks to see if that id is in downloadedIDs, which requires searching the whole list. If downloadedIDs were stored in something faster for searching, like a HashSet, this would be much faster and become an O(n) instead of O(n^2). There might also be something faster in the Collections API, but I don't know it.
If you need to preserver ordering, you can use a LinkedHashSet instead of a normal HashSet but it will add some memory overheard and a bit of a performance hit for inserting/removing elements.

I agree with the HashSet recommendation unless the Integer IDs fit in a relatively small range. In that case, I would benchmark using each of HashSet and BitSet, and actually use whichever is faster for your data in your environment.

First of all I am giving my apology for the long answer. If I am wrong at any point you are always welcome to correct me. Here I am comparing some options of solving the solution
OPTION 1 < ArrayList >:
In your code you used the ArrayList.removeAll method lets look in to the code of removeAll
the source code of removeAll
public boolean removeAll(Collection<?> c) {
return batchRemove(c, false);
}
so need to know what is in batchRemove method. Here it is link. The key part here if you can see
for (; r < size; r++)
if (c.contains(elementData[r]) == complement)
elementData[w++] = elementData[r];
now lets look into the contains method which is just a wrapper of indexOf method. link. In the indexOf method there is a O(n) operation. (noting just a part here)
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
So over all it is a
O(n^2)
operations in the removeAll
OPTION 2 < HashSet >:
previously I wrote something in here but it seems I was wrong at some point so removing this. Better take suggestion from expert about Hashset. I am not sure in your case whether hashmap will be a better solution. So I am proposing another solution
OPTION 3 < My Suggestion You can try>:
step 1: if your data is sorted then no need of this step else sort the list which you will subtract(second list)
step 2: for every element of unsorted list run a binary search in the second list
step 3: if no match found then store in another result list but if match found then dont add
step 4: result list is your final answer
Cost of option 3:
step 1: if not sorted O(nlogn) time
step 2: O(nlogn) time
step 3: O(n) space
**
so overall O(nlogn) time and O(n) space
**

If a list is required, you can choose a LinkedList. In your case, as #Chris said, the ArrayList implementation will move all the elements in each removal.
With the LinkedList you would get a much better performance for random adding/removing. See this post.

Efficient way to implement a String Array "is in" method using Java

I have a requirement to present highly structured information picked from a highly un-structured web service. In order to display the info correctly, I have to do a lot of String matches and duplicate removals to ensure I'm picking the right combination of elements.
One of my challenges involves determining if a String is in an Array of Strings.
My dream is to do "searchString.isIn(stringArray);" but I realize the String class doesn't provide for that.
Is there a more efficient way of doing this beyond this stub?:
private boolean isIn(String searchString, String[] searchArray)
{
for(String singleString : searchArray)
{
if (singleString.equals(searchString)
return true;
}
return false;
}
Thanks!

You may want to look into HashMap or HashSet, both of which give constant time retrieval, and it's as easy as going:
hashSet.contains(searchString)
Additionally, HashSet (and HashMap for its keys) prevents duplicate elements.
If you need to keep them in order of insertion, you can look into their Linked counterparts, and if you need to keep them sorted, TreeSet and TreeMap can help (note, however, that the TreeSet and TreeMap do not provide constant time retrieval).

Everybody else seems to be viewing this question in a broader scope (which is certainly valid). I am only answering this bit:
One of my challenges involves
determining if a String is in an Array
of Strings.
That's simple:
return Arrays.asList(arr).contains(str)
Reference:
Arrays.asList(array)

If you are doing this a lot, you can initially sort the array and do a binary search for your strings.

As mentioned a HashMap or HashSet can provide reasonable performance above what you've mentioned. It depends greatly on how well distributed your hash algorithm is and how many buckets are in the Map.
You could also keep a sorted list and perform a binary search on that list which could perform slightly better, though you pay the cost of sorting. If it's a one time sort, then that's not a big deal. If the list is constantly changing, you may pay a larger cost.
Lastly, you could consider a Trie structure. I think this would be the fastest way to search, but that's a gut reaction. I don't have the numbers to support that.

As explained before you can use a Set (see http://download.oracle.com/javase/1.5.0/docs/api/java/util/Set.html and specially the boolean contains(Object o) method) for that purpose. Here is a quick 'n dirty example that demonstrates this:
String[] a = {"a", "2"};
Set<String> hashSet = new HashSet<String>();
Collections.addAll(hashSet, a);
System.out.println(hashSet.contains("a")); // Returns true
System.out.println(hashSet.contains("2")); // Returns true
System.out.println(hashSet.contains("e")); // Returns false
Hope this helps ;)

As Zach has pointed out , you can use hashset to prevent duplicate, and use contains method to search for a string , which returns true when a match is found.You also need to override equals in ur class.
public boolean equals(Object other) {
return other != null && other instanceof L && this.l == ((L)other).l;

If the search space (your collection of strings) is limited than I agree with the answers already posted. If, however, you have a large collection of strings and need to perform a sufficient number of searches on it (to outweigh the setup overhead), you might also consider encoding the search strings in a trie data structure. Again this would only be advantageous if there are enough strings and you search enough times to justify the setup overhead.

What is the fastest way to find an array within another array in Java?

Is there any equivalent of String.indexOf() for arrays? If not, is there any faster way to find an array within another other than a linear search?

Regardless of the elements of your arrays, I believe this is not much different than the string search problem.
This article provides a general intro to the various known algorithms.
Rabin-Karp and KMP might be your best options.
You should be able to find Java implementations of these algorithms and adapt them to your problem.

List<Object> list = Arrays.asList(myArray);
Collections.sort(list);
int index = Collections.binarySearch(list, find);
OR
public static int indexOf(Object[][] array, Object[] find){
for (int i = 0; i < array.length(); i ++){
if (Arrays.equals(array[i], find)){
return i;
}
}
return -1;
}
OR
public static int indexOf(Object[] array, Object find){
for (int i = 0; i < array.length(); i ++){
if (array[i].equals(find)){
return i;
}
}
return -1;
}
OR
Object[] array = ...
int index = Arrays.asList(array).indexOf(find);

As far as I know, there is NO way to find an array within another without a linear search. String.indexOf uses a linear search, just inside a library.
You should write a little library called indexOf that takes two arrays, then you will have code that looks just like indexOf.
But no matter how you do it, it's a linear search under the covers.
edit:
After looking at #ahmadabolkader's answer I kind of take this back. Although it's still a linear search, it's not as simple as just "implement it" unless you are restricted to fairly small test sets/results.
The problem comes when you want to see if ...aaaaaaaaaaaaaaaaaab fits into a string of (x1000000)...aaaaaaaaab (in other words, strings that tend to match most places in the search string).
My thought was that as soon as you found a first character match you'd just check all subsequent characters one-on-one, but that performance would degrade terrifyingly when most of the characters matched most of the time. There was a rolling hash method in #a12r's answer that sounded much better if this is a real-world problem and not just an assignment.
I'm just going to vote for #a12r's answer because of those awesome Wikipedia references.

The short answer is no - there is no faster way to find an array within an array by using some existing construct in Java. Based on what you described, consider creating a HashSet of arrays instead of an array of arrays.

Normally the way you find things in collections in java is
put them in a hashmap (dictionary) and look them up by their hash.
loop through each object and test its equality
(1) won't work for you because an array object's hash won't tell you that the contents are the same. You could write some sort of wrapper that would create a hashcode based on the contents (you'd also have to make sure equals returned values consistent with that).
(2) also will require a bit of work because object equality for arrays will only test that the objects are the same. You'd need to wrap the arrays with a test of the contents.
So basically, not unless you write it yourself.

You mean you have an array which elements also are array elements? If that is the case and the elements are sorted you might be able to use binarysearch from java.util.Arrays

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Searching a HashSet for any element in a string array - java

It's O(n) in this case (array is used), it cannot be faster. If you just want to make the code cleaner: return !Collections.disjoint(group, Arrays.asList(elements));

... but I feel that it could be done faster. I don't think there is a faster way. Your code is O(N) on average, where N is the number of strings in the array. I don't think that you can improve on that.

If you know that the set is a sorted set, and that the array is sorted, you can get the interval set from the start to the end to possibly do better than O(|array| * access-time(set)), and which especially allows for some better than O(|array|) negative results, but if you're hashing you can't.

Related

Best way to remove one arraylist elements from another arraylist

is there any faster way to generate List of N integer

How can I Subtract these lists faster?

Efficient way to implement a String Array "is in" method using Java

What is the fastest way to find an array within another array in Java?

Categories

Resources