Related
The Question:
Write a function, that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [−1,000,000..1,000,000].
What I'm not understanding is how I'm failing these two cases, while also having 100% correctness
Also I'm not sure what cases I'm not covering, I managed for the case of 1 not existing in the first if statement, if there is a missing element in my for each loop, and finally, in the case of no missing element, to return the largest int + 1
Any help in clearing the confusion (and improvements for my time complexity) would be greatly appreciated
import java.lang.*;
import java.util.*;
class Solution {
public int solution(int[] A) {
// write your code in Java SE 8
//int counter = 0;
Set<Integer> set = new HashSet<Integer>();
Arrays.sort(A);
for(int n : A){
if(n > 0){
// counter++;
set.add(n);
}
}
//returns if set does not contain 1
if(!set.contains(1)){
return 1;
}
//returns missing int if it does not exist in set
for(int n: set){
if(!set.contains(n+1)){
return n+1;
}
}
//if no missing ints, returns end of array + 1
return A[A.length-1];
}
}
Answer Results:
I think you over-killed. first, using a hashset can lead to O(N) algorithm, which you have to avoid sort (O(NlogN)). But due to large memory footprint, I'd rather go with a sort and it is practically faster. Second, if you choose a sort of A, then a simple for-loop with detect any gap in the positive part of the array, which is now your result. You should avoid Hashset and the many loops in your code
I would use a BitSet.
Advantage: Fast scan for missing value. Performance: O(n)
public static int solution(int... values) {
BitSet bitSet = new BitSet();
for (int v : values)
if (v > 0)
bitSet.set(v);
return bitSet.nextClearBit(1);
}
Test
System.out.println(solution(1, 3, 6, 4, 1, 2));
System.out.println(solution(1, 2, 3));
System.out.println(solution(-1, -3));
Output
5
4
1
I'm trying to take a String, i.e. $!#, convert each individual symbol to its corresponding ASCII value, it here being 36, 33, 35 and then ending up with an array of integers with each single number stored in a separate index, meaning 3, 6, 3, 3, 3, 5.
In short: From $!# to $,!,# to 36, 33, 35 to 3, 6, 3, 3, 3, 5
Since I am using Processing (wrapper for Java), my SO-research got me so far:
String str = "$!#";
byte[] b = str.getBytes();
for (int i=0; i<b.length; i++){
println(b[i]);
}
I'm ending up with the ASCII values. The byte array b contains [36, 33, 35].
Now if those were to be Strings instead of bytes, I would use String.valueOf(b) to get 363335 and then split the whole thing again into a single digit integer array.
I wonder if this approach is unnecessarily complicated and can be done with less conversion steps. Any suggestions?
Honestly, if I were you, I wouldn't worry too much about "efficiency" until you have an actual problem. Write code that you understand. If you have a problem with efficiency, you'll have to define exactly what you mean: is it taking too many steps? Is it taking too long? What is the size of your input?
Note that the approach you outlined and the approach below are both O(N), which is probably the best you're going to get.
If you have a specific bottleneck inside that O(N) then you should do some profiling to figure out where that is. But you shouldn't bother with micro-optimizations (or shudder premature optimizations) just because you "feel" like something is "inefficient".
To quote from this answer:
...in the absence of measured performance issues you shouldn't optimize because you think you will get a performance gain.
If I were you, I would just use the charAt() function to get each char in the String, and then use the int() function to convert that char into an int. Then you could add those int values to a String (or better yet, a StringBuilder):
String str = "$!#";
StringBuilder digits = new StringBuilder();
for (int i=0; i<str.length(); i++) {
int c = int(str.charAt(i));
digits.append(c);
}
Then you could use the split() function to split them into individual digits:
String[] digitArray = digits.toString().split("");
for (String d : digitArray) {
println(d);
}
Prints:
3
6
3
3
3
5
There are a ton of different ways to do this: you could use the toCharArray() function instead of charAt(), for example. But none of this is going to be any more or less "efficient" until you define exactly what you mean by efficiency.
Java 8 stream solution:
int[] digits = "$!#".chars()
.mapToObj(Integer::toString)
.flatMapToInt(CharSequence::chars)
.map(c -> c - '0')
.toArray();
System.out.println(Arrays.toString(digits)); // prints [3, 6, 3, 3, 3, 5]
One solution could be doing something like that
// Create an array of int of a size 3 * str.length()
for(int i = 0; i < str.length(); i++)
{
int n = (int) str.charAt(i);
int a = n / 100;
int b = (n - (a * 100)) / 10;
int c = n - (a * 100) - (b * 10);
// push a, b and c in your array (chose the order and if you want
// to add the potential 0's or not)
}
But as others said before, I'm not sure if it worth playing like that. It depends on your application I guess.
Is there a way of using OR in java without repeating the variable in the condition:
For example
if(x != 5 || x != 10 || x != 15)
instead use something like
if x not in [5,10,15]
You can take advantage of short-circuit terminal operations in Java8 streams (See the Stream Operations section in the Stream Package Description for details.) For example:
int x = 2;
// OR operator
boolean bool1 = Stream.of(5, 10, 15).anyMatch(i -> i == x);
System.out.println("Bool1: " +bool1);
// AND operator
boolean bool2 = Stream.of(5, 10, 15).allMatch(i -> i != x);
System.out.println("Bool2: " +bool2);
It produces the following output:
Bool1: false
Bool2: true
Your second example is just storing elements in an array and checking for existence, no need for OR in that case.
If you want to go that route, store your elements in a List<Integer> and use contains(Object o)
if(!myList.contains(x))
if you want to exclude all multiply of 5, you can try this
if(x % 5 != 0)
Here a solution that comes with much less performance overhead than the ones using Streams or a HashMap:
Store the values you want to check against in an array and write a small function to check whether x is among these values
private static boolean isNotIn(int x, int[] values) {
for (int i : values) {
if (i == x) {
return false;
}
}
return true;
}
if (isNotIn(x, new int[] {5, 10, 15})) {
...
}
The overhead is minimal compared to the original code and becomes negligible if you can pull out the array as a constant.
I also find that it is nicer to read than the other solutions but that is a very subjective opinion ;-)
I am trying to make my hashCode()-implementation return the same hashCode for all permutations of an int-array.
Requirements:
If array A is a permutation of B then A.equals(B) must yield true
Two arrays who are a permutation of one another must return the same hashCode.
I have already written the functions to provide me with all the permutations (rotations and reflections) of the arrays, but I don't really understand how I can make them all return the same code, since there are no object properties to base the code upon.
What I've tried so far is gathering the Arrays.hashCode() of all the permutations, summing them up into a long, dividing them by the number of permutations and returning the result as an int.
While this doesn't feel like a good solution (using the median is quite vague, thus might result in collisions), it's not working anyway. I've found objects that are not among the valid permutations to return the same hashCode.
Example 1: Reflection
These two are equal because arr2 is a reflection of arr1.
int[] arr1 = {0,2,4,1,3} int[] arr2 = {4,2,0,3,1}
[X 0 0 0 0] [0 0 0 0 X]
[0 0 X 0 0] [0 0 X 0 0]
[0 0 0 0 X] [X 0 0 0 0]
[0 X 0 0 0] [0 0 0 X 0]
[0 0 0 X 0] [0 X 0 0 0]
Example 2: Rotation
These two are permutations of each other because arr2 is a rotated arr1.
int[] arr1 = {0,2,4,1,3} int[] arr2 = {4,1,3,0,2}
[X 0 0 0 0] [0 0 0 0 X]
[0 0 X 0 0] [0 X 0 0 0]
[0 0 0 0 X] [0 0 0 X 0]
[0 X 0 0 0] [X 0 0 0 0]
[0 0 0 X 0] [0 0 X 0 0]
Q: How can I implement a hashCode()-function to return the same hash for every array-object that is a permutation of one another, one that would return the same hashCode for all of the above examples?
Update:
The reason I cannot sort and compare the arrays is that all arrays that will ever be compared will all contain the values 0..n-1. The reason for this is that the index represents the chessboard row, while the value represents the column at which the queen is placed. (See n queens puzzle if you're interested).
Hence, I am unable to calculate the hashcode by sorting first. Any other ideas?
You could simply sort the array, and then use Arrays.hashCode() to compute the hashCode.
Your collection looks like a Bag, or MultiSet. Several libraries have implementations for such a data structure. Guava for example.
The simplest way to do it would be to sum all the values in the array and then use a bit mixer to spread the bits in the result. The sum of all the values will be the same regardless of the order, so you're guaranteed that any permutation of the array will result in the same value.
For example:
int hash = 0;
for (int i = 0; i < array.length; ++i)
{
hash += array[i];
}
// See link below for reference
hash ^= (hash >>> 20) ^ (hash >>> 12);
return h ^ (hash >>> 7) ^ (hash >>> 4);
I got the bit mixer code from http://burtleburtle.net/bob/hash/integer.html. That page is full of good information that you probably want to know.
You might also consider working in the array length if it's different among the arrays you'll be comparing. You could also multiply the result by the highest (or lowest) value in the array, etc. Anything that would help you differentiate.
Create a class that wraps your array.
The hashCode method needs to perform an operation that is commutative, so that different permutations have the same hash code. Compute a hash code that is the sum of the elements in the array. The sum will not change if the order changes.
You should override equals also.
Sort the arrays before you calculate the hash or compare them within equals.
Based on your description, it sounds like you're doing a brute-force solution to the N-queens problem, whereby you generate every possible position of the queens on the board, eliminate reflections/rotations so you're left with all unique board layouts, and then search for acceptable layouts. As mentioned in the other answers, you cannot rely on hashCode alone for duplicate elimination, because even a well-written hash function has the possibility of collisions.
Instead, I'd suggest defining a canonical layout for a given equivalence set of rotations/reflections. One likely approach would be to define a sort ordering for your layouts, doing a pairwise comparison of elements, until you find an unequal position. The canonical representation for a given layout would be the layout with the lowest ordering.
Then, as you generate your layouts, the first thing you do is get the canonical representation of that layout, and only proceed if you've not yet seen the canonical version. For instance:
public class Chessboard implements Comparable<Chessboard> {
private int[] rows;
public boolean equals(Object other) {
return other != null &&
other instanceof Chessboard &&
Arrays.equals(rows, other.rows);
}
public int hashCode() {
return Arrays.hashCode(rows);
}
public int compareTo(Chessboard other) {
if (rows.length != other.rows.length) {
return rows.length - other.rows.length;
}
for (int i = 0; i < rows.length; i++) {
int c = rows[i] - other.rows[i];
if (c != 0) return c;
}
return 0;
}
public List<Chessboard> getPermutations() {
/* Your permutations code here. */
}
public Chessboard getCanonicalLayout() {
List<Chessboard> permutations = getPermutations();
Collections.sort(permutations);
return permutations.get(0);
}
public static void main(String[] args) {
Set<Chessboard> checked = new HashSet<Chessboard>();
for (Chessboard b : getAllLayouts()) {
Chessboard c = b.getCanonicalLayout();
if (checked.contains(c)) {
continue;
}
checked.add(c);
if (isSolution(c)) {
System.out.println("Found a solution: " + c);
}
}
}
}
There is a list L. It contains elements of arbitrary type each.
How to delete all duplicate elements in such list efficiently? ORDER must be preserved
Just an algorithm is required, so no import any external library is allowed.
Related questions
In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
How do you remove duplicates from a list in Python whilst preserving order?
Removing duplicates from list of lists in Python
How do you remove duplicates from a list in Python?
Assuming order matters:
Create an empty set S and an empty list M.
Scan the list L one element at a time.
If the element is in the set S, skip it.
Otherwise, add it to M and to S.
Repeat for all elements in L.
Return M.
In Python:
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> S = set()
>>> M = []
>>> for e in L:
... if e in S:
... continue
... S.add(e)
... M.append(e)
...
>>> M
[2, 1, 4, 3, 5, 6]
If order does not matter:
M = list(set(L))
Special Case: Hashing and Equality
Firstly, we need to determine something about the assumptions, namely the existence of an equals and has function relationship. What do I mean by this? I mean that for the set of source objects S, given any two objects x1 and x2 that are elements of S there exists a (hash) function F such that:
if (x1.equals(x2)) then F(x1) == F(x2)
Java has such a relationship. That allows you to check to duplicates as a near O(1) operation and thus reduces the algorithm to a simple O(n) problem. If order is unimportant, it's a simple one liner:
List result = new ArrayList(new HashSet(inputList));
If order is important:
List outputList = new ArrayList();
Set set = new HashSet();
for (Object item : inputList) {
if (!set.contains(item)) {
outputList.add(item);
set.add(item);
}
}
You will note that I said "near O(1)". That's because such data structures (as a Java HashMap or HashSet) rely on a method where a portion of the hash code is used to find an element (often called a bucket) in the backing storage. The number of buckets is a power-of-2. That way the index into that list is easy to calculate. hashCode() returns an int. If you have 16 buckets you can find which one to use by ANDing the hashCode with 15, giving you a number from 0 to 15.
When you try and put something in that bucket it may already be occupied. If so then a linear comparison of all entries in that bucket will occur. If the collision rate gets too high or you try to put too many elements in the structure will be grown, typically doubled (but always by a power-of-2) and all the items are placed in their new buckets (based on the new mask). Thus resizing such structures is relatively expensive.
Lookup may also be expensive. Consider this class:
public class A {
private final int a;
A(int a) { this.a == a; }
public boolean equals(Object ob) {
if (ob.getClass() != getClass()) return false;
A other = (A)ob;
return other.a == a;
}
public int hashCode() { return 7; }
}
This code is perfectly legal and it fulfills the equals-hashCode contract.
Assuming your set contains nothing but A instances, your insertion/search now turns into an O(n) operation, turning the entire insertion into O(n2).
Obviously this is an extreme example but it's useful to point out that such mechanisms also rely on a relatively good distribution of hashes within the value space the map or set uses.
Finally, it must be said that this is a special case. If you're using a language without this kind of "hashing shortcut" then it's a different story.
General Case: No Ordering
If no ordering function exists for the list then you're stuck with an O(n2) brute-force comparison of every object to every other object. So in Java:
List result = new ArrayList();
for (Object item : inputList) {
boolean duplicate = false;
for (Object ob : result) {
if (ob.equals(item)) {
duplicate = true;
break;
}
}
if (!duplicate) {
result.add(item);
}
}
General Case: Ordering
If an ordering function exists (as it does with, say, a list of integers or strings) then you sort the list (which is O(n log n)) and then compare each element in the list to the next (O(n)) so the total algorithm is O(n log n). In Java:
Collections.sort(inputList);
List result = new ArrayList();
Object prev = null;
for (Object item : inputList) {
if (!item.equals(prev)) {
result.add(item);
}
prev = item;
}
Note: the above examples assume no nulls are in the list.
If the order does not matter, you might want to try this algorithm written in Python:
>>> array = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6]
>>> unique = set(array)
>>> list(unique)
[1, 2, 3, 4, 5, 6]
in haskell this would be covered by the nub and nubBy functions
nub :: Eq a => [a] -> [a]
nub [] = []
nub (x:xs) = x : nub (filter (/= x) xs)
nubBy :: (a -> a -> Bool) -> [a] -> [a]
nubBy f [] = []
nubBy f (x:xs) = x : nub (filter (not.f x) xs)
nubBy relaxes the dependence on the Eq typeclass, instead allowing you to define your own equality function to filter duplicates.
These functions work over a list of consistent arbitrary types (e.g. [1,2,"three"] is not allowed in haskell), and they are both order preserving.
In order to make this more efficient, using Data.Map (or implementing a balanced tree) could be used to gather the data into a set (key being the element, and value being the index into the original list in order to be able to get the original ordering back), then gathering the results back into a list and sorting by index. I will try and implement this later.
import qualified Data.Map as Map
undup x = go x Map.empty
where
go [] _ = []
go (x:xs) m case Map.lookup x m of
Just _ -> go xs m
Nothing -> go xs (Map.insert x True m)
This is a direct translation of #FogleBird's solution. Unfortunately it doesn't work without the import.
a Very basic attempt at replacing Data.Map import would be to implement a tree, something like this
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Show, Read)
insert x Empty = Node x Empty Empty
insert x (Node a left right)
| x < a = Node a (insert x left) right
| otherwise = Node a left (insert x right)
lookup x Empty = Nothing --returning maybe type to maintain compatibility with Data.Map
lookup x (Node a left right)
| x == a = Just x
| x < a = lookup x left
| otherwise = lookup x right
an improvement would be to make it autobalancing on insert by maintaining a depth attribute (keeps the tree from degrading into a linked list). This nice thing about this over a hash table is that it only requires your type to be in the typeclass Ord, which is easily derivable for most types.
I take requests it seems. In response to #Jonno_FTWs inquiry here is a solution which completely removes duplicates from the result. It's not entirely dissimilar to the original, simply adding an extra case. However the runtime performance will be much slower since you are going through each sub-list twice, once for the elem, and the second time for the recusion. Also note that now it will not work on infinite lists.
nub [] = []
nub (x:xs) | elem x xs = nub (filter (/=x) xs)
| otherwise = x : nub xs
Interestingly enough you don't need to filter on the second recursive case because elem has already detected that there are no duplicates.
In Python
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> a=[]
>>> for i in L:
... if not i in a:
... a.append(i)
...
>>> print a
[2, 1, 4, 3, 5, 6]
>>>
In java, it's a one liner.
Set set = new LinkedHashSet(list);
will give you a collection with duplicate items removed.
For Java could go with this:
private static <T> void removeDuplicates(final List<T> list)
{
final LinkedHashSet<T> set;
set = new LinkedHashSet<T>(list);
list.clear();
list.addAll(set);
}
Delete duplicates in a list inplace in Python
Case: Items in the list are not hashable or comparable
That is we can't use set (dict) or sort.
from itertools import islice
def del_dups2(lst):
"""O(n**2) algorithm, O(1) in memory"""
pos = 0
for item in lst:
if all(item != e for e in islice(lst, pos)):
# we haven't seen `item` yet
lst[pos] = item
pos += 1
del lst[pos:]
Case: Items are hashable
Solution is taken from here:
def del_dups(seq):
"""O(n) algorithm, O(log(n)) in memory (in theory)."""
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
Case: Items are comparable, but not hashable
That is we can use sort. This solution doesn't preserve original order.
def del_dups3(lst):
"""O(n*log(n)) algorithm, O(1) memory"""
lst.sort()
it = iter(lst)
for prev in it: # get the first element
break
pos = 1 # start from the second element
for item in it:
if item != prev: # we haven't seen `item` yet
lst[pos] = prev = item
pos += 1
del lst[pos:]
go through the list and assign sequential index to each item
sort the list basing on some comparison function for elements
remove duplicates
sort the list basing on assigned indices
for simplicity indices for items may be stored in something like std::map
looks like O(n*log n) if I haven't missed anything
It depends on what you mean by "efficently". The naive algorithm is O(n^2), and I assume what you actually mean is that you want something of lower order than that.
As Maxim100 says, you can preserve the order by pairing the list with a series of numbers, use any algorithm you like, and then resort the remainder back into their original order. In Haskell it would look like this:
superNub :: (Ord a) => [a] -> [a]
superNub xs = map snd
. sortBy (comparing fst)
. map head . groupBy ((==) `on` snd)
. sortBy (comparing snd)
. zip [1..] $ xs
Of course you need to import Data.List (sort), Data.Function (on) and Data.Ord (comparing). I could just recite the definitions of those functions, but what would be the point?
I've written an algorithm for string. Actually it does not matter what type do you have.
static string removeDuplicates(string str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2) {
return str;
}
char[] arr = str.ToCharArray();
int len = arr.Length;
int pos = 1;
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < pos; ++j) {
if (arr[i] == arr[j]) {
break;
}
}
if (j == pos) {
arr[pos] = arr[i];
++pos;
}
}
string finalStr = String.Empty;
foreach (char c in arr.Take(pos)) {
finalStr += c.ToString();
}
return finalStr;
}
One line solution in Python.
Using lists-comprehesion:
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> M = []
>>> zip(*[(e,M.append(e)) for e in L if not e in M])[0]
(2, 1, 4, 3, 5, 6)
Maybe you should look into using associate arrays (aka dict in python) to avoid having duplicate elements in the first place.
My code in Java:
ArrayList<Integer> list = new ArrayList<Integer>();
list.addAll({1,2,1,3,4,5,2,3,4,3});
for (int i=0; i<list.size(); i++)
{
for (int j=i+1; j<list.size(); j++)
{
if (list.get(i) == list.get(j))
{
list.remove(i);
j--;
}
}
}
or simply do this:
SetList<Integer> unique = new SetList<Integer>();
unique.addAll(list);
Both ways have Time = nk ~ O(n^2)
where n is the size of input list,
k is number of unique members of the input list
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements
{
temp[1:n]; //an array of n elements
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
*//based on 'value' sort the array temp.*
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i].value
return p
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).
Generic solution close to the accepted answer
k = ['apple', 'orange', 'orange', 'grapes', 'apple', 'apple', 'apple']
m = []
def remove_duplicates(k):
for i in range(len(k)):
for j in range(i, len(k)-1):
if k[i] == k[j+1]:
m.append(j+1)
l = list(dict.fromkeys(m))
l.sort(reverse=True)
for i in l:
k.pop(i)
return k
print(remove_duplicates(k))