Data Structure to cache most frequent elements

Data Structure to cache most frequent elements - java

Suppose I read a stream of integers. The same integer may appear more than once in the stream. Now I would like to keep a cache of N integers that appeared most frequently. The cache is sorted by the frequency of the stream elements.
How would you implement it in Java?

You want to use a binary indexed tree, the code in the link is for C++ and should be fairly straightforward to convert into Java (AFAICT the code would be the same):
Paper Peter Fenwick
Implementation in C++

Use a Guava Multiset and sort it by frequency

public class MyData implements Comparable<MyData>{
public int frequency = 0;
public Integer data;
#Override
public int compareTo(MyData that) {
return this.frequency - that.frequency;
}
}
Have it stored in a PriorityQueue

Create an object model for the int, inside create a Count property. Create a SortedVector collection extending the Vector collection. Each time an integer occurs, add it to the vector if it doesn't exist. Else, find it, update the count property += 1, then call Collections.sort(this) within your Vector.

Do you know the range of the numbers? If so, it might make sense to use an array. For example, if I knew that the range of the numbers was between 0 and 10, I would make an array of size 10. Each element in this array would count the number of times I've seen a given number. Then, you just have to remember the most frequently seen number.
e.g.
array[10];
freq_index = -1;
freq_count = -1;
readVal(int n){
array[n]+=1;
if array[n] > freq_count
freq_index = n;
freq_count = array[n];
}
Of course, this approach is bad if the distribution of numbers is sparse.
I'd try a priority queue.

Related

How to efficiently store a set of tuples/pairs in Java

I need to perform a check if the combination of a long value and an integer value were already seen before in a very performance-critical part of an application. Both values can become quite large, at least the long will use more than MAX_INT values in some cases.
Currently I have a very simple implementation using a Set<Pair<Integer, Long>>, however this will require too many allocations, because even when the object is already in the set, something like seen.add(Pair.of(i, l)) to add/check existence would allocate the Pair for each call.
Is there a better way in Java (without libraries like Guava, Trove or Apache Commons), to do this check with minimal allocations and in good O(?)?
Two ints would be easy because I could combine them into one long in the Set, but the long cannot be avoided here.
Any suggestions?

Here are two possibilities.
One thing in both of the following suggestions is to store a bunch of pairs together as triple ints in an int[]. The first int would be the int and the next two ints would be the upper and lower half of the long.
If you didn't mind a 33% extra space disadvantage in exchange for an addressing speed advantage, you could use a long[] instead and store the int and long in separate indexes.
You'd never call an equals method. You'd just compare the three ints with three other ints, which would be very fast. You'd never call a compareTo method. You'd just do a custom lexicographic comparison of the three ints, which would be very fast.
B* tree
If memory usage is the ultimate concern, you can make a B* tree using an int[][] or an ArrayList<int[]>. B* trees are relatively quick and fairly compact.
There are also other types of B-trees that might be more appropriate to your particular use case.
Custom hash set
You can also implement a custom hash set with a custom, fast-calculated hash function (perhaps XOR the int and the upper and lower halves of the long together, which will be very fast) rather than relying on the hashCode method.
You'd have to figure out how to implement the int[] buckets to best suit the performance of your application. For example, how do you want to convert your custom hash code into a bucket number? Do you want to rebucket everything when the buckets start getting too many elements? And so on.

How about creating a class that holds two primitives instead? You would drop at least 24 bytes just for the headers of Integer and Long in a 64 bit JVM.
Under this conditions you are looking for a Pairing Function, or generate an unique number from 2 numbers. That wikipeia page has a very good example (and simple) of one such possibility.

How about
class Pair {
int v1;
long v2;
#Override
public boolean equals(Object o) {
return v1 == ((Pair) o).v1 && v2 == ((Pair) o).v2;
}
#Override
public int hashCode() {
return 31 * (31 + Integer.hashCode(v1)) + Long.hashCode(v2);
}
}
class Store {
// initial capacity should be tweaked
private static final Set<Pair> store = new HashSet<>(100*1024);
private static final ThreadLocal<Pair> threadPairUsedForContains = new ThreadLocal<>();
void init() { // each thread has to call init() first
threadPairUsedForContains.set(new Pair());
}
boolean contains(int v1, long v2) { // zero allocation contains()
Pair pair = threadPairUsedForContains.get();
pair.v1 = v1;
pair.v2 = v2;
return store.contains(pair);
}
void add(int v1, long v2) {
Pair pair = new Pair();
pair.v1 = v1;
pair.v2 = v2;
store.add(pair);
}
}

Need help using java.lang.reflect.Array to sort arrays

An interview question was to write this method to remove duplicate element in an array.
public static Array removeDuplicates(Array a) {
...
return type is java.lang.reflect.Array and parameter is also java.lang.reflect.Array type.
How would this method be called for any array?
Also not sure about my implementation:
public static Array removeDuplicates(Array a)
{
int end=Array.getLength(a)-1;
for(int i=0;i<=end-1;i++)
{
for(int j=i+1;j<=end;j++)
{
if(Array.get(a, i)==Array.get(a, j))
{
Array.set(a, j, Array.get(a, end));
end--;
j--;
}
}
}
Array b=(Array) Array.newInstance(a.getClass(), end+1);
for(int i=0;i<=end;i++)
Array.set(a, i, Array.get(a, i));
return b;
}

You may want to consider using a different data structure such as a hashmap to detect the duplicate (O(1)) instead of looping with nested for loops (O(n^2)). It should give you much better time complexity.

There are various problem with this code. Starting here:
if(Array.get(a, i)==Array.get(a, j))
Keep in mind that those get() calls return Object. So, when you pass in an array of strings, comparing with == simply will most likely result in wrong results (because many objects that are in fact equal still have different references --- so your check returns false all the time!)
So, the first thing to change: use equals() instead of == !
The other problem is:
end--;
Seriously: you never ever change the variable that controls your for loop.
Instead: have another counter, like
int numberOfOutgoingItems = end;
and then decrease that counter!
For your final question - check the javadoc; for example for get(). That reads get(Object array, int index)
So you should be able to do something like:
int a[] = ...;
Object oneValue = Array.get(a, 0);
for example.
Disclaimer. I have to admit: I don't know if the Array implementation is smart enough to automatically turn the elements of an int[] into an Integer object.
It could well be that you have to write code first to detect the exact type of array (if it is an array of int for example); to instead call getInt() instead of getObject().
Beyond that, some further reading how to use reflection/Array can be found here

Array object and maximum

I know how to find minimum and maximum in an array. If a method lets say was called fMax():
public static double fMax(Object[] stuff)
The parameter is an array object how would I go about finding the max of this array? I cannot just do. Okay so how would I do this if I want the method to return a double and if the memory hasnt been allocated for the parameter named stuff then it will return the value NEGATIVE_INFINITY in the Double class, otherwise the return value will be the maximum value from the elements in the stuff array
Object max = stuff[0];
for (int i = 0; i < stuff.length; i++) {
if (data[i] > max) {
max = stuff[i];
}
}

To find the maximum of something, either
a) that something needs to implement the Comparable interface
b) you need to have some sort of explicit criteria for determining what maximum is, so you can put that in an instance of Comparator
Object itself isn't going to have anything useful for sorting. If you subclass object, you could sort based on the components of that object.
public class Example implements Comparable
{
int sortableValue = 0;
public Example (int value)
{
this.sortableValue = value;
}
public int compareTo(Example other)
{
return Integer.compare(this.sortableValue, other.sortableValue);
}
}
That's an object definition that has a natural sorting order. Java can look at that with any of the built in sorting algorithms and know the order they belong in.
If you don't provide java with a means of determining how an object has greater or lesser relative value compared to another object of the same type, it won't figure it out on its own.

Object is not comparable, you need a definite type if you want to compare values, sort or find something.
Streams are the most powerful, versatile tools for the job, this here will solve your problem if your want to find min/max of an array of Double :
Double[] arr = {1d, 2d, 3d, 4d};
Double min = Arrays.asList(arr).stream().parallel().min(Double::compare).get();
Double max = Arrays.asList(arr).stream().parallel().max(Double::compare).get();

String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
Now, just compare the new primitive array that we made from the object. If you don't need the object after this, and you aren't planning on returning an array object, then make your original array null, to take up less memory.
Check this:
How to compare two object arrays in Java?

How to create and fill an Unlimited Array

Earlier last week I was in an android/Java class, and our lecturer likes to throw little challenges at us every now and then, just as fun little programs for us to think about.
The topic I'm studying is OOP and OOD in c# and Java environments, so this really doesn't have any huge leverage on my actual final project, and I'd like to stress this was an optional task set for fun.
The Task was asking for the programmer to:
Create a program that could hold an "unlimited" array of integers (based on how many the user required) and find the max value in the array.
The issue wasn't the max method (easy), or the variables in the array (basic), but the array itself. we weren't allowed to use linked lists, it had to be an "Unlimited" 1D array that could take user input.
I've been playing around with the array for a while now, was going to make a circular array at first but that still doesn't solve many of the issues, and I can't really work out how to solve the problem in a way that this could be ported over and used in c#
any ideas as to how this could be achieved?

If you can't use only LinkedList you can use any other implementation of java.util.List.
If you can't use at all java.util.List you can use an array with enough values as you need and use a pointer to the last value.
Something like this
public class MyArray {
private int[] myArray = new int[10000];
private int index = -1;
public void add(int obj) {
index++;
myArray[index] = obj;
}
public Integer removeLast() {
if (index >= 0) {
return myArray[index--];
}
return null;
}
public Integer get(int i) {
if (i >= 0 && i < index) {
return myArray[i];
}
return null;
}
}
Note. This is very similar to the internal representation of ArrayList. Take a tour of source of ArrayList to know more, the biggest difference is that this implentation is blocked to a maximum of 10000 ints, instead the ArrayList can grows if necessary, but I think that the grows implementation is outside the scope of your exercise.

Select subarray without copying into new buffer?

I have float[] array of length 100. Is there a way I can select (pseudocode):
x = array[10:19];
To get elements 10,11,12,...,19 without copying over into another buffer? I'm in a mobile application where I don't want to waste space or time doing this. I'd rather just reference the pointers the system uses for array.

The most efficient way to do this would be to use System.arrayCopy(), which is much faster and more efficient than copying manually using a loop. It will require another array, but any approach you use (beyond just passing the original array around with a couple of ints representing the offset to use) will do this, and it's relatively cheap - the memory consuming bit is usually the objects that it's referencing rather than the array itself, and they are not copied.

No, there is no API to do that. The closest solution to this would be building your own class that wraps an existing array, and does the re-indexing:
class SubArray {
private final float[] data;
private final int offset;
private final int length;
public SubArray(float[] data, int offset, int length) {
this.data = data;
this.offset = offset;
this.length = length;
}
public float get(int index) {
if (index >= length) throw ...
return data[index + offset];
}
public void set(int index, float value) {
if (index >= length) throw ...
data[index + offset] = value;
}
}
If the result that you need is a new object that behaves like an array in all respects, including the indexing operator, you would need to make a copy.

(Update) Precondition: You should store the data in a Float[] instead of a float[], the performance-hit should be minimal.
You can use: Arrays.asList(array).subList(10, 20).
The Arrays.asList(array) does the following:
Returns a fixed-size list backed by the specified array. (Changes to the returned list "write through" to the array.) This method acts as bridge between array-based and collection-based APIs, in combination with Collection.toArray(). The returned list is serializable and implements RandomAccess.
Source
And then .subList(10, 20) returns you a List.
Then if you really want to work with arrays in the end, you could take the following lines:
List<Float> subList = Arrays.asList((Float[])array).subList(10, 20);
Float[] subArray = subList.toArray(new Float[subList.size()]);
(Update) Changed Arrays.asList(array) to Arrays.asList((Float[])array) such that it is correct now.
From documentation:
Returns an array containing all of the elements in this list in proper sequence (from first to last element); the runtime type of the returned array is that of the specified array. If the list fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this list.
If the list fits in the specified array with room to spare (i.e., the array has more elements than the list), the element in the array immediately following the end of the list is set to null. (This is useful in determining the length of the list only if the caller knows that the list does not contain any null elements.)
Like the toArray() method, this method acts as bridge between array-based and collection-based APIs. Further, this method allows precise control over the runtime type of the output array, and may, under certain circumstances, be used to save allocation costs.
Suppose x is a list known to contain only strings. The following code can be used to dump the list into a newly allocated array of String:
Source
This should ensure that no data is wasted, the only thing to be careful about could be autoboxing.
UPDATE: Changed my answer such that it now is correct under a precondition.

What is the problem of using a simple for loop? Objects are in java called by reference.
So, executing copying the array does not copy the objects.
float[] subarray = new float[10];
for(int i = 10, j = 0; i < 19; i++, j++) {
subarray[j] = x[i];
}
The array[0] is a reference to the object of x[0].
edit: This only applies for objects, and i don't know if it also applies to a float

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Data Structure to cache most frequent elements - java

Suppose I read a stream of integers. The same integer may appear more than once in the stream. Now I would like to keep a cache of N integers that appeared most frequently. The cache is sorted by the frequency of the stream elements. How would you implement it in Java?

You want to use a binary indexed tree, the code in the link is for C++ and should be fairly straightforward to convert into Java (AFAICT the code would be the same): Paper Peter Fenwick Implementation in C++

Use a Guava Multiset and sort it by frequency

public class MyData implements Comparable<MyData>{ public int frequency = 0; public Integer data; #Override public int compareTo(MyData that) { return this.frequency - that.frequency; } } Have it stored in a PriorityQueue

Related

How to efficiently store a set of tuples/pairs in Java

Need help using java.lang.reflect.Array to sort arrays

Array object and maximum

How to create and fill an Unlimited Array

Select subarray without copying into new buffer?

Categories

Resources