I have a file of Integer[]s that is too large to put in memory. I would like to search for all arrays with a last member of x and use them in other code. Is there a way to use Guava's multimap to do this, where x is the key and stored in memory and the Integer[] is the value and that is stored on disk? In this scenario, the keys are not unique, but key-value pairs are unique. Reading of this multimap (assuming that it's possible) will be concurrent. I'm also open to suggestions of other ways to approach this.
Thanks
You could create a class representing an array on disk (based on its index in the file of arrays), let's call it FileBackedIntArray, and put instances of that as the values of a HashMultimap<Integer, FileBackedIntArray>:
public class FileBackedIntArray {
// Index of the array in the file of arrays
private final int index;
private final int lastElement;
public FileBackedIntArray(int index, int lastElement) {
this.index = index;
this.lastElement = lastElement;
}
public int getIndex() {
return index;
}
public int[] readArray() {
// Read the file and deserialize the array at the associated index
return smth;
}
public int getLastElement() {
return lastElement;
}
#Override
public int hashCode() {
return index;
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
} else if (o == null || o.getClass() != getClass()) {
return false;
}
return index == ((FileBackedIntArray) o).index;
}
}
Do you actually need an Integer[] and not an int[], by the way (i.e. you can have null values)? As you've said in the comments, you don't really need an Integer[], so using intss everywhere will avoid boxing/unboxing and will save a lot of space since you appear to have lots of them. Hopefully you don't have a huge number of possible values for the last element (x).
You then create an instance for each array and read the last element to put it the Multimap without keeping the array around. Populating the Multimap needs to be either sequential or protected with a lock if concurrent, but reading can be concurrent without any protection. You could even create an ImmutableMultimap once the HashMultimap has been populated, to guard against any modification, a safe practice in a concurrent environment.
Related
I'm trying to iterate over the Integer objects of a HashSet and I want to count the number of times an element occurs. this is my method so far
public int freq(int element) {
int numElements = 0;
for (int atPos : mySet){
if (mySet.atPos == element){ //says atPos cannot be resolved to a field
numElements++;
}
}
return numElements;
}
would it be better to use an iterator to iterate over the elements? How do I fix my
mySet.atPos
line?
This is where I initialize my HashSet
private HashSet <Integer> mySet = new HashSet<Integer>();
A Set cannot contain duplicate elements. Therefore you will always get a count of 0 or 1 for your element.
For any collection, you can get the frequency of the elements with:
public int freq(int element) {
return Collections.frequency(mySet, element);
}
Not sure you'd want to make a method out of it ...
Your issue is a simple misunderstanding of how you can use variables. int atPos and mySet.atPos do not refer to the same thing. The former refers to a local variable, the latter is looking for a public member of field of an instance of a set called the same thing.
You are trying to access this field:
public class HashSet
{
public int atPos; //<<<
}
but, when we think of it this way, obviously that field does not exist in HashSet!
All you need to do is get rid of mySet. and your code will work.
if (atPos == element){
numElements++;
}
Would it be better to use an iterator to iterate over the elements?
No, there's no benefit to using an iterator in this situation. A for each is more readable.
As others have noted, because sets will never contain duplicates, your numElements will actually only ever be one or zero. As such, you could actually write your function very compactly as:
public int freq(int element) {
if (myset.contains(element)) {
return 1;
}
else {
return 0;
}
}
Or even better using the ternary operator:
public int freq(int element) {
return myset.contains(element) ? 1 : 0;
}
I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
This solution would be inefficient since I would access the hash table twice, once for contains and once for add. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after the contains, so add will not trigger a cache miss and thus be quite fast.
I cannot retrieve an element already in the set (see line flagged PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?
I don't think it's possible using HashSet. You could use some kind of Map instead and use your value as key and as value. The java.util.concurrent.ConcurrentMap also happens to posess the quite convenient method
putIfAbsent(K key, V value)
that returns the value if it is already existent. However, I don't know about the performance of this method (compared to checking "manually" on non-concurrent implementations of Map).
Here is how you would do it using a HashMap:
class HashCons<T>{
Map<T,T> map = new HashMap<T,T>();
public T intern(T t){
if (!map.containsKey(t))
map.put(t,t);
return map.get(t);
}
}
I think the reason why it is not possible with HashSet is quite simple: To the set, if contains(t) is fulfilled, it means that the given t also equals one of the t' in the set. There is no reason for being able return it (as you already have it).
Well HashSet is implemented as HashMap wrapper in OpenJDK, so you won't win in memory usage comparing to solution suggested by aRestless.
10-min sketch
class HashCons<T> {
T[] table;
int size;
int sizeLimit;
HashCons(int expectedSize) {
init(Math.max(Integer.highestOneBit(expectedSize * 2) * 2, 16));
}
private void init(int capacity) {
table = (T[]) new Object[capacity];
size = 0;
sizeLimit = (int) (capacity * 2L / 3);
}
T cons(#Nonnull T key) {
int mask = table.length - 1;
int i = key.hashCode() & mask;
do {
if (table[i] == null) break;
if (key.equals(table[i])) return table[i];
i = (i + 1) & mask;
} while (true);
table[i] = key;
if (++size > sizeLimit) rehash();
return key;
}
private void rehash() {
T[] table = this.table;
if (table.length == (1 << 30))
throw new IllegalStateException("HashCons is full");
init(table.length << 1);
for (T key : table) {
if (key != null) cons(key);
}
}
}
I was wondering if it was better to have a method for this and pass the Array to that method or to write it out every time I want to check if a number is in the array.
For example:
public static boolean inArray(int[] array, int check) {
for (int i = 0; i < array.length; i++) {
if (array[i] == check)
return true;
}
return false;
}
Thanks for the help in advance!
Since atleast Java 1.5.0 (Java 5) the code can be cleaned up a bit. Arrays and anything that implements Iterator (e.g. Collections) can be looped as such:
public static boolean inArray(int[] array, int check) {
for (int o : array){
if (o == check) {
return true;
}
}
return false;
}
In Java 8 you can also do something like:
// import java.util.stream.IntStream;
public static boolean inArray(int[] array, int check) {
return IntStream.of(array).anyMatch(val -> val == check);
}
Although converting to a stream for this is probably overkill.
You should definitely encapsulate this logic into a method.
There is no benefit to repeating identical code multiple times.
Also, if you place the logic in a method and it changes, you only need to modify your code in one place.
Whether or not you want to use a 3rd party library is an entirely different decision.
If you are using an array (and purely an array), the lookup of "contains" is O(N), because worst case, you must iterate the entire array. Now if the array is sorted you can use a binary search, which reduces the search time to log(N) with the overhead of the sort.
If this is something that is invoked repeatedly, place it in a function:
private boolean inArray(int[] array, int value)
{
for (int i = 0; i < array.length; i++)
{
if (array[i] == value)
{
return true;
}
}
return false;
}
You can import the lib org.apache.commons.lang.ArrayUtils
There is a static method where you can pass in an int array and a value to check for.
contains(int[] array, int valueToFind)
Checks if the value is in the given array.
ArrayUtils.contains(intArray, valueToFind);
ArrayUtils API
Using java 8 Stream API could simplify your job.
public static boolean inArray(int[] array, int check) {
return Stream.of(array).anyMatch(i -> i == check);
}
It's just you have the overhead of creating a new Stream from Array, but this gives exposure to use other Stream API. In your case you may not want to create new method for one-line operation, unless you wish to use this as utility.
Hope this helps!
Why cannot I retrieve an element from a HashSet?
Consider my HashSet containing a list of MyHashObjects with their hashCode() and equals() methods overridden correctly. I was hoping to construct a MyHashObject myself, and set the relevant hash code properties to certain values.
I can query the HashSet to see if there "equivalent" objects in the set using the contains() method. So even though contains() returns true for the two objects, they may not be == true.
How come then there isn’t any get() method similar to how the contains() works?
What is the thinking behind this API decision?
If you know what element you want to retrieve, then you already have the element. The only question for a Set to answer, given an element, is whether it contains() it or not.
If you want to iterator over the elements, just use a Set.iterator().
It sounds like what you're trying to do is designate a canonical element for an equivalence class of elements. You can use a Map<MyObject,MyObject> to do this. See this Stack Overflow question or this one for a discussion.
If you are really determined to find an element that .equals() your original element with the constraint that you must use the HashSet, I think you're stuck with iterating over it and checking equals() yourself. The API doesn't let you grab something by its hash code. So you could do:
MyObject findIfPresent(MyObject source, HashSet<MyObject> set)
{
if (set.contains(source)) {
for (MyObject obj : set) {
if (obj.equals(source))
return obj;
}
}
return null;
}
It is brute-force and O(n) ugly, but if that's what you need to do...
You can use HashMap<MyHashObject, MyHashObject> instead of HashSet<MyHashObject>.
Calling containsKey() on your "reconstructed" MyHashObject will first hashCode() - check the collection, and if a duplicate hashcode is hit, finally equals() - check your "reconstructed" against the original, at which you can retrieve the original using get()
Complexity is O(1) but the downside is you will likely have to override both equals() and hashCode() methods.
It sounds like you're essentially trying to use the hash code as a key in a map (which is what HashSets do behind the scenes). You could just do it explicitly, by declaring HashMap<Integer, MyHashObject>.
There is no get for HashSets because typically the object you would supply to the get method as a parameter is the same object you would get back.
If you know the order of elements in your Set, you can retrieve them by converting the Set to an Array. Something like this:
Set mySet = MyStorageObject.getMyStringSet();
Object[] myArr = mySet.toArray();
String value1 = myArr[0].toString();
String value2 = myArr[1].toString();
The idea that you need to get the reference to the object that is contained inside a Set object is common. It can be archived by 2 ways:
Use HashSet as you wanted, then:
public Object getObjectReference(HashSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
for (Xobject o : set) {
if (obj.equals(o))
return o;
}
}
return null;
}
For this approach to work, you need to override both hashCode() and equals(Object o) methods
In the worst scenario we have O(n)
Second approach is to use TreeSet
public Object getObjectReference(TreeSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
return set.floor(obj);
}
return null;
}
This approach gives O(log(n)), more efficient.
You don't need to override hashCode for this approach but you have to implement Comparable interface. ( define function compareTo(Object o)).
One of the easiest ways is to convert to Array:
for(int i = 0; i < set.size(); i++) {
System.out.println(set.toArray()[i]);
}
If I know for sure in my application that the object is not used in search in any of the list or hash data structure and not used equals method elsewhere except the one used indirectly in hash data structure while adding. Is it advisable to update the existing object in set in equals method. Refer the below code. If I add the this bean to HashSet, I can do group aggregation on the matching object on key (id). By this way I am able to achieve aggregation functions such as sum, max, min, ... as well. If not advisable, please feel free to share me your thoughts.
public class MyBean {
String id,
name;
double amountSpent;
#Override
public int hashCode() {
return id.hashCode();
}
#Override
public boolean equals(Object obj) {
if(obj!=null && obj instanceof MyBean ) {
MyBean tmpObj = (MyBean) obj;
if(tmpObj.id!=null && tmpObj.id.equals(this.id)) {
tmpObj.amountSpent += this.amountSpent;
return true;
}
}
return false;
}
}
First of all, convert your set to an array. Then, get the item by indexing the array.
Set uniqueItem = new HashSet();
uniqueItem.add("0");
uniqueItem.add("1");
uniqueItem.add("0");
Object[] arrayItem = uniqueItem.toArray();
for(int i = 0; i < uniqueItem.size(); i++) {
System.out.println("Item " + i + " " + arrayItem[i].toString());
}
If you could use List as a data structure to store your data, instead of using Map to store the result in the value of the Map, you can use following snippet and store the result in the same object.
Here is a Node class:
private class Node {
public int row, col, distance;
public Node(int row, int col, int distance) {
this.row = row;
this.col = col;
this.distance = distance;
}
public boolean equals(Object o) {
return (o instanceof Node &&
row == ((Node) o).row &&
col == ((Node) o).col);
}
}
If you store your result in distance variable and the items in the list are checked based on their coordinates, you can use the following to change the distance to a new one with the help of lastIndexOf method as long as you only need to store one element for each data:
List<Node> nodeList;
nodeList = new ArrayList<>(Arrays.asList(new Node(1, 2, 1), new Node(3, 4, 5)));
Node tempNode = new Node(1, 2, 10);
if(nodeList.contains(tempNode))
nodeList.get(nodeList.lastIndexOf(tempNode)).distance += tempNode.distance;
It is basically reimplementing Set whose items can be accessed and changed.
If you want to have a reference to the real object using the same performance as HashSet, I think the best way is to use HashMap.
Example (in Kotlin, but similar in Java) of finding an object, changing some field in it if it exists, or adding it in case it doesn't exist:
val map = HashMap<DbData, DbData>()
val dbData = map[objectToFind]
if(dbData!=null){
++dbData.someIntField
}
else {
map[dbData] = dbData
}
I've a Vector of objects, and have to search inside for a random attribute of those objects (For example, a Plane class, a Vector containing Plane; and I've to search sometimes for destination, and others to pilotName).
I know I can traverse the Vector using an Iterator, but I've got stuck at how do I change the comparison made between a String and the attribute on the object. I thought of using switch, but a another opinion would be cool.
Update 1:
The code I've written is something like this (Java n00b alert!):
public int search(String whatSearch, String query){
int place = -1;
boolean found = false;
for ( Iterator<Plane> iteraPlane = this.planes.iterator(); iteraPlane.hasNext() && found == false; ) {
Plane temp = (Plane) iteraPlane.next();
/* Here is where I have to search for one of many attributes (delimited by whatSearch */
}
return place;
}
Seems I've to stick to linear search (and that's a price I've able to pay). Anyway, I was thinking if Java had something like variable variable name (ouch!)
I assume that your problem is that you want to have a method that searches for a result based on some property of the collection type. Java is weak on this because it is best expressed in a language which has closures. What you need is something like:
public interface Predicate<T> {
public boolean evaluate(T t);
}
And then your search method looks like:
public static <T> T findFirst(List<T> l, Predicate<T> p) { //use List, not Vector
for (T t : l) { if (p.evaluate(t)) return t; }
return null;
}
Then anyone can use this general-purpose search method. For example, to search for an number in a vector of Integers:
List<Integer> is = ...
findFirst(is, new Predicate<Integer> {
public boolean evaluate(Integer i) { return i % 2 == 0; }
});
But you could implement the predicate in any way you want; for any arbitrary search
Use Collections.binarySearch and provide a Comparator.
EDIT: This assumes that the Vector is sorted. Otherwise, one has to do a linear search.
the equals() method is the best option. For these iterations you could do something like this:
for (Plane plane: planes) {
if ("JFK".equals(plane.getDestination())) {
// do your work in here;
}
}
or you could override the equals() method within Plane to see if the String passed in matches your destination (or pilot). this will allow you to use the indexOf(Object) and indexOf(Object, index) methods on Vector to return you the index(es) of the object(s). Once you have that, you could use Vector.get(index) to return to Object for you.
in Plane.java:
public boolean equals(Object o) {
return o.equals(getDestination()) ||
o.equals(getPilot()) ||
super.equals(o);
}
there is more work to be done with this option, as you will need to override hashCode() as well (see documentation).
See #oxbow_lakes above -- I think what you want isn't to pass a String as whatSearch, it's to pass a little snippet of code that knows how to get the property you're interested in. For a less general version:
public static interface PlaneMatcher {
boolean matches(Plane plane, String query);
}
public int search(PlaneMatcher matcher, String query){
int place = -1;
boolean found = false;
for ( Iterator<Plane> iteraPlane = this.planes.iterator(); iteraPlane.hasNext() && found == false; ) {
Plane temp = (Plane) iteraPlane.next();
if (matcher.matches(temp, query) {
found = true;
}
place++;
}
return place;
}
...
// example
int pilotNameIndex = search(new PlaneMatcher() {
boolean matches(Plane plane, String query) {
// note: assumes query non-null; you probably want to check that earlier
return query.equals(plane.getPilotName());
}
}, "Orville Wright");
(By the way, if it's the index you're interested in rather than the Plane itself, I wouldn't bother with an Iterator -- just use an old-fashioned for (int i = 0; i < planes.size(); i++) loop, and when you have a match, return i.)
Now, the tricky bit here is if what you have to search for is really identified by arbitrary strings at run-time. If that's the case, I can suggest two alternatives:
Don't store these values as object fields -- plane.pilotName, plane.destination -- at all. Just have a Map<String, String> (or better yet, a Map<Field, String> where Field is an Enum of all the valid fields) called something like plane.metadata.
Store them as object fields, but prepopulate a map from the field names to PlaneMatcher instances as described above.
For instance:
private static final Map<String, PlaneMatcher> MATCHERS = Collections.unmodifiableMap(new HashMap<String, PlaneMatcher>() {{
put("pilotName", new PlaneMatcher() {
boolean matches(Plane plane, String query) {
return query.equals(plane.getPilotName());
});
...
put("destination", new PlaneMatcher() {
boolean matches(Plane plane, String query) {
return query.equals(plane.getDestination());
});
}}
...
public int search(String whatSearch, String query){
PlaneMatcher matcher = MATCHERS.get(whatSearch);
int place = -1;
boolean found = false;
for ( Iterator<Plane> iteraPlane = this.planes.iterator(); iteraPlane.hasNext() && found == false; ) {
Plane temp = (Plane) iteraPlane.next();
if (matcher.matches(temp, query) {
found = true;
}
place++;
}
return place;
}
Oh, and you might be tempted to use reflection. Don't. :)
A simple way is to pass a comparison function to your search routine. Or, if you need more speed, use generics.