Mocking contains() with a String[][] - java

I have two SQL tables. After grabbing both tables in ResultSets, I've stored them in String[][]s, ordered by a common id column. These tables should contain the same data, however one may have duplicates of the same row from the other. In order to check if every String[] in table A is present at least once in table B, I need to construct a somewhat efficient contains()-esque method for String[].
This is what I have so far, but am stumped (also not sure if there's a much more efficient solution). Give it the source table and target table. It takes each String[] in the source table and (should) go through each String[] in the target table and find an instance of the source String[] somewhere in the target String[][] by checking if there's at least one String[] that matches the original String[], element by element. Can anyone point me in the right direction and/or fill in the blanks? This isn't homework or any assignment, I'm refactoring some code and am having a major brain fart. Thanks!
public boolean targetContainsSource(String[][] s, String[][] t) {
boolean result = true;
//For each String[] in String[][] s
for (int i = 0; i < s.length; i++) {
//For each String[] in String[][] t
for (int j = 0; j < t.length; j++) {
//For each String in t's String[]
for (int k = 0; k < t[0].length; k++) {
if (!s[i][k].equals(t[j][k])) {
}
}
}
}
return result;
}

Your innermost loop could be removed by using Arrays.equals().
For each element of the first array, you should define a found boolean variable, that would only be set to true once the element is found in the second array. Once the second loop is finished, if this variable is still false, you have found an element of the first array that is not in the second, and you can return immediately.
And of course, as soon as this variable is set to true, you can break out of the second loop.

Essentially, you generally need to do the following:
use a strong hash function to take a hash of each row: this gives you a single integer (probably a long to be strong enough) or single string/byte array representing the entire row
then proceed as though you were comparing two "lists" of rows. At least one of these "lists" should actually be stored in a HashSet/HashMap, whose contains() method is efficient.
For the hash function you could use MD5 (e.g. you can use this code, but use "MD5" instead of "SHA-1"). You can use MessageDigest.compare() to compare to byte arrays representing hash codes.
If you only have a small number (say, a few tens of thousands) of rows, then you could use a 64-bit hash code-- this just has the advantage that each hash is stored in a long so they're a bit easier to shufty about and compare. But 64-bit hash codes are only strong enough for guaranteeing uniqueness of hashes of tens to hundreds of thousands of objects (=different rows in your case).
P.S. If you're prepared to store all of the data in memory, then you could also just use as the "hash" of each row all of the columns concetenated together into a single string. The trick is to make the check efficient to have one of the tables' row representations stored in a HashSet/HashMap.

Related

How to add key & element to the existing hash table in Java?

Here is my code:
for (int j = 0; j < modulePass1AL.get(i).modSize; j++) {
System.out.printf("for module %d: put into entirememoryMapHashtable:%d,%d\n", i, arrayPerModule[j][0], arrayPerModule[j][1]);
entirememoryMapHashtable.put(arrayPerModule[j][0], arrayPerModule[j][1]);
}
I want to add arrayPerModule[j][0] and arrayPerModule[j][1] to a hash table called entirememoryMapHashtable, which is a big hash table that is supposed to keep the information from each arrayPerModule (I have 4 arrayPerModule arrays in total -- they all have different array length).
However, I think my code is instead, updating the contents of entirememoryMapHashtable, by keep adding arrayPerModule from the zeroth index of entirememoryMapHashtable.
Please help me fix this.
Thank you!
put(...) method from Map interface expects a key and a value. Remember, the key must be unique...
Check if in arrayPerModule[j][0] you dont have repeated elements, otherwise, you will update the values for that key. If you don't need to use arrayPerModule[j][0] as key, in this context, you could use the index.

Which data structures to use when storing multiple entities with multiple query criteria?

There is a storage unit, with has a capacity for N items. Initially this unit is empty.
The space is arranged in a linear manner, i.e. one beside the other in a line.
Each storage space has a number, increasing till N.
When someone drops their package, it is assigned the first available space. The packages could also be picked up, in this case the space becomes vacant.
Example: If the total capacity was 4. and 1 and 2 are full the third person to come in will be assigned the space 3. If 1, 2 and 3 were full and the 2nd space becomes vacant, the next person to come will be assigned the space 2.
The packages they drop have 2 unique properties, assigned for immediate identification. First they are color coded based on their content and second they are assigned a unique identification number(UIN).
What we want is to query the system:
When the input is color, show all the UIN associated with this color.
When the input is color, show all the numbers where these packages are placed(storage space number).
Show where an item with a given UIN is placed, i.e. storage space number.
I would like to know how which Data Structures to use for this case, so that the system works as efficiently as possible?
And I am not given which of these operations os most frequent, which means I will have to optimise for all the cases.
Please take a note, even though the query process is not directly asking for storage space number, but when an item is removed from the store it is removed by querying from the storage space number.
You have mentioned three queries that you want to make. Let's handle them one by one.
I cannot think of a single Data Structure that can help you with all three queries at the same time. So I'm going to give an answer that has three Data Structures and you will have to maintain all the three DS's state to keep the application running properly. Consider that as the cost of getting a respectably fast performance from your application for the desired functionality.
When the input is color, show all the UIN associated with this color.
Use a HashMap that maps Color to a Set of UIN. Whenever an item:
is added - See if the color is present in the HashMap. If yes, add this UIN to the set else create a new entry with a new set and add the UIN then.
is removed - Find the set for this color and remove this UIN from the set. If the set is now empty, you may remove this entry altogether.
When the input is color, show all the numbers where these packages are placed.
Maintain a HashMap that maps UIN to the number where an incoming package is placed. From the HashMap that we created in the previous case, you can get the list of all UINs associated with the given Color. Then using this HashMap you can get the number for each UIN which is present in the set for that Color.
So now, when a package is to be added, you will have to add the entry to previous HashMap in the specific Color bucket and to this HashMap as well. On removing, you will have to .Remove() the entry from here.
Finally,
Show where an item with a given UIN is placed.
If you have done the previous, you already have the HashMap mapping UINs to numbers. This problem is only a sub-problem of the previous one.
The third DS, as I mentioned at the top, will be a Min-Heap of ints. The heap will be initialized with the first N integers at the start. Then, as the packages will come, the heap will be polled. The number returned will represent the storage space where this package is to be put. If the storage unit is full, the heap will be empty. Whenever a package will be removed, its number will be added back to the heap. Since it is a min-heap, the minimum number will bubble up to the top, satisfying your case that when 4 and 2 are empty, the next space to be filled will be 4.
Let's do a Big O analysis of this solution for completion.
Time for initialization: of this setup will be O(N) because we will have to initialize a heap of N. The other two HashMaps will be empty to begin with and therefore will incur no time cost.
Time for adding a package: will include time to get a number and then make appropriate entries in the HashMaps. To get a number from heap will take O(Log N) time at max. Addition of entries in HashMaps will be O(1). Hence a worst case overall time of O(Log N).
Time for removing a package: will also be O(Log N) at worst because the time to remove from the HashMaps will be O(1) only while, the time to add the freed number back to min-heap will be upper bounded by O(Log N).
This smells of homework or really bad management.
Either way, I have decided to do a version of this where you care most about query speed but don't care about memory or a little extra overhead to inserts and deletes. That's not to say that I think that I'm going to be burning memory like crazy or taking forever to insert and delete, just that I'm focusing most on queries.
Tl;DR - to solve your problem, I use a PriorityQueue, an Array, a HashMap, and an ArrayListMultimap (from guava, a common external library), each one to solve a different problem.
The following section is working code that walks through a few simple inserts, queries, and deletes. This next bit isn't actually Java, since I chopped out most of the imports, class declaration, etc. Also, it references another class called 'Packg'. That's just a simple data structure which you should be able to figure out just from the calls made to it.
Explanation is below the code
import com.google.common.collect.ArrayListMultimap;
private PriorityQueue<Integer> openSlots;
private Packg[] currentPackages;
Map<Long, Packg> currentPackageMap;
private ArrayListMultimap<String, Packg> currentColorMap;
private Object $outsideCall;
public CrazyDataStructure(int howManyPackagesPossible) {
$outsideCall = new Object();
this.currentPackages = new Packg[howManyPackagesPossible];
openSlots = new PriorityQueue<>();
IntStream.range(0, howManyPackagesPossible).forEach(i -> openSlots.add(i));//populate the open slots priority queue
currentPackageMap = new HashMap<>();
currentColorMap = ArrayListMultimap.create();
}
/*
* args[0] = integer, maximum # of packages
*/
public static void main(String[] args)
{
int howManyPackagesPossible = Integer.parseInt(args[0]);
CrazyDataStructure cds = new CrazyDataStructure(howManyPackagesPossible);
cds.addPackage(new Packg(12345, "blue"));
cds.addPackage(new Packg(12346, "yellow"));
cds.addPackage(new Packg(12347, "orange"));
cds.addPackage(new Packg(12348, "blue"));
System.out.println(cds.getSlotsForColor("blue"));//should be a list of {0,3}
System.out.println(cds.getSlotForUIN(12346));//should be 1 (0-indexed, remember)
System.out.println(cds.getSlotsForColor("orange"));//should be a list of {2}
System.out.println(cds.removePackage(2));//should be the orange one
cds.addPackage(new Packg(12349, "green"));
System.out.println(cds.getSlotForUIN(12349));//should be 2, since that's open
}
public int addPackage(Packg packg)
{
synchronized($outsideCall)
{
int result = openSlots.poll();
packg.setSlot(result);
currentPackages[result] = packg;
currentPackageMap.put(packg.getUIN(), packg);
currentColorMap.put(packg.getColor(), packg);
return result;
}
}
public Packg removePackage(int slot)
{
synchronized($outsideCall)
{
if(currentPackages[slot] == null)
return null;
else
{
Packg packg = currentPackages[slot];
currentColorMap.remove(packg.getColor(), packg);
currentPackageMap.remove(packg.getUIN());
currentPackages[slot] = null;
openSlots.add(slot);//return slot to priority queue
return packg;
}
}
}
public List<Packg> getUINsForColor(String color)
{
synchronized($outsideCall)
{
return currentColorMap.get(color);
}
}
public List<Integer> getSlotsForColor(String color)
{
synchronized($outsideCall)
{
return currentColorMap.get(color).stream().map(packg -> packg.getSlot()).collect(Collectors.toList());
}
}
public int getSlotForUIN(long uin)
{
synchronized($outsideCall)
{
if(currentPackageMap.containsKey(uin))
return currentPackageMap.get(uin).getSlot();
else
return -1;
}
}
I use 4 different data structures in my class.
PriorityQueue I use the priority queue to keep track of all the open slots. It's log(n) for inserts and constant for removals, so that shouldn't be too bad. Memory-wise, it's not particularly efficient, but it's also linear, so that won't be too bad.
Array I use a regular Array to track by slot #. This is linear for memory, and constant for insert and delete. If you needed more flexibility in the number of slots you could have, you might have to switch this out for an ArrayList or something, but then you'd have to find a better way to keep track of 'empty' slots.
HashMap ah, the HashMap, the golden child of BigO complexity. In return for some memory overhead and an annoying capital letter 'M', it's an awesome data structure. Insertions are reasonable, and queries are constant. I use it to map between the UIDs and the slot for a Packg.
ArrayListMultimap the only data structure I use that's not plain Java. This one comes from Guava (Google, basically), and it's just a nice little shortcut to writing your own Map of Lists. Also, it plays nicely with nulls, and that's a bonus to me. This one is probably the least efficient of all the data structures, but it's also the one that handles the hardest task, so... can't blame it. this one allows us to grab the list of Packg's by color, in constant time relative to the number of slots and in linear time relative to the number of Packg objects it returns.
When you have this many data structures, it makes inserts and deletes a little cumbersome, but those methods should still be pretty straight-forward. If some parts of the code don't make sense, I'll be happy to explain more (by adding comments in the code), but I think it should be mostly fine as-is.
Query 3: Use a hash map, key is UIN, value is object (storage space number,color) (and any more information of the package). Cost is O(1) to query, insert or delete. Space is O(k), with k is the current number of UINs.
Query 1 and 2 : Use hash map + multiple link lists
Hash map, key is color, value is pointer(or reference in Java) to link list of corresponding UINs for that color.
Each link list contains UINs.
For query 1: ask hash map, then return corresponding link list. Cost is O(k1) where k1 is the number of UINs for query color. Space is O(m+k1), where m is the number of unique color.
For query 2: do query 1, then apply query 3. Cost is O(k1) where k1 is the number of UINs for query color. Space is O(m+k1), where m is the number of unique color.
To Insert: given color, number and UIN, insert in hash map of query 3 an object (num,color); hash(color) to go to corresponding link list and insert UIN.
To Delete: given UIN, ask query 3 for color, then ask query 1 to delete UIN in link list. Then delete UIN in hash map of query 3.
Bonus: To manage to storage space, the situation is the same as memory management in OS: read more
This is very simple to do with SegmentTree.
Just store a position in each place and query min it will match with vacant place, when you capture a place just assign 0 to this place.
Package information possible store in separate array.
Initiall it have following values:
1 2 3 4
After capturing it will looks following:
0 2 3 4
After capturing one more it will looks following:
0 0 3 4
After capturing one more it will looks following:
0 0 0 4
After cleanup 2 it will looks follwong:
0 2 0 4
After capturing one more it will looks following:
0 0 0 4
ans so on.
If you have segment tree to fetch min on range it possible to done in O(LogN) for each operation.
Here my implementation in C#, this is easy to translate to C++ of Java.
public class SegmentTree
{
private int Mid;
private int[] t;
public SegmentTree(int capacity)
{
this.Mid = 1;
while (Mid <= capacity) Mid *= 2;
this.t = new int[Mid + Mid];
for (int i = Mid; i < this.t.Length; i++) this.t[i] = int.MaxValue;
for (int i = 1; i <= capacity; i++) this.t[Mid + i] = i;
for (int i = Mid - 1; i > 0; i--) t[i] = Math.Min(t[i + i], t[i + i + 1]);
}
public int Capture()
{
int answer = this.t[1];
if (answer == int.MaxValue)
{
throw new Exception("Empty space not found.");
}
this.Update(answer, int.MaxValue);
return answer;
}
public void Erase(int index)
{
this.Update(index, index);
}
private void Update(int i, int value)
{
t[i + Mid] = value;
for (i = (i + Mid) >> 1; i >= 1; i = (i >> 1))
t[i] = Math.Min(t[i + i], t[i + i + 1]);
}
}
Here example of usages:
int n = 4;
var st = new SegmentTree(n);
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());
st.Erase(2);
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());
For getting the storage space number I used a min heap approach, PriorityQueue. This works in O(log n) time, removal and insertion both.
I used 2 BiMaps, self-created data structures, for storing the mapping between UIN, color and storage space number. These BiMaps used internally a HashMap and an array of size N.
In first BiMap(BiMap1), a HashMap<color, Set<StorageSpace>> stores the mapping of color to the list of storage spaces's. And a String array String[] colorSpace which stores the color at the storage space index.
In the Second BiMap(BiMap2), a HashMap<UIN, storageSpace> stores the mapping between UIN and storageSpace. And a string arrayString[] uinSpace` stores the UIN at the storage space index.
Querying is straight forward with this approach:
When the input is color, show all the UIN associated with this color.
Get the List of storage spaces from BiMap1, for these spaces use the array in BiMap2 to get the corresponding UIN's.
When the input is color, show all the numbers where these packages are placed(storage space number). Use BiMap1's HashMap to get the list.
Show where an item with a given UIN is placed, i.e. storage space number. Use BiMap2 to get the values from the HashMap.
Now when we are given a storage space to remove, both the BiMaps have to be updated. In BiMap1 get the entry from the array, get the corersponding Set, and remove the space number from this set. From BiMap2 get the UIN from the array, remove it and also remove it from the HashMap.
For both the BiMaps the removal and the insert operations are O(1). And the Min heap works in O(Log n), hence the total time complexity is O(Log N)

2 dimensional array & method calls - beginner

I'm currently working on a homework assignment for a beginner-level class and I need help building a program that tests if a sodoku solution presented as an int[][] is valid. I do this by creating helper methods that check both rows, columns and grids.
To check the column I call a method called getColumn that returns a column[]. When I test it out it works fine. I then pass it out on a method called uniqueEntries that makes sure that there are no duplicates.
Problem is, when I call my getColumn method, it returns an array consisting of only one number (for example 11111111, 22222222, 33333333). I have no idea why it does that. Here is my code:
int[][] sodokuColumns = new int[length][length];
for(int k = 0 ; k < sodokuPuzzle.length ; k++) {
sodokuColumns[k] = getColumn(sodokuPuzzle, k);
}
for (int l = 0; l < sodokuPuzzle.length; l++) {
if(uniqueEntries(sodokuColumns[l]) == false) {
columnStatus = false;
}
}
my helper is as follows
public static int[] getColumn(int[][] intArray, int index) {
int[] column = new int[intArray.length];
for(int i = 0 ; i < intArray.length ; i++) {
column[i] = intArray[i][index];
}
return column;
}
Thanks !
You said:
when I call my getColumn method, it returns an array consisting of only one number (for example 11111111, 22222222, 33333333).
I don't see any issue with your getColumn method other than the fact it's not even needed because getColumn(sodokuPuzzle, k) is the same as sodokuPuzzle[k]. If you're going to conceptualize your 2D array in such a way that your first index is the column then for your purpose of checking uniqueness you only need to write a method to get rows.
The issue you're having would seem to be with another part of your code that you did not share. I suspect there's a bug in the logic that accepts user input and that it's populating the puzzle incorrectly.
Lastly a tip for checking uniqueness (if you're allowed to use it) would be to create a Set of some kind (e.g. HashSet) and add all of your items (in your case integers) to that set. If the set has the same size as your original array of items then the items are all unique, if the size differs there are duplicates.

In for loops, does the length of the array get evaluated each iteration?

if I have a for loop like...
for (int i = 0; i < myArray.length; i++) { ... }
...does myArray.lengthget evaluated every iteration? So would something like...
int len = myArray.length;
for (int i = 0; i < len; i++) { ... }
... be a small performance increase?
regardless myArray.length is just a field so there is nothing to evaluate
Java array has length as public final int so it gets initialized once and when you refer to it there is no code execution like a method call
The public final field length, which contains the number of components of the array. length may be positive or zero.
The first form will probably incur some performance penalty, since evaluating it will require, before the iflt, an aload, an arraylength and an iload; whereas the second is only two iloads.
#ajp rightly mentions that myArray may change; so it is highly unlikely that the compiler will optimize the first form into the second for you (unless, maybe, myArray is final).
However, the JIT, when it kicks in, is probably smart enough so that, if myArray doesn't change, it will turn the first form into the second.
Just in case, anyway, use the second form (this is what I always do, but that's just out of habit). Note that you can always javap the generated class file to see the generated byte code and compare.
By the way, Wikipedia has a very handy page listing all of a JVM's bytecodes. As you may see, quite a lot of them are dedicated to arrays!
Yes, the termination expression gets evaluated every time. So you're right that storing the length once could be a small performance increase. But more importantly, it changes the logic, which could make a difference if myArray gets reassigned.
for (int i = 0; i < myArray.length; i++) {
if (something-something-something) {
myArray = appendToMyArray(myArray, value); // sets myArray to a new, larger array
}
}
Now it makes a big difference whether you store the array length in a variable first.
You wouldn't normally see code like this with an array. But with an arrayList or other collection, whose size could increase (or decrease) in the body of the loop, it makes a big difference whether you compute the size once or every time. This idiom shows up in algorithms where you keep a "To-Do list". For example, here's a partial algorithm to find everyone who's connected directly or indirectly to some person:
ArrayList<Person> listToCheck = new ArrayList<>(KevinBacon);
for (int i = 0; i < listToCheck.size(); i++) {
List<Person> connections = allConnections(listToCheck.get(i));
for (Person p : connections) {
if ([p has not already been checked]) {
listToCheck.add(p); // increases listToCheck.size()!!!
}
}
}
Not really. Both cases are comparing the value at two memory addresses with every iteration, except you are doing unnecessary assigning when you use a len variable. The performance difference is probably very small, and the first line is more readable, so I would use the first way as it is more readable. If you want to be even more readable and efficient, use a for-each loop if you are just going to do a linear iteration through your array. For-each loops look work like this:
int [] myArray = {1,2,3};
for(int i:myArray){
System.out.print(i);
}
will print:
1
2
3
as i is set to each element of the array. The for each loop can be used for many objects, and is a nice feature to learn.
Here is a guide explaining it.
https://www.udemy.com/blog/for-each-loop-java/

Iterable/Multidimensional array next method issues

Disclaimer: This is for a homework assignment.
I am currently working on an assignment where I need to implement an iterable interface in order to pass each array from a square two-dimensional array. This array is supposed to represent a grid of numbers (so I will be referring to them as such [row][col]). My problem is that I want to use the same next method to iterate through the rows and the columns. First, is this possible? Second, any suggestions/hints?
My next method currently looks like this:
public Data[] next(){
Data [] holder = new Data[ray.length];
for (int i = 0; i <ray.length; i++)
holder[i]=ray[counter][i];
counter++;
return holder;}
EDIT: I am aware of being able to switch counter and i in ray[counter][i], but I'm not sure how to have it do both if that's possible.
ray is the multidimensional array and count is an attribute of the Iterator method I've created (It's initialized to 0 and this is the only method that changes it). I know I cannot return the "column" of ray this way, so how would I go about having next call columns and rows?? Thanks for any of the help. I'll be standing by if you have further questions.
My problem is that I want to use the same next method to iterate through the rows and the columns. First, is this possible?
Yes it is possible, assuming you mean what I think you mean. (The phrase "iterate through the rows and the columns" is horribly ambiguous.)
Since this is a homework exercise here are a couple of hints:
You need two counters not one.
When you get to the end of one row you need to go to the start of the next row. (Obviously!) Think about what that means if you've got two counters.
This should be enough to get you on the right track.
I want a row by row iteration and a column by column iteration.
This is also a horribly ambiguous description, but I'm going to interpret it as meaning that sometimes you want to iterate left to right and top to bottom, and other times you want to iterate top to bottom and left to right.
That is also possible:
One possibility is to use an extra state variable to tell the iterator which direction you are iterating; i.e. row within column, or column within row.
Another possibility is to implement two distinct Iterator classes for the two directions.
The problem is that the iterator class is only supposed to have one counter and returns an single-dimension array.
You've (finally) told us unambiguously that the iterator is supposed to return an array. (A good dentist could pull out a tooth quicker than that!)
So here's a hint:
Returning the ith row is easy, but returning the jth column requires you to create a new array to hold the values in that column.
My advice is: transform the 2d array to a list and iterate.
When initialize the Iterator, transform the list. Then you could iterate the list easily.
Following is p-code, you could enrich the implementation in your homework. Hope it helps you!
class TwoDimeIterator implements Iterator<Date> {
List transformedList = new ArrayList();
int cursor = 0;
/** transform to a list row by row.
So you could define your Iterator order.**/
TwoDimeIterator(){
for(int i=0; i < ray.length; i++)
for(int j=0; j < ray[0].length; j++)
transformedList.add(ray[i][j]);
}
public Date next() {
return transformedList.get(cursor++);
}
public boolean hasNext() {
return cursor != transformedList.size();
}
//...
}

Categories