Store all combinations of String array and search - java

I have a big static dataset in the memory that stores the following attributes of people:
[sex, age, race, marital-status, education, native-country, workclass, occupation]
Each attribute takes values from a predefined set of values, and set is of different size for each attribute. This is the dictionary:
[[Male, Female], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], [White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black], [Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse], [Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool], [United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands], [Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked], [Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces]]
I would like to have a structure that keeps all possible combinations, so that for each combination in my dataset I can store some statistics (e.g. how many times a specific combination exists in the dataset), but store some information also for combinations that don't exist in the dataset. So all combinations should be represented.
I tried producing all possible combinations using ArrayList of String[]
but it takes several seconds and then searching for a specific combination using indexOf(x), where x is String[] doesn't seem to work.
public class Grid {
// Immutable fields
private final int combinationLength;
private final String[][] values;
private final int[] maxIndexes;
private final ArrayList<String[]> GridValues = new ArrayList<String[]>();
// Mutable fields
private final int[] currentIndexes;
private boolean hasNext;
public Grid(final String[][] array) {
combinationLength = array.length;
values = array;
maxIndexes = new int[combinationLength];
currentIndexes = new int[combinationLength];
if (combinationLength == 0) {
hasNext = false;
return;
}
hasNext = true;
// Fill in the arrays of max indexes and current indexes.
for (int i = 0; i < combinationLength; ++i) {
if (values[i].length == 0) {
// Set hasNext to false if at least one of the value-arrays is empty.
// Stop the loop as the behavior of the iterator is already defined in this case:
// the iterator will just return no combinations.
hasNext = false;
return;
}
maxIndexes[i] = values[i].length - 1;
currentIndexes[i] = 0;
}
while (hasNext()){
String[] nextCombination = next();
GridValues.add(nextCombination);
}
}
private boolean hasNext() {
return hasNext;
}
public String[] next() {
if (!hasNext) {
throw new NoSuchElementException("No more combinations are available");
}
final String[] combination = getCombinationByCurrentIndexes();
nextIndexesCombination();
return combination;
}
private String[] getCombinationByCurrentIndexes() {
final String[] combination = new String[combinationLength];
for (int i = 0; i < combinationLength; ++i) {
combination[i] = values[i][currentIndexes[i]];
}
return combination;
}
private void nextIndexesCombination() {
for (int i = combinationLength - 1; i >= 0; --i) {
if (currentIndexes[i] < maxIndexes[i]) {
// Increment the current index
++currentIndexes[i];
return;
} else {
// Current index at max:
// reset it to zero and "carry" to the next index
currentIndexes[i] = 0;
}
}
// If we are here, then all current indexes are at max, and there are no more combinations
hasNext = false;
}
}
Anyone has an idea for a faster and better way to do this?
Thanks a lot!

I am making an assumption here- I am assuming the data does not keep changing (looking at the data does not feel like its dynamic).
I would use an a local file based HSQL DB to store the data (I chose this for speed purposes - however feel free to swap this out for a formal dB like MySQL).
The trick to get all the types counts across various dimensions is in the schema.
For data mining "Star Schema" is the preferred approach. This schema will allow you to group by, count on any dimension you want. In your case the schema would probably look like:
table person - columns(id (primary key), name, age, sex_id, country_id, highest_education_id, income)
table sex - columns(id (primary key), name)
table country - columns(id (primary key), name)
table education - columns(id (primary key), name)
This way if you want to find count of all people who are from Columbia, the query would be like:
select count(*) from people where country_id = <columbia country id>
You can do even higher order queries like, find a total income of all the japanese :
select country.name, sum(people.income)
from people inner join country on people.country_id = country.id
and country.name = "Japan"
Its highly flexible and extensible.

Related

ArrayOutOfBounds with minimum/maximum values

I am trying to solve Project Euler problem 18. I have created an array for each row (starting from the bottom), and then an array of those arrays. I created a recursive method that starts from the bottom row, and looks ahead three rows to find the best path.
I created minimum and maximum methods to make sure that the index of my arrays could not go below zero, or above the length minus one.
/**
* A method that sets a minimum limit for an integer
* #param a The number
* #param b The lowest value it can go
* #return a
*/
public static int min(int a, int b) {
if (a<b) {
a=b;
}
return a;
}
/**
* Sets the maximum limit for an int
* #param a the number
* #param b The highest a number can go
* #return a
*/
public static int max(int a, int b) {
if(a>b) {
a=b;
}
return a;
}
Then I used these methods when calculating all possible paths in the next three rows.
I got the Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 11 error on this line of code:
eighthPoss = array[x+2][max(i, array[x].length-1)] + array[x+1][max(i, array[x].length-1)] + array[x][max(i, array[x].length-1)];
Where x is the current row, and i is the current position on the row.
I have if statements for when x+2 and x+1 are more than the amount of rows (When we are on the second to last, or last row of the triangle). I am genuinely confused how anything on this line of code is out of bounds as I have minimum and maximum values on each of them to make sure they don't go out of the range. I ran print statements and the last numbers the loops ran through before the error were x=1, i=10.
Below are my arrays. (I did not include the top row, since it is only one number.)
int[] row1 = {04, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 04, 23};
int[] row2 = {63, 66, 04, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31};
int[] row3 = {91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48};
int[] row4 = {70, 11, 33 ,28, 77, 73, 17, 78, 39, 68, 17, 57};
int[] row5 = {53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14};
int[] row6 = {41, 48, 72, 33, 47, 32, 37, 16, 94, 29};
int[] row7 = {41, 41, 26, 56, 83, 40, 80, 70, 33};
int[] row8 = {99, 65, 4, 28, 6, 16, 70, 92};
int[] row9 = {88, 2, 77, 73, 7, 63, 67};
int[] row10 = {19, 1, 23, 75, 3, 34};
int[] row11 = {20, 4, 82, 47, 65};
int[] row12 = {18, 35, 87, 10};
int[] row13 = {17, 47, 82};
int[] row14 = {95, 64};
int[][] rows = {row1, row2, row3, row4, row5, row6, row7, row8, row9, row10, row11, row12,
row13, row14};
Any help you could give me would be greatly appreciated.
Make sure that your x+2 isn't going out of bounds as it seems you're only checking to see if i is going out of bounds. (You may be checking for it somewhere else but you didn't provide that code).
Bonus clamp method that I personally use instead of individual max and min methods (since they're built into Java).
public static int clamp(int a, int min, int max) {
return Math.max(min, Math.min(max, a));
}

Generic Selection sort that can check object fields

What I need to do is take a file that would contain a name field and then also have data with them and sort them based on a given number that represents a certain set of that data. Here is an example text file. This would be like cities and their average temperature for each month(not actually accurate).
NewYork: 35, 28, 99, 39, 3, 15, 52, 5, 6, 97, 36, 32
Baltimore: 1, 59, 55, 0, 92, 82, 23, 60, 23, 16, 75, 75
Seattle: 19, 18, 10, 36, 50, 2, 8, 36, 56, 86, 14, 91
Atlanta: 57, 75, 52, 66, 28, 58, 53, 5, 21, 30, 81, 58
I want to be able to sort these based on a given data point, so if 2 was selected it should be sorted by the second point in ascending order so it would look like this. The actual data for each city is not being sorted, the whole data set is being either moved up or down based on the data point that is being used.
Seattle: 19, 18, 10, 36, 50, 2, 8, 36, 56, 86, 14, 91
NewYork: 35, 28, 99, 39, 3, 15, 52, 5, 6, 97, 36, 32
Baltimore: 1, 59, 55, 0, 92, 82, 23, 60, 23, 16, 75, 75
Atlanta: 57, 75, 52, 66, 28, 58, 53, 5, 21, 30, 81, 58
^
This is the second data point row that was sorted.
Here is what the selectionsort class is looking like, and I could get it to work without generics but I want to be able to use generics.
public class SelectionSort{
private SelectionSort(){
}
public static <T extends Comparable<T>> void sort(ArrayList <T> arrayList, int key){
for(int i=0; i<arrayList.size() -1; i++)
{
int smallestIndex = i;
for(int j=i+1; j<arrayList.size(); j++)
{
if(arrayList.get(smallestIndex).compareTo((arrayList.get(j))) > 0 )
{
smallestIndex = j;
}
}
T temp = arrayList.get(i);
arrayList.add(i,arrayList.get(smallestIndex));
arrayList.add(smallestIndex, temp);
}
}
Then for the actual objects I have this where the input to the constructor is simply one whole line from the text file. Then the data handle and name handle help to split the String up and set name to the name from the text file, and set the arraylist to be all the numbers that the name had with it.
public class ObjectData<T> {
private ArrayList<Integer> list;
private String name;
public ObjectData(String objectInfo){
String array[] = allData(objectInfo);
this.name = nameHandle(array[0]);
list = new ArrayList<Integer>(Arrays.asList(dataHandle(array[1])));
}
private static String[] allData(String string){
String array[] = string.split(":");
return array;
}
private static String nameHandle(String string){
String name = string.trim();
return name;
}
private Integer[] dataHandle(String string){
String array[] = string.split(",");
String trimmedArray[] = new String[array.length];
Integer integerArray[] = new Integer[trimmedArray.length];
for (int i = 0; i < array.length; i++){
trimmedArray[i] = array[i].trim();
}
for (int i = 0; i < trimmedArray.length; i++){
integerArray[i] = Integer.valueOf(trimmedArray[i]);
}
return integerArray;
}
public String getName(){
return this.name;
}
public Integer getIndex(int index){
return list.get(index);
}
}
In main all the objects are put into an arraylist, and that would be the arraylist of objects that is then inputed into the SelectionSort class. The problem I am having is figuring out how to use a the generic selection sort with an object like this. I was planning on just calling for example if the key was 2 like above. I would call
if(arrayList.get(smallestIndex).getIndex(2).compareTo((arrayList.get(smallestIndex).getIndex(2))) > 0 )
instead of if(arrayList.get(smallestIndex).compareTo((arrayList.get(j))) > 0 )
The only other way I was thinking about being able to do this was instead of creating objects I would just put all the information into a 2d array and then be able to compare them, but then I lose the name for the data. If anyone can suggest how I should change this to make it work, or a totally different way to accomplish this, that would be great.
One way you might consider is creating an object that implements the Comparator interface and takes the index to sort on as a parameter:
public class DataComparator implements Comparator<ObjectData>{
private int sortIndex;
public DataComparator(int sortIndex){
this.sortIndex = sortIndex;
}
public int compare(ObjectData t1, ObjectData t2) {
return (t1.getIndex(sortIndex) < t2.getDataIndex(sortIndex)) ? -1 : 1;
}
}
If you then have a List<ObjectData> of your cities, let's call it cityList for the sake of example, you can sort it using Collections.sort(cityList, new DataComparator(2)).

Java: Selection Sort My implementation vs. another

The following is my implementation of Selection Sort:
package algorithm.selectionsort;
public class SelectionSort {
public static void main(String[] args) {
int[] myArray = selectionSort(new int[] { 9, 9, 9, 8, 7, 73, 32, 109, 1100, 432, 321, 0 });
for (int element : myArray) {
System.out.print("" + element + " ");
}
}
public static int[] selectionSort(int[] a) {
int min;
for (int i = 0; i < a.length - 1; i++) {
min = i;
for (int j = i + 1; j < a.length; j++) {
if (a[j] < a[min]) {
min = j;
int temp = a[i];
a[i] = a[min];
a[min] = temp;
}
}
}
return a;
}
}
I noticed that my instructor codes it slightly differently:
public static int[] selectionSort(int[] a) {
int min;
for (int i = 0; i < a.length - 1; i++) {
min = i;
for (int j = i + 1; j < a.length; j++) {
if (a[j] < a[min]) {
min = j;
}
}
int temp = a[i];
a[i] = a[min];
a[min] = temp;
}
return a;
}
Both implementations work. I'm curious as to what the difference here is. Is it efficiency?
The difference between your instructor's and yours is that he iterate through the array and for each element, search for the minimum, then perform a swap with the element after the wall index.
For yours, you iterate through the array and for each element, while searching for the minimum, if current value is < then the current tentative min, perform a swap with the element after the wall index.
So instead of swapping n times, you could possible swap n*n times for worst case:
Your swap for just one pass (worst case):
100, 90, 88, 70, 55, 43, 32, 28, 19, 10
90, 100, 88, 70, 55, 43, 32, 28, 19, 10
88, 100, 90, 70, 55, 43, 32, 28, 19, 10
70, 100, 90, 88, 55, 43, 32, 28, 19, 10
55, 100, 90, 88, 70, 43, 32, 28, 19, 10
43, 100, 90, 88, 70, 55, 32, 28, 19, 10
32, 100, 90, 88, 70, 55, 43, 28, 19, 10
28, 100, 90, 88, 70, 55, 43, 32, 19, 10
19, 100, 90, 88, 70, 55, 43, 32, 28, 10
10, 100, 90, 88, 70, 55, 43, 32, 28, 19
Your instructor's swap for just one pass (worst case):
100, 90, 88, 70, 55, 43, 32, 28, 19, 10
10, 90, 88, 70, 55, 43, 32, 28, 19, 100
In essence, you swap the values while in the midst of searching for the min. The "min" you swapped may not be the lowest value in the array.
ofcouse your instructor's code is more efficiency and more elegant.
What is Selection Sort?
The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.
If the length of the list to be sorted is n, then just n times of exchange should be done, but in your code, it's n*(n-1)*(n-2)....

how sum of 2 arraylist?

i got below mentioned error when run code:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.LinkedList.checkPositionIndex(Unknown Source)
at java.util.LinkedList.addAll(Unknown Source)
at Collection.Dynamycmaasiv.Collecktionaddlist.main(Collecktionaddlist.java:36)
code
public static void main(String[] args) {
LinkedList<Integer> num = new LinkedList<Integer>();
LinkedList<Integer> numodd = new LinkedList<Integer>();
LinkedList<Integer> numeven = new LinkedList<Integer>();
LinkedList<Integer> sumoffevenandodd = new LinkedList<Integer>();// help
// me
// to
// solve
for (double i = 0; i < 50; i++) {
num.add((int) i);
if (i % 2 == 0) {
numeven.add((int) i);
} else {
numodd.add((int) i);
}
}
System.out.println(num);
System.out.println("-----------------");
System.out.println(numodd);
System.out.println("-----------------");
System.out.println(numeven);
for (int i =0; i<numeven.size(); i++){
sumoffevenandodd.addAll(numeven.get(i)+ numodd.get(i), null);
}
System.out.println(sumoffevenandodd);
}
}
addAll() is not about adding up numbers. It is about adding all the elements of the method parameter to the collection itself.
So, you need to loop, like
int sum = 0;
for (Integer numberFromList : numeven) {
sum = sum + numberFromList;
Or, if you have Java8, you can use streams:
int sumEven = numeven.stream().sum();
Sum, done.
And for the record: the real lesson to be learned here: read the javadoc. Don't assume that method called addAll() does what you suppose it does. Turn to the javadoc and inform yourself what reality thinks about your assumptions.
But just to be clear; as I got carried away with your question, too.
In your code, if you change
sumoffevenandodd.addAll(numeven.get(i)+ numodd.get(i), null);
to
sumoffevenandodd.add(numeven.get(i)+ numodd.get(i));
it should work, too.
Long story short: if you intended to really have a list with 50 sums within, then my first paragraphs do not really help with your problem.
But it isn't exactly clear what you wanted to do; so I leave my answer as is - to address both possible explanations what is "wrong" in your logic.
if the intention of the question is
num odd
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49]
num even
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
sum of odd and even
[1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61, 65, 69, 73, 77, 81, 85, 89, 93, 97]
then
for (int i =0; i< numeven.size(); i++){
sumoffevenandodd.add(numeven.get(i)+ numodd.get(i));
}

HashTable: in case of collision

Hashtable ht = new Hashtable();
for (int i = 0; i < 100; i++) {
ht.put(i%10, i);
}
Enumeration< Integer> eles = ht.elements();
while(eles.hasMoreElements())
System.out.println(eles.nextElement());
Above code snippet is printing 99, 98,.......90
But I want to print all 100 elements.
How to get a list of numbers like ...
99,89,79,69,...19,9
98,88,78,68....18,8
97,87,77,67....17,7
..
..
91,81,71,61....11,1
Basically all collision list.
You are currently using i % 10 as your hash map key, which only has ten values (0-9). Hence only the last ten values are stored in your map, all the others are overriden.
If you need to store more than one item in each bucket, use a list type as your value. For example:
Hashtable<Integer, List<Integer>> ht = new Hashtable<>();
for (int i = 0; i < 100; i++) {
int key = i % 10;
List<Integer> list = ht.get(key);
if (list == null) {
list = new ArrayList<>();
ht.put(key, list);
}
list.add(i);
}
Enumeration<List<Integer>> eles = ht.elements();
while (eles.hasMoreElements()) {
System.out.println(Arrays.toString(eles.nextElement().toArray()));
}
Output:
[9, 19, 29, 39, 49, 59, 69, 79, 89, 99]
[8, 18, 28, 38, 48, 58, 68, 78, 88, 98]
[7, 17, 27, 37, 47, 57, 67, 77, 87, 97]
[6, 16, 26, 36, 46, 56, 66, 76, 86, 96]
[5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
[4, 14, 24, 34, 44, 54, 64, 74, 84, 94]
[3, 13, 23, 33, 43, 53, 63, 73, 83, 93]
[2, 12, 22, 32, 42, 52, 62, 72, 82, 92]
[1, 11, 21, 31, 41, 51, 61, 71, 81, 91]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
What you observe in your example is not collision effect. It is normal element replacement.
After your 100 iteration there are only 10 elements in Hashtable.
You use numbers i%10 (0,1,...,9) as keys. So, you have only 10 different keys.
For example: in your for-loop you put 10 values for key=5 (i=5, i=15, i=95) and each put(5, val) replaces old value associated with key=5.
Collision list is different concept.
For each key hashtable computes some hash value and uses this hash to select index in its inner bucket table. Next places {key,value} under that index.
Collision is situation where 2 different keys has computed the same bucket index.
For example:
table index | map.entry
0 | {0, "A"}
1 | {3, "B"}
2 | {2, "A"} -> {4, "C"}
3 | {1, "D"} -> {5, "A} -> {6, "F}
In this example you have hashtable with 4-element inner table.
This hashtable contains 7 element (7 different keys) but:
key 2 and 3 was placed to the same bucket (they have the same index computed upon hash value)
key 1, 5, 6 was placed to the same bucket.
So we can say, there is collision between key=2 and key=3 and between 1,5,6.
In other words keys 2 adn 3 are on the same collision list. The same to keys 1,5,6.
You cannot get such collistion list from Hashtable because it is Hashtable internal implementation marked as private:
/**
* Hashtable bucket collision list entry
*/
private static class Entry<K,V> implements Map.Entry<K,V> {
int hash;
final K key;
V value;
Entry<K,V> next;
protected Entry(int hash, K key, V value, Entry<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
...
public V setValue(V value) {
if (value == null)
throw new NullPointerException();
V oldValue = this.value;
this.value = value;
return oldValue;
}
...
public int hashCode() {
return hash ^ value.hashCode();
}
...
}
And Hashtable have its internal bucket table defined as:
/**
* The hash table data.
*/
private transient Entry<K,V>[] table;
Hope this helps to figure out hashtable behavior.

Categories