Creating groups based on previous data

Creating groups based on previous data - java

I am trying to get my app to determine the best solution for grouping 20 golfers into foursomes.
I have data that shows when a golfer played, what date and the others in the group.
I would like the groups made up of golfer who haven't played together, or when everyone has played together, the longest amount of time that they played together. In other words, I want groups made up of players who haven't played together in a while as opposed to last time out.
Creating a permutation list of 20! to determine the lowest combinations didn't work well.
Is there another solution that I am not thinking of?

#Salix-alba's answer is spot on to get you started. Basically, you need a way to figure out how much time has already been spent together by members of your golfing group. I'll assume for illustration that you have a method to determine how much time two golfers have spent together. The algorithm can then be summed up as:
Compute total time spent together of every group of 4 golfers (see Salix-alba's answer), storing the results in an ordered fashion.
Pick the group of 4 golfers with the least time together as your first group.
Continue to pick groups from your ordered list of possible groups such that no member of the next group picked is a member of any prior group picked
Halt when all golfers have a group, which will always happen before you run out of possible combinations.
By way of quick, not promised to compile example (I wrote it in the answer window directly):
Let's assume you have a method time(a,b) where a and b are the golfer identities, and the result is how much time the two golfers have spent together.
Let's also that assume that we will use a TreeMap> to keep track of "weights" associated with groups, in a sorted manner.
Now, let's construct the weights of our groups using the above assumptions:
TreeMap<Integer,Collection<Integer>> options = new TreeMap<Integer, Collection<Integer>>();
for(int i=0;i<17;++i) {
for(int j=i+1;j<18;++j) {
for(int k=j+1;k<19;++k) {
for(int l=k+1;l<20;++l) {
Integer timeTogether = time(i,j) + time(i,k) + time(i,l) + time(j,k) + time(j,l)+time(k,l);
Collection<Integer> group = new HashSet<Integer>();
group.add(i);
group.add(j);
group.add(k);
group.add(l);
options.put(timeTogether, group);
}
}
}
}
Collection<Integer> golferLeft = new HashSet<Integer>(); // to quickly determine if we should consider a group.
for(int a=0; a < maxGolfers, a++) {
golferLeft.add(a);
}
Collection<Collection<Integer>> finalPicks = new ArrayList<Collection<Integer>>();
do{
Map.Entry<Integer, Collection<Integer>> least = options.pollFirstEntry();
if (least != null && golferLeft.containsAll(least.getValue()) {
finalPicks.add(least.getValue());
golferLeft.removeAll(least.getValue());
}
}while (golferLeft.size() > 0 && least != null);
And at the end of the final loop, finalPicks will have a number of collections, with each collection representing a play-group.
Obviously, you can tweak the weight function to get different results -- say you would rather be concerned with minimizing the time since members of the group played together. In that case, instead of using play time, sum up time since last game for each member of the group with some arbitrarily large but reasonable value to indicate if they have never played, and instead of finding the least group, find the largest. And so on.
I hope this has been a helpful primer!

There should be 20 C 4 possible groupings which is 4845. It should be possible to generate these combinations quite easily with four nested for loops.
int count = 0;
for(int i=0;i<17;++i) {
for(int j=i+1;j<18;++j) {
for(int k=j+1;k<19;++k) {
for(int l=k+1;l<20;++l) {
System.out.println(""+i+"\t"+j+"\t"+k+"\t"+l);
++count;
}
}
}
}
System.out.println("Count "+count);
You can quickly loop through all of these and use some objective function to workout which is the most optimal grouping. Your problem definition is a little fuzzy so I'm not sure how tell which is the best combination.
Thats just the number of way picking four golfers out of 20, you really need 5 group of 4 which I think is 20C4 * 16C4 * 12C4 * 8C4 which is 305,540,235,000. This is still in the realm of exhaustive computation though you might need to wait a few minutes.
Another approach might be a probabilistic approach. Just pick the groups at random, rejecting illegal combinations and those which don't meet your criteria. Keep picking random groups until you have found which is good enough.

Related

Efficient way for checking if a string is present in an array of strings [duplicate]

This question already has answers here:
How do I determine whether an array contains a particular value in Java?
(30 answers)
Closed 2 years ago.
I'm working on a little project in java, and I want to make my algorithm more efficient.
What I'm trying to do is check if a given string is present in an array of strings.
The thing is, I know a few ways to check if a string is present in an array of strings, but the array I am working with is pretty big (around 90,000 strings) and I am looking for a way to make the search more efficient, and the only ways I know are linear search based, which is not good for an array of this magnitude.
Edit: So I tried implementing the advices that were given to me, but the code i wrote accordingly is not working properly, would love to hear your thoughts.`
public static int binaryStringSearch(String[] strArr, String str) {
int low = 0;
int high = strArr.length -1;
int result = -1;
while (low <= high) {
int mid = (low + high) / 2;
if (strArr[mid].equals(str)) {
result = mid;
return result;
}else if (strArr[mid].compareTo(str) < 0) {
low = mid + 1;
}else {
high = mid - 1;
}
}
return result;
}
Basically what it's supposed to do is return the index at which the string is present in the array, and if it is not in the array then return -1.

So you have a more or less fixed array of strings and then you throw a string at the code and it should tell you if the string you gave it is in the array, do I get that right?
So if your array pretty much never changes, it should be possible to just sort them by alphabet and then use binary search. Tom Scott did a good video on that (if you don't want to read a long, messy text written by someone who isn't a native english speaker, just watch this, that's all you need). You just look right in the middle and then check - is the string you have before or after the string in the middle you just read? If it is already precisely the right one, you can just stop. But in case it isn't, you can eliminate every string after that string in case it's after the string you want to find, otherwise every string that's before the just checked string. Of course, you also eliminate the string itself if it's not equal because - logic. And then you just do it all over again, check the string in the middle of the ones which are left (btw you don't have to actually delete the array items, it's enough just to set a variable for the lower and upper boundary because you don't randomly delete elements in the middle) and eliminate based on the result. And you do that until you don't have a single string in the list left. Then you can be sure that your input isn't in the array. So this basically means that by checking and comparing one string, you can't just eliminate 1 item like you could with checking one after the other, you can remove more then half of the array, so with a list of 256, it should only take 8 compares (or 9, not quite sure but I think it takes one more if you don't want to find the item but know if it exists) and for 65k (which almost matches your number) it takes 16. That's a lot more optimised.
If it's not already sorted and you can't because that would take way too long or for some reason I don't get, then I don't quite know and I think there would be no way to make it faster if it's not ordered, then you have to check them one by one.
Hope that helped!
Edit: If you don't want to really sort all the items and just want to make it a bit (26 times (if language would be random)) faster, just make 26 arrays for all letters (in case you only use normal letters, otherwise make more and the speed boost will increase too) and then loop through all strings and put them into the right array matching their first letter. That way it is much faster then sorting them normally, but it's a trade-off, since it's not so neat then binary search. You pretty much still use linear search (= looping through all of them and checking if they match) but you already kinda ordered the items. You can imagine that like two ways you can sort a buncha cards on a table if you want to find them quicker, the lazy one and the not so lazy one. One way would be to sort all the cards by number, let's just say the cards are from 1-100, but not continuously, there are missing cards. But nicely sorting them so you can find any card really quickly takes some time, so what you can do instead is making 10 rows of cards. In each one you just put your cards in some random order, so when someone wants card 38, you just go to the third row and then linearly search through all of them, that way it is much faster to find items then just having them randomly on your table because you only have to search through a tenth of the cards, but you can't take shortcuts once you're in that row of cards.

Depending on the requirements, there can be so many ways to deal with it. It's better to use a collection class for the rich API available OOTB.
Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and the insertion order does not matter: Use Set<String> set = new HashSet<>() and then you can use Set#contains to check the presence of a particular string.
Are the strings supposed to be unique i.e. the duplicate strings need to be discarded automatically and also the insertion order needs to be preserved: Use Set<String> set = new LinkedHashSet<>() and then you can use Set#contains to check the presence of a particular string.
Can the list contain duplicate strings. If yes, you can use a List<String> list = new ArrayList<>() to benefit from its rich API as well as get rid of the limitation of fixed size (Note: the maximum number of elements can be Integer.MAX_VALUE) beforehand. However, a List is navigated always in a sequential way. Despite this limitation (or feature), the can gain some efficiency by sorting the list (again, it's subject to your requirement). Check Why is processing a sorted array faster than processing an unsorted array? to learn more about it.

You could use a HashMap which stores all the strings if
Contains query is very frequent and lookup strings do not change frequently.
Memory is not a problem (:D) .

Java8 - Mapping with Streams without Collecting for Performance

Exponentially Growing Stream
I have a Stream that grows exponentially for creating permutations. So each call to addWeeks increases the number of elements in the Stream.
Stream<SeasonBuilder> sbStream = sbSet.stream();
for (int i = 1; i <= someCutOff; i++) {
sbStream = sbStream.map(sb -> sb.addWeeks(possibleWeeks))
.flatMap(Collection::stream);
}
// Collect SeasonBuilders into a Set
return sbStream.collect(Collectors.toSet()); // size > 750 000
Problems
Each call to addWeeks returns a Set<SeasonBuilder> and collecting everything into a Set takes a while.
addWeeks is not static and needs to be called on each SeasonBuilder in the stream, each time through the loop
public Set<SeasonBuilder> addWeeks(
final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
return possibleWeeks.stream()
.filter(containsMatchup()) // Finds the weeks to add
.map(this::addWeek) // Create new SeasonBuilders with the new week
.collect(Collectors.toSet());
Out of memory error..... when possible weeks has size = 15
Questions
Should I be using a method chain other than map followed by flatmap?
How can I modify addWeeks so that I don't have to collect everything into a Set?
Should I return a Stream<SeasonBuilder>? Can I flatmap a Stream?
Update:
Thanks for the help everyone!
I have put the code for the methods in a gist
Thanks to #Holger and #lexicore for suggesting returning a Stream<SeasonBuilder> in addWeeks. Minor performance increase, as was predicted by #lexicore
I tried using parallelStream() and there was no significant change in performance
Context
I am creating all possible permutations of a Fantasy Football season, which will be used elsewhere for stats analysis. In a 4-team, 14-week season, for any given week, there could be three different possibilities
(1 vs 2) , (3 vs 4)
(1 vs 3) , (2 vs 4)
(1 vs 4) , (2 vs 3)
To solve the problem, plug in the permutations, and we have all our possible seasons. Done! But wait... what if Team 1 only ever plays Team 2. Then the other teams would be sad. So there are some constraints on the permutations.
Every team must play each other roughly the same amount of times (i.e. Team 1 cannot play against Team 3 ten times in a single season). In this example - 4-teams, 14 weeks - each team is capped at playing another team 5 times. So some sort of filtering has to happen when creating permutations, to avoid non-valid seasons.
Where this gets more interesting is:
6 Team League -- 15 possible weeks
8 Team League -- 105 possible weeks
10 Team League -- 945 possible weeks
I am trying to optimize performance where possible, because there are a lot of permutations to create. A 4-team, 14-week season creates 756 756 (=14!/(5!5!4!)) possible seasons, given the constraints. 6-team or 8-team seasons just get crazier.

Your whole construction is very suspicious to begin with. If you're interested in performance it is unlikely that generating all permutations explicitly is a good approach.
I also don't believe that collecting to set and streaming again is the performance problem.
But nevertheless, to answer your question: why don't you return Stream<SeasonBuilder> from addWeeks directly, why do you collect it to set first? Return the stream directy, without collecting:
public Stream<SeasonBuilder> addWeeks(
final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
return possibleWeeks.stream()
.filter(containsMatchup()) // Finds the weeks to add
.map(this::addWeek); // Create new SeasonBuilders with the new week
}
You won't need map/flatMap then, just one flatMap:
sbStream = sbStream.flatMap(sb -> sb.addWeeks(possibleWeeks));
But this won't help your performance much anyway.

Shuffle an array so every value will have a different index [duplicate]

This question already has answers here:
How to test randomness (case in point - Shuffling)
(11 answers)
Closed 8 years ago.
I have written a method to shuffle a String array
So the task is to implement WhiteElephant concept for a given string array of list of names.Should generate assignments to match the original elements.
I have written method to pick a random number and used a map to store the values so that each value will have a different index. But this prints out only 5 values. and i am confused now.
public static String[] generateAssignments(final String[] participants) {
Random r = new Random();
int size = participants.length;
HashMap val = new HashMap();
int change = 0;
String[] assignments = new String[6];
System.out.println("Using arrays");
for(int i=0; i<size;i++){
for(int j =0; j<size; j++){
change = r.nextInt(size);
if(val.containsValue(change) || change==i){
continue;
}
else val.put(i, change);
assignments[i] = participants[change];
System.out.println(assignments[i]);
break;
}
}
return assignments;
}
I appreciate your inputs.
Thanks,
Lucky

If your shuffle method is random (or pseudorandom) it will be near impossible to unit test since the output is non deterministic. If you allow for seeding a random number generator then you could ensure the output is consistent given the same seeds, though this doesn't show randomness.
You could also run the shuffle method a large number of times and check that each card shows up at each index an approixmately equal number of times. Over a large enough number of simulations this should help illustrate randomness.

FYI - There are some logical errors in both your shuffle() code and the test. I won't address those here; hopefully having a good test will allow you to figure out the problems!
Writing tests around Random data is hard.
The best option is to pass in an instance of Random to your shuffle() method or it's containing class. Then in test usages, you can pass in an instance of Random which has been seeded with a known value. Given that the Random code will behave the same every time and you control the input array, your test will be deterministic; you can confidently assert on each object in the sorted collection.
The only downside of this approach is that you won't have a test preventing you from re-writing your shuffle() method to simply re-order the elements every time into this specified order. But that might be over-thinking it; usually we can trust our future selves.
An alternative approach is to assume that in a Random world, given enough time, every data possibility will be realized.
I used this approach when testing a 6-sided die's roll() method. I needed to ensure that getting all values from 1-6 was possible. I didn't want to complicate the method signature or the Die constructor to take in an instance of Random. I also didn't feel confident in a test that used a known seed and simply always asserted 3 (i.e.).
Instead, I made the assumption that given enough rolls, all values from 1-6 would eventually be rolled. I wrote a test that infinitely called roll until all values from 1-6 had been returned. Then I added a timeout to the test so that it would fail after 1 second if the above condition hadn't been met.
#Test(timeout = 1000)
public void roll_shouldEventuallyReturnAllNumbersBetweenOneAndSixInclusively() {
Die die = new Die();
Set<Integer> rolledValues = new HashSet<Integer>();
int totalOfUniqueRolls = 0;
while (rolledValues.size() < Die.NUM_SIDES) {
if (rolledValues.add(die.roll())) {
totalOfUniqueRolls += die.getFaceValue();
}
}
assertEquals(summation(1, Die.NUM_SIDES), totalOfUniqueRolls);
}
Worst case scenario it fails after 1 second (which hasn't happened yet) but it usually passes in about 20 milliseconds.

The test must be reproducible: it is not useful if it depends on something random.
I suggest you to use mocking so the CUT (code under test) don't use the real Random class instantiated in production, but a different class written by you with a predictable behavior, giving you the possibility to make some significant assertions on two or three items.

It appears your shuffle() method will always return the same result. So given a input test array of however many elements in your test, just specify the exact output array you'd expect.
It looks like you are trying to write a very general test. Instead, your tests should be very specific: given a specific input A then you expect a specific output B.

Split an array of common English words into separate lists/arrays based on word length in Java

I'm trying to search an array of common English words to see if a specific word is contained in it, based on a text file. Since this array has >700,000 words and around 1000 words need to be checked if in the array multiple times, I thought it would be more efficient to separate the words into separate arrays or lists based on length. Is there an easy way to do this without using a switch or lots of if statements? Like so:
for(int i = 0; i < commonWordArray.length; i++) {
if(commonWordArray[i].length == 2) {
twoLetterList.add(commonWordArray[i]);
else if(commonWordArray[i].length == 3) {
threeLetterList.add(commonWordArray[i]);
else if(commonWordArray[i].length == 4) {
fourLetterList.add(commonWordArray[i]);
}
...etc
}
Then doing the same thing when checking the words:
for(int i = 0; i < checkWords.length; i++) {
if(checkWords[i].length == 2) {
if(twoLetterList.contains(checkWords[i])) {
...etc
}

Step 1
Create word buckets.
ArrayList<ArrayList<String>> buckets = new ArrayList<>();
for(int i = 0; i < maxWordLength; i++) {
buckets.add(new ArrayList<String>());
}
Step 2
Add words to your buckets.
buckets.get(word.length()).add(word);
This approach has the downside that some of your buckets may go unused. This is not an issue if you are only filtering common English words, as they do not exceed 30 characters in length. Creating 10-15 extra lists is a trivial overhead for a computer. The largest uncommon but non-technical word is 183 characters. Technical words exceed 180,000 characters, by which point this approach is clearly not practical.
The upside of this approach is that ArrayList.get() and ArrayList.add() both run in constant (O(1)) time.

Use a List<Set<String>> sets. That is, given a String word, find first the proper set (Set<String> set = sets.get(word.length)) - create the set if needed, extend the list if needed. Then just do a set.add(word). Done!
Edit/Hint: a (good) programmer should be lazy - if you need to do/write the same thing twice, you're doing something wrong.

Assuming you've got memory for it (which your current approach relies on), why not just a single Set<String>? Simpler, faster.

If you want to use multiple strings to search, you may want to try something like the Aho Corasick algorithm.
Alternatively, you may want to turn the problem around and check if a string from the 700k array is in the 1k array. To this, you won't have memory issues (imho) and you may do it with a simple dictionary (balanced tree). so you'd have 700k log2(1000).

Use a Trie, which is a memory-efficient storage mechanism which excels at storing words and checking for whether they exist or not.
Implementing one on your own is a fun exercise, or look at existing implementations.

Algorithm for finding selected matching symptoms from a set of conditions

I have a database which has a table that stores medical conditions and another table which stores symptoms. Each condition has many symptoms. The user will select a number of symptoms from the database and the algorithm will find out how many symptoms match for each condition.
I want to return each matching condition and the number of matching symptoms e.g Cold 4/8
It's quite a simple idea although I'm having difficulty working out the pseudocode/algorithm.
Thanks

If you have to code it from scratch (for example, in a homework assignment) then you might want to look at the Rete algorithm. It's going to try and help you make the minimal number of tests to get to a given conclusion. If you just take the brute force solution of looking at a bunch of different medical conditions and a set of symptoms for each and roll through testing each symptom for each condition to assign it a score, you'll end up testing the same symptoms many many times within different conditions. Runny nose, cough, etc. might show up in the symptoms list for hundreds of things. Rete attacks that and only tests each symptom once and then works its way to conclusions.
However, if you're not having to build this from scratch then you probably want to look at an off-the-shelf solution like Drools or Jess which give you a rules engine to make building the kind of database you want easy. They also build in a Rete algorithm (or something like it) to optimize their performance in the face of potentially huge numbers of rules.

For each symptom store a list of condition. When you see a symptom increment the count of all the corresponding conditions.
Python Example: Let 'A', 'B' and 'C' be the conditions and 'X','Y' and 'Z' the symptoms.
symptom = {'X':['A','B'], 'Y':['A','B','C'], 'Z':['A','C'] }
def condCount(userSymptoms):
condCnt= {}
for sym in userSymptoms:
for i in symptoms['sym]:
condCnt[i]=condCnt.get(i,0)+1
return condCnt
condCount(['X','Y'])
Answer: {'A':2,'B':2,'C':1}

pseudo code in Java
enum Condition {
CONDITON_1, CONDITION_2, CONDITION_N;
}
enum Symptom {
SYMPTOM_1, SYMPTOM_2, SYMPTOM_N;
}
public static final int SYMPTOM_COUNT = Symptom.values().length;
static final Map<Condition, Set<Symptom>> MAP = new EnumMap<Condition, Set<Symptom>>(Condition.class);
static {
MAP.put(Condition.CONDITON_1, EnumSet.of(Symptom.SYMPTOM_1));
MAP.put(Condition.CONDITION_2, EnumSet.of(Symptom.SYMPTOM_1, Symptom.SYMPTOM_2));
MAP.put(Condition.CONDITION_N, EnumSet.of(Symptom.SYMPTOM_2, Symptom.SYMPTOM_N));
}
public static void findMatches(Set<Symptom> symptoms) {
for (Map.Entry<Condition, Set<Symptom>> entry : MAP.entrySet()) {
Set<Symptom> matches = EnumSet.copyOf(entry.getValue());
matches.retainAll(symptoms);
System.out.println(entry.getKey() + ": " + matches.size() + " / " + SYMPTOM_COUNT);
}
}
public static void main(String... _) {
findMatches(EnumSet.of(Symptom.SYMPTOM_2, Symptom.SYMPTOM_N));
}
prints
CONDITON_1: 0 / 3
CONDITION_2: 1 / 3
CONDITION_N: 2 / 3

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.