Java: testing positive and negative values - java

I need to test whether a late goal in a Soccer game has changed the result.
The project is centered around one Team, and it is regarding their results against various opponents. It is an investigation regarding the importance of late goals to a winning team.
In my Goals database I have a list of the times that goals were scored and whether or not they were scored by the Team or the Opposition.
There is a Game Object that stores things like teamScore, oppScore, a result (String that contains either W or L or D) etc.
The Season Object is used to collect and return the results.
When a goal is scored by the team it increments the minutes Int[] in the Game at the correct minute:
minute[GoalTime]++
When a goal is scored by the team it decrements the minutes database at the correct minute:
minute[GoalTime]++
Therefore to find the score at any minute, we add up all the minutes that have been before:
int score85=0;
for (int g = 0; g <= 85; g++) {
score85+=minutes[g];
}
If I have score90 and score85 how do I compare them so that it only returns results where a late goal has changed the result? I wish to avoid logging games where a winning team has scored again post 85, as this makes no difference to the result. I’m only interested in goals that have had a direct impact on the outcome.
Here is what I have:
int difference = score90-score85;
if (difference>0 && score85<=0)
{
if (result.equals("W") || result.equals("D"))
{
season.gamesWDByLateGoal++;
}
}
if (difference<0 && score85>=0)
{
if (result.equals("D") || result.equals("L")){
season.gamesLDByLateGoal++;
println(gameNumber);
}
}
How can I be sure that I am getting the right result? I am testing 1500+ games and I have been getting different answers.

The goal count is not a good solution:
minute[GoalTime]++
That means to have an array of 90 integers (the length of a usual soccer game is 90 minutes). You are doing statistics on data, so you should be familiar with some of the average values. I hope you agree that having 9 goals in a game is a lot. That means that 90% of those integers you have are useless most of the time.
Not only is it a waste of memory, but also a waste of time, as you have to step through all those integers (in the for loop) to find the game result.
In addition to that, not every game lasts 90 minutes.
Since soccer players are paid like Hollywood actresses, they act as
such and allowances of a few minutes are added to every period in
order to compensate for their stage time.
There can be an overtime of 2 additional 15 minute periods
there were rules that could shorten the overtime, see golden and silver goal.
Considering that, the idea of using the time as an index doesn't look that good.
On top of that, there can be a penalty shoot-out, which causes goals that do not even have a time associated to them.
Further down in the article, there's this demonstration how different rating systems handle penalty goals
In the calculation of UEFA coefficients, shoot-outs are ignored for club coefficients,[53] but not national team coefficients, where the shoot-out winner gets 20,000 points: more than the shoot-out loser, who gets 10,000 (the same as for a draw) but less than the 30,000 points for winning a match outright.[56] In the FIFA World Rankings, the base value of a win is three points; a win on penalties is two; a draw and a loss on penalties are one; a loss is zero.[54] The more complicated ranking system FIFA used from 1999 to 2006 gave a shoot-out winner the same points as for a normal win and a shoot-out loser the same points as for a draw; goals in the match proper, but not the shoot-out, were factored into the calculation.[57]
I highly recommend looking at the different systems to see which one makes most sense to you. After all, you cannot compare to another system if you have a different definition of winning and losing a game.
To complicate things further, goals can have different "values". Goals that are scored away are more valuable than those at home.
If two matches between A and B are as follows
#A's home: A:1 B:1
#B's home: A:2 B:2
A wins on aggregate, even though both games are draws.
Did I mention indoor soccer yet? =)
conclusion
Use an official system as a reference. If your system comes to the
conclusion that the game was a draw, while the official saying is
that itS' a win, then your system is not comparable to any official
results and has thus very little meaningfulness. Make sure your system matches the official results.
Store the goals as their time in a dynamic list associated with the team.
Finally, in a separate list, store results with a time. Whenever a goal is scored that changes the result, add another result object with the new result to this list. You now know all the times when the result changed, without discretising the entire timeline. To get a result at an arbitrary point in time, get the result from that list with the next smaller time stamp.

Related

How to calculate percentages of poker hands when the program knows all hands?

I am looking to calculate the percentages that you see on tv when watching poker in my java program. The difference between this question and many others, if unclear, is that the program DOES know all players hands and can therefore determine an accurate percentage. There are many websites such as this one: https://www.pokerlistings.com/online-poker-odds-calculator
where you can input the players cards and it will give you the percentages. I was wondering if there were any all ready set algorithms for this or any java API that i could use in my program. I understand that my question may be a bit unrealistic. So if there is no algorithms or APIs, perhaps someone knows how it is calculated, so that i can try to construct my own algorithm?
In case there is still confusion:
If there are 3 players and their hands are
player 1: As kh
player 2: 2d 3c
player 3: AH AD
I would like to know the percentage that each player has to win preflop, pre-turn and pre-river
Thanks in advance.
The numbers are much smaller than you might expect, so brute-forcing is quite possible. The trick is to disregard the order the cards come out as much as possible. For example, if you're considering preflop probabilities, you only care about the entire board by the river, and not the specific flop, turn and river.
Given 3 hands (6 cards), there's 46 cards remaining in the deck. That means there's choose(46, 5) = 1,370,754 different boards by the river. Enumerate all boards, and count how many times each hand wins on the river. (Or more accurately, compute the equity for each hand, since sometimes 2 or 3 hands will tie on the river). This gives you preflop probabilities, and this is the most expensive thing to compute.
Given the flop, there's only choose(43, 2) = 903 boards possible by the end, so the flop probabilities (which you call the pre-turn), it's very cheap to enumerate all the runouts and compute the average equity for the three hands.
Given the flop and turn, there's only 42 possible river cards. So on the turn (pre-river), it's even cheaper to compute the hand equities.
You'll ideally need a fast hand evaluator, but that should be easy to find online if you don't want to write it yourself.
its inefficient, but you can loop through all of the cards left to be played and see how many each player wins in each scenario. Various speed ups can be made by eliminating possibilities ( IE no flushes if suits havent lined up )

Calculating winning odds - Poker bot

I am trying to build a poker bot in java. I have written the hand evaluation class and I am about to start feeding a neural network but I face a problem. I need the winning odds of every hand for every step: preflop, flop, turn, river.
My problem is that, there are 52 cards and the combinations of 5 cards are 2,598,960. So I need to store 2,598,960 odds for each possible hand. The number is huge and these are only the odds I need for the river.
So I have two options:
Find the odds for every possible hand and every possible deck and every time I start my application load them and kill my memory.
Calculate the odds on the fly and lack processing power.
Is there a 3rd better option to deal with this problem?
3rd option is use the disk... but my first choice would be to calculate odds as you need them.
Why do you need to calculate all combinations of 5 cards, a lot of these hands are worth the same, as there are 4 suits there is repetition between hands.
Personally I would rank your hand based on how many hands beat your hand and how many hands your hand beats. From this you can compute your probability of winning the table by multiplying by number of active hands.
What about ignoring the colors? From 52 possible values, you drop to 13. You only have 6175 options remaining. Of course, colors are important for a flush - but here, it is pretty much binary - are all the colors the same or not? So we are at 12350 (including some impossible combinations, in fact it is 7462 as in the others, a number is contained more than once, so the color must differ).
If the order is important (e.g. starting hand, flip, flop, river or how is it called), it will be a lot more, but it is still less than your two millions. Try simplifying your problems and you'll realize they can be solved.

Traveling Salesman Variation: Building a tour of baseball stadiums

I'm trying to write a Java program to find the best itinerary to do a driving tour to see a game at each of the 30 Major League Baseball stadiums. I define the "best" itinerary using the metric Miles Driven + (200 * Number of Days on the Road); this eliminates tours that are 20,000 miles over 30 days, or 11,000 miles over 90 days, neither of which would be a trip that I'd want to take. Each team plays 81 home games over the course of a 183-day season, so when a team is at home needs to be taken into consideration.
Also, I'm not just looking for one best tour for the entire baseball season. I'm looking to find the best tour that starts/ends at any given MLB city, on any given date (Detroit on June 15, Atlanta on August 3, etc.).
I've got the program producing results that I'm pretty happy with, but it would take a few months to run to completion on my laptop, and I'm wondering if anyone has any ideas for how to make it more efficient.
The program runs iteratively. It starts with a single game; say, Chicago on April 5. It figures out which games you could get to next within the next day or two after the Chicago game; let's say there are two such games, in Cincinnati and Detroit. It creates a data structure containing all the stops on each prospective tour (one for Chicago-Cincinnati, one for Chicago-Detroit). Then, it does the same thing to find prospective 3rd stops for both of the 2-stop tours, and so on, until it gets to the 30th and last stop, at which point it ascertains the best tour.
It uses a couple of methods to prune inefficient tours as it goes. The main one is employed using a HashMap. The key is a character sequence that denotes (1) which ballparks have already been visited, and (2) which was the last one visited. So it would run into a duplicate on, say, A-B-C-D-E and A-D-B-C-E. It would then keep the shorter route and eliminate the longer one. For most starting points, this keeps the maximum number of tours on the list at any given time at around 20 million, but for some starting points, it gets up to around 90 million.
So ... any ideas?
Your algorithm is not actually greedy -- it enumerates all possible tours (pruning the obviously bad ones as you go). A greedy algorithm looks only one step ahead, makes the best decision for that step, then moves the next step ahead, etc. They are not exact but they are very fast. I would suggest adapting a standard greedy-type TSP heuristic for your problem. There are many common ones -- nearest neighbor, cheapest insertion, etc. You'll find various online sources describing them if you aren't already familiar with them.
You could also create duplicate stadium nodes, one for each home game, and then model this as a generalized TSP (GTSP) in which each "cluster" consists of the nodes for a given stadium. The distance from node (i_1,j_1) to (i_2,j_2) (where i = stadium and j = date) is defined by your metric.
Technically this is a TSP with time windows, but it's more complicated than the usual definition of the TSPTW because usually each node has a single contiguous time window (e.g., arrive between 8 am and 2 pm) whereas here you have a disjoint set of windows and you must choose one of them.
Hope this helps.

Blackjack help: advice for user about whether to hit or stand

ok so i am making a blackjack program
that uses the output box. My problem here is trying to get a sort of help for the user.
i need help finding out what to do at this point:
if (y.equalsIgnoreCase("Y"))
{
if(userHand.getBlackjackValue()+10<21)
{
System.out.println("You should hit.");
}
if(userHand.getBlackjackValue()+10>21)
{
}
}
The problem is at the second inner if statement. how should it be determined whether or not the player should continue hitting or should stand. I'll include the class as well as other classes in the package pertaining to the program. i am thinking that i might have to add more methods to the project in order to make it work
https://sites.google.com/site/np2701/
if u can please point out some convoluted code that i can fix up, thanks
If card counting is out of scope, use a basic strategy table for the rules you are using (number of decks, etc): http://wizardofodds.com/games/blackjack/strategy/calculator/ - you should index into the table based on your hand's point value and the dealer's card, and return the option stored in the table. You might choose to store it in the code as a two dimensional array, or load it from a file. You might store it as characters and interpret what the characters, mean, or as an enum, for example you might call the enum Hints with members Hit, Stand, Split, etc.
A basic strategy table is guaranteed to provide the best odds of success if card counting is ignored, because we take all of the relevant state and chose the statistically best option.
If we wish to account for card counting too, then we must keep track of the True Count (the running high-low count divided by the number of decks left), and for certain states (player hand score vs dealer revealed card) instead of always doing the same action, we do one action if the True Count is above x and another if it is below x. In addition, you should bet next to nothing if the true count is low (below 1) and bet more and more as it increases past 1, but not so much more you run the risk of bankruptcy. Read more here http://wizardofodds.com/games/blackjack/card-counting/high-low/
To represent such an index programatically, I would make an object with three fields: the below-index action, the above-index action and the index value.
If you really want to suggest the proper play to the user, you need to look up the basic strategy for the game you're simulating. These tables are based on the player's total (and you have to know whether it's soft or hard), and the dealer's upcard.
If all you want to know is "what are my chances of busting on the next hit", that's just (the number of remaining cards that will bust you) / (total remaining cards). This requires not only the player total, but the actual cards. For example, in single deck, if a player has two sevens against a dealer 5, there are 24 bust cards out of the 49 remaining, so you'll bust 24/49 (about 49%) of the time. But if you have a 10 and a 4 (also 14) against a dealer 10, there are only 22 bust cards remaining, for a 45% chance of busting.

Best fit curve for trend line

Problem Constraints
Size of the data set, but not the data itself, is known.
Data set grows by one data point at a time.
Trend line is graphed one data point at a time (using a spline/Bezier curve).
Graphs
The collage below shows data sets with reasonably accurate trend lines:
The graphs are:
Upper-left. By hour, with ~24 data points.
Upper-right. By day for one year, with ~365 data points.
Lower-left. By week for one year, with ~52 data points.
Lower-right. By month for one year, with ~12 data points.
User Inputs
The user can select:
the type of time series (hourly, daily, monthly, quarterly, annual); and
the start and end dates for the time series.
For example, the user could select a daily report for 30 days in June.
Trend Weight
To calculate the window size (i.e., the number of data points to average when calculating the trend line), the following expression is used:
data points / trend weight
Where data points is derived from user inputs and trend weight is 6.4. Even though a trend weight of 6.4 produces good fits, it is rather arbitrary, and might not be appropriate for different user inputs.
Question
How should trend weight be calculated given the constraints of this problem?
Based on the looks of the graphs I would say you have too many points for your 12 point graph (it is just a spline of the points given... which is visually pleasing, but actually does more harm than good when trying to understand the trend) and too few points for your 365 point graph. Perhaps try doing something a little exponential like:
(Data points)^1.2/14.1
I do realize this is even more arbitrary than what you already have, but arbitrary isn't the worst thing in the world.
(I got 14.1 by trying to keep the 52 point graph fixed, since that one looks nice, by taking (52^(1.2)/52)*6.4=14.1. You using this technique you could try other powers besides 1.2 to see what you visually get.
Dan
I voted this up for the quality of your results and the clarity of your write-up. I wish I could offer an answer that could improve on your already excellent work.
I fear that it might be a matter of trial and error with the trend weight until you see an improved fit.
It could be that you could make this an input from users as well: allow them to fiddle with the value, given realistic constraints, until they get satisfactory values.
I also wondered if the weight would be different for each graph, since the number of points in each is different. Are you trying to get a single weighting that works for all graphs?
Excellent work; a nice question. Well done. I wish I was more helpful. Perhaps someone else will have more wisdom to impart than I do.
It might look like the trend lines are accurate in those 4 graphs but its really quite off. (This is best seen in the begging of the lower left one and the beginning of the upper right. I would think that you would want to use no less than half of your points when finding the trend line (though really you should use much more than half). I would suggest a Trend Weight of 2 at a maximum. Though really you ought to stick closer to the 1-1.5 range. Since it is arbitrary i would suggest you give your user an "accuracy of trend line" slider that they can use where the most accurate setting uses a trend weight of 1 and the least accurate uses a weight of #of data points +1. This would use 0 points (amusing you always round down) and, i would assume, though your statistics software might be different, will generate a strait horizontal line.

Categories