Java - statistics automation using arrays, (Creating a table of classes) - java

I know this is kind of hard to understand but bear with me here, I'm taking a statistics class and I wanted to automate the process, so what we do is take some data, take the minimum number and maximum, get the range, all easy and already done.
But here is the trick, we need to split the data between classes, for example user entered 10 data points ranging between the numbers 10 and 40, what you wanna do is create classes to fit each number, class one is the numbers between 10 and 15, class 2 is 15 and 20 and so on (this was just an example).. thinking about it first you can just make a shit ton of if conditions, but that doesn't seem intuitive and second all data and class width and number of classes are the user's choice, so how do you suggest I approach this problem?
To help you understand here is a better example:
STATS

Related

Sampling numerical arrays in java

I have a data set of time series data I would like to display on a line graph. The data is currently stored in an oracle table and the data is sampled at 1 point / second. The question is how do I plot the data over a 6 month period of time? Is there a way to down sample the data once it has been returned from oracle (this can be done in various charts, but I don't want to move the data over the network)? For example, if a query returns 10K points, how can I down sample this to 1K points and still have the line graph and keep the visual characteristics (peaks/valley)of the 10K points?
I looked at apache commons but without know exactly what the statistical name for this is I'm a bit at a loss.
The data I am sampling is indeed time series data such as page hits.
It sounds like what you want is to segment the 10K data points into 1K buckets -- the value of each one of these buckets may be any statistic computation that makes sense for your data (sorry, without actual context it's hard to say) For example, if you want to spot the trend of the data, you might want to use Median Percentile to summarize the 10 points in each bucket. Apache Commons Math have helper functions for that. Then, with the 1K downsampled datapoints, you can plot the chart.
For example, if I have 10K data points of page load times, I might map that to 1K data points by doing a median on every 10 points -- that will tell me the most common load time within the range -- and point that. Or, maybe I can use Max to find the maximum load time in the period.
There are two options: you can do as #Adrian Pang suggests and use time bins, which means you have bins and hard boundaries between them. This is perfectly fine, and it's called downsampling if you're working with a time series.
You can also use a smooth bin definition by applying a sliding window average/function convolution to points. This will give you a time series at the same sampling rate as your original, but much smoother. Prominent examples are the sliding window average (mean/median of all points in the window, equally weighted average) and Gaussian convolution (weighted average where the weights come from a Gaussian density curve).
My advice is to average the values over shorter time intervals. Make the length of the shorter interval dependent on the overall time range. If the overall time range is short enough, just display the raw data. E.g.:
overall = 1 year: let subinterval = 1 day
overall = 1 month: let subinterval = 1 hour
overall = 1 day: let subinterval = 1 minute
overall = 1 hour: no averaging, just use raw data
You will have to make some choices about where to shift from one subinterval to another, e.g., for overall = 5 months, is subinterval = 1 day or 1 hour?
My advice is to make a simple scheme so that it is easy for others to comprehend. Remember that the purpose of the plot is to help someone else (not you) understand the data. A simple averaging scheme will help get you to that goal.
If all you need is reduce the points of your visuallization without losing any visuall information, I suggest to use the code here. The tricky part of this approach is to find the correct threshold. Where threshold is the amount of data point you target to have after the downsampling. The less the threshold the more visual information you lose. However from 10K to 1K, is feasible, since I have tried it with a similar amount of data.
As a side note you should have in mind
The quality of your visualization depends one the amount of points and the size (in pixels) of your charts. Meaning that for bigger charts you need more data.
Any further analysis many not return the corrected results if it is applied at the downsampled data. Or at least I haven't seen anyone prooving the opposite.

Finding as many pairs as possible

I'm trying to solve a problem but unfortunately my solution is not really the best for this task.
Task:
At a party there are N guests ( 0 < N < 30000 ). All guests tell when they get to the party and when they leave (for example [10;12]). The task is to take photos of as many people as possible at the party. On a photo there can only be 2 people (a pair) and each person can only be on exactly one photo. Of course, a photo can only be taken when the two persons are at the party at the same time. This is the case when their attendance intervals overlap.
My idea: I wrote a program which from the intervals creates a graph of connections. From the graph I search for the person who has the least number of connections. From the connected persons I also select the person who has the least connections. Then these two are chosen as a pair on a photo. Both are removed from the graph. The algorithm runs until no connections are left.
This approach works however there is a 10 secs limit for the program to calculate. With 1000 entries it runs in 2 secs, but even with 4000 it takes a lot of time. Furthermore, when I tried it with 25000 data, the program stops with an out of memory error, so I cannot even store the connections properly.
I think a new approach is needed here, but I couldn't find an other way to make this work.
Can anyone help me to figure out the proper algorithm for this task?
Thank you very much!
Sample Data:
10
1 100
2 92
3 83
4 74
5 65
6 55
7 44
8 33
9 22
10 11
The first line is the number of guests the further data is the intervals of people at the party.
No need to create graph here, this problem can be solved well on intervals structure. Sort people by ascending order of their leaving time(ending point of interval). Then iterate over them in that sorted order: if current person is not intersecting with anyone, then he should be removed. If he is intersecting with more than one person, take as a pair one of them who has earliest leaving time. During iteration you should compare each person only with next ones.
Proving this approach is not so difficult, so I hope you can prove it yourself.
Regarding running time, simple solution will be O(N^2), however I think that it can be reduced to O(N * logN). Anyway, O(N^2) will fit in 10 seconds on a normal PC.
Seems like a classical maximum matching problem to me.
You build a graph, where people who can possibly be pictured together (their time intervals intersect) are connected with an edge, and then find maximum matching, for example, with Edmond's Blossom algorithm.
I wouldn't say, that it's quite easy to implement. However, you can get quite a good approximation of this with Kuhn's algorithm for maximum matching in bipartite graphs. This one is really easy to implement, but won't give you the exact solution.
I have some really simple idea:
Assume, party will take Xh, make X sets for each hour, add to them appropriate people. Of course, people who will be there longer than hour will be in few sets. Now if there are 2 sets "together" with even number of ppl, you just could take n/2 photos for each sets. If there are 2 sets of odd number of people you are looking for someone who will be on each of that 2 sets, and move him to one of them (so you got 2 even number sets of people who will be on the same time on the party).
Remember to remove all used ppl (consider some class - Man with lists of all his/her hours).
My idea probably should be expand to more advanced "moving people" algorithm, through more than one neighboring set.
I think the following can do:
First, read all the guests' data and sort them into an array by leaving time ascending. Then, take first element of the array and iterate through next elements until the very first time-match found (next guest's entry time is less than this guest's leave time), if found, remove both from array as a pair, and report it elsewhere. If not, remove the guest as it can't be paired at all. Repeat until the array is empty.
The worst case of this is also N^2, as a party can be like [1,2],[3,4],... where no guests could be paired with each other, and the algorithm will search through all 30000 guests in vain all the time. So I don't think this is the optimal algorithm, but it should give an exact answer.
You say you already have a graph structure representation. I assume your vertices represent the guest and the interval of their staying at the party and the edges represent overlap of the respective intervals. What you then have to solve is the graph theoretical maximum matching problem, which has been solved before.
However, as indicated in my comments above, I think you can exploit the properties of the problem, especially the transitivity-like "if A leaves before B leaves and B leaves before C arrives, then A and C will not meet, either" like this:
Wait until the next yet unphotographed guest is about to leave, then take a photo of this one with the one who leaves next among those present.
You might succeed in thinking about the earliest time a photo can be taken: It is the time when the second person arrives at the party.
So as a photographer, goto the party being the first person and wait. Whenever a person arrives, take a photo with him/her and all other persons at the party. As a person appears only once, you will not have any duplicates.
While taking a photo (i.e. iterating over the list of guests), remove those guests who actually left the party.

Choose 1 from each of 11 ArrayLists is too slow

This is basically what my program is doing:
if you have 5 different shirts and 4 different pants to choose from, there are 20 different combinations of shirts and pants you can wear, and my program would iterate through all 20 combinations to determine which is the "best" to wear.
Except, in my situation, there are 11 different types of clothing (such as headgear, gloves, pants, earrings, shoes, cloak, etc) and as many as 10 items in each category. So, there could be as many as 11^10 combinations, and when I tried running my program with only 4 in each category, or 11^4, it took about 5 seconds to complete. 11^10 will take DAYS.
Currently, what I have going on is 11 for loops nested inside each other to go through every combination. This obviously isn't the best way to do this because it is so slow. How can I make this faster? For context, I have 1 "outer" ArrayList containing 11 ArrayLists, and each of those 11 ArrayLists is a list of objects (the clothing).
There really is no way of doing it without using a brute-force search, however you can chop down the choice area by generating a range of possible matches, and only iterating through the items in the list that are in that range.
Edit
Since your score is additive, the best item of clothing from each category is the one that gives the best sum of dex+str+int. This means that you don't need to consider combinations, just pick the best shirt, then the best pants, etc. So 11 loops, not 11 nested loops.
Old answer
In the general case, there brute force (what you're doing right now) is the only way to guarantee a correct answer.
Under some conditions, however, you can do better: For example if I know that if the one shirt-pants combination is "better" than another shirt-pants combination no matter what other items of clothing I choose, I can find the best shirt-pants combination first, and then consider what item of clothing is best to add to it. It's essentially a greedy search.

What is a good Java data structure to store RPG Game items?

I'm building a RPG dungeon game in Java and I'm stuck on creating a data structure.
I have a lot of Thing objects that I can copy to populate a dungeon with. For instance, there is a bread Thing object, and a sword Thing object, a chain mail Thing object, and monster Thing(s), etc. I want to store them in a central Library and then be able to retrieve an object using certain queries. I want to store them using the following fields:
int minLevel
int maxLevel
double probability
int[] types
So a rusty sword would have a minLevel of 1, a maxLevel of 3, a probability of rarity(3%),and [type.SWORD,type.WEAPON,type.ITEM,TYPE.EQUIP]. A better sword would have minLevel 2, maxLevel 10, rarity (1%).
Then I want to retrieve a random type.SWORD from the library and say I'm at level 3. I should get a rusty sword more often than the better sword based on their probabilities. If I retrieved a type.SWORD from the library requesting level 10, I would only get back the better sword.
I hope this makes sense.
EDIT
At the initialization stage, all the basic objects will be created. Things like the available weapons, armor, foods, potions, wands, all the basic possible Things that have a unique graphic tile in the game. Then when I want to place an object somewhere, I just make a copy of one of the available Things, adjust it's stats a little, and plunk it down in the world. The actual items are all subclass of the root Thing class, such as class Creature,Item,Equip(extends Item),Weapon(extends Equip),Armor(extends Equip),Food(extends Item), etc. But I want to tag them different in the Library database, I want to use extra tags, such as type.RARE, type.ARTIFACT, type.CURSED, so I want extra tags besides the class.
The game use LIBGDX to be available on Android and as an Applet. I use the free Rltile set, which has thousands of good tiles. I will use Pubnub or Google App Engine to provide multiplayer support.
i can think of three answers:
write your own Library that stores these things in Maps with custom methods.
so you might have a Map<Type,List<Object>> that stores lists of things by type
and then a method that takes a type, retrieves the list from the map, and selects
something by probability (that's easy to do - you just some up the probabilities,
generate a random number between 0 and the sum, and then walk through the list,
subtracting the item's probability from your random value until it's negative - you
return the item that made it negative[*]). you can also filter the list first by
level, for example.
if you have a real mix of different things, and don't want to base this on types,
then another option (slower, but more flexible) is to place everything in a list
and then filter by your needs. a nice way to do that is with guava - see
Iterables.filter and Predicate at https://code.google.com/p/guava-libraries/.
you could provide an interface that takes a predicate and returns a random
selection from whatever is left after filtering. predicates are easy to construct
"inline" with anonymous classes - see examples at
https://code.google.com/p/guava-libraries/wiki/FunctionalExplained#Functions_and_Predicates
stick all this in a database. maybe i am too enterprisey, and games people would
never do this, but it seems to me that a small, embedded database like sqlite or
H2 would be perfect for this. you can then select things with SQL queries (this
is already a long answer so i won't give more details here).
change your design. what you describe is not very OO. instead of having types,
your Things could implement interfaces. so the Weapon interface would have a
getMinLevel() method, for example. and then, with a design like this, use a
database with ORM (hibernate).
what you're doing is kind of ambitious and i suspect is more about learning than anything else (no criticism intended - this is how i learn stuff, by making things, so just assuming you are like me). so choose whichever you feel most comfortable with.
[*] this assumes that you always want to return something. if the probabilities are normalized and you want to be able to return nothing, select the initial value from 0-1 (or 0-100 if using percentages). and, if nothing turns the value negative when you run through the list, return nothing.
The easiest approach is to put all your objects in a single large arraylist, and use repeated sampling to select an object.
The procedure to select a random item is very simple:
Select a random number from 0 up to the size of the ArrayList
Get the object at that index from the library
If the object fails to meet any criteria you specify (e.g. "is a of type.SWORD or type.MACE?") go back to start
If the object is outside the minimum or maximum level, go back to start
If the object has a rarity of less than 100%, create a random number from 0-100%. If the random number exceeds the object's rarity, loop back to start. Most objects should have a rarity of say 10-100%, if you want extremely common objects then you can add them multiple times to the library.
This procedure will produce an object that meets the criteria sooner or later (if it exists) and will do so according to the rarity percentage.
The one slight trickness is that it will loop infinitely if no such object exists. Suppose there is no weapon in the library at level 17 for example? To get around this, I would propose widening the minLevel and maxLevel after every 100 tries to ensure that eventually one is found. Ensure you always have a level 1 object of each type available.
For safety, you might also want a bailout after say 100,000 tries (but remember to throw an exception - this is a problem if you are asking for things that don't exist in the library!).
P.S. I implemented a similar library system in a roguelike game called Tyrant that I created many years ago. Source is here if you are interersted:
https://github.com/mikera/tyrant/blob/master/src/main/java/mikera/engine/Lib.java

Java Program using 4D array [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm a first year computer engineering student and I'm quite new here. I have been learning Java for the past three and a half months, and C++ for six months before that. My knowledge of Java is limited to defining and using own methods, absolute basics of object-oriented programming like use of static data members and member visibility.
This afternoon, my computer programming prof taught us about multi-dimensional arrays in Java. About multi-dimensional arrays being simply arrays of arrays and so on. He mentioned that in nominal, educational programming, arrays beyond 2 dimensions are almost never used. Even 3D arrays are used only where absolutely essential, like carrying out scientific functions. This leaves next to zero use for 4D arrays as using them shows that "you're using the wrong datatype" in my prof's words.
However, I'd like to write a program in which the use of a 4D array, of any data type, primitive or otherwise, is justified. The program must not be as trivial as printing the elements of the array.
I have no idea where to begin, this is why I am posting this here. I'd like your suggestions. Relevant problem statements, algorithms, and bits and pieces of code are also welcome.
Thank you.
Edit: Forgot to mention, I have absolutely no idea about working with GUIs in Java, so please do not post ideas that implement GUIs.
Ideas:
- Matrix multiplication and it's applications like finding shortest path in graphs
- Solving of systems of equations
- Cryptography -- many cryptoprotocols represent data or keys or theirs internal structures in a form of matrices.
- Any algo on graphs represented as matrices
I must have been having some kind of fixation on matrices, sorry :)
For 4D arrays one obvious thing I can think of is the representation of 3D environment changing in time, so 4th dimension represents time scale. Or any representation of 3D which have additional associated property placed in 4th dimension of array.
You could create a Sodoku hypercube with 4 dimensions and stores the numbers the user enters into a 4dimensional int array.
One use could be applying dynamic programming to a function that takes 4 integer parameters f(int x,int y,int z,int w). To avoid calling this expensive function over and over again, you can cache the results in a 4D array, results[x][y][z][w]=f(x,y,z,w);.
Now you just have to find an expensive integer function with arity of 4, oh, and a need for calculating it often...
Just to back him up,..your prof is quite right. I'm afraid I might be physically violent to anyone using a 4D+ array in production code.
It's kinda cool to be able to go into greater than 3 dimensions as an educational exercise but for real work it makes things way too complicated because we don't really have much comprehension of structures with greater than 3 dimensions.
The reason it's difficult to come up with a practical use for 4D+ arrays is because there is (almost) nothing that complicated in the real world to model.
You could look into modelling something like a tesseract , which is (in layman's terms ) a 4D cube or as Victor suggests use the 4th dimension to model constant time.
HTH
There are many possible uses. As others have said, you can model a hypercube or something that makes use of a hypercube as well as modeling a change over time. However, there are many other possible uses.
For example, one of the theoretical simulation models of our universe uses 11th dimensional physics. You can write a program to model what these assumed physics would look like. The user would only be able to see a 3-dimensional space which definitely limits usability, but the 4th dimensional coordinate could act like the changing of a channel allowing the user to change their perspective. If a 4th dimensional explosion occurs, for example, you might even want a 5th dimensional array so that you can model what it looks like in each connected 3-dimensional space as well as how it looks in each frame of time.
To take a step away from the scientific, think about an MMORPG. Today many of those games uses "instanced" locations which means that a copy of a given zone is created exclusively for the use of a given group of players so to prevent lag. If this "instanced" concept was given a 4th dimensional coordinate and it allows players to shift their position across instances it could effectively allow all server worlds to be merged together while allowing the players a great deal of control over where they go while decreasing cost.
Of course, your question wants to know about ideas without using a GUI. That's a bit more difficult because you are working in a 2D environment. One real application would be Calculus. We have 3D graphing calculators, but for higher dimensions you pretty much have to do it by hand. A pogram that aims to solve these calculations for you might not be able to properly display the information, but you can certainly calculate it. Also, when hologaphic interfaces become a widespread reality it may be possible to represent a hypercube graph in 3D making such a program useful.
You might be able to write a text based board game where the position of pieces is represented with text. You can add dimensions and game rules to use them.
The simplest idea I could give you is a save state system. At each interval the program in memory is copied and stored into a file. It's coordinate is it's position in time. At face value you may not need a 4D array to handle this, but suppose the program you were saving states of used a 3D array. You could set it up to represent each saved state as a position in time that you can make use of and then view the change in time.
I'm not sure what specifically you could do with this, because I just started thinking about it. But you could possibly use a 4D array for some sort of basic physics simulation, like modeling a projectile flight involving some wind values and what not. That just came to mind because the term 4D always brings to mind that the "position" of any object is 4 values, with time as the 4th.
Being a physics student we have only 3 dimension of space but we have a 4th dimension which is time. So thinking in that way we can think of an array of any dimension(1D or 2D or 3D) whose values differ with time or an array which keeps the record of every array whose values changed with time.
It seems to be quite known to us. For example the "ATTENDANCE REGISTER" which we usually have in our classroom.
This is my view to it.
That's it.
Enjoy :-)
To give a concrete example for the Ishtar's answer: Four-string alignment. To compute optimal two-string alignment, you write one string along one (say, horizontal) axis of a 2D-array (a matrix!) and the other one along the other array. The array is filled with edit costs, and the optimal alignment of the two strings is the one which produces the lowest cost, associated with a path through the matrix. A common way of finding such a path is by the above mentioned dynamic programming. You can look up 'Levenshtein distance' or 'edit distance' for technical details.
The basic idea can be expanded to any number of strings. For four strings you'd need a four-dimensional array, to write each string along one of the dimensions.
In practice, however, multiple string alignment is not done this way, for at least two reasons:
Lack of flexibility: Why would you need to align exactly four strings??? In computational molecular biology, for example, you might wish to align many strings (think of DNA sequences), and their number is not known in advance, but it is seldom four. You program would be useful for a very limited class of problems.
Computational complexity, in space and time. The requirements are exponential in the number of dimensions, making the approach impractical for most real-world purposes. Besides, most of the entries in such multi-dimensional array would lie on such suboptimal paths, which are never even touched, so that storing them would be simply waste of space.
So, for all practical purposes, I believe your professor was right.

Categories