Efficient Fisher's Exact Test in Java - java

I need a library/function/method to perform a Fisher's exact test in Java, and provide the right, left and two-tailed probabilities.
Simple Googling shows a solution within the packages of Tassel, but the method inside simply applies the test steps with no optimization, and therefore it's extremely slow. Moreover, it uses int types everywhere and it's not really efficient for big contingency tables.
If you know any already written solution, help me :-)

See if this helps: http://www.users.zetnet.co.uk/hopwood/tools/StatTests.java
The formula is quite simple. There's a very simple (two-tailed) implementation here: http://javanus.com/blogs/?p=51 (see the comment by Discretoboy for a much cleaner implementation)
You can also take a look at the test implementation in Java Statistical Classes.

I use http://wordhoard.northwestern.edu/userman/javadoc/edu/northwestern/at/utils/math/statistics/FishersExactTest.html
A (very) brief test showed it to be similar in speed to the Java Statistical Classes (jsc) test mentioned above but it had the additional advantage of not giving me an illegal argument exception when my table included zero, which I believe is a legitimate case.

Related

How is util.Collections used

I don't understand how Collections is generally used. The confusion started when I found out what binary search was and I looked up an implementation in java. The first I found was this https://www.javatpoint.com/binary-search-in-java, but I also found this on Geeksforgeeks: https://www.geeksforgeeks.org/collections-binarysearch-java-examples/.
They pretty much have the same output and obviously the second is simpler, but I don't really understand what the point is of the first link. To generalize for all of Collections, are there situations where using Collections is disadvantageous?
I'm sorry my question can't be more specific or if the question doesn't make sense, but I don't understand enough to make it more specific.
java.util.Collections is a library class containing utility methods for dealing with Collection types. That is, it has helpful methods which solve common problems or do useful things, so that you don't have to write your own code to do them.
Your first link shows an implementation of the binary search algorithm from scratch, while your second link shows how to use the utility method Collections.binarySearch, which saves writing your own implementation.
The first link may be useful for educational purposes (since students often have to learn about the binary search algorithm), or it may be useful for people who need to adapt binary search to a different problem. For example, a variation of binary search can be used to find the first occurrence of the target number, or the smallest number greater than or equal to the target, but the Collections.binarySearch method cannot do either of these things, so you could have to write an implementation yourself.
The first link you posted actually explains how a binary search works, giving the theory behind it, and how to implement it on your own. This is good to understand how, and why a binary search works.
However, the Java language has a util library for Collections (Maps, Lists, etc) that have some of these simple methods already implemented. The second link explains how to use that library.

Why there is no AndAttribute in specflow?

When i generate a step definition with And/But attribute it takes the previous steps attribute(Given/When/Then). I feel the implementation in cucumber is better as we can use And/But as it is and its more aligned with bdd.
Why there is no separate 'And'/ 'But' Attribute, is there a significant reason not using in Specflow?
Conceptually And and But are just syntactic sugar for the actual step type of Given When or Then. All steps are really of one of those types. In the specflow implementation it would be possible to have different implementations for And steps which were after a Given and those after a When, where as in a language where And is a native step type this wouldn't be possible. Conversely you have to attribute an And step with two attributes to use it both after a Given and a When.
This might be a good thing or might be a bad thing. In the end I think this is just an implementation decision which is swings and roundabouts, and make little difference.

As a Java programmer learning Python, what should I look out for? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Much of my programming background is in Java, and I'm still doing most of my programming in Java. However, I'm starting to learn Python for some side projects at work, and I'd like to learn it as independent of my Java background as possible - i.e. I don't want to just program Java in Python. What are some things I should look out for?
A quick example - when looking through the Python tutorial, I came across the fact that defaulted mutable parameters of a function (such as a list) are persisted (remembered from call to call). This was counter-intuitive to me as a Java programmer and hard to get my head around. (See here and here if you don't understand the example.)
Someone also provided me with this list, which I found helpful, but short. Anyone have any other examples of how a Java programmer might tend to misuse Python...? Or things a Java programmer would falsely assume or have trouble understanding?
Edit: Ok, a brief overview of the reasons addressed by the article I linked to to prevent duplicates in the answers (as suggested by Bill the Lizard). (Please let me know if I make a mistake in phrasing, I've only just started with Python so I may not understand all the concepts fully. And a disclaimer - these are going to be very brief, so if you don't understand what it's getting at check out the link.)
A static method in Java does not translate to a Python classmethod
A switch statement in Java translates to a hash table in Python
Don't use XML
Getters and setters are evil (hey, I'm just quoting :) )
Code duplication is often a necessary evil in Java (e.g. method overloading), but not in Python
(And if you find this question at all interesting, check out the link anyway. :) It's quite good.)
Don't put everything into classes. Python's built-in list and dictionaries will take you far.
Don't worry about keeping one class per module. Divide modules by purpose, not by class.
Use inheritance for behavior, not interfaces. Don't create an "Animal" class for "Dog" and "Cat" to inherit from, just so you can have a generic "make_sound" method.
Just do this:
class Dog(object):
def make_sound(self):
return "woof!"
class Cat(object):
def make_sound(self):
return "meow!"
class LolCat(object):
def make_sound(self):
return "i can has cheezburger?"
The referenced article has some good advice that can easily be misquoted and misunderstood. And some bad advice.
Leave Java behind. Start fresh. "do not trust your [Java-based] instincts". Saying things are "counter-intuitive" is a bad habit in any programming discipline. When learning a new language, start fresh, and drop your habits. Your intuition must be wrong.
Languages are different. Otherwise, they'd be the same language with different syntax, and there'd be simple translators. Because there are not simple translators, there's no simple mapping. That means that intuition is unhelpful and dangerous.
"A static method in Java does not translate to a Python classmethod." This kind of thing is really limited and unhelpful. Python has a staticmethod decorator. It also has a classmethod decorator, for which Java has no equivalent.
This point, BTW, also included the much more helpful advice on not needlessly wrapping everything in a class. "The idiomatic translation of a Java static method is usually a module-level function".
The Java switch statement in Java can be implemented several ways. First, and foremost, it's usually an if elif elif elif construct. The article is unhelpful in this respect. If you're absolutely sure this is too slow (and can prove it) you can use a Python dictionary as a slightly faster mapping from value to block of code. Blindly translating switch to dictionary (without thinking) is really bad advice.
Don't use XML. Doesn't make sense when taken out of context. In context it means don't rely on XML to add flexibility. Java relies on describing stuff in XML; WSDL files, for example, repeat information that's obvious from inspecting the code. Python relies on introspection instead of restating everything in XML.
But Python has excellent XML processing libraries. Several.
Getters and setters are not required in Python they way they're required in Java. First, you have better introspection in Python, so you don't need getters and setters to help make dynamic bean objects. (For that, you use collections.namedtuple).
However, you have the property decorator which will bundle getters (and setters) into an attribute-like construct. The point is that Python prefers naked attributes; when necessary, we can bundle getters and setters to appear as if there's a simple attribute.
Also, Python has descriptor classes if properties aren't sophisticated enough.
Code duplication is often a necessary evil in Java (e.g. method overloading), but not in Python. Correct. Python uses optional arguments instead of method overloading.
The bullet point went on to talk about closure; that isn't as helpful as the simple advice to use default argument values wisely.
One thing you might be used to in Java that you won't find in Python is strict privacy. This is not so much something to look out for as it is something not to look for (I am embarrassed by how long I searched for a Python equivalent to 'private' when I started out!). Instead, Python has much more transparency and easier introspection than Java. This falls under what is sometimes described as the "we're all consenting adults here" philosophy. There are a few conventions and language mechanisms to help prevent accidental use of "unpublic" methods and so forth, but the whole mindset of information hiding is virtually absent in Python.
The biggest one I can think of is not understanding or not fully utilizing duck typing. In Java you're required to specify very explicit and detailed type information upfront. In Python typing is both dynamic and largely implicit. The philosophy is that you should be thinking about your program at a higher level than nominal types. For example, in Python, you don't use inheritance to model substitutability. Substitutability comes by default as a result of duck typing. Inheritance is only a programmer convenience for reusing implementation.
Similarly, the Pythonic idiom is "beg forgiveness, don't ask permission". Explicit typing is considered evil. Don't check whether a parameter is a certain type upfront. Just try to do whatever you need to do with the parameter. If it doesn't conform to the proper interface, it will throw a very clear exception and you will be able to find the problem very quickly. If someone passes a parameter of a type that was nominally unexpected but has the same interface as what you expected, then you've gained flexibility for free.
The most important thing, from a Java POV, is that it's perfectly ok to not make classes for everything. There are many situations where a procedural approach is simpler and shorter.
The next most important thing is that you will have to get over the notion that the type of an object controls what it may do; rather, the code controls what objects must be able to support at runtime (this is by virtue of duck-typing).
Oh, and use native lists and dicts (not customized descendants) as far as possible.
The way exceptions are treated in Python is different from
how they are treated in Java. While in Java the advice
is to use exceptions only for exceptional conditions this is not
so with Python.
In Python things like Iterator makes use of exception mechanism to signal that there are no more items.But such a design is not considered as good practice in Java.
As Alex Martelli puts in his book Python in a Nutshell
the exception mechanism with other languages (and applicable to Java)
is LBYL (Look Before You Leap) :
is to check in advance, before attempting an operation, for all circumstances that might make the operation invalid.
Where as with Python the approach is EAFP (it's easier to Ask for forgiveness than permission)
A corrollary to "Don't use classes for everything": callbacks.
The Java way for doing callbacks relies on passing objects that implement the callback interface (for example ActionListener with its actionPerformed() method). Nothing of this sort is necessary in Python, you can directly pass methods or even locally defined functions:
def handler():
print("click!")
button.onclick(handler)
Or even lambdas:
button.onclick(lambda: print("click!\n"))

Java: Best practices for turning foreign horror-code into clean API...?

I have a project (related to graph algorithms). It is written by someone else.
The code is horrible:
public fields, no getters/setters
huge methods, all public
some classes have over 20 fields
some classes have over 5 constructors (which are also huge)
some of those constructors just leave many fields null
(so I can't make some fields final, because then every second constructor signals errors)
methods and classes rely on each other in both directions
I have to rewrite this into a clean and understandable API.
Problem is: I myself don't understand anything in this code.
Please give me hints on analyzing and understanding such code.
I was thinking, perhaps, there are tools which perform static code analysis
and give me call graphs and things like this.
Oh dear :-) I envy you and not at the same time..ok let's take one thing at a time. Some of these things you can tackle yourself before you set a code analyzing tool loose at it. This way you will gain a better understanding and be able to proceed much further than with a simple tool
public fields, no getters/setters
make everything private. Your rule should be to limit access as much as possible
huge methods, all public
split and make private where it makes sense to do so
some classes have over 20 fields
ugh..the Builder pattern in Effective Java 2nd Ed is a prime candidate for this.
some classes have over 5 constructors (which are also huge)
Sounds like telescoping constructors, same pattern as above will help
some of those constructors just left many fields null
yep it is telescoping constructors :)
methods and classes rely on each other in both directions
This will be the least fun. Try to remove inheritance unless you're perfectly clear
it is required and use composition instead via interfaces where applicable
Best of luck we are here to help
WOW!
I would recommend: write unittests and then start refactoring
* public fields, no getters/setters
start by making them private and 'feel' the resistance on compiler errors as metric.
* huge methods, all public
understand their semantics, try to introdue interfaces
* some classes have over 20 fields
very common in complex appilcations, nothing to worrie
* some classes have over 5 constructors (which are also huge)
replace them by by buider/creator pattern
* some of those constructors just left many fields null
see above answer
* methods and classes rely on each other in both directions
decide whether to to rewrite everything (honestly I faced cased where only 10% of the code was needed)
Well, the clean-up wizard in eclipse will scrape off a noticable percentage of the sludge.
Then you could point Sonar at it and fix everything it complains about, if you live long enough.
For static analysis and call graphs (no graphics, but graph structures), you can use Dependency Finder.
Use an IDE that knows something about refactoring, like IntelliJ. You won't have situations where you move one method and five other classes complain, because IntelliJ is smart enough to make all the required changes.
Unit tests are a must. Someone refactoring without unit tests is like a high-wire performer without a safety net. Get one before you start the long, hard climb.
The answer may be: patience & coffee.
This is the way I would do it:
Start using the code , e.g. from within a main method, as if it were used by the other classes - same arguments, same invocation orders. Do that inside a debugger, as you see each step that this class makes.
Start writing unit tests for that functionality. Once you have reached a reasonable coverage, you will start to notice that this class probably has too many responsibilities.
while ( responsibilities != 1 ) {
Extract an interface which expresses one responsibility of that class.
Make all callers use that interface instead of the concrete type;
Extract the implementation to a separate class;
Pass the new class to all callers using the new interface.
}
Not saying tools like Sonar, FindBugs etc. that some have already mentiones don't help, but there are no magic tricks. Start from something you do understand, create a unit test for it and once it runs green start refactoring piece by piece. Remember to mock dependencies as you go along.
Sometimes it is easier to rewrite something from scratch. Is this 'horrible code' working as intended or full of bugs? It is documented?
In my current project, deleting my predessor's work nearly in its entirety, and rewriting it from scratch, was the most efficient approach. Granted, this was an extreme case of code obfuscation, utter lack of meaningful comments, and utter incompetence, so your mileage may vary.
Though some legacy code might be barely comprehensible, still it can be refactored and improved to legibility in a stepwise fashion. Have you seen Joshua Kerievsky's Refactoring To Patterns book? -- it's good on this.

How can I write code without "needing" comments for readability? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Is it possible to write good and understandable code without any comments?
When coding often I hear that if comments are needed then it means that the code is too hard to understand. I agree that code should be readable but often the language itself makes the code hard to follow, because of "plumbing" and strange syntax. The languages I use most often are:
Java
Mootools
Ruby
Erlang
Any tips would be appreciated?
Thanks
Recommended reading: Clean Code by Robert C. Martin.
In brief, you should
use meaningful variable/method/class names,
keep your functions/methods short,
have each class and method do only one thing,
have the code in each method be on the same level of abstraction.
Don't fear of extracting even moderately complex expressions from if statements; which one is clearer to read, this
if (i >= 0 && (v.size() < u || d == e)) ...
or
if (foundNewLocalMaximum()) ...
(Don't try to find any meaning in the first code snippet, I just made it up :-)
Comments in clean code are almost never needed. The only exceptions I can think of is if you are using some obscure language feature (e.g. C++ template metaprogramming) or algorithm, and you give a reference to the source of the method/algorithm and its implementation details in a comment.
The main reason why any other kind of comments is not very useful in the long run is that code changes, and comments tend to not be updated alongside the changes in the corresponding code. So after a while the comment is not simply useless, but it is misleading: it tells you something (implementation notes, reasoning about design choices, bug fixes etc.) which refers to a version of the code which is long gone, and you have no idea whether it is relevant anymore for the current version of the code.
Another reason why I think that "why I chose this solution" is most often not worth documenting in the code, is that the brief version of such a comment would almost always be like either "because I think this is the best way", or a reference to e.g. "The C++ Programming Language, ch. 5.2.1", and the long version would be a three-page essay. I think that an experienced programmer most often sees and understands why the code is written like this without much explanation, and a beginner may not understand even the explanation itself - it's not worth trying to cover everyone.
Last but not least, IMO unit tests are almost always a better way of documentation than code comments: your unit tests do document your understanding, assumptions and reasoning about the code quite efficiently, moreover you are automatically reminded to keep them in sync with the code whenever you break them (well, provided you actually run them with your build...).
I don't think you can normally write code without comments.
Briefly, the code documents how. The comments document why.
I would expect the comments to indicate the conditions why the code has been written like that, limitations imposed by requirements or externalities, the impact that would result from changing the code, and other gotchas. The comments contain information that isn't contained within the code itself.
Comments along the code are supposed to tell you why you initially did something a certain way. It shouldn't mean the code is too hard to understand.
The most important things to follow are:
give your variables, methods, classes... meaningful names
write classes/ modules with a clean responsibility
don't mix up different levels of code (don't do bit shifting and high level logic inside of one method)
I think it is useful to write comments for USERS of your code - what the classes/methods/functions do, when an how to call it etc. In other words document the API.
If you need to comment how a method works for the benefit of maintainers then I think the code is probably too complex. In that case refactor it into simpler functions, as others have said.
I personally feel that having no comments at all is about as bad as having excessive commenting. You just need to find the right balance. About using long descriptive names for things this about sums it up for me: read this Also read Kernighan & Pike on long names.
You need to follow certain rules.
Give the entities (variable, classes, etc) readable and meaningful names.
Use design patterns extensively and name them accordingly, e.g. if it is a Factory name it FooFactory.
Have the code formatted properly, etc.

Categories