General Purpose Decision Tree Algorithm Code Implementations

General Purpose Decision Tree Algorithm Code Implementations - java

Are there any well-designed, general purpose decision tree implementations for iPhone or Java? I know with LINQ it would be quite trivial, but with Objective C and Java, it would be much more complex.
Basically, I want to drill down a set of objects based off any number of qualifications or attributes in my apps.

You could try Weka. The API is somewhat obtuse and makes simple things complicated, but it's a very good machine learning library and it even comes with a GUI front end if you want to play around with the classes interactively before writing code that uses them programmatically.

Related

Approach to learning algorithms using a specific language

So for the summer I decided that I may as well start learning algorithms before school starts. I've been told that the class is fairly fast paced, and that algorithms isn't something you should take lightly (I have a tendency to do this with all the course work during the semester lol).
The book we're going to use is this Algorithms (4th Edition).
Anyway, this is my problem.
I'm almost third way through the book, but I just realized what I was doing. For example, I would read and re-read the sections I don't quite understand. Then if I feel confident enough, I would try to reproduce the same algorithm in java from my head. But by doing this, my code looks almost exactly like the ones in the book..in java.
I can't say I'm just memorizing code after code--I do understand the concepts and they help me code these algorithms--but I feel like I'll only be able to implement these algorithms in java. I should note that I only know java at the moment.
tldr: I'm learning algorithms as if I'm learning to play the guitar--repetition after repetition. But by doing so I feel like I'm being more fixated that I'll only able to implement these in java. How exactly would you learn algorithms if the book you're using is language-specific?
Thanks in advance.

Don't Confuse Yourself
You're studying Java, so write them in Java. Especially if Java is your first language. Don't confuse yourself for now, as you are trying to learn 2 things at once: how to progam in Java, and how to progam. You're learning both a new language and a way of thinking. Don't do too much but adding another language to the sauce for now.
Diversify
Later on, or if you feel confident enough that you can take on another language simultaneously, then it would obviously be beneficial to learn another one and try to replicate the algorithms without looking at the book.
Reproduce and Extend
What we could recommend you is to look for derivates of the algorithms. Known variants, that have been documented, and where you could just read the description of the variant so you can try to implement it from the "base" version, without needing to read the book.
For instance, if your book introduced you to a linked list, you should be able to come up with the algorithm for a doubly-linked list or a circular linked list without reading more than a description of the desired outcome. Or there's something about the original concepts that you clearly misunderstood.
Try First, Read-On Later
I'd recommend you actually even try to implement the algorithms described in your book before they show them to you. The point of seeing Sedgewick's algorithm is to see a canonical implementation, which is considered a standard blueprint. If you just read the section leading up to the implementation (which hopefully is displayed first), then just sit down with the book, and try to figure out how you could do that. If you can't do that at all, then you're too far ahead in your book and should backtrack and start again from scratch.

Thing about algorithms, they're essentially language-agnostic. There's really nothing stopping you from doing Sedgewick's examples in C, Python or some other language.
If you really don't know any other languages, concentrate on Java. Sure, its a bit repetitious, but those bits will stick in your head in a good way and come test time, you'll be glad for the information.
You're in an interesting position right now, since the kind of thinking required to write programs is very different from normal thinking. Add to that the fact you're learning a whole new language with a different syntax, punctuation and the like. Practice really does make perfect, since there are many bits and pieces to remember.
Oh, if you want practice with algorithms, try out project euler, code kata and other challenge sites. These little challenges can help you familiarize yourself with the language as well as get comfortable with the type of thinking required.

First, congrats on taking your first steps on learning how to code. I would say that you are already ahead of your peers by starting to look ahead during the summer.
As far as your fears on only being able to implement algorithms in Java, you have already demonstrated that it will not be a problem for you. It sounds like you are passionate enough to get started early so you should have no problem implementing a solution in multiple languages. Additionally most of the languages with C/C++ (Java and C# to name a few) like syntax will be similar enough that you will be able to translate your knowledge seamlessly.
The best advice that I can give is to CODE, CODE, CODE!! Don't just read about the algorithms actually implement them.

You don't say how well you know the mathematics behind the algorithms. That will be key in determining your facility with the code.
Sedgewick's books are very good. I'd feel free to pick some and check out other books as well, like "Numerical Recipes" and "Numerical Methods That Work". See if another point of view can clarify for you.
If you don't feel like you're getting enough out of copying Java, see if you can translate them into another language, maybe Python or purely functional alternative. If you can do that, you'll know you've got it.

I would either try to learn another language to verify that you can actually port it to another language (javascript would be my vote because it is simple and useful on the front and backend) or write the algorithms out in pseudocode since that is more language agnostic. Most languages will have the code look pretty similar. The only thing to be very careful about is when you are relying on some aspect of the language (such as generics or iterators in java) which you may not be able to use in another language and that could leave a gap in your understanding.
Another way to verify that you actually understand the algorithm is to make slight changes in the problem and make sure that you can adjust the algorithm to still work. For example if it is a sorting algorithm then try to sort by several different attributes rather than just one, if it is a graph algorithm make the graph a digraph and see how things should change.

I'm learning algorithms as if I'm learning to play the
guitar--repetition after repetition.
Then you are not learning algorithms. You are learning repetition. Two different things. The usage of a programming language by an algorithms book is a secondary factor. It is just a vehicle of instruction, an implementation detail.
What you should be focusing is on understanding the structure, logic and mathematical characteristics of an algorithm (and possibly the data structure(s) associated with it.)
That's what your focus should be.
But by doing so I feel like I'm being more fixated that I'll only able
to implement these in java.
But that is because you are focusing on just how the algorithm is being coded (in Java in this particular case.) You are focusing on an implementation detail.
When you learn to drive, you don't focus on how you learn to drive a Honda Civic or a Nissan Maxima. You learn the essence of what driving is, the rules of thumbs, the necessary precautions and the laws governing driving a vehicle.
Same with learning algorithms. You don't learn "Algorithms in Java" no more than "Algorithms in Haskell". You learn Algorithms first and foremost, the vehicle (sans very specialized cases) is secondary.
You should be focusing on what the algorithm does, how and why. Questions like "how/why does it work?" and most importantly *"what are the performance characteristics?", those are the things you should be focusing on.
Every good algorithms book (Sedgewick's included) carry that message. That's what you should focus on. How you get to that re-focusing, that's a function of one's personal learning strategies.
How exactly would you learn algorithms if the book you're using is language-specific?
By not focusing on the language. Focus on the structure, focus on the data structures involved, the invariants, pre-conditions and post-conditions. Understand asymptotic behavior described in Big-O (or Big-Omicron), Little-O/Little-Omicron and Omega notations.
You are learning algorithms, not programming in Java via coding algorithms.
If you can't do this mental leap, it means you do not have sufficient practice or abstract analysis. It is not an insult, but an observation and an advice. Coding, the usage of a programming language is typically secondary to the mathematical analysis of computing, the focus of Computer Science (of which Algorithms is a part thereof.)
NOTE I've done Java for over 10 years, and though I like it for work, I strongly believe it is a poor tool for learning programming or CS topics.
One is better served by learning Algorithms with either A) a procedural, systems-level programming language like C or Ada, or a high-level pseudo-assembler simulator, or B) a functional language like Lisp or Haskell.
Object-Oriented features in pure/pseudo-pure OO languages simply get in the way.
Algorithms are mathematical structures with a nature descriptive of the how (operationally) and/or the what (mathematically). The former is perfectly suited for procedural programming, the later for functional programming.

Design for defining graphs or flowing structures

I'm trying to create a system for representing and designing graphs in an easy way. That means it should be easy to create some graphical representation from the data structure but it should also be easy to store the structure and do easy calculation on it. Easy calulations in this sence are questions like which nodes are the next nodes from a given node in the graph.
Is there some nice way to define stuff like this in xml or database structures? Later would be easier to edit.
Is there maybe already some good java library abstract enough to support my problems?
I'm trying to define a production process which can also have cycles (these cylces are not so important and could be modeled differently), but it feels kind of weird having to make these fundamental design decisions when this problem is so generic.

JUNG - http://jung.sourceforge.net/, may be a good solution for you. It's pretty extensible and has visualization, graph algorithm support etc

neo4j is the "standard" graph database (see also). you can abstract away from a particular implementation (so that you can change the database without changing you code) using blueprints.
alternatively, if the database part is not so important, a library like jgrapht (i wasn't aware of jung, from chris's answer, but it looks similar) gives you access to the usual algorithms for in-memory structures.
[neo4j licencing]

Is it compulsory to learn about Data Structures if you want to be a Java/C++ programmer? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So do I really need to learn about them ? Isn't there an interesting way to learn about stacks, linked lists, heaps, etc ? I found it a boring subject.
**While posting this question it showed some warning. Am I not allowed to post such a question ? Admins please clarify and I will delete it :/
Warning :: The question you're asking appears subjective and is likely to be closed.
okay..I get it
So what is THE best way to learn them ? What book do I refer ? What website ?

It's compulsory to learn about data structures if you want to be a programmer. Data structures are your bread-and-butter - if you don't understand things like the behavior, uses, and run-time complexity ('big-O') of at least the basic structures (arrays, linked lists, stacks, queues, trees (binary / n-ary, self-balancing varietes), hash-tables, heaps, graphs) and the algorithms that run on them (insert / locate / delete), you won't know which is appropriate to use under what circumstances.
Every trade has its tools; these are ours. Data structures are the most basic underpinnings of almost any algorithm that you're going to learn. Unless you want to be a cargo cult programmer, you need to understand how they work.
Whether or not there are interesting ways to learn about them is a separate question entirely... :)

I would go even so far as to say that most of programming revolves around manipulating data structures, it is the foundation of computing after all: you get some data, you process it, you possibly give output. All the data usually reside in data structures and choosing inappropriate structures will have the bigger impact the bigger the project.

As you get more experience you will find that algorithms and datastructures are invaluable to your day-to-day development, and actually pretty interesting.
By learning about them now you will learn:
Which data structure is appropriate for which context, i.e. when to use a single-linked list, when to use a stack, when to use a queue, when to use a tree
Which algorithms are appropriate for which purpose, such as tree depth-first search or breadth-first search.
Space and time complexity of algorithms, for example why is a quicksort sometimes the best solution, and sometimes heapsort.
Overall it'll teach you the beginnings and fundamentals of computer science, even if you never have to implement a stack again you'll know the kind of thought and consideration that goes into it. If you then ever have to implement your OWN data structure (and chances are you will, quite often), you will know what to do and what not to do.

If you want to be a successful programmer, data structure is a must. How will you program if you don't know data structures and algorithms?

Whether you like it or not, all programming is built around data structures. You may never have to write one, but you will have to choose which one to use many times. It is not really a requirement to programming in general, but if you want to excel in the field, understanding of the basics is a must.
Anyone can build a shed without knowledge of materials or construction techniques. You can even work in a house placing bricks and mortar under the orders of someone else, but if you want to build a house yourself, you do need to understand materials and techniques.
Data structures are the programming materials. Algorithms are techniques. Will you use data structures? You will use the simplest ones in a daily basis, each so often you will need to solve a problem where an specific data structure is required, and while you may manage not to build your own bricks, you will need to understand whether you need bricks or a concrete wall for your purposes.

If you want some evidence for the importance of data structures, take a look at the Google hiring process. Whatever you think about Google as a company, there is no denying that they have some very good people working for them. Their interview process is setup to determine the candidates knowledge of data structures and algorithms. Because when it comes down to it, that is what is at the core of programming, no matter what language you are working in or what domain you are programming for.
If you are planning a career as a professional programmer, you need to know the fundamentals, not just how to crank out code that "works". Otherwise, your just playing.

Is it compulsory to learn about arithmetic to be an engineer?

If you take the attitude of "is it compulsory" with regards to any of the building blocks of programming languages, you're probably not cut out to be a coder. Regardless of "compulsory" or not, you should always be looking for new concepts to learn and seeing if it'll improve your coding style/standard.
But in answer to your question: yes.

I'd say it's compulsory at some point in your development to have a firm grasp. I'm not necessarily sure the standard Data Structures course is the best way to learn. Sometimes, the best way to learn them is "I have problem X. For some reason, it's taking my algorithm a long time to solve X. How can I make this faster?"
One book I'd highly recommend is Programming Pearls. It has some really good analyses, backed by a lot of examples of where the real-world motivations for the solutions came from. It presents the problems in an interesting manner, and never teaches by giving you a laundry list of data structures.

Yes 99% of books on Data Structures are boring and exercises contrived. They feel like they are just making up problems that server no practical purpose :(
This book is the one exception to the rule I've come across. You will have a naive but working RPG game by the end of the book:
Data Structures for Game Programmers
Read the above book and you will solve your chicken and egg problem and see you really can't do much without data structures after all.

Well, this may sound a little awkward but I wouldn't say it is COMPULSORY to learn Data Structures to be a - regular- developer. Seriously! Of courser if you study then hard it will give you a lot of insights and knowledge on several programming aspects and that's always good. But compulsory ... well, I think it is just too much. VERY GOOD would be enough.
Let me explain why. It is not that often, for today, to write Data Structures code because - let's face it - it would be RE-writing, RE-inventing what we already know for so many years! What I would say it is COMPULSORY is to study just the general theory of them and the APIs/libraries that are already commonly in use (and tested and optimized) like the Collections API in Java. You must know by heart the differences between a List and a Set (in Java for instance) and their capabilities and proper usage but you don't need to know exactly HOW they are implemented - checking every private method and attribute - to deal with most common, day to day, coding problems. You will do just fine without all the "guts" of Data Structures for the general stuff. We face different challenges now.
But don't get me wrong - neither think I am crazy or naive! Off course there are situations that you will need to implement yourself some sort of custom data structure (maybe your own BalancedBinaryTreeMap!). You got to be prepared for everything.
I am just arguing about being compulsory or not. Again, I don't think is compulsory but it is indeed very good.
Cheers.

I suppose you could learn programming without learning a whole lot about data structures or algorithms. To make an equivalent example, think of it like if a carpenter knew how to build things, but didn't know about measurements and such. Would he be able to get a career in carpentry? Possibly, but let's say he needed to know the exact material he would need to complete a project. He'd probably get fired because he doesn't know what sort of material or measurements to use.
So with data structures and algorithms, you can say it's the ability to give exact measurements of an application and knowing what sort of performance you'll get out it.

Like a musician learning scales, data structures are part of the tools of the software trade. Sure you can work as a programmer without the knowledge but you're handicapping yourself. If I'm interviewing two people for a position and one of them understands and uses structures and the other can't even explain what a stack is, my choice is pretty clear.
If you want to be judged to be a competent, employable programmer, you need to learn your craft.

Is it compulsory to learn them?
No, you can program without them, just as it's not compulsory to break your code into functions.
That being said, if you want to be an effective programmer that can write at least decent code without getting your car egged by your coworkers, you want to at least be able to make a decent selection of library classes.
Every programmer should understand the tradeoff between a LinkedList and an Array, or why binary searches and binary trees are useful for sorted data. This isn't just about performance - it's about correctness too since you can't just put anything into a tree set.
Does it mean that you need to know how to implement your own AVL tree, build super-smart data structures, etc.? Not necessarily. It's a matter of how much you want to know what's going on "beneath the hood" and whether your tasks necessitate it.
I'm not a big fan of deep data structure and algorithms questions in interviews because the vast majority of developers don't need to implement these things, just to use library stuff. I prefer to ask job related questions in interviews. However, accept that if you don't learn those things you would face a tougher battle to get other jobs.

It should be, yes...

No body should force you to learn anything you don't want to learn.
If you're the type of person that is compelled to be the best that he/she can be at what he/she does, and you love what you do for a living, you'll learn everything there is to know on your own accord.
#happysoul: You should ask yourself WHY learning data structures bore you. Also, it would help if you also identify what DOESN'T bore you.
If you at least love to learn about algorithms, I'm sure we can all suggest a perfect marriage of the two that would be exciting to learn!
My recommendation for best algorithm/data structure combo for the most fun learning experience: graphs.

How easily customizable are SAP industry-specific solutions?

First of all, I have a very superficial knowledge of SAP. According to my understanding, they provide a number of industry specific solutions. The concept seems very interesting and I work on something similar for banking industry. The biggest challenge we face is how to adapt our products for different clients. Many concepts are quite similar across enterprises, but there are always some client-specific requirements that have to be resolved through configuration and customization. Often this requires reimplementing and developing customer specific features.
I wonder how efficient in this sense SAP products are. How much effort has to be spent in order to adapt the product so it satisfies specific customer needs? What are the mechanisms used (configuration, programming etc)? How would this compare to developing custom solution from scratch? Are they capable of leveraging and promoting best practices?

Disclaimer: I'm talking about the ABAP-based part of SAP software only.
Disclaimer 2, ref PATRYs response: HR is quite a bit different from the rest of the SAP/ABAP world. I do feel rather competent as a general-purpose ABAP developer, but HR programming is so far off my personal beacon that I've never even tried to understand what they're doing there. %-|
According to my understanding, they provide a number of industry specific solutions.
They do - but be careful when comparing your own programs to these solutions. For example, IS-H (SAP for Healthcare) started off as an extension of the SD (Sales & Distribution) system, but has become very much more since then. While you could technically use all of the techniques they use for their IS, you really should ask a competent technical consultant before you do - there are an awful lot of pits to avoid.
The concept seems very interesting and I work on something similar for banking industry.
Note that a SAP for Banking IS already exists. See here for the documentation.
The biggest challenge we face is how to adapt our products for different clients.
I'd rather rephrase this as "The biggest challenge is to know where the product is likely to be adapted and to structurally prepare the product for adaption." The adaption techniques are well researched and easily employed once you know where the customer is likely to deviate from your idea of the perfect solution.
How much effort has to be spent in
order to adapt the product so it
satisfies specific customer needs?
That obviously depends on the deviation of the customer's needs from the standard path - but that won't help you. With a SAP-based system, you always have three choices. You can try to customize the system within its limits. Customizing basically means tweaking settings (think configuration tables, tens of thousands of them) and adding stuff (program fragments, forms, ...) in places that are intended to do so. Technology - see below.
Sometimes customizing isn't enough - you can develop things additionally. A very frequent requirement is some additional reporting tool. With the SAP system, you get the entire development environment delivered - the very same tools that all the standard applications were written with. Your programs can peacefully coexist with the standard programs and even use common routines and data. Of course you can really screw things up, but show me a real programming environment where you can't.
The third option is to modify the standard implementations. Modifications are like a really sharp two-edged kitchen knife - you might be able to cook really cool things in half of the time required by others, but you might hurt yourself really badly if you don't know what you're doing. Even if you don't really intend to modify the standard programs, it's very comforting to know that you could and that you have full access to the coding.
(Note that this is about the application programs only - you have no chance whatsoever to tweak the kernel, but fortunately, that's rarely necessary.)
What are the mechanisms used (configuration, programming etc)?
Configurations is mostly about configuration tables with more or less sophisticated dialog applications. For the programming part of customizing, there's the extension framework - see http://help.sap.com/saphelp_nw70ehp1/helpdata/en/35/f9934257a5c86ae10000000a155106/frameset.htm for details. It's basically a controlled version of dependency injection. As a solution developer, you have to anticipate the extension points, define the interface that has to be implemented by the customer code and then embed the call in your code. As a project developer, you have to create an implementation that adheres to the interface and activate it. The basic runtime system takes care of glueing the two programs together, you don't have to worry about that.
How would this compare to developing custom solution from scratch?
IMHO this depends on how much of the solution is the same for all customers and how much of it has to be adapted. It's really hard to be more specific without knowing more about what you want to do.

I can only speak for the Human Resource component, but this is a component where there is a lot of difference between customers, based on a common need.
First, most of the time you set the value for a group, and then associate the object (person, location...) with a group depending on one or two values. This is akin to an indirection, and allow for great flexibility, as you can change the association for a given location without changing the others. in a few case, there is a 3 level indirection...
Second, there is a lot of customization that is nearly programming. Payroll or administrative operations are first class example of this. In the later cas, you get a table with the operation (hiring for example), the event (creation, modification...) a code for the action (I for test, F to call a function, O for a standard operation) and a text field describing the parameters of a function ("C P0001, begda, endda" to create a structure P001 with default values).
Third, you can also use such a table to indicate a function or class (ABAP-OO), that will be dynamically called. You get a developer to create this function or class, and then indicate this in the table. This is a method to replace a functionality by another one, or extend it. This is used extensively in the ESS/MSS.
Last, there is also extension point or file that you can modify. this is nearly the same as the previous one, except that you don't need to indicate the change : the file is always used (ZXPADU01/02 for HR modification of infotype)
hope this help
Guillaume PATRY

What ways are there of drawing 3D trees using Java and OpenGL?

I know how to draw basic objects using JOGL or LWJGL to connect to OpenGL. What I would like is something that can generate some kind of geometry for trees, similar to what SpeedTree is famous for. Obviously I don't expect the same quality as SpeedTree.
I want the trees to not look repetitive. Speed is not a concern, I do not expect to need more than 100 trees on screen at one time.
Are there free tree-drawing libraries available in Java? Or sample code or demos?
Is there anything in other languages which I could port or learn from?

http://arbaro.sourceforge.net/
http://www.propro.ru/go/Wshop/povtree/povtree.html
Non java: http://www.aust-manufaktur.de/austt.html

There are thousands of methods. A better question would define 'best' in a more confined way. Are we talking 'best' as in speed of drawing (suitable for thousands or millions of trees)? Best as in best-looking? etc.

2D or 3D?
In 2D, a common way is to use L-systems.
I also tried an OO approach, defining objects for trunk, branches, leaves, all extending an abstract class and implementing a Genotype interface (to vary the kind of trees).
Not sure if it is efficient (lot of objects created, particularly if I animate the tree) but interesting to do.

Here are a couple resources that may be helpful:
gamedev thread on the subject - contains some useful advice/suggestions
ngPlant - an open-source procedural plant generation tool. It is not written in Java, but you may be able to find ideas in its algorithms.

If you are using eclipse/SWT, try Draw 2D.

If you're serious about getting good-looking, fast trees, there's a commercial C++ library SpeedTree. Lots of big-time games use it (e.g., GTA, Elder Scrolls).

A combination of OpenSceneGraph and SpeedTree has worked for me.

I know of two libraries that enable the usage of OpenGl with java.
LWJGL (Light Weight Java Gaming Library), which imo is the better one due to its simplicity and its similarity to using opengl with c/c++.
JOGL If you want to mix swing components with opengl this may be the better choice, I've never used it but several years ago it was known to be pretty buggy, I don't know if its matured since then.
As for drawing trees, there are many ways to do so like the other poster said, you might want to be more specific.
edit: I guess I misunderstood the question a bit, oh well : / You can load in a 3d model of a tree and display that.

http://www.codeplex.com/LTrees has some source code on that. it's c++ though.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.