I’m currently learning the Java Collections API and feel I have a good understanding of the basics, but I’ve never understood why this standard API doesn’t include a Graph implementation. The three base classes are easily understandable (List, Set, and Map) and all their implementations in the API are mostly straightforward and consistent.
Considering how often graphs come up as a potential way to model a given problem, this just doesn’t make sense to me (it’s possible it does exist in the API and I’m not looking in the right place of course). Steve Yegge suggests in one of his blog posts that a programmer should consider graphs first when attacking a problem, and if the problem domain doesn’t fit naturally into this data structure, only then consider the alternative structures.
My first guess is that there is no universal way to represent graphs, or that their interfaces may not be generic enough for an API implementation to be useful? But if you strip down a graph to its basic components (vertices and a set of edges that connect some or all of the vertices) and consider the ways that graphs are commonly constructed (methods like addVertex(v) and insertEdge(v1, v2)) it seems that a generic Graph implementation would be possible and useful.
Thanks for helping me understand this better.
Note that some special graphs are included in the Collection Framework, notably linked lists and trees.
This also points to a possible reason why no general Graph implementation is present: as graphs can have so many different forms and flavours with wildly different characteristics, a general Graph might not turn out to be very useful.
Also, at least in my practice so far, I haven't felt the need for graphs most of the time. Some domains surely do need them, but many simply don't. (Out of more than a dozen projects in various domains I have been involved in so far, I recount two which actually needed graphs.) So I guess there was no really big pressure from the Java community in general to have a Graph in the Collection Framework. It contains only the basic stuff, which is needed "almost always", by "almost everyone". And one of its strengths is indeed its (relative) simplicity and clarity, which, I believe, its designers see as an asset to be preserved.
Related
I'm trying to create a system for representing and designing graphs in an easy way. That means it should be easy to create some graphical representation from the data structure but it should also be easy to store the structure and do easy calculation on it. Easy calulations in this sence are questions like which nodes are the next nodes from a given node in the graph.
Is there some nice way to define stuff like this in xml or database structures? Later would be easier to edit.
Is there maybe already some good java library abstract enough to support my problems?
I'm trying to define a production process which can also have cycles (these cylces are not so important and could be modeled differently), but it feels kind of weird having to make these fundamental design decisions when this problem is so generic.
JUNG - http://jung.sourceforge.net/, may be a good solution for you. It's pretty extensible and has visualization, graph algorithm support etc
neo4j is the "standard" graph database (see also). you can abstract away from a particular implementation (so that you can change the database without changing you code) using blueprints.
alternatively, if the database part is not so important, a library like jgrapht (i wasn't aware of jung, from chris's answer, but it looks similar) gives you access to the usual algorithms for in-memory structures.
[neo4j licencing]
In Bloch’s presentation, he said designer should look for good power-to-weight ratio for API. Moreover, he also stressed that ‘Conceptual weight more important than bulk’. I guess the weight is for ‘Conceptual weight’, bulk is for number of methods of a class.
But I couldn’t understand what ‘Conceptual weight’ is, what ‘power-to-weight ratio’ is. Welcome to any explanation!
Bloch gave an example: List.subList() has good 'power-to-weight ratio'. If clients wants to know an index of a sub list, he doesn't need to call a low 'p2w ratio' method indexOfSubList(a,b,e), instead, he could call List.subList(a,b).indexOf(e). Bloch thought this is 'power-to-weight ratio'.
Origin:
API Should Be As Small As Possible But No Smaller
API should satisfy its requirements
When in doubt leave it out
Functionality, classes, methods, parameters, etc.
You can always add, but you can never remove
Conceptual weight more important than bulk
Look for a good power-to-weight ratio
I'd interpret "conceptual weight" as the number of abstract concepts you have to learn and understand to use the API. Concepts usually map to public classes, while classes that are not public add to the bulk but not to the conceptual weight.
So if you put it technically, an API has a high conceptual weight if a typical client of the API has to explicitly use a lot of classes belonging to the API to work with it.
"good power-to-weight ratio" then means the API should use as few public classes as possible to offer as much functionality as possible. That means an API should:
Not add concepts or abstractions of its own that are not present in the domain
For complex domains, offer shortcuts to the most commonly needed functionality that allows a typical user to bypass the more complex parts of the domain
I'd say that
power = the amount of functionality provided by the API
weight = the effort required to learn the API
I guess that an API with a good power-to-weight ratio is an API that offers a lot of functionality (power) while requiring little effort to properly work (weight) with it.
This is could be done via, for example, "Convention over Configuration". (Note this is just an example, and you can achieve this in many ways.)
It would be helpful a link to Bloch's presentation, he might be referring to something else :-)
He's referring to all the stuff you get from using an API. His example is for the collections API where each time you access it, you only get specific functionality. On the other hand, some API's will load much more stuff just to give you some functionality.
Another essential aspect of the power-to-weight ratio of an API or even a language as a whole is its verbosity. That is, how much you have to type to get a task done and the readability of the resulting code. In Java, Iterator has a better power-to-weight ratio than Enumeration, the interface it was designed to replace, simply because Iterator has a shorter name and shorter method names, which are no longer than they need to be and do the same job with no loss of clarity (as well as the additional remove method).
When writing code I am seeing requirements to change data models (e.g. adding/changing/removing data members from a class). When these data models belong to an interface, it seems difficult to change without breaking the existing client codes. So I am wondering if there is any best practice of designing interfaces/data models in a way to minimize the impact during evolution.
The closest thing I can find from google is data contract versioning. But that seems to be a .net specific topic. I am wondering if the same practice applies to the Java world, or there is a different or generic way to deal with data model evolution.
Thanks
There are some tools which can help, have a look at LiquiBase.
This article goves a good overview on developerworks
There are no easy answers to this in either the Java or data modeling domains.
Some changes are upwards compatible; e.g. addition of new methods, optional fields, subclasses and so on.
Some changes are not compatible, but can be handled using a simple transformation; e.g. addition of a mandatory field could supported by a transformation that adds an extra constructor argument.
Some changes unavoidably require major programmer intervention.
Another point to note is that the problem gets a lot harder when the data corresponding to the data models is persistent, and cannot be thrown away when the data model changes. This is referred to as the "schema evolution" problem, and I believe that it has been proven that there is no general solution.
For an audit log, i need to know the differences between 2 objects.
Those objets may contains others objets, list, set of objects and so the differences needed maybe recursive if desired.
Is there a api using reflection (or other) already for that ?
Thanks in advance.
Regards
It's a pretty daunting problem to try and solve generically. You might consider pairing a Visitor pattern, which allows you to add functionality to a graph of objects, with a Chain of Responsibility pattern, which allows you to break separate the responsibility for executing a task out into multiple objects and then dynamically route requests to the right handler.
If you did this, you would be able to generate simple, specific differentiation logic on a per-type basis without having a single, massive class that handles all of your differentiation tasks. It would also be easy to add handlers to the tree.
The best part is that you can still have a link in your Chain of Responsibility for "flat" objects (objects that are not collections and basically only have propeties), which is where reflection would help you the most anyway. If you "catch-all" case uses simple reflection-based comparison and your "special" cases handle things like lists, dictionaries, and sets, then you will have a flexible, maintainable, inexpensive solution.
For more info:
http://www.netobjectives.com/PatternRepository/index.php?title=TheChainOfResponsibilityPattern
http://www.netobjectives.com/PatternRepository/index.php?title=TheVisitorPattern
I have written a framework that does exactly what you were looking for. It generates a graph from any kind of object, no matter how deeply nested it is and allows you to traverse the changes with visitors. I have already done things like change logs generation, automatic merging and change visualization with it and so far it hasn't let me down.
I guess I'm a few years too late to help in your specific case, but for the sake of completion, here's the link to the project: https://github.com/SQiShER/java-object-diff
First of all, I have a very superficial knowledge of SAP. According to my understanding, they provide a number of industry specific solutions. The concept seems very interesting and I work on something similar for banking industry. The biggest challenge we face is how to adapt our products for different clients. Many concepts are quite similar across enterprises, but there are always some client-specific requirements that have to be resolved through configuration and customization. Often this requires reimplementing and developing customer specific features.
I wonder how efficient in this sense SAP products are. How much effort has to be spent in order to adapt the product so it satisfies specific customer needs? What are the mechanisms used (configuration, programming etc)? How would this compare to developing custom solution from scratch? Are they capable of leveraging and promoting best practices?
Disclaimer: I'm talking about the ABAP-based part of SAP software only.
Disclaimer 2, ref PATRYs response: HR is quite a bit different from the rest of the SAP/ABAP world. I do feel rather competent as a general-purpose ABAP developer, but HR programming is so far off my personal beacon that I've never even tried to understand what they're doing there. %-|
According to my understanding, they provide a number of industry specific solutions.
They do - but be careful when comparing your own programs to these solutions. For example, IS-H (SAP for Healthcare) started off as an extension of the SD (Sales & Distribution) system, but has become very much more since then. While you could technically use all of the techniques they use for their IS, you really should ask a competent technical consultant before you do - there are an awful lot of pits to avoid.
The concept seems very interesting and I work on something similar for banking industry.
Note that a SAP for Banking IS already exists. See here for the documentation.
The biggest challenge we face is how to adapt our products for different clients.
I'd rather rephrase this as "The biggest challenge is to know where the product is likely to be adapted and to structurally prepare the product for adaption." The adaption techniques are well researched and easily employed once you know where the customer is likely to deviate from your idea of the perfect solution.
How much effort has to be spent in
order to adapt the product so it
satisfies specific customer needs?
That obviously depends on the deviation of the customer's needs from the standard path - but that won't help you. With a SAP-based system, you always have three choices. You can try to customize the system within its limits. Customizing basically means tweaking settings (think configuration tables, tens of thousands of them) and adding stuff (program fragments, forms, ...) in places that are intended to do so. Technology - see below.
Sometimes customizing isn't enough - you can develop things additionally. A very frequent requirement is some additional reporting tool. With the SAP system, you get the entire development environment delivered - the very same tools that all the standard applications were written with. Your programs can peacefully coexist with the standard programs and even use common routines and data. Of course you can really screw things up, but show me a real programming environment where you can't.
The third option is to modify the standard implementations. Modifications are like a really sharp two-edged kitchen knife - you might be able to cook really cool things in half of the time required by others, but you might hurt yourself really badly if you don't know what you're doing. Even if you don't really intend to modify the standard programs, it's very comforting to know that you could and that you have full access to the coding.
(Note that this is about the application programs only - you have no chance whatsoever to tweak the kernel, but fortunately, that's rarely necessary.)
What are the mechanisms used (configuration, programming etc)?
Configurations is mostly about configuration tables with more or less sophisticated dialog applications. For the programming part of customizing, there's the extension framework - see http://help.sap.com/saphelp_nw70ehp1/helpdata/en/35/f9934257a5c86ae10000000a155106/frameset.htm for details. It's basically a controlled version of dependency injection. As a solution developer, you have to anticipate the extension points, define the interface that has to be implemented by the customer code and then embed the call in your code. As a project developer, you have to create an implementation that adheres to the interface and activate it. The basic runtime system takes care of glueing the two programs together, you don't have to worry about that.
How would this compare to developing custom solution from scratch?
IMHO this depends on how much of the solution is the same for all customers and how much of it has to be adapted. It's really hard to be more specific without knowing more about what you want to do.
I can only speak for the Human Resource component, but this is a component where there is a lot of difference between customers, based on a common need.
First, most of the time you set the value for a group, and then associate the object (person, location...) with a group depending on one or two values. This is akin to an indirection, and allow for great flexibility, as you can change the association for a given location without changing the others. in a few case, there is a 3 level indirection...
Second, there is a lot of customization that is nearly programming. Payroll or administrative operations are first class example of this. In the later cas, you get a table with the operation (hiring for example), the event (creation, modification...) a code for the action (I for test, F to call a function, O for a standard operation) and a text field describing the parameters of a function ("C P0001, begda, endda" to create a structure P001 with default values).
Third, you can also use such a table to indicate a function or class (ABAP-OO), that will be dynamically called. You get a developer to create this function or class, and then indicate this in the table. This is a method to replace a functionality by another one, or extend it. This is used extensively in the ESS/MSS.
Last, there is also extension point or file that you can modify. this is nearly the same as the previous one, except that you don't need to indicate the change : the file is always used (ZXPADU01/02 for HR modification of infotype)
hope this help
Guillaume PATRY