C++ and Java : Use of virtual base class

C++ and Java : Use of virtual base class - java

I have some doubts while comparing C++ and Java multiple inheritance.
Even Java uses multiple, multi-level inheritance through interfaces - but why doesnt it use anything like a virtual base class as in C++ ? Is it because the members of a java interface are being ensured one copy in memory (they are public static final), and the methods are only declared and not defined ?
Apart from saving memory, is there any other use of virtual classes in C++ ? Are there any caveats if I forget to use this feature in my multiple inheritance programs ?
This one is a bit philosophical - but why didnt the C++ developers made it a default to make every base class, virtual ? What was the need of providing flexibility ?
Examples will be appreciated. Thanks !!

1) Java interfaces dont have attributes. One reason for virtual base classes in c++ is to prevent duplicate attributes and all the difficulties associated with that.
2) There is at least a slight performance penalty for using virtual base classes in c++. Also, the constructors become so complicated, that it is advised that virtual base classes only have no-argument constructors.
3) Exactly because of the c++ philosphy: One should not require a penalty for something which one may not need.

Sorry - not a Java programmer, so short on details. Still, virtual bases are a refinement of multiple inheritance, which Java designers always defended ommiting on the basis that it's overly complicated and arguably error-prone.
virtual bases aren't just for saving memory - the data is shared by the different objects inheriting from them, so those derived types could use it to coordinate their behaviour in some way. They're not useful all that often, but as an example: object identifiers where you want one id per most-derived object, and not to count all the subobjects. Another example: ensuring that a multiply-derived type can unambiguously map / be converted to a pointer-to-base, keeping it easy to use in functions operating on the base type, or to store in containers of Base*.
As C++ is currently Standardised, a type deriving from two classes can typically expect them to operate independently and as objects of that type tend to do when created on the stack or heap. If everything was virtual, suddenly that independence becomes highly dependent on the types from which they happen to be derived - all sorts of interactions become the default, and derivation itself becomes less useful. So, your question is why not make the default virtual - well, because it's the less intuitive, more dangerous and error-prone of the two modes.

1.Java multiple inheritance in interfaces behaves most like virtual inheritance in C++.
More precisely, to implement java-like inheritance model in c++ you need to use c++ virtual base classes.
However, one of the disadvantages of c++ virtual inheriritance (except of small memory and performance penalty) is the impossibility to static_cast<> from base to derived, so rtti (dynamic_cast) need to be used
(or one may provide "hand made" virtual casting functions for child classes if a list of
such child classes are known in advance)
2.if you forget "virtual" qualifier in inheritance list, it usually lead to compiler error
since any casting frome drived to base class becomes ambigious
3.Philosophical questions usually are quite dificult to answer... c++ is a multiparadigm (and multiphilosophical) language and doesn't impose any philosophical decisions. You may use virtual inheritance whenever possible in you own projects, and (you are rioght) it has a good reason. But such a maxima may be unacceptable for others, so universal c++ tools (standard and other widely used libraries) should be (if possible) free of any particular philosophical conventions.

I'm working on an open source project which basically is translating a large C++ library to Java. The object model of the original creature in C++ can be pretty complicated sometimes. More than necessary, I'd say... which was more or less the motto of Java designers... well... this is another subject.
The point is that I've written an article which shows how you can circumvent type erasure in Java. The article explains well how it can be done and, in the end how your source code can eventually resemble C++ very closely.
http://www.jquantlib.org/index.php/Using_TypeTokens_to_retrieve_generic_parameters
An immediate implication of the study I've done is that it would be possible to implement virtual base classes in your application, I mean: not in Java, not in the language, but in your application, via some tricks, or a lot of tricks to be more precise.
In case you do have interest for such kind of black magic, the lines below may be useful for you somehow. Otherwise certainly not.
Ok. Let's go ahead.
There are several difficulties in Java:
1. Type erasure (solved in the article)
2. javac was not designed to understand what a virtual base class would be;
3. Even using tricks you will not be able to circumvent difficulty #2, because this difficulty appears at compilation time.
If you'd like to use virtual base classes, you can have it with Scala, which basically solved difficulty #2 by exactly creating another compiler, which fully understands some more sophisticated object models, I'd say.
if you'd like to explore my article and try to "circunvent" virtual base classes in pure Java (not Scala), you could do something like I explain below:
Suppose that you have something like this in C++:
template<Base>
public class Extended : Base { ... }
It could be translate to something like this in Java:
public interface Virtual<T> { ... }
public class Extended<B> implements Virtual<B> { ... }
OK. What happens when you instantiate Extended like below?
Extended extended = new Extended<Base>() { /* required anonymous block here */ }
Well.. basically you will be able to get rid of type erasure and will be able to Obtain type information of Base inside your class Extended. See my article for a comprehensive explanation of the black magic.
OK. Once you have type of Base inside Extended, you can instantiate a concrete implementation of Virtual.
Notice that, at compile time, javac can verify types for you, like in the example below:
public interface Virtual<Base> {
public List<Base> getList();
}
public class Extended<Base> implements Virtual<Base> {
#Override
public List<Base> getList() {
// TODO Auto-generated method stub
return null;
}
}
Well... despite all effort to implement it, in the end we are doing badly what an excellent compiler like scalac does much better than us, in particular it is doing its job at compile time.
I hope it helps... if not confused you already!

Related

Minimizing interfaces in Golang

In golang, interfaces are extremely important for decoupling and composing code, and thus, an advanced go program might easily define 1000s of interfaces .
How do we evolve these interfaces over time, to ensure that they remain minimal?
Are there commonly used go tools which check for unused functions ?
Are there best practices for annotating go functions with something similar to java's #Override, which ensures that a declared function is properly implementing a expected contract?
Typically in the java language, it is easy to keep code tightly bound to an interface specification because the advanced tooling allows us to find and remove functions which aren't referenced at all (usually this is highlighted automatically for you in any common IDE).

Are there commonly used go tools which check for unused functions ?
Sort of, but it is really hard to be sure for exported interfaces. oracle can be used to find references to types or methods, but only if you have all of the code that references you availible on your gopath.
can you ensure a type implements a contract?
If you attempt to use a type as an interface, the compiler will complain if it does not have all of the methods. I generally do this by exporting interfaces but not implementations, and making a constructor:
type MyInterface interface{
Foo()
}
type impl struct{}
func (i *impl) Foo(){}
func NewImpl() MyInterface{
return &impl{}
}
This will not compile if impl does not implement all of the required functions.
In go, it is not needed to declare that you implement an interface. This allows you to implement an interface without even referencing the package it is defined in. This is pretty much exactly the opposite of "tightly binding to an interface specification", but it does allow for some interesting usage patterns.

What your asking for isn't really a part of Go. There are no best practices for annotating that a function satisfies an interface. I would personally say the only clear best practice is to document which interfaces your types implement so that people can know. If you want to test explicitly (at compile time) if a type implements an interface you can do so using assignment, check out my answer here on the topic; How to check if an object has a particular method?
If you're just looking to take inventory of your code base to do some clean up I would recommend using that assignment method for all your types to generate compile time errors regarding what they don't implement, scale down the declarations until it compiles. In doing so you should become aware of the disparity between what might be implemented and what actually is.
Go is also lacking in IDE options. As a result some of those friendly features like "find all references" aren't there. You can use text searching tricks to get around this, like searching func TheName to get only the declaration and .TheName( to get all invocations. I'm sure you'll get used to it pretty quickly if you continue to use this tooling.

When is a reference to the object class required?

What is the function of the class Object in java? All the "objects" of any user defined class have the same function as the aforementioned class .So why did the creators of java create this class?
In which situations should one use the class 'Object'?

Since all classes in Java are obligated to derive (directly or indirectly) from Object, it allows for a default implementation for a number of behaviours that are needed or useful for all objects (e.g. conversion to a string, or a hash generation function).
Furthermore, having all objects in the system with a common lineage allows one to work with objects in a general sense. This is very useful for developing all sorts of general applications and utilities. For example, you can build a general purpose cache utility that works with any possible object, without requiring users to implement a special interface.

Pretty much the only time that Object is used raw is when it's used as a lock object (as in Object foo = new Object(); synchronized(foo){...}. The ability to use an object as the subject of a synchronized block is built in to Object, and there's no point to using anything more heavyweight there.

Object provides an interface with functionality that the Java language designers felt all Java objects should provide. You can use Object when you don't know the subtype of a class, and just want to treat it in a generic manner. This was especially important before the Java language had generics support.
There's an interesting post on programmers.stackexchange.com about why this choice was made for .NET, and those decisions most likely hold relevance for the Java language.

What Java implements is sometimes called a "cosmic hierarchy". It means that all classes in Java share a common root.
This has merit by itself, for use in "generic" containers. Without templates or language supported generics these would be harder to implement.
It also provides some basic behaviour that all classes automatically share, like the toString method.
Having this common super class was back in 1996 seen as a bit of a novelty and cool thing, that helped Java get popular (although there were proponents for this cosmic hierarchy as well).

How are java interfaces implemented internally? (vtables?)

C++ has multiple inheritance. The implementation of multiple inheritance at the assembly level can be quite complicated, but there are good descriptions online on how this is normally done (vtables, pointer fixups, thunks, etc).
Java doesn't have multiple implementation inheritance, but it does have multiple interface inheritance, so I don't think a straight forward implementation with a single vtable per class can implement that. How does java implement interfaces internally?
I realize that contrary to C++, Java is Jit compiled, so different pieces of code might be optimized differently, and different JVMs might do things differently. So, is there some general strategy that many JVMs follow on this, or does anyone know the implementation in a specific JVM?
Also JVMs often devirtualize and inline method calls in which case there are no vtables or equivalent involved at all, so it might not make sense to ask about actual assembly sequences that implement virtual/interface method calls, but I assume that most JVMs still keep some kind of general representation of classes around to use if they haven't been able to devirtualize everything. Is this assumption wrong? Does this representation look in any way like a C++ vtable? If so do interfaces have separate vtables and how are these linked with class vtables? If so can object instances have multiple vtable pointers (to class/interface vtables) like object instances in C++ can? Do references of a class type and an interface type to the same object always have the same binary value or can these differ like in C++ where they require pointer fixups?
(for reference: this question asks something similar about the CLR, and there appears to be a good explanation in this msdn article though that may be outdated by now. I haven't been able to find anything similar for Java.)
Edit:
I mean 'implements' in the sense of "How does the GCC compiler implement integer addition / function calls / etc", not in the sense of "Java class ArrayList implements the List interface".
I am aware of how this works at the JVM bytecode level, what I want to know is what kind of code and datastructures are generated by the JVM after it is done loading the class files and compiling the bytecode.

The key feature of the HotSpot JVM is inline caching.
This doesn't actually mean that the target method is inlined, but means that an assumption
is put into the JIT code that every future call to the virtual or interface method will target
the very same implementation (i.e. that the call site is monomorphic). In this case, a
check is compiled into the machine code whether the assumption actually holds (i.e. whether
the type of the target object is the same as it was last time), and then transfer control
directly to the target method - with no virtual tables involved at all. If the assertion fails, an attempt may be made to convert this to a megamorphic call site (i.e. with multiple possible types); if this also fails (or if it is the first call), a regular long-winded lookup is performed, using vtables (for virtual methods) and itables (for interfaces).
Edit: The Hotspot Wiki has more details on the vtable and itable stubs. In the polymorphic case, it still puts an inline cache version into the call site. However, the code actually is a stub that performs a lookup in a vtable, or an itable. There is one vtable stub for each vtable offset (0, 1, 2, ...). Interface calls add a linear search over an array of itables before looking into the itable (if found) at the given offset.

Why all the java code is packed in Classes?

I have started learning Java and picked up some books and collected materials online too. I have programmed using C++. What I do not understand is that ,even the main method is packed inside a Class in Java. Why do we have everything packed inside some class in Java ? Why it does not have independent functions ?

This is the main concept of object oriented programming languages: everything is an object which is an instance of a class.
So because there's nothing but classes in Java (except the few Java primitive types, like int, float, ...) we have to define the main method, the starting point for a java application, inside a class.
The main method is a normal static method that behaves just like any other static method. Only that the virtual machine uses this one method (only) to start the main thread of the application.
Basically, it works like this:
You start the application with java MyClass
The JVM loads this class ("classloading")
The JVM starts a new thread (the main thread)
The JVM invokes the method with the signature public static void main(String[])
That's it (in brief).

Because that's how the designers of the language wanted it to be.

Java enforces the Object Oriented paradigm very very heavily. That said, there are plenty of ways to work around it if you find it cumbersome.
For one, the class that contains main can easily have lots of other methods in it. Also, it's not uncommon to make a class called 'Utility' or something similar to store all your generic methods that aren't associated with any particular class or objects.

If you work in a smart IDE such as eclipse, you'll eventually find that Java's seemingly over-abundant restrictions, exceptions, and rigorous structure are actually a blessing in disguise. The compiler can understand and work through your code much better when the language isn't cluttered with syntactic junk. It will give you information about unused variables and methods, dead code, etc. and the IDE will give you suggested fixes and code completion (and of course auto-format). I never actually type out import statements anymore, I just mouse over the code and select the class I want. With rigorous types, generic types, casting restrictions etc. the compiler can catch a lot of code which might otherwise result in all kinds of crazy undetectable behavior at runtime. Java is the strictest language in the sense that most of what you type will not compile or else very quickly throw an Exception of one kind or another.
So, if you ask a question about the structure of Java, Java programmers will generally just answer "because that's the rule" while Python programmers are trying to get their indentation right (no auto-format to help), Ruby programmers are writing unit tests to make sure all their arguments are of the correct type and C programmers are trying to figure out where the segfault is occuring. As I understand C++ has everything Java has, but too many other capabilities (including casting anything to anything and oh-so-dangerous pointer arithmetic).

And you can have multiple entry points in a jar depending on how many classes inside the package has main.

Why should virtual functions not be used excessively?

I just read that we should not use virtual function excessively. People felt that less virtual functions tends to have fewer bugs and reduces maintenance.
What kind of bugs and disadvantages can appear due to virtual functions?
I'm interested in context of C++ or Java.
One reason I can think of is virtual function may be slower than normal functions due to v-table lookup.

You've posted some blanket statements that I would think most pragmatic programmers would shrug off as being misinformed or misinterpreted. But, there do exist anti-virtual zealots, and their code can be just as bad for performance and maintenance.
In Java, everything is virtual by default. Saying you shouldn't use virtual functions excessively is pretty strong.
In C++, you must declare a function virtual, but it's perfectly acceptable to use them when appropriate.
I just read that we should not use virtual function excessively.
It's hard to define "excessively"... certainly "use virtual functions when appropriate" is good advice.
People felt that less virtual functions tends to have fewer bugs and reduces maintenance.
I'm not able to get what kind of bugs and disadvantages can appear due to virtual functions.
Poorly designed code is hard to maintain. Period.
If you're a library maintainer, debugging code buried in a tall class hierarchy, it can be difficult to trace where code is actually being executed, without the benefit of a powerful IDE, it's often hard to tell just which class overrides the behavior. It can lead to a lot of jumping around between files tracing inheritance trees.
So, there are some rules of thumb, all with exceptions:
Keep your hierarchies shallow. Tall trees make for confusing classes.
In c++, if your class has virtual functions, use a virtual destructor (if not, it's probably a bug)
As with any hierarchy, keep to a 'is-a' relationship between derived and base classes.
You have to be aware, that a virtual function may not be called at all... so don't add implicit expectations.
There's a hard-to-argue case to be made that virtual functions are slower. It's dynamically bound, so it's often the case. Whether it matters in most of the cases that its cited is certainly debatable. Profile and optimize instead :)
In C++, don't use virtual when it's not needed. There's semantic meaning involved in marking a function virtual - don't abuse it. Let the reader know that "yes, this may be overridden!".
Prefer pure virtual interfaces to a hierarchy that mixes implementation. It's cleaner and much easier to understand.
The reality of the situation is that virtual functions are incredibly useful, and these shades of doubt are unlikely coming from balanced sources - virtual functions have been widely used for a very long time. More newer languages are adopting them as the default than otherwise.

Virtual functions are slightly slower than regular functions. But that difference is so small as to not make a difference in all but the most extreme circumstances.
I think the best reason to eschew virtual functions is to protect against interface misuse.
It's a good idea to write classes to be open for extension, but there's such a thing as too open. By carefully planning which functions are virtual, you can control (and protect) how a class can be extended.
The bugs and maintenance problems appear when a class is extended such that it breaks the contract of the base class. Here's an example:
class Widget
{
private WidgetThing _thing;
public virtual void Initialize()
{
_thing = new WidgetThing();
}
}
class DoubleWidget : Widget
{
private WidgetThing _double;
public override void Initialize()
{
// Whoops! Forgot to call base.Initalize()
_double = new WidgetThing();
}
}
Here, DoubleWidget broke the parent class because Widget._thing is null. There's a fairly standard way to fix this:
class Widget
{
private WidgetThing _thing;
public void Initialize()
{
_thing = new WidgetThing();
OnInitialize();
}
protected virtual void OnInitialize() { }
}
class DoubleWidget : Widget
{
private WidgetThing _double;
protected override void OnInitialize()
{
_double = new WidgetThing();
}
}
Now Widget won't run into a NullReferenceException later.

Every dependency increases complexity of the code, and makes it more difficult to maintain. When you define your function as virtual, you create dependency of your class on some other code, that might not even exist at the moment.
For example, in C, you can easily find what foo() does - there's just one foo(). In C++ without virtual functions, it's slightly more complicated: you need to explore your class and its base classes to find which foo() we need. But at least you can do it deterministically in advance, not in runtime. With virtual functions, we can't tell which foo() is executed, since it can be defined in one the subclasses.
(Another thing is the performance issue that you mentioned, due to v-table).

I suspect you misunderstood the statement.
Excessively is a very subjective term, I think that in this case it meant "when you don't need it", not that you should avoid it when it can be useful.
In my experience, some students, when they learn about virtual functions and get burned the first time by forgetting to make a function virtual, think that it is prudent to simply make every function virtual.
Since virtual functions do incur a cost on every method invocation (which in C++ cannot usually be avoided because of separate compilation), you are essentially paying now for every method call and also preventing inlining. Many instructors discourage students from doing this, though the term "excessive" is a very poor choice.
In Java, a "virtual" behavior (dynamic dispatching) is the default. However, The JVM can optimize things on the fly, and could theoretically eliminate some of the virtual calls when the target identity is clear. In additional, final methods or methods in final classes can often be resolved to a single target as well at compile time.

In C++: --
Virtual functions have a slight performance penalty. Normally it is too small to make any difference but in a tight loop it might be significant.
A virtual function increases the size of each object by one pointer. Again this is typically insignificant, but if you create millions of small objects it could be a factor.
Classes with virtual functions are generally meant to be inherited from. The derived classes may replace some, all or none of the virtual functions. This can create additional complexity and complexity is the programmers mortal enemy. For example, a derived class may poorly implement a virtual function. This may break a part of the base class that relies on the virtual function.
Now let me be clear: I am not saying "don't use virtual functions". They are a vital and important part of C++. Just be aware of the potential for complexity.

We recently had a perfect example of how misuse of virtual functions introduces bugs.
There is a shared library that features a message handler:
class CMessageHandler {
public:
virtual void OnException( std::exception& e );
///other irrelevant stuff
};
the intent is that you can inherit from that class and use it for custom error handling:
class YourMessageHandler : public CMessageHandler {
public:
virtual void OnException( std::exception& e ) { //custom reaction here }
};
The error handling mechanism uses a CMessageHandler* pointer, so it doesn't care of the actual type of the object. The function is virtual, so whenever an overloaded version exists the latter is called.
Cool, right? Yes, it was until the developers of the shared library changed the base class:
class CMessageHandler {
public:
virtual void OnException( const std::exception& e ); //<-- notice const here
///other irrelevant stuff
};
... and the overloads just stopped working.
You see what happened? After the base class was changed the overloads stopped to be the overloads from C++ point of view - they became new, other, unrelated functions.
The base class had the default implementation not marked as pure virtual, so the derived classes were not forced to overload the default implementation. And finally the functon was only called in case of error handling which isn't used every here and there. So the bug was silently introduced and lived unnoticed for a quite long period.
The only way to eliminate it once and for all was to do a search on all the codebase and edit all the relevant pieces of code.

I dont know where you read that, but imho this is not about performance at all.
Maybe its more about "prefer composition about inheritance" and problems which can occur if your classes/methods are not final (im talking mostly java here) but not really designed for reuse. There are many things which can go really wrong:
Maybe you use virtual methods in your
constructor - once theyre overridden,
your base class calls the overridden
method, which may use ressources
initialized in the subclass
constructor - which runs later (NPE rises).
Imagine an add and an addAll method
in a list class. addAll calls add
many times and both are virtual.
Somebody may override them to count
how many items have been added at
all. If you dont document that addAll
calls add, the developer may (and
will) override both add and addAll
(and add some counter++ stuff to
them). But now, if you youse addAll,
each item is count twice (add and
addAll) which leads to incorrect
results and hard to find bugs.
To sum this up, if you dont design your class for being extended (provide hooks, document some of the important implementation things), you shouldnt allow inheritance at all because this can lead to mean bugs. Also its easy to remove a final modifier (and maybe redesign it for reuseability) from one of your classes if needed, but its impossible to make a non-final class (where subclassing lead to errors) final because others may have subclassed it already.
Maybe it was really about performance, then im at least off topic. But if it wasnt, there you have some good reasons not to make your classes extendable if you dont really need it.
More information about stuff like that in Blochs Effective Java (this particular post was written a few days after I read item 16 ("prefer composition over inheritance") and 17 ("design and document for inheritance or else prohibit it") - amazing book.

I worked sporadically as a consultant on the same C++ system over a period of about 7 years, checking on the work of about 4-5 programmers. Every time I went back the system had gotten worse and worse. At some point somebody decided to remove all the virtual functions and replace them with a very obtuse factory/RTTI-based system that essentially did everything the virtual functions were already doing but worse, more expensively, thousands of lines more code, lots of work, lots of testing, ... Completely and utterly pointless, and clearly fear-of-the-unknown-driven.
They had also hand-written dozens of copy constructors, with errors, when the compiler would have produced them automatically, error-free, with about three exceptions where a hand-written version was required.
Moral: don't fight the language. It gives you things: use them.

The virtual table gets created for each class, having virtual functions or deriving from a class containing virtual functions. This consumes more than usual space.
The compiler needs to silently insert extra code for ensuring that the late binding takes place instead of the early binding. This consumes more than usual time.

In Java, there is no virtual keyword, but all methods (functions) are virtual, except the ones marked as final, static methods and private instance methods. Using virtual functions is not a bad practice at all, but because generally they cannot be resolved in compile-time, and compiler can't perform optimizations on them, they tend to be a little slower. The JVM has to figure out at run-time, which is the exact method that needs to be called. Note that this is not a big problem by any means, and you should consider it only if your goal is to create a very high-performance application.
For example, one of the biggest optimizations in Apache Spark 2 (which runs on JVM) was to reduce number of virtual function dispatches, to gain a better performance.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

C++ and Java : Use of virtual base class - java

Related

Minimizing interfaces in Golang

When is a reference to the object class required?

How are java interfaces implemented internally? (vtables?)

Why all the java code is packed in Classes?

Why should virtual functions not be used excessively?

Categories

Resources