Java programmers and API seems to favor explicit set/get methods.
however I got the impression C++ community frowns upon such practice.
If it is so,is there a particular reason (besides more lines of code) why this is so?
on the other hand, why does Java community choose to use methods rather than direct access?
Thank you
A well designed class should ideally not have too many gets and sets. In my opinion, too many gets and sets are basically an indication of the fact that someone else (and potentially many of them) need my data to achieve their purpose. In that case, why does that data belong to me in the first place? This violates the basic principle of encapsulation (data + operations in one logical unit).
So, while there is no technical restriction and (in fact abundance of) 'set' and 'get' methods, I would say that you should pause and reinspect your design if you want too many of those 'get' and 'set' in your class interface used by too many other entities in your system.
There are occasions when getters/setters are appropriate but an abundance of getters/setters typically indicate that your design fails to achieve any higher level of abstraction.
Typically it's better (in regards to encapsulation) to exhibit higher level operations for your objects that does not make the implementation obvious to the user.
Some other possible reasons why it's not as common in C++ as in Java:
The Standard Library does not use it.
Bjarne Stroustrup expresses his dislike towards it (last paragraph):
I particularly dislike classes with a
lot of get and set functions. That is
often an indication that it shouldn't
have been a class in the first place.
It's just a data structure. And if it
really is a data structure, make it a
data structure.
The usual argument against get/set methods is that if you have both and they're just trivial return x; and x = y; then you haven't actually encapsulated anything at all; you may as well just make the member public which saves a whole lot of boilerplate code.
Obviously there are cases where they still make sense; if you need to do something special in them, or you need to use inheritance or, particularly, interfaces.
There is the advantage that if you implement getters/setters you can change their implementation later without having to alter code that uses them. I suppose the frowning on it you refer to is kind of a YAGNI thing that if there's no expectation of ever altering the functions that way, then there's little benefit to having them. In many cases you can just deal with the case of altering the implementation later anyway.
I wasn't aware that the C++ community frowned on them any more or less than the Java community; my impression is that they're rather less common in languages like Python, for example.
I think the reason the C++ community frowns on getters and setters is that C++ offers far better alternatives. For example:
template <class T>
class DefaultPredicate
{
public:
static bool CheckSetter (T value)
{
return true;
}
static void CheckGetter (T value)
{
}
};
template <class T, class Predicate = DefaultPredicate <T>>
class Property
{
public:
operator T ()
{
Predicate::CheckGetter (m_storage);
return m_storage;
}
Property <T, Predicate> &operator = (T rhs)
{
if (Predicate::CheckSetter (rhs))
{
m_storage = rhs;
}
return *this;
}
private:
T m_storage;
};
which can then be used like this:
class Test
{
public:
Property <int> TestData;
Property <int> MoreTestData;
};
int main ()
{
Test
test;
test.TestData = 42;
test.MoreTestData = 24;
int value = test.TestData;
bool check = test.TestData == test.MoreTestData;
}
Notice that I added a predicate parameter to the property class. With this, we can get creative, for example, a property to hold an integer colour channel value:
class NoErrorHandler
{
public:
static void SignalError (const char *const error)
{
}
};
class LogError
{
public:
static void SignalError (const char *const error)
{
std::cout << error << std::endl;
}
};
class Exception
{
public:
Exception (const char *const message) :
m_message (message)
{
}
operator const char *const ()
{
return m_message;
}
private:
const char
*const m_message;
};
class ThrowError
{
public:
static void SignalError (const char *const error)
{
throw new Exception (error);
}
};
template <class ErrorHandler = NoErrorHandler>
class RGBValuePredicate : public DefaultPredicate <int>
{
public:
static bool CheckSetter (int rhs)
{
bool
setter_ok = true;
if (rhs < 0 || rhs > 255)
{
ErrorHandler::SignalError ("RGB value out of range.");
setter_ok = false;
}
return setter_ok;
}
};
and it can be used like this:
class Test
{
public:
Property <int, RGBValuePredicate <> > RGBValue1;
Property <int, RGBValuePredicate <LogError> > RGBValue2;
Property <int, RGBValuePredicate <ThrowError> > RGBValue3;
};
int main ()
{
Test
test;
try
{
test.RGBValue1 = 4;
test.RGBValue2 = 5;
test.RGBValue3 = 6;
test.RGBValue1 = 400;
test.RGBValue2 = 500;
test.RGBValue3 = -6;
}
catch (Exception *error)
{
std::cout << "Exception: " << *error << std::endl;
}
}
Notice that I made the handling of bad values a template parameter as well.
Using this as a starting point, it can be extended in many different ways.
For example, allow the storage of the property to be different to the public type of the value - so the RGBValue above could use an unsigned char for storage but an int interface.
Another example is to change the predicate so that it can alter the setter value. In the RGBValue above this could be used to clamp values to the range 0 to 255 rather than generate an error.
Properties as a general language concept technically predate C++, e.g. in Smalltalk, but they weren't ever part of the standard. Getters and setters were a concept used in C++ when it was used for development of UI's, but truth be told, it's an expensive proposition to develop UI's in what is effectively a systems language. The general problem with getters and setters in C++ was that, since they weren't a standard, everybody had a different standard.
And in systems languages, where efficiency concerns are high, then it's just easier to make the variable itself public, although there's a lot of literature that frowns mightily on that practice. Often, you simply see richer exchanges of information between C++ object instances than simple items.
You'll probably get a lot of viewpoints in response to this question, but in general, C++ was meant to be C that did objects, making OOP accessable to developers that didn't know objects. It was hard enough to get virtuals and templates into the language, and I think that it's been kind of stagnant for a while.
Java differs because in the beginning, with what Java brought in areas like garbage collection, it was easier to promote the philosophy of robust encapsulation, i.e. external entities should keep their grubby little paws off of internal elements of a class.
I admit this is pretty much opinion - at this time I use C++ for highly optimized stuff like 3D graphics pipelines - I already have to manage all my object memory, so I'd take a dim view of fundamentally useless code that just serves to wrap storage access up in additional functions - that said, the basic performance capabilies of runtimes like the MSFT .net ILM make that a position that can be difficult to defend at times
Purely my 2c
There's nothing unusual about having explicit set/get methods in C++. I've seen it in plenty of C++, it can be very useful to not allow direct access to data members.
Check out this question for an explanation of why Java tends to prefer them and the reasons for C++ are the same. In short: it allows you to change the way data members are accessed without forcing client code (code that uses your code) to recompile. It also allows you to enforce a specific policy for how to access data and what to do when that data is accessed.
By mandating the use of set/get methods, one can implement useful side-effects in the getter/setter (for example, when the argument to get/set is an object).
I am surprised nobody has mentioned Java introspection and beans yet.
Using get.../set... naming convention combined with introspection allows all sorts of clever trickery with utility classes.
I personally feel that the "public" keyword should have been enough to trigger the bean magic but I am not Ray Gosling.
My take on this is that in C++ is a rather pointless exercise. You are adding at least six lines of code to test and maintain which perform no purpose and will for the most part be ignored by the compiler. It doesnt really protect your class from misuse and abuse unless you add a lot more coding.
I don't think the C++ community frowned on using getters and setters. They are almost always a good idea.
It has to do with the basics of object oriented programming - hiding the internals of an object from its users. The users of an object should not need to know (nor should they care) about the internals of an object.
It also gives you control over what is done whenever a user of your object tries to read/write to it. In effect, you expose an interface to the object's users. They have to use that interface and you control what happens when methods in that interface are called - the getters and setters would be part of the interface.
It just makes things easier when debugging. A typical scenario is when your object lands up in a weird state and you're debugging to find out how it got there. All you do is set breakpoints in your getters and setters and assuming all else is fine, you're able to see how your object gets to the weird state. If your object's users are all directly accessing its members, figuring out when your object's state changes becomes a lot harder (though not impossible)
I would argue that C++ needs getters/setters more than Java.
In Java, if you start with naked field access, and later you changed your mind, you want getter/setter instead, it is extremely easy to find all the usages of the field, and refactor them into getter/setter.
in C++, this is not that easy. The language is too complex, IDEs simply can't reliably do that.
so In C++, you better get it right the first time. In Java, you can be more adventurous.
There were gets/sets long before java. There are many reasons to use them, especially, if you have to recalculate sth. wenn a value changes. So the first big advantage is, that you can watch to value changes. But imho its bad to ALWAYS implement get and set-often a get is enough. Another point is, that class changes will directly affect your customers. You cant change member names without forcing to refactor the clients code with public members. Lets say, you have an object with a lenght and you change this member name...uh. With a getter, you just change you side of the code and the client can sleep well. Adding gets/Sets for members that should be hidden is of course nonsense.
Related
Say I have a List of object which were defined using lambda expressions (closures). Is there a way to inspect them so they can be compared?
The code I am most interested in is
List<Strategy> strategies = getStrategies();
Strategy a = (Strategy) this::a;
if (strategies.contains(a)) { // ...
The full code is
import java.util.Arrays;
import java.util.List;
public class ClosureEqualsMain {
interface Strategy {
void invoke(/*args*/);
default boolean equals(Object o) { // doesn't compile
return Closures.equals(this, o);
}
}
public void a() { }
public void b() { }
public void c() { }
public List<Strategy> getStrategies() {
return Arrays.asList(this::a, this::b, this::c);
}
private void testStrategies() {
List<Strategy> strategies = getStrategies();
System.out.println(strategies);
Strategy a = (Strategy) this::a;
// prints false
System.out.println("strategies.contains(this::a) is " + strategies.contains(a));
}
public static void main(String... ignored) {
new ClosureEqualsMain().testStrategies();
}
enum Closures {;
public static <Closure> boolean equals(Closure c1, Closure c2) {
// This doesn't compare the contents
// like others immutables e.g. String
return c1.equals(c2);
}
public static <Closure> int hashCode(Closure c) {
return // a hashCode which can detect duplicates for a Set<Strategy>
}
public static <Closure> String asString(Closure c) {
return // something better than Object.toString();
}
}
public String toString() {
return "my-ClosureEqualsMain";
}
}
It would appear the only solution is to define each lambda as a field and only use those fields. If you want to print out the method called, you are better off using Method. Is there a better way with lambda expressions?
Also, is it possible to print a lambda and get something human readable? If you print this::a instead of
ClosureEqualsMain$$Lambda$1/821270929#3f99bd52
get something like
ClosureEqualsMain.a()
or even use this.toString and the method.
my-ClosureEqualsMain.a();
This question could be interpreted relative to the specification or the implementation. Obviously, implementations could change, but you might be willing to rewrite your code when that happens, so I'll answer at both.
It also depends on what you want to do. Are you looking to optimize, or are you looking for ironclad guarantees that two instances are (or are not) the same function? (If the latter, you're going to find yourself at odds with computational physics, in that even problems as simple as asking whether two functions compute the same thing are undecidable.)
From a specification perspective, the language spec promises only that the result of evaluating (not invoking) a lambda expression is an instance of a class implementing the target functional interface. It makes no promises about the identity, or degree of aliasing, of the result. This is by design, to give implementations maximal flexibility to offer better performance (this is how lambdas can be faster than inner classes; we're not tied to the "must create unique instance" constraint that inner classes are.)
So basically, the spec doesn't give you much, except obviously that two lambdas that are reference-equal (==) are going to compute the same function.
From an implementation perspective, you can conclude a little more. There is (currently, may change) a 1:1 relationship between the synthetic classes that implement lambdas, and the capture sites in the program. So two separate bits of code that capture "x -> x + 1" may well be mapped to different classes. But if you evaluate the same lambda at the same capture site, and that lambda is non-capturing, you get the same instance, which can be compared with reference equality.
If your lambdas are serializable, they'll give up their state more easily, in exchange for sacrificing some performance and security (no free lunch.)
One area where it might be practical to tweak the definition of equality is with method references because this would enable them to be used as listeners and be properly unregistered. This is under consideration.
I think what you're trying to get to is: if two lambdas are converted to the same functional interface, are represented by the same behavior function, and have identical captured args, they're the same
Unfortunately, this is both hard to do (for non-serializable lambdas, you can't get at all the components of that) and not enough (because two separately compiled files could convert the same lambda to the same functional interface type, and you wouldn't be able to tell.)
The EG discussed whether to expose enough information to be able to make these judgments, as well as discussing whether lambdas should implement more selective equals/hashCode or more descriptive toString. The conclusion was that we were not willing to pay anything in performance cost to make this information available to the caller (bad tradeoff, punishing 99.99% of users for something that benefits .01%).
A definitive conclusion on toString was not reached but left open to be revisited in the future. However, there were some good arguments made on both sides on this issue; this is not a slam-dunk.
To compare labmdas I usually let the interface extend Serializable and then compare the serialized bytes. Not very nice but works for the most cases.
I don't see a possibility, to get those informations from the closure itself.
The closures doesn't provide state.
But you can use Java-Reflection, if you want to inspect and compare the methods.
Of course that is not a very beautiful solution, because of the performance and the exceptions, which are to catch. But this way you get those meta-informations.
I'm just beginning to learn OOP programming in java. I have already programmed a little in C++, and one of the things I miss the most in Java is the possibility to return multiple values. It's true that C++ functions only strictly return one variable, but we can use the by-reference parameters to return many more. Conversely, in Java we can't do such a thing, at least we can't for primitive types.
The solution I thought off was to create a class grouping the variables I wanted to return and return an instance of that class. For example, I needed to look for an object in a an array and I wanted to return a boolean(found or not) and an index. I know I could make this just setting the index to -1 if nothing was found, but I think it's more clear the other way.
The thing is that I was told by someone who knows much more about Java than I know that I shouldn't create classes for the purpose of returning multiple values ( even if they are related). He told classes should never be used as C++ structs, just to group elements. He also said methods shouldn't return non-primitive objects , they should receive the object from the outside and only modify it. Which of these things are true?
I shouldn't create classes for the purpose of returning multiple values
classes should never be used as C++ structs, just to group elements.
methods shouldn't return non-primitive objects, they should receive the object from the outside and only modify it
For any of the above statements this is definitely not the case. Data objects are useful, and in fact, it is good practice to separate pure data from classes containing heavy logic.
In Java the closest thing we have to a struct is a POJO (plain old java object), commonly known as data classes in other languages. These classes are simply a grouping of data. A rule of thumb for a POJO is that it should only contain primitives, simple types (string, boxed primitives, etc) simple containers (map, array, list, etc), or other POJO classes. Basically classes which can easily be serialized.
Its common to want to pair two, three, or n objects together. Sometimes the data is significant enough to warrant an entirely new class, and in others not. In these cases programmers often use Pair or Tuple classes. Here is a quick example of a two element generic tuple.
public class Tuple2<T,U>{
private final T first;
private final U second;
public Tuple2(T first, U second) {
this.first = first;
this.second = second;
}
public T getFirst() { return first; }
public U getSecond() { return second; }
}
A class which uses a tuple as part of a method signature may look like:
public interface Container<T> {
...
public Tuple2<Boolean, Integer> search(T key);
}
A downside to creating data classes like this is that, for quality of life, we have to implement things like toString, hashCode, equals getters, setters, constructors, etc. For each different sized tuple you have to make a new class (Tuple2, Tuple3, Tuple4, etc). Creating all of these methods introduce subtle bugs into our applications. For these reasons developers will often avoid creating data classes.
Libraries like Lombok can be very helpful for overcoming these challenges. Our definition of Tuple2, with all of the methods listed above, can be written as:
#Data
public class Tuple2<T,U>{
private final T first;
private final U second;
}
This also makes it extremely easy to create custom response classes. Using the custom classes can avoid autoboxing with generics, and increase readability greatly. eg:
#Data
public class SearchResult {
private final boolean found;
private final int index;
}
...
public interface Container<T> {
...
public SearchResult search(T key);
}
methods should receive the object from the outside and only modify it
This is bad advice. It's much nicer to design data around immutability. From Effective Java 2nd Edition, p75
Immutable objects are simple. An immutable object can be in exactly one state, the state in which it was created. If you make sure that all constructors establish class invariants, then it is guaranteed that these invariants will remain true for all time, with no further effort on your part or on the part of the programmer who uses the class. Mutable objects, on the other hand, can have arbitrarily complex state spaces. If the documentation does not provide a precise description of the state transitions performed by mutator methods, it can be difficult or impossible to use a mutable class reliably.
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads accessing them concurrently. This is far and away the easiest approach to achieving thread safety. In fact, no thread can ever observe any effect of another thread on an immutable object. Therefore, immutable objects can be shared freely.
As to your specific example ("how to return both error status and result?")
I needed to look for an object in a an array and I wanted to return a boolean(found or not) and an index. I know I could make this just setting the index to -1 if nothing was found, but I think it's more clear the other way.
Returning special invalid result values such as -1 for "not found" is indeed very common, and I agree with you that it is not too pretty.
However, returning a tuple of (statusCode, resultValue) is not the only alternative.
The most idiomatic way to report exceptions in Java is to, you guessed it, use exceptions. So return a result or if no result can be produced throw an exception (NoSuchElementException in this case). If this is appropriate depends on the application: You don't want to throw exceptions for "correct" input, it should be reserved for irregular cases.
In functional languages, they often have built-in data structures for this (such as Try, Option or Either) which essentially also do statusCode + resultValue internally, but make sure that you actually check that status code before trying to access the result value. Java now has Optional as well. If I want to go this route, I'd pull in these wrapper types from a library and not make up my own ad-hoc "structs" (because that would only confuse people).
"methods shouldn't return non-primitive objects , they should receive the object from the outside and only modify it"
That may be very traditional OOP thinking, but even within OOP the use of immutable data absolutely has its value (the only sane way to do thread-safe programming in my book), so the guideline to modify stuff in-place is pretty terrible. If something is considered a "data object" (as opposed to "an entity") you should prefer to return modified copies instead of mutating the input.
For some static Information you can use the static final options. Variables, declared as static final, can be accessed from everywhere.
Otherwise it is usual and good practise to use the getter/ setter concept to receive and set parameters in your classes.
Strictly speaking, it is a language limitation that Java does not natively support tuples as return values (see related discussion here). This was done to keep the language cleaner. However, the same decision was made in most other languages. Of course, this was done keeping in mind that, in case of necessity, such a behaviour can be implemented by available means. So here are the options (all of them except the second one allow to combine arbitrary types of return components, not necessarily primitive):
Use classes (usually static, self-made or predefined) specifically designed to contain a group of related values being returned. This option is well covered in other answers.
Combine, if possible, two or more primitive values into one return value. Two ints can be combined into a single long, four bytes can be combined into a single int, boolean and unsigned int less than Integer.MAX_VALUE can be combined into a signed int (look, for example, at how Arrays.binarySearch(...) methods return their results), positive double and boolean can be combined into a single signed double, etc. On return, extract the components via comparisons (if boolean is among them) and bit operations (for shifted integer components).
2a. One particular case worth noting separately. It is common (and widely used) convention to return null to indicate that, in fact, the returned value is invalid. Strictly speaking, this convention substitutes two-field result - one implicit boolean field that you're using when checking
if (returnValue != null)
and the other non-primitive field (which can be just a wrapper of a primitive field) containing the result itself. You use it after the above checking:
ResultClass result = returnValue;
If you don't want to mess with data classes, you can always return an array of Objects:
public Object[] returnTuple() {
return new Object[]{1234, "Text", true};
}
and then typecast its components to desired types:
public void useTuple() {
Object[] t = returnTuple();
int x = (int)t[0];
String s = (String)t[1];
boolean b = (boolean)t[2];
System.out.println(x + ", " + s + ", " + b);
}
You can introduce field(s) into your class to hold auxiliary return component(s) and return only the main component explicitly (you decide which one is the main component):
public class LastResultAware {
public static boolean found;
public static int errorCode;
public static int findLetter(String src, char letter) {
int i = src.toLowerCase().indexOf(Character.toLowerCase(letter));
found = i >= 0;
return i;
}
public static int findUniqueLetter(String src, char letter) {
src = src.toLowerCase();
letter = Character.toLowerCase(letter);
int i = src.indexOf(letter);
if (i < 0)
errorCode = -1; // not found
else {
int j = src.indexOf(letter, i + 1);
if (j >= 0)
errorCode = -2; // ambiguous result
else
errorCode = 0; // success
}
return i;
}
public static void main(String[] args) {
int charIndex = findLetter("ABC", 'b');
if (found)
System.out.println("Letter is at position " + charIndex);
charIndex = findUniqueLetter("aBCbD", 'b');
if (errorCode == 0)
System.out.println("Letter is only at position " + charIndex);
}
}
Note that in some cases it is better to throw an exception indicating an error than to return an error code which the caller may just forget to check.
Depending on usage, this return-extending fields may be either static or instance. When static, they can even be used by multiple classes to serve a common purpose and avoid unnecessary field creation. For example, one public static int errorCode may be enough. Be warned, however, that this approach is not thread-safe.
What is the C/C++ equivalence of java.io.Serializable?
There're references to serialization libraries on:
Serialize Data Structures in C
And there are:
http://troydhanson.github.io/tpl/index.html
http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/index.html
https://developers.google.com/protocol-buffers/docs/cpptutorial#optimization-tips
But do such an equivalence even exists?
So if I have an abstract class as follows in Java, how would a serializable class in C/C++ look like?
import java.io.Serializable;
public interface SuperMan extends Serializable{
/**
* Count the number of abilities.
* #return
*/
public int countAbility();
/**
* Get the ability with index k.
* #param k
* #return
*/
public long getAbility(int k);
/**
* Get the array of ability from his hand.
* #param k
* #return
*/
public int[] getAbilityFromHand(int k);
/**
* Get the finger of the hand.
* #param k
* #return
*/
public int[][] getAbilityFromFinger(int k);
//check whether the finger with index k is removed.
public boolean hasFingerRemoved(int k);
/**
* Remove the finger with index k.
* #param k
*/
public void removeFinger(int k);
}
Could any serializable C/C++ object just be inherited like in Java?
There are no standard library classes that implement serialization the same way Java does. There are some libraries that facilitate serialization but for basic needs you typically make your class serializable by overloading the insertion and extraction operators like this:
class MyType
{
int value;
double factor;
std::string type;
public:
MyType()
: value(0), factor(0.0), type("none") {}
MyType(int value, double factor, const std::string& type)
: value(value), factor(factor), type(type) {}
// Serialized output
friend std::ostream& operator<<(std::ostream& os, const MyType& m)
{
return os << m.value << ' ' << m.factor << ' ' << m.type;
}
// Serialized input
friend std::istream& operator>>(std::istream& is, MyType& m)
{
return is >> m.value >> m.factor >> m.type;
}
};
int main()
{
std::vector<MyType> v {{1, 2.7, "one"}, {4, 5.1, "two"}, {3, 0.6, "three"}};
std::cout << "Serialize to standard output." << '\n';
for(auto const& m: v)
std::cout << m << '\n';
std::cout << "\nSerialize to a string." << '\n';
std::stringstream ss;
for(auto const& m: v)
ss << m << '\n';
std::cout << ss.str() << '\n';
std::cout << "Deserialize from a string." << '\n';
std::vector<MyType> v2;
MyType m;
while(ss >> m)
v2.push_back(m);
for(auto const& m: v2)
std::cout << m << '\n';
}
Output:
Serialize to standard output.
1 2.7 one
4 5.1 two
3 0.6 three
Serialize to a string.
1 2.7 one
4 5.1 two
3 0.6 three
Deserialize from a string.
1 2.7 one
4 5.1 two
3 0.6 three
The serialization format is entirely up to the programmer and you are responsible for making sure that each member of the class that you want to serialize is itself serializable (has an insertion/extraction operator defined). You also have to deal with how fields are separated (spaces or new-lines or zero-terminated?).
All the basic types have serialization (insertion/extraction) operators pre-defined but you still need to be careful with things like std::string that can contain (for example) spaces or new-lines (if you are using spaces or new-lines as your field delimiter).
There is not a single standard for this. In fact every library can implement it in different way. Here are some approaches which can be used:
class has to be derived from common base class and implement read() and write() virtual methods:
class SuperMan : public BaseObj
{
public:
virtual void read(Stream& stream);
virtual void write(Stream& stream);
};
class should implement special interface - in C++ this is done by deriving class from special abstract class. This is variaton of previous method:
class Serializable
{
public:
virtual Serializable() {}
virtual void read(Stream& stream) = 0;
virtual void write(Stream& stream) = 0;
};
class SuperMan : public Man, public Serializable
{
public:
virtual void read(Stream& stream);
virtual void write(Stream& stream);
};
library may allow (or require) to register "serializers" for given type. They can be implemented by creating class from special base class or interface, and then registering them for given type:
#define SUPERMAN_CLASS_ID 111
class SuperMan
{
public:
virtual int getClassId()
{
return SUPERMAN_CLASS_ID;
}
};
class SuperManSerializer : public Serializer
{
virtual void* read(Stream& stream);
virtual void write(Stream& stream, void* object);
};
int main()
{
register_class_serializer(SUPERMAN_CLASS_ID, new SuperManSerializer());
}
serializers can be also implemented using functors, e.g. lambdas:
int main
{
register_class_serializer(SUPERMAN_CLASS_ID,
[](Stream&, const SuperMan&) {},
[](Stream&) -> SuperMan {});
}
instead of passing serializer object to some function, it may be enough to pass its type to special template function:
int main
{
register_class_serializer<SuperManSerializer>();
}
class should provide overloaded operators like '<<' and '>>'. First argument for them is some stream class, and second one is out class instance. Stream can be a std::stream, but this causes conflict with default use for these operators - converting to and from user-friendly text format. Because of this stream class is a dedicated one (it can wrap std::stream though), or library will support alternative method if << also has to be supported.
class SuperMan
{
public:
friend Stream& operator>>(const SuperMan&);
friend Stream& operator<<(const SuperMan&);
};
there should be specialization of some class template for our class type. This solution can be used together with << and >> operators - library first will try to use this template, and revert to operators if it will not be specialized (this can be implemented as default template version, or using SFINAE)
// default implementation
template<class T>
class Serializable
{
public:
void read(Stream& stream, const T& val)
{
stream >> val;
}
void write(Stream& stream, const T& val)
{
stream << val;
}
};
// specialization for given class
template<>
class Serializable<SuperMan>
{
void read(Stream& stream, const SuperMan& val);
void write(Stream& stream, const SuperMan& val);
}
instead of class template library may also use C-style interface with global overloaded functions:
template<class T>
void read(Stream& stream, const T& val);
template<class T>
void write(Stream& stream, const T& val);
template<>
void read(Stream& stream, const SuperMan& val);
template<>
void write(Stream& stream, const SuperMan& val);
C++ language is flexible, so above list is for sure not complete. I am convinced it would be possible to invent another solutions.
As other answers have mentioned, C++ does not have nearly the sort of built-in serialization/deserialization capabilities that Java (or other managed languages) have. This is in part due to the minimal run-time type information (RTTI) available in C++. C++ by itself does not have reflection, so each serializable object must be completely responsible for serialization. In managed languages like Java and C#, the language includes enough RTTI for an external class to be able to enumerate the public fields on an object in order to perform the serialization.
Luckily... C++ does not impose a default mechanism for serialization of a class hierarchy. (I wouldn't mind it supplying an optional mechanism supplied by a special base type in the standard library or something, but overall this could put limits on existing ABIs)
YES Serialization is incredibly important and powerful in modern software engineering. I use it any time I need to translate a class hierarchy to and from some form of runtime consumable data. The mechanism I always choose is based on some form of reflection. More on this below.
You may also want to look here for an idea of the complexities to consider and if you really wanted to verify against the standard you could purchase a copy here. It looks like the working draft for the next standard is on github.
Application specific systems
C++/C allow the author of the application the freedom to select the mechanics behind many of the technologies people take for granted with newer and often higher level languages. Reflection (RTTI), Exceptions, Resource/Memory Management (Garbage collection, RAII, etc.). These systems can all potentially impact the overall quality of a particular product.
I have worked on everything from real time games, embedded devices, mobile apps, to web applications and the overall goals of the particular project vary between them all.
Often for real time high performance games you will explicitly disable RTTI (it isn't very useful in C++ anyway to be honest) and possibly even Exceptions (Many people don't desire the overhead produced here either and if you were really crazy you could implement your own form from long jumps and such. For me Exceptions create an invisible interface that often creates bugs people wouldn't even expect to be possible, so I often avoid them anyway in favor of more explicit logic. ).
Garbage collection isn't included in C++ by default either and in real time games this is a blessing. Sure you can have incremental GC and other optimized approaches which I have seen many games use (often times it is a modification of an existing GC like that used in Mono for C#). Many games use pooling and often for C++ RAII driven by smart pointers. It isn't unusual to have different systems with different patterns of memory usage either which can be optimized in different ways. The point is some applications care more then others about the nitty gritty details.
General idea of automatic serialization of type hierarchy
The general idea of an automatic serialization system of type hierarchies is to use a reflection system that can query type information at runtime from a generic interface. My solution below relies on building that generic interface by extending upon some base type interfaces with the help of the macros. In the end you basically get a dynamic vtable of sorts that you can iterate by index or query by string names of members/types.
I also use a base reflection reader/writer type that exposes some iostream interfaces to allow derived formatters to override. I currently have a BinaryObjectIO, JSONObjectIO, and ASTObjectIO but it is trivial to add others. The point of this is to remove the responsibly of serializing a particular data format from the hierarchy and put it into the serializer.
Reflection at the language level
In many situations the application knows what data it would like to serialize and there is no reason to build it into every object in the language. Many modern languages include RTTI even in the basic types of the system (if they are type based common intrinsics would be int, float, double, etc.). This requires extra data to be stored for everything in the system regardless of the usage by the application. I'm sure many modern compilers can at times optimize away some with tree shaking and such, but you can't guarantee that either.
A Declarative approach
The methods already mentioned are all valid use cases, although they lack some flexibility by having the hierarchy handle the actual serialization task. This can also bloat your code with boilerplate stream manipulation on the hierarchy.
I personally prefer a more declarative approach via reflection. What I have done in the past and continue to do in some situations is create a base Reflectable type in my system. I end up using template metaprogramming to help with some boilerplate logic as well as the preprocessor for string concatenation macros. The end result is a base type that I derive from, a reflectable macro declaration to expose the interface and a reflectable macro definition to implement the guts (tasks like adding the registered member to the type's lookup table.).
So I normally end up with something that looks like this in the h:
class ASTNode : public Reflectable
{
...
public:
DECLARE_CLASS
DECLARE_MEMBER(mLine,int)
DECLARE_MEMBER(mColumn,int)
...
};
Then something like this in the cpp:
BEGIN_REGISTER_CLASS(ASTNode,Reflectable);
REGISTER_MEMBER(ASTNode,mLine);
REGISTER_MEMBER(ASTNode,mColumn);
END_REGISTER_CLASS(ASTNode);
ASTNode::ASTNode()
: mLine( 0 )
, mColumn( 0 )
{
}
I can then use the reflection interface directly with some methods such as:
int id = myreflectedObject.Get<int>("mID");
myreflectedObject.Set( "mID", 6 );
But much more commonly I just iterate some "Traits" data that I have exposed with another interface:
ReflectionInfo::RefTraitsList::const_iterator it = info->getReflectionTraits().begin();
Currently the traits object looks something like this:
class ReflectionTraits
{
public:
ReflectionTraits( const uint8_t& type, const uint8_t& arrayType, const char* name, const ptrType_t& offset );
std::string getName() const{ return mName; }
ptrType_t getOffset() const{ return mOffset; }
uint8_t getType() const{ return mType; }
uint8_t getArrayType() const{ return mArrayType; }
private:
std::string mName;
ptrType_t mOffset;
uint8_t mType;
uint8_t mArrayType; // if mType == TYPE_ARRAY this will give the type of the underlying data in the array
};
I have actually come up with improvements to my macros that allow me to simplify this a bit... but those are taken from an actual project I'm working on currently. I'm developing a programming language using Flex, Bison, and LLVM that compiles to C ABI and webassembly. I'm hoping to open source it soon enough, so if you are interested in the details let me know.
The thing to note here is that "Traits" information is metadata that is accessible at runtime and describes the member and is often much larger for general language level reflection. The information I have included here was all I needed for my reflectable types.
The other important aspect to keep in mind when serializing any data is version information. The above approach will deserialize data just fine until you start changing the internal data structure. You could, however, include a post and possibly pre data serialization hook mechanism with your serialization system so you can fix up data to comply with newer versions of types. I have done this a few times with setups like this and it works really well.
One final note about this technique is that you are explicitly controlling what is serialized here. You can pick and choose the data you want to serialize and the data that may just be keeping track of some transient object state.
C++ Lax guarantees
One thing to note... Since C++ is VERY lax about what data actually looks like. You often have to make some platform specific choices (this is probably one of the main reasons a standard system isn't provided). You can actually do a great deal at compile time with Template metaprogramming, but sometimes it is easier to just assume your char to be 8 bits in length. Yes even this simple assumption isn't 100% universal in C++, luckily in most situations it is.
The approach I use also does some non-standard casting of NULL pointers to determine memory layout (again for my purposes this is the nature of the beast). The following is an example snippet from one of the macro implementations to calculate the member offset in the type where CLASS is provided by the macro.
(ptrType_t)&reinterpret_cast<ptrType_t&>((reinterpret_cast<CLASS*>(0))->member)
A general warning about reflection
The biggest issue with reflection is how powerful it can be. You can quickly turn an easily maintainable codebase into a huge mess with too much inconsistent usage of reflection.
I personally reserve reflection for lower level systems (primarily serialization) and avoid using it for runtime type checking for business logic. Dynamic dispatching with language constructs such as virtual functions should be preferred to reflection type check conditional jumps.
Issues are even harder to track down if the language has inherit all or nothing support for reflection as well. In C# for example you cannot guarantee, given a random codebase, that a function isn't being used simply by allowing the compiler to alert you of any usage. Not only can you invoke the method via a string from the codebase or say from a network packet... you also could break the ABI compatibility of some other unrelated assembly that reflects on the target assembly. So again use reflection consistently and sparingly.
Conclusion
There is currently no standard equivalent to the common paradigm of a serializable class hierarchy in C++, but it can be added much like any other system you see in newer languages. After all everything eventually translates down to simplistic machine code that can be represented by the binary state of the incredible array of transistors included in your CPU die.
I'm not saying that everyone should roll their own here by any means. It is complicated and error prone work. I just really liked the idea and have been interested in this sort of thing for a while now anyways. I'm sure there are some standard fallbacks people use for this sort of work. The first place to look for C++ would be boost as you mentioned above.
If you do a search for "C++ Reflection" you will see several examples of how others achieve a similar result.
A quick search pulled up this as one example.
Say I have a List of object which were defined using lambda expressions (closures). Is there a way to inspect them so they can be compared?
The code I am most interested in is
List<Strategy> strategies = getStrategies();
Strategy a = (Strategy) this::a;
if (strategies.contains(a)) { // ...
The full code is
import java.util.Arrays;
import java.util.List;
public class ClosureEqualsMain {
interface Strategy {
void invoke(/*args*/);
default boolean equals(Object o) { // doesn't compile
return Closures.equals(this, o);
}
}
public void a() { }
public void b() { }
public void c() { }
public List<Strategy> getStrategies() {
return Arrays.asList(this::a, this::b, this::c);
}
private void testStrategies() {
List<Strategy> strategies = getStrategies();
System.out.println(strategies);
Strategy a = (Strategy) this::a;
// prints false
System.out.println("strategies.contains(this::a) is " + strategies.contains(a));
}
public static void main(String... ignored) {
new ClosureEqualsMain().testStrategies();
}
enum Closures {;
public static <Closure> boolean equals(Closure c1, Closure c2) {
// This doesn't compare the contents
// like others immutables e.g. String
return c1.equals(c2);
}
public static <Closure> int hashCode(Closure c) {
return // a hashCode which can detect duplicates for a Set<Strategy>
}
public static <Closure> String asString(Closure c) {
return // something better than Object.toString();
}
}
public String toString() {
return "my-ClosureEqualsMain";
}
}
It would appear the only solution is to define each lambda as a field and only use those fields. If you want to print out the method called, you are better off using Method. Is there a better way with lambda expressions?
Also, is it possible to print a lambda and get something human readable? If you print this::a instead of
ClosureEqualsMain$$Lambda$1/821270929#3f99bd52
get something like
ClosureEqualsMain.a()
or even use this.toString and the method.
my-ClosureEqualsMain.a();
This question could be interpreted relative to the specification or the implementation. Obviously, implementations could change, but you might be willing to rewrite your code when that happens, so I'll answer at both.
It also depends on what you want to do. Are you looking to optimize, or are you looking for ironclad guarantees that two instances are (or are not) the same function? (If the latter, you're going to find yourself at odds with computational physics, in that even problems as simple as asking whether two functions compute the same thing are undecidable.)
From a specification perspective, the language spec promises only that the result of evaluating (not invoking) a lambda expression is an instance of a class implementing the target functional interface. It makes no promises about the identity, or degree of aliasing, of the result. This is by design, to give implementations maximal flexibility to offer better performance (this is how lambdas can be faster than inner classes; we're not tied to the "must create unique instance" constraint that inner classes are.)
So basically, the spec doesn't give you much, except obviously that two lambdas that are reference-equal (==) are going to compute the same function.
From an implementation perspective, you can conclude a little more. There is (currently, may change) a 1:1 relationship between the synthetic classes that implement lambdas, and the capture sites in the program. So two separate bits of code that capture "x -> x + 1" may well be mapped to different classes. But if you evaluate the same lambda at the same capture site, and that lambda is non-capturing, you get the same instance, which can be compared with reference equality.
If your lambdas are serializable, they'll give up their state more easily, in exchange for sacrificing some performance and security (no free lunch.)
One area where it might be practical to tweak the definition of equality is with method references because this would enable them to be used as listeners and be properly unregistered. This is under consideration.
I think what you're trying to get to is: if two lambdas are converted to the same functional interface, are represented by the same behavior function, and have identical captured args, they're the same
Unfortunately, this is both hard to do (for non-serializable lambdas, you can't get at all the components of that) and not enough (because two separately compiled files could convert the same lambda to the same functional interface type, and you wouldn't be able to tell.)
The EG discussed whether to expose enough information to be able to make these judgments, as well as discussing whether lambdas should implement more selective equals/hashCode or more descriptive toString. The conclusion was that we were not willing to pay anything in performance cost to make this information available to the caller (bad tradeoff, punishing 99.99% of users for something that benefits .01%).
A definitive conclusion on toString was not reached but left open to be revisited in the future. However, there were some good arguments made on both sides on this issue; this is not a slam-dunk.
To compare labmdas I usually let the interface extend Serializable and then compare the serialized bytes. Not very nice but works for the most cases.
I don't see a possibility, to get those informations from the closure itself.
The closures doesn't provide state.
But you can use Java-Reflection, if you want to inspect and compare the methods.
Of course that is not a very beautiful solution, because of the performance and the exceptions, which are to catch. But this way you get those meta-informations.
Suppose you're maintaining an API that was originally released years ago (before java gained enum support) and it defines a class with enumeration values as ints:
public class VitaminType {
public static final int RETINOL = 0;
public static final int THIAMIN = 1;
public static final int RIBOFLAVIN = 2;
}
Over the years the API has evolved and gained Java 5-specific features (generified interfaces, etc). Now you're about to add a new enumeration:
public enum NutrientType {
AMINO_ACID, SATURATED_FAT, UNSATURATED_FAT, CARBOHYDRATE;
}
The 'old style' int-enum pattern has no type safety, no possibility of adding behaviour or data, etc, but it's published and in use. I'm concerned that mixing two styles of enumeration is inconsistent for users of the API.
I see three possible approaches:
Give up and define the new enum (NutrientType in my fictitious example) as a series of ints like the VitaminType class. You get consistency but you're not taking advantage of type safety and other modern features.
Decide to live with an inconsistency in a published API: keep VitaminType around as is, and add NutrientType as an enum. Methods that take a VitaminType are still declared as taking an int, methods that take a NutrientType are declared as taking such.
Deprecate the VitaminType class and introduce a new VitaminType2 enum. Define the new NutrientType as an enum. Congratulations, for the next 2-3 years until you can kill the deprecated type, you're going to deal with deprecated versions of every single method that took a VitaminType as an int and adding a new foo(VitaminType2 v) version of each. You also need to write tests for each deprecated foo(int v) method as well as its corresponding foo(VitaminType2 v) method, so you just multiplied your QA effort.
What is the best approach?
How likely is it that the API consumers are going to confuse VitaminType with NutrientType? If it is unlikely, then maybe it is better to maintain API design consistency, especially if the user base is established and you want to minimize the delta of work/learning required by customers. If confusion is likely, then NutrientType should probably become an enum.
This needn't be a wholesale overnight change; for example, you could expose the old int values via the enum:
public enum Vitamin {
RETINOL(0), THIAMIN(1), RIBOFLAVIN(2);
private final int intValue;
Vitamin(int n) {
intValue = n;
}
public int getVitaminType() {
return intValue;
}
public static Vitamin asVitamin(int intValue) {
for (Vitamin vitamin : Vitamin.values()) {
if (intValue == vitamin.getVitaminType()) {
return vitamin;
}
}
throw new IllegalArgumentException();
}
}
/** Use foo.Vitamin instead */
#Deprecated
public class VitaminType {
public static final int RETINOL = Vitamin.RETINOL.getVitaminType();
public static final int THIAMIN = Vitamin.THIAMIN.getVitaminType();
public static final int RIBOFLAVIN = Vitamin.RIBOFLAVIN.getVitaminType();
}
This allows you to update the API and gives you some control over when to deprecate the old type and scheduling the switch-over in any code that relies on the old type internally.
Some care is required to keep the literal values in sync with those that may have been in-lined with old consumer code.
Personal opinion is that it's probably not worth the effort of trying to convert. For one thing, the "public static final int" idiom isn't going away any time soon, given that it's sprinkled liberally all over the JDK. For another, tracking down usages of the original ints is likely to be really unpleasant, given that your classes will compile away the reference so you're likely not to know you've broken anything until it's too late
(by which I mean
class A
{
public static final int MY_CONSTANT=1
}
class B
{
....
i+=A.MY_CONSTANT;
}
gets compiled into
i+=1
So if you rewrite A you may not ever realize that B is broken until you recompile B later.
It's a pretty well known idiom, probably not so terrible to leave it in, certainly better than the alternative.
There is a rumor that the creator of "make" realized that the syntax of Makefiles was bad, but felt that he couldn't change it because he already had 10 users.
Backwards compatibility at all costs, even if it hurts your customers, is a bad thing. SO can't really give you a definitive answer on what to do in your case, but be sure and consider the cost to your users over the long term.
Also think about ways you can refactor the core of your code will keeping the old integer based enums only at the outer layer.
Wait for the next major revision, change everything to enum and provide a script (sed, perl, Java, Groovy, ...) to convert existing source code to use the new syntax.
Obviously this has two drawbacks:
No binary compatibility. How important this one is depends on the use cases, but can be acceptable in the case of a new major release
Users have to do some work. If the work is simple enough, then this too may be acceptable.
In the meantime, add new types as enums and keep old types as ints.
The best would be if you could just fix the published versions, if possible. In my opinion consistency would be the best solution, so you would need to do some refactoring. I personally don't like deprecated things, because they get into way. You might be able to wait until a bigger version release and use those ints until then, and refactor everything in a big project. If that is not possible, you might consider yourself stuck with the ints, unless you create some kinds of wrappers or something.
If nothing helps but you still evolve the code, you end up losing consistency or living with the deprecated versions. In any case, usually at least at some point of time people become fed up with old stuff if it has lost it's consistency and create new from scratch... So you would have the refactoring in the future no matter what.
The customer might scrap the project and buy an other product, if something goes wrong. Usually it is not the customer's problem can you afford refactoring or not, they just buy what is appropriate and usable to them. So in the end it is a tricky problem and care needs to be taken.