Java toString for debugging or actual logical use - java

This might be a very basic question, apologies if this was already asked.
Should toString() in Java be used for actual program logic or is it only for debugging/human reading only. My basic question is should be using toString() or write a different method called asString() when I need to use the string representation in the actual program flow.
The reason I ask is I have a bunch of classes in a web service that rely on a toString() to work correctly, in my opinion something like asString() would have been safer.
Thanks

Except for a few specific cases, the toString should be used for debugging, not for the production flow of data.
The method has several limitations which make it less suitable for use in production data flow:
Taking no parameters, the method does not let you easily alter the string representation in response to the environment. In particular, it is difficult to format the string in a way that is sensitive to the current locale.
Being part of the java.Object class, this method is commonly overridden by subclasses. This may be harmful in situations when you depend on the particular representation, because the writers of the subclass may have no idea of your restrictions.
The obvious exceptions to this rule are toString methods of the StringBuilder and the StringBuffer classes, because these two methods simply make an immutable string from the mutable content of the corresponding object.

It is not just for debugging/human reading only, it really depends on the context in which the object is being used. For example, if you have a table which is displaying some object X, then you may want the table to display a readable textual representation of X in which case you would usually implement the toString() method. This of course is a basic example but there are many uses in which case implementing toString() would be a good idea.

Related

Represent email, telephonenumber and id's as POJO's instead of Strings

I have a typical business web application where the domain contains entities like accounts and users. My backend is Java, so they're represented by POJO's. During an early iteration, every attribute of those POJO's were just strings. This made sense because the html input was a string, and the way the data is persisted in the DB is also similar to a string.
Recently, we've been working on validating this kind of input and I found it helps if I switch over to an object notation for this kind attributes. For example, a TelephoneNumber class consists of:
(int) country calling code
(string) rest of number
(static char) the character to prefix the country calling code (in our case this is a +)
(static pattern) regular expression to match if phonenumber is sensical
methods to compare and validate telephone numbers.
This class has advantages and disadvantages:
not good: Additional object creation and conversion between string/object
good: OOP and all logic regarding telephone numbers is bundled in one class (high cohesion),
good: whenever a telephone number is needed as an argument for a method or constructor, java's strict typing makes it very clear we're not just dealing with a random string.
Compare the possible confusing double strings:
public User(String name, String telephoneNumber)
vs the clean OOP way:
public User(String name, TelephoneNumber telephoneNumber)
I think in this case the advantages outweight the disadvantges. My concern is now for the following two attributes:
-id's (like b3e99627-9754-4276-a527-0e9fb49d15bb)
-e-mailadresses
This "objects" are really just a single string. It seems overkill to turn them into objects. Especially the user.getMail.getMailString() kind of methods really bother me because I know the mailString is the only attribute of mail. However, if I don't turn them into an object, I lose some of the advantages.
So my question is: How do you deal with this concepts in a web application? Are there best practices or other concerns to take into account?
If you use Strings for everything you are essentially giving up type safety, and you have to "type check" with validation in any class or method where the string is used. Inevitably this validation code gets duplicated and makes other classes bloated, confusing, and potentially inconsistent because the validation isn't the same in all places. You can never really be sure what the string holds, so debugging becomes more difficult, maintenance gets ugly, and ultimately it wastes lots of developer time. Given the power of modern processors, you shouldn't worry about the performance cost of using lots of objects because it's not worth sacrificing programmer productivity (in most cases).
One other thing that I have found is that string variables tend to be more easily abused by future programmers who need to make a "quick fix", so they'll set new values for convenience just where they need them, instead of extending a type and making it clear what's going on.
My recommendation is to use meaningful types everywhere possible.
Maximizing the benefit of typing leads to the idea of "tiny types", which you can read about here: http://darrenhobbs.com/2007/04/11/tiny-types/
Essentially it means you make classes to represent everything. In your example with the User class, that would mean you would also make a Name class to represent the name. Inside that class you might also have two more classes, FirstName and LastName. This adds clarity to your code, and maximizes the number of logical errors the compiler stops you from making. In most cases you would never use a first name where you want a last name and vice versa.
One of the biggest advantages of objects is the fact that they can have methods. For example, all your data object (phone number, address, email etc.) can implement the same interface IValidatable with validate method, which does the obvious. In this case, it would make sense to wrap email in an object as well, since we do want to validate emails. Regarding ID - assuming it's assigned internally by your app, you probably don't need to wrap it.

Should you avoid Guavas Ordering.usingToString()?

This question was prompted after reading Joshua Bloch's "Effective Java". Specifically in Item #10, he argues that it is bad practice to parse an object's string representation and use it for anything except a friendlier printout/debug. The reason is that such a use "is error-prone, results in fragile systems that break if you change the format".
To me it looks like Guava's Ordering.usingToString() is a spot on example of this. So is it bad practice to use it?
Well, if the sorting is only used for deciding in which order to display things to a user, I'd argue it's part of "friendlier printout/debug".
If, however, your codes correctness depends on the ordering, then I'd argue that it's indeed a bad idea to depend on toString.
As the author of that method, I would agree: it's really just a crutch. For those "look, I just need an Ordering<Object>, dammit" cases. It should probably be removed, since you can get its behavior with Ordering.onResultOf(Functions.toStringFunction) anyway.
If your program ever used the toString() for lexical sorting using natural ordering in such a way that program execution depends on it, then it would be wise to override the default toString() of the class that extended. You should in that case make the toString() method final and clearly document that it is used for ordering.
It would however be much better to create another method returning a String and create an ordering depending on that result, possibly by creating a specific Comparator to do the sorting. See for instance the final method name() used for enumerations in Java. In general it creates the same String as toString() but it is still possible to perform ordering with it even if toString() has been overridden.
If you use the last method, then the Ordering.usingToString() would not be of much use of course.
There are some obvious cases where it actually makes sense like StringBuffer etc. Obviously it doesn't make sense for most "business" classes to depend on toString().

Why is String.length() a method?

If a String object is immutable (and thus obviously cannot change its length), why is length() a method, as opposed to simply being public final int length such as there is in an array?
Is it simply a getter method, or does it make some sort of calculation?
Just trying to see the logic behind this.
Java is a standard, not just an implementation. Different vendors can license and implement Java differently, as long as they adhere to the standard. By making the standard call for a field, that limits the implementation quite severely, for no good reason.
Also a method is much more flexible in terms of the future of a class. It is almost never done, except in some very early Java classes, to expose a final constant as a field that can have a different value with each instance of the class, rather than as a method.
The length() method well predates the CharSequence interface, probably from its first version. Look how well that worked out. Years later, without any loss of backwards compatibility, the CharSequence interface was introduced and fit in nicely. This would not have been possible with a field.
So let's really inverse the question (which is what you should do when you design a class intended to remain unchanged for decades): What does a field gain here, why not simply make it a method?
This is a fundamental tenet of encapsulation.
Part of encapsulation is that the class should hide its implementation from its interface (in the "design by contract" sense of an interface, not in the Java keyword sense).
What you want is the String's length -- you shouldn't care if this is cached, calculated, delegates to some other field, etc. If the JDK people want to change the implementation down the road, they should be able to do so without you having to recompile.
Perhaps a .length() method was considered more consistent with the corresponding method for a StringBuffer, which would obviously need more than a final member variable.
The String class was probably one of the very first classes defined for Java, ever. It's possible (and this is just speculation) that the implementation used a .length() method before final member variables even existed. It wouldn't take very long before the use of the method was well-embedded into the body of Java code existing at the time.
Perhaps because length() comes from the CharSequence interface. A method is a more sensible abstraction than a variable if its going to have multiple implementations.
You should always use accessor methods in public classes rather than public fields, regardless of whether they are final or not (see Item 14 in Effective Java).
When you allow a field to be accessed directly (i.e. is public) you lose the benefit of encapsulation, which means you can't change the representation without changing the API (you break peoples code if you do) and you can't perform any action when the field is accessed.
Effective Java provides a really good rule of thumb:
If a class is accessible outside its package, provide accessor methods, to preserve the flexibility to change the class's internal representation. If a public class exposes its data fields, all hope of changing its representation is lost, as client code can be distributed far and wide.
Basically, it is done this way because it is good design practice to do so. It leaves room to change the implementation of String at a later stage without breaking code for everyone.
String is using encapsulation to hide its internal details from you. An immutable object is still free to have mutable internal values as long as its externally visible state doesn't change. Length could be lazily computed. I encourage you to take a look as String's source code.
Checking the source code of String in Open JDK it's only a getter.
But as #SteveKuo points out this could differ dependent on the implementation.
In most current jvm implementations a Substring references the char array of the original String for content and it needs start and length fields to define their own content, so the length() method is used as a getter. However this is not the only possible way to implement String.
In a different possible implementation each String could have its own char array and since char arrays already have a length field with the correct length it would be redundant to have one for the String object, since String.length() is a method we don't have to do that and can just reference the internal array.length .
These are two possible implementations of String, both with their own good and bad parts and they can replace each other because the length() method hides where the length is stored (internal array or in own field).

Provide Programmatic Access to All Data Available in String Form: toString()

Bloch said: Provide Programmatic Access to All Data Available in String Form.
I am wondering if he means to override toString() which should involve 'all data available'?
I think the 'in string form' means that the string is for human reading, so override toString() is enough for the advice. Am I correct?
No, apparently he meant quite the opposite of that. If a data member is available as part of the toString() output (or other string methods of the class), Bloch's fear is that developers using the API will rely on that and parse the strings to get at the underlying data values. His advice is to provide specific accessors for those data elements, to prevent developers from relying on the format of toString()'s output.

What methods and interfaces do you (almost) always implement in classes?

Which methods and interfaces do you always implement in your classes?
Do you always override equals()? If you do, do you also do hashcode()? toString()? Do you make it a habit to implement the Comparable interface?
I've just written some code where I needed to implement compareTo() and override equals() to get my program to work in a sane manner; I now start seeing ways of using these everywhere...
What do y'all think?
I usually don't implement things in advance unless I need them.
If my class contains data members and I plan to store it somewhere, I will usually implement equals, hashCode, and comparable.
However, I found that most of my classes do not have this issue so there's no point to do it. For example, if your class revolves around functionality on other objects rather than data, why bother? If you have one instance or is organized hierarchically (e.g., a GUI widget or window), why bother?
Don't implement things you don't need, but always make sure to check whether they are needed or not because Java will generally not warn you.
Also, make sure to use your IDE or something like Apache commons to generate these functions. There is rarely a need to hand-code them.
As for toString, I rarely implement it until I find myself debugging and needing a better presentation in the Eclipse debugger (e.g., instead of object ID). I am afraid of implicit converts and never use toString when generating output.
(Almost) Always toString().
It is usually helpful for debugging purposes.
If you override equals, you (almost always) have to override hashCode. hashCode's contract is that two objects that are equals must have the same hash code. If you override equals such that equality is based on something besides the system identity hash code, the it's possible for two objects to be equal to each other but have different hash code.
I think you should never implement things you don't need, or are not sure you are going to need them or not. If it doesn't add value to your code, don't put it in. If you like to keep your (unit) tests in synch with your code, and use them to show use cases of your code, then you shouldn't have anything that is not covered by those tests. This includes equals(), hashCode(), compareTo() etc.
The problem I see, other than a possible waste of time, is that it would confuse someone who reads the code. "Why does this class have equals implemented? Is it some data value? Can it be a part of a collection? Does it even make sense to compare instances of this class?"
So I'd say only implement these when you actually need them. Therefore I can't say that I always implement this and that method. Perhaps toString() would be the method that I write the most, because it's usefulness appears a lot in debugging.
Almost always toString(), it's a pain to be debugging and read something about object Class#123456
equals() and hashCode() when needed, but always both or neither.
The Iterable interface is useful on collection-like classes, and will usually just return something like innerCollection.iterator(). Comparable can be useful too.
also, our company created some interfaces I use a lot, like Displayable (like toString, but gives more or another type of info, like for logging) and ParseLocatable (for stuff that comes from a file we parse, and we want to see in which file and on which line where for example a specific rule was defined (a little like stacktraces)
Effective Java has a chapter on how and when to implement toString, equals, hashCode, Comparable, etc. Highly recommended reading.
toString() is sometimes really helpful for testing purposes when you're too lazy to write Unit tests, also comes in handy for watches while debugging.
But I wouldn't recommend to implement Comparable in every object, it's nice sometimes, but use it wise or you'll end up with loads of code that you don't actually need.
Ditto for toString() and its variants in different languages and runtimes, but I'd also like to point you towards Ned Batchelder's article on stringification, which is a good read and is close to my reasoning for doing so.
For business CRUD applications, I always override ToString. This helps when binding a List(Of T) to a WinForm control. For example, overriding ToString in a Customer object to return _name will then automatically show the customer name value when binding a List(Of Customer) to a ListBox control. Comes in handy.
I usually implement the compareTo method as well as the toString method. Its generally good to know how one instance of the class compares to another instance for sorts and searches. Also an overrided toString method is great for debugging. You can see the content of the class (not just the memory location) presented in a way that makes sense for the class you have written.
On objects that are used primarily for holding data ("rocks"), I find toString and the equals/hashcode contract to be invaluable. This is because rocks are typically passed into and extracted out of collections all the time, most notably the Hash(Set/Map) collections, which require the equals and hashcode contract, and it is very easy to see these objects in a debugger if toString is implemented. When implementing toString, I always use Apache Common's ToStringBuilder class to display all of my properties - that way it is very easy to read the output. I am never concerned about "implicit conversion" - toString is not meant to be used as anything but a human readable string, that the toString can be used on Number subclasses to convert to and from is really just a quirk, etc. Production code should never rely on the toString method to convert the object to a string representation, because that's not what it's for - it's for a human readable string representation, so a different method should be defined if a non-human, but computer code useable string representation is desired.
For data value classes I have an AbstractPojo class which uses reflection to implement equals, hashCode, toString and asMap()
I extend this class for all my data value objects so I don't implement this each time.
I don't override ToString but I apply sometimes the DebuggerDisplay attribute which does the same for the debugging purposes and does not put overhead on the release version.
I also found myself overriding the ToString() method a lot. Especially during development. Although code generators help, it becomes quite annoying to have to change it every time you rename a class member.
Actually I got so annoyed, I tried to find a remedy, This is what I came up with:
Creates a string of this format: MemberType MemberName=MemberValue
Usage:
string testMember = "testing";
Console.WriteLine(Member.State(() => testMember));
Writes ' string testMember="testing" ' to the Console.
Here it is:
public static class Member
{
public static string State<T>(Func<T> expr)
{
var member = ExtractMemberFromLambdaExpression(expr);
Type memberType = GetTypeOfMember(member);
string contents = ExtractContentsFromLambdaExpression(expr);
return string.Format("{0} {1}={2}",memberType.Name, member.Name, contents);
}
static string ExtractContentsFromLambdaExpression<T>(Func<T> expr)
{
if (expr() == null) {
return "NULL";
}
string contents = string.Empty;
if (expr().GetType().IsArray) {
foreach (var item in (expr() as Array)) {
contents += item.ToStringNullSafe() + ", ";
}
contents = contents.Trim().TrimEnd(',');
} else {
contents = expr().ToString();
}
return contents;
}
static MemberInfo ExtractMemberFromLambdaExpression<T>(Func<T> expr)
{
// get IL code behind the delegate
var il = expr.Method.GetMethodBody().GetILAsByteArray();
// bytes 2-6 represent the member handle
var memberHandle = BitConverter.ToInt32(il, 2);
// resolve the handle
return expr.Target.GetType().Module.ResolveMember(memberHandle);
}
static Type GetTypeOfMember(MemberInfo member)
{
Type memberType;
if (member.MemberType == MemberTypes.Field) {
memberType = GetFieldType(member as FieldInfo);
}
else if (member.MemberType == MemberTypes.Property) {
memberType = GetPropertyType(member as PropertyInfo);
}
else {
memberType = typeof(object);
}
return memberType;
}
static Type GetFieldType(FieldInfo fieldInfo)
{
return fieldInfo.FieldType;
}
static Type GetPropertyType(PropertyInfo propertyInfo)
{
return propertyInfo.PropertyType;
}
}
A more thorough explanation and how to use it can be found on my blog about the:
Generic ToString() Method

Categories