Is making an empty string constant worth it? - java

I have a co-worker that swears by
//in a singleton "Constants" class
public static final String EMPTY_STRING = "";
in a constants class available throughout the project. That way, we can write something like
if (Constants.EMPTY_STRING.equals(otherString)) {
...
}
instead of
if ("".equals(otherString)) {
...
}
I say it's
not worth it--it doesn't save any space in the heap/stack/string pool,
ugly
abuse of a constants class.
Who is the idiot here?

String literals are interned by default, so no matter how many times you refer to "" in code, there will only be one empty String object. I don't see any benefit in declaring EMPTY_STRING. Otherwise, you might as well declare ONE, TWO, THREE, FOUR, etc. for integer literals.
Of course, if you want to change the value of EMPTY_STRING later, it's handy to have it in one place ;)

Why on earth would you want a global variable in Java? James Gosling really tried to get rid of them; don't bring them back, please.
Either
0 == possiblyEmptyString.length()
or
possiblyEmptyString.isEmpty() // Java 6 only
are just as clear.

I much prefer seeing EMPTY_STRING.
It makes it english. "".equals 'reads' differently than EMPTY_STRING.equals.

Ironically the whole point of constants is to make them easily changeable. So unless your co-worker plans to redefine EMPTY_STRING to be something other than an empty string - which would be a really stupid thing to do - casting a genuine fixed construct such as "" to a constant is a bad thing.
As Dan Dyer says, its like defining the constant ONE to be 1: it is completely pointless and would be utterly confusing - potentially risky - if someone redefined it.

Well, I could guess too, but I did a quick test... Almost like cheating...
An arbitrary string is checked using various methods. (several iterations)
The results suggests that isEmpty() is both faster and indeed more readable;
If isEmpty() is not available, length() is a good alternative.
Using a constant is probably not worth it.
"".equals(someString()) :24735 ms
t != null && t.equals("") :23363 ms
t != null && t.equals(EMPTY) :22561 ms
EMPTY.equals(someString()) :22159 ms
t != null && t.length() == 0 :18388 ms
t != null && t.isEmpty() :18375 ms
someString().length() == 0 :18171 ms
In this scenario;
"IAmNotHardCoded".equals(someString())
I would suggest defining a constant in a r e l e v a n t place, since a global class
for all constants really sucks. If there is no relevant place, you are probably doing something else wrong...
Customer.FIELD_SHOE_SIZE //"SHOE_SIZE"
Might be considered a relevant place where as;
CommonConstants.I__AM__A__LAZY__PROGRAMMER // true
is not.
For BigIntegers and similar thing, I tend to end up defining a final static locally; like:
private final static BigDecimal ZERO = new BigDecimal(0);
private final static BigDecimal B100 = new BigDecimal("100.00");
Thats bugs me and wouldn't it be nice with some sugar for BigInts and BigDecimals...

I'm with your coworker. While the empty string is hard to mistype, you can accidentally put a space in there and it may be difficult to notice when scanning the code. More to the point it is a good practice to do this with all of your string constants that get used in more than one place -- although, I tend to do this at the class level rather than as global constants.
FWIW, C# has a static property string.Empty for just this purpose and I find that it improves the readability of the code immensely.

As a tangent to the question, I generally recommend using a utility function when what you're really checking for is "no useful value" rather than, specifically, the empty string. In general, I tend to use:
import org.apache.commons.lang.StringUtils;
// Check if a String is whitespace, empty ("") or null.
StringUtils.isBlank(mystr);
// Check if a String is empty ("") or null.
StringUtils.isEmpty(mystr);
The concept being that the above two:
Check the various other cases, including being null safe, and (more importantly)
Conveys what you are trying to test, rather than how to test it.

David Arno states: -
Ironically the whole point of
constants is to make them easily
changeable
This is simply not true. The whole point of constants is reuse of the same value and for greater readability.
It is very rare that constant values are changed (hence the name). It is more often that configuration values are changed, but persisted as data somewhere (like a config file or registry entry)
Since early programming, constants have been used to turn things like cryptic hex values such as 0xff6d8da412 into something humanly readable without ever intending to change the values.
const int MODE_READ = 0x000000FF;
const int MODE_EXECUTE = 0x00FF0000;
const int MODE_WRITE = 0x0000FF00;
const int MODE_READ_WRITE = 0x0000FFFF;

I don't like either choice. Why not if (otherString.length() == 0)
Edit: I actually always code
if (otherString == null || otherString.length() == 0)

yes--it offers no benefit.
depends on what you're used to, I'm sure.
No, it's just a constant--not an abuse.

The same argument comes up in .NET from time to time (where there's already a readonly static field string.Empty). It's a matter of taste - but personally I find "" less obtrusive.

Hehe, funny thing is:
Once it compiles, you wont see a difference (in the byte-code) between the "static final" thing and the string literal, as the Java-compiler always inlines "static final String" into the target class. Just change your empty string into something recognizable (like the LGPL-text) and look at the resulting *.class file of code that refernces that constant. You will find your text copied into that class-file.

One case where it does make sense to have a constant with value of empty string is when you the name captures the semantics of the value. For example:
if (Constants.FORM_FIELD_NOT_SET.equals(form.getField("foobar"))) {
...
}
This makes the code more self documenting (apart from the argument that a better design is to add the method checking whether a field is set to the form itself).

We just do the following for situations like this:
public class StaticUtils
{
public static boolean empty(CharSequence cs)
{
return cs == null || cs.length() == 0;
}
public static boolean has(CharSequence cs)
{
return !empty(cs);
}
}
Then just import static StaticUtils.*

Hmm, the rules are right but are being taken in a different sense! Lets look at the cause, firstly all object references in java are checked by equals(). Earlier on, in some languages it was done using '==' operator, if by accident someone used '=' for '==', a catastrophe. Now the question of magic numbers/constants, for a computer all constants/numbers are similar. Instead of 'int ONE=1' one can surely use 1, but will that hold true for double PI = 3.141...? What happens if someone tries to change the precision sometime later.
If we were to come up with a check list, what would the rule be address the general guideline isn't it? All I mean to say is that rules are supposed to aid, we can surely bend the rules only when we know them very well. Common sense prevails. As suggested by a friend of mine, program constants like 0/1 which denote exit conditions can be hard coded and hence magic number principle doesn't apply. But for those which participate in logical checks/rules, better keep them as configurable constants.

Why it is preferable to use String.Empty in C# and therefore a public constant in other languages, is that constants are static, therefore only take up one instance in memory.
Every time you do something like this: -
stringVariable = "";
you are creating a new instance of a null string, and pointing to it with stringVariable.
So every time you make an assignment of "" to a variable (pointer), that "" null string is a new string instance until it no longer has any pointer assignments to it.
initializing strings by pointing them all to the same constant, means only one "" is ever created and every initialized variable points to the same null string.
It may sound trivial, but creating and destroying strings is much more resource intensive than creating pointers (variables) and pointing them to an existing string.
As string initialization is common, it is good practice to do: -
const String EMPTY_STRING = "";
String variable1 = EMPTY_STRING;
String variable2 = EMPTY_STRING;
String variable3 = EMPTY_STRING;
String variable4 = EMPTY_STRING;
String variable5 = EMPTY_STRING;
You have created 5 string pointers but only 1 string
rather than: -
String variable1 = "";
String variable2 = "";
String variable3 = "";
String variable4 = "";
String variable5 = "";
You have created 5 string pointers and 5 separate null strings.
Not a major issue in this case, but in thousands of lines of code in dozens of classes, it is unnecessary memory waste and processor use, creating another null string variable, when they can all point to the same one, making applications much more efficient.
Of course, compilers should be clever enough to determine several static strings and reuse duplicates, but why assume?
Also, it's less prone to introducing errors as "" and " " will both compile, yet you may miss the space you accidentally added which could produce spurious run time errors, for example conditional logic such as: -
myvariable = " ";
While (myVariable == ""){
...
}
Code inside the while block is unreachable because myVariable will not satisfy the condition on the first iteration. The error of initializing with " " instead of "" is easy to miss, whereas: -
myvariable = EMPTY_STRING;
While (myVariable == EMPTY_STRING){
...
}
... is less likely to cause runtime errors, especially as misspelling EMPTY_STRING would generate a compile error instead of having to catch the error at run time.
The cleanest solution, would be to create a static class that contains members of all kinds of string constants you need, should you require more than just an empty string.
public static class StringConstants{
public static String Empty = "";
public static String EMail = "mailto:%s";
public static String http = "http://%s";
public static String https = "https://%s";
public static String LogEntry = "TimeStamp:%tYmdHMSL | LogLevel:%s| Type:%s | Message: '%s'";
}
String myVariable = StringConstants.Empty;
You may even be able to extend the native String object, depending on your language.

If you every wish to store "empty" strings in a nullable string column in oracle, you will have to change the definition of EMPTY_STRING to be something other than ""! (I recall from the last time I was forced to use Oracle that it does not know the difference between an empty string and a null string).
However this should be done in your data access layer so the rest of the app does not know about it, and/or sort out your data model so you don’t need to store empty string AND null strings in the same column.

Or simply just have it as string.IsNullOrEmpty(otherString)

Related

Check list size with a magic number (1) or global constant?

I'm in an argument with a co-worker about the following code:
private static final byte ONE_ELEMENT = 1;
private boolean isListSizeEqualsOne(List<MyClass> myList) {
return myList.size() == ONE_ELEMENT;
}
I'm arguing that this kind of code admittedly reduces a warning about a magic number but unnecessarily increases clutter at the same time. I'm suggesting to inline the global variable instead:
private boolean isListSizeEqualsOne(List<MyClass> myList) {
return myList.size() == 1;
}
Is there any literature for / against this example?
I think the problem with the code is already in the method itself. Just like comments, a method name should not indicate what the code does, but why. In other words, it should indicate the functionality it provides, not its implementation.
That is, it should express the role that this method plays in the system. so instead of the name isListSizeEqualsOne, use a name that indicates the "why". For example resultIsUnique, or errorReturned (if you use an API where a list with a single element indicates an error).
Then the naming of the constant follows:
resultIsUnique: constant UNIQUE_RESULT_COUNT=1
errorReturned: constant ERROR_RESULT_COUNT=1
Finally, I don't think it is a good idea to enable warnings for inline constants. Using named constants for numbers only makes sense if either
the value must be the same everywhere (e.g. magic number for a file format), or
the value needs a name to be obvious, such as mathematical constants
If you need constants whose meaning is obvious (such as checking for an empty list by comparing the size to zero), then I think a plain inline value is perfectly ok.

Is it wise to declare a String as final if I use it many times?

I have a method repeatedMethod like this:
public static void repeatedMethod() {
// something else
anotherMethod("myString");
// something else
}
public static void anotherMethod(String str) {
//something that doesn't change the value of str
}
and I call the repeatedMethod many times.
I would like to ask if it is wise to declare myString as static final outside that method like this:
public static final String s = "myString";
public void repeatedMethod() {
anotherMethod(s);
}
I think that when I do anotherMethod("myString"), a new instance of String is created. And since I do that many times, many instances of String are created.
Therefore, it might be better to create only one instance of String outside the repeatedMethod and use only that one every time.
What you are doing is right but for the wrong reason.
When you do anotherMethod("myString"), no new instance of String is actually going to be created: it might reuse a String from the String constant pool (refer to this question).
However, factoring common String values as constants (i.e. as private static final) is a good practice: if that constant ever needs to change, you only need to change it in one place of the source code (for example, if tomorrow, "myString" needs to become "myString2", you only have one modification to make)
String literals of the same text are identical, there won't be excessive object creation as you fear.
But it's good to put that string literal in a static final variable (a constant) with a descriptive name that documents the purpose of that string. It's generally a recommended practice to extract string literals to constants with good names.
This is especially true when the same string literal appears in more than one place in the code (a "magic string"), in which case it's strongly recommended to extract to a constant.
no, you don't need to do that ,there is a "constant pool" in the JVM ,for every inline string (ex:"myString") ,it will be treated as an constant variable implicitly, and every identical inline string will be put in the constant pool just once.
for example ,for
String i="test",j="test";
there will be just one instance of constant variable "test" in the constant pool.
also refer to
http://www.thejavageek.com/2013/06/19/the-string-constant-pool/
Optimize for clarify before worrying about performance. String literals are only created once, ever (unless the literal is unloaded) Performance is not only less important usually but irreverent in this case.
You should define a constant instead repeating the same String to make it clear these strings don't happen to be the same, but must be the same. If someone trying to maintain the code later has to modify one String, does this mean they should all be changed or not.
BTW When you optimize for clarity you are also often optimizing for performance. The JIT looks for common patterns and if you try to out smart the optimizer you are more likely to confuse it resulting in less optimal code.

The "Why" behind PMD's StringInstantiation rule

Along the lines of an existing thread, The “Why” behind PMD's rules, I'm trying to figure out the meaning of one particular PMD rule : String and StringBuffer Rules.StringInstantiation.
This rule states that you shouldn't explicitly instantiate String objects. As per their manual page :
Avoid instantiating String objects; this is usually unnecessary since
they are immutable and can be safely shared.
This rule is defined by the following Java
class:net.sourceforge.pmd.lang.java.rule.strings.StringInstantiationRule
Example(s):
private String bar = new String("bar"); // just do a String bar =
"bar";
http://pmd.sourceforge.net/pmd-5.0.1/rules/java/strings.html
I don't see how this syntax is a problem, other than it being pointless. Does it affect overwhole performance ?
Thanks for any thought.
With String foo = "foo" there will be on instance of "foo" in PermGen space (This is referred to as string interning). If you were to later type String bar = "foo" there would still only be one "foo" in the PermGen space.
Writing String foo = new String( "foo" ) will also create a String object to count against the heap.
Thus, the rule is there to prevent wasting memory.
Cheers,
It shouldn't usually affect performance in any measurable way, but:
private String bar = new String("bar"); // just do a String bar = "bar";
If you execute this line a million times you will have created a million objects
private String bar = "bar"; // just do a String bar = "bar";
If you execute this line a million times you will have created one Object.
There are scenarios where that actually makes a difference.
Does it affect overwhole performance ?
Well, performance and maintenance. Doing something which is pointless makes the reader wonder why the code is there in the first place. When that pointless operation also involves creating new objects (two in this case - a new char[] and a new String) that's another reason to avoid it...
In the past, there has been a reason to call new String(existingString) if the existing string was originally obtained as a small substring of a longer string - or other ways of obtaining a string backed by a large character array. I believe that this is not the case with more recent implementations of Java, but obviously you can still be using an old one. This shouldn't be a problem for constant strings anyway, mind you.
(You could argue that creating a new object allows you to synchronize on it. I would avoiding synchronizing on strings to start with though.)
One difference is the memory footprint:
String a = "abc"; //one object
String b = "abc"; //same object (i.e. a == b) => still one object in memory
String c = new String("abc"); // This is a new object - now 2 objects in memory
To be honest, the only reason I can think of, why one would use the String constructor is in combination with substring, which is a view on the original string. Using the String constructor in that case helps getting rid of the original string if it is not needed any longer.
However, since java 7u6, this is not the case any more so I don't see any reasons to use it any more.
It can be useful, because it creates a new identity, and sometimes object identities are important/crucial to an application. For example, it can be used as an internal sentinel value. There are other valid use cases too, e.g. to avoid constant expression.
If a beginner writes such code, it's very likely a mistake. But that is a very short learning period. It is highly unlikely that any moderately experienced Java programmer would write that by mistake; it must be for a specific purpose. File it under "it looks like a stupid mistake, but it takes efforts to make, so it's probably intended".
It is
pointless
confusing
slightly slower
You should try to write the simplest, clearest code you can. Adding pointless code is bad all round.

What is the better approach to convert primitive data type into String

I can convert an integer into string using
String s = "" + 4; // correct, but poor style
or
String u = Integer.toString(4); // this is good
I can convert a double into string using
String s = "" + 4.5; // correct, but poor style
or
String u = Double.toString(4.5); // this is good
I can use String s = "" + dataapproach to convert either an int or double into String. While If I wants to use the other approach using toString() I have to use the Wrapper class of each data type. Then why in some books it is mentioned that the first approach is poor one while the second one is the better. Which one is the better approach and why?
I would use
String.valueOf(...)
You can use the same code for all types, but without the hideous and pointless string concatenation.
Note that it also says exactly what you want - the string value corresponding to the given primitive value. Compare that with the "" + x approach, where you're applying string concatenation even though you have no intention of concatenating anything, and the empty string is irrelevant to you. (It's probably more expensive, but it's the readability hit that I mind more than performance.)
How about String.valueOf()? It's overridden overloaded for all primitive types and delegates to toString() for reference types.
String s = "" + 4;
Is compiled to this code:
StringBuffer _$_helper = new StringBuffer("");
_$_helper.append(Integer.toString(4));
String s = _$_helper.toString();
Obviously that is pretty wastefull.
Keep in mind that behind the scene the compiler is always using StringBuffers if you use + in asociation with String's
There's a third one - String.valueOf(..) (which calls Wrapper.toString(..)
Actually the compiler adds Wrapper.toString(..) in these places, so it's the same in terms of bytecode. But ""+x is uglier.
The string-concatenation way creates an extra object (that then gets GCed), that is one reason why it's considered "poorer". Plus it is trickier and less readable, which as Jon Skeet points out is usually a bigger consideration.
I would also use String.valueOf() method, which, in effect, uses the primitive type's Wrapper object and calls the toString() method for you:
Example, in the String class:
public static String valueOf(int i) {
return Integer.toString(i, 10);
}
Appending a double quote is a bad way of doing it especially for readability. I would consider using some of the Apache util classes for conversion or writing your own utility methods for doing this type of stuff.
It is always better that you're aware of the type of argument you are trying to convert to string and also make compiler aware of the type. That simplifies the operation as well as the cycles. When you follow the append method, you are leaving the type decision to the compiler and also increasing the code lines for the compiler to do the same.
I think the answer really depends on what you're trying to convert, and for what purpose, but in general, I'm not a big fan of doing naked conversions, because in most instances, conversions to a string are for logging, or other human readability purposes.
MessageFormat.format("The value of XYZ object is {0}", object);
This gives good readability, fine grained control over the formatting of the output, and importantly, it can be internationalized by replacing the string with a message bundle reference.
Need I mention this also avoids the possible NPE problem of calling object.toString()?

Should I set the initial java String values from null to ""?

Often I have a class as such:
public class Foo
{
private String field1;
private String field2;
// etc etc etc
}
This makes the initial values of field1 and field2 equal to null. Would it be better to have all my String class fields as follows?
public class Foo
{
private String field1 = "";
private String field2 = "";
// etc etc etc
}
Then, if I'm consistent with class definition I'd avoid a lot of null pointer problems. What are the problems with this approach?
That way lies madness (usually). If you're running into a lot of null pointer problems, that's because you're trying to use them before actually populating them. Those null pointer problems are loud obnoxious warning sirens telling you where that use is, allowing you to then go in and fix the problem. If you just initially set them to empty, then you'll be risking using them instead of what you were actually expecting there.
Absolutely not. An empty string and a null string are entirely different things and you should not confuse them.
To explain further:
"null" means "I haven't initialized
this variable, or it has no value"
"empty string" means "I know what the value is, it's empty".
As Yuliy already mentioned, if you're seeing a lot of null pointer exceptions, it's because you are expecting things to have values when they don't, or you're being sloppy about initializing things before you use them. In either case, you should take the time to program properly - make sure things that should have values have those values, and make sure that if you're accessing the values of things that might not have value, that you take that into account.
Does it actually make sense in a specific case for the value to be used before it is set somewhere else, and to behave as an empty String in that case? i.e. is an empty string actually a correct default value, and does it make sense to have a default value at all?
If the answer is yes, setting it to "" in the declaration is the right thing to do. If not, it's a recipe for making bugs harder to find and diagnose.
I disagree with the other posters. Using the empty string is acceptable. I prefer to use it whenever possible.
In the great majority of cases, a null String and an empty String represent the exact same thing - unknown data. Whether you represent that with a null or an empty String is a matter of choice.
Generally it would be best to avoid this. A couple of reasons:
Getting a NullPointerException is generally a good warning that you are using a variable before you should be, or that you forgot to set it. Setting it to an empty string would get rid of the NullPointerException, but would likely cause a different (and harder to track down) bug further down the line in your program.
There can be a valid difference between null and "". A null value usually indicates that no value was set or the value is unknown. An empty string indicates that it was deliberately set to be empty. Depending on your program, that subtle difference could be important.
I know this is an old question but I wanted to point out the following:
String s = null;
s += "hello";
System.out.println(s);// this will return nullhello
whereas
String s = "";
s += "hello";
System.out.println(s); // this will return hello
obviously the really answer to this is that one should use StringBuffer rather than just concatenate strings but as we all know that for some code it is just simpler to concatenate.
I would suggest neither.
Instead you should give your fields sensible values. If they don't have to change, I would make them final.
public class Foo {
private final String field1;
private final String field2;
public Foo(String field1, String field2) {
this.field1 = field1;
this.field2 = field2;
}
// etc.
}
No need to assign, i'm-not-initialised-yet values. Just give it the initial values.
I would avoid doing this, you need to know if your instances aren't getting populated with data correctly.
Null is better, that is why they are called unchecked exceptions {Null pointer exception}. When the exception is thrown, it tells you that you have to initialize it to some non null value before calling any methods on it.
If you do
private String field1 = "";
You are trying to supress the error. It is hard to find the bug, later.
I think when you use String s = null it will create variable "s" on stack only and no object will exists on heap,but as soon as you declare things as like String s=""; what it will does is like it will create "" object on heap.As we know that Strings are immutable so whenever u wil assign new value to string varible everytime it will create new Object on heap...So I think String s=null is efficient than String s = "";
Suggestions are welcome!!!!!
No way. Why do you want to do that? That will give incorrect results. nulls and """ are not same.

Categories