Programming practice for defining string constants in Java

Programming practice for defining string constants in Java - java

My perception for defining string constants in Java is that one should define a string constant, when the same string is used at multiple places. This help in reducing typo errors, reduce the effort for future changes to the string etc.
But how about string that are used at a single place. Should we declare string constant even in that case.
For eg. Logging Some counter (random example).
CounterLogger.addCounter("Method.Requested" , 1)
Is there an advantage of declaring constant rather than using raw string?
Does the compiler does any optimization?

Declaring constants can improve your code because they can be more descriptive. In your example
CounterLogger.addCounter("Method.Requested" , 1)
The method parameter "Method.Requested" is quite self describing but the 1 is not making this a constant would make this example more readable.
CounterLogger.addCounter("Method.Requested" , INITIAL_VALUE)

The way I see it, Strings can be used in one of two ways:
As properties / keys / enumerations - or in other words, as an internal representation of another Objects/states of your application, where one code component writes them, and another one reads them.
In UI - for GUI / console / logging display purposes.
I Think it's easy to see how in both cases it's important to avoid hard-coding.
The first kind of strings must (if possible) be stored as constants and exposed to whichever program component that might use them for input/output.
Displayed Strings (like in your Logger case) are strings that you might change somewhere in the future. Having them all stored as static final fields in a constants-dedicated class can make later modifications much easier, and help avoid duplicates of similar massages.
Regarding the optimization question - as others have already answered, I believe there's no significant difference.

Presumably, you'll want to write a unit test for whichever method contains that line of code. That unit test will need access to that String value. If you don't use a constant, you'll have the String repeated twice, and if you have to change it in the future, you'll have to change it in both places.
So best to use a constant, even though the compiler is not going to do any helpful optimisations.

In my view in your case is fine. If you cant see any advantage in declaring it as a constant dont do it. To support this point take a look at Spring JdbcTemplate (I have no doubt that Spring code is a good example to follow) it is full of String literals like these
Assert.notNull(psc, "PreparedStatementCreator must not be null");
Assert.notNull(action, "Callback object must not be null");
throw getExceptionTranslator().translate("StatementCallback", getSql(action), ex);
but only two constants
private static final String RETURN_RESULT_SET_PREFIX = "#result-set-";
private static final String RETURN_UPDATE_COUNT_PREFIX = "#update-count-";
Iterestingly, this line
Assert.notNull(sql, "SQL must not be null");
repeats 5 times in the code nevertheless the authors refused to make it a constant

Related

Is it a good idea to declare all text getting used for logging as public static final

Is it a good idea to declare all text getting used for logging as public static final from performance point of view or otherwise ?
Does it have any advantage other than readability in case one string is getting used only once ?

First, the objective part of your question: is there a performance benefit from declaring a log statement static final, i.e:
private static final String SUCCESS = "Success!";
//[...]
log.info(SUCCESS);
log.info(SUCCESS);
// versus:
log.info("Success!");
log.info("Success!");
The JLS states in section 3.10.5:
[A] string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
So whether your string literal is declared once as a static final or appears multiple times in the source code, it will always be the same String instance, wherever it is used, and thus take up the same amount of memory, and will be accessed in exactly the same way. There will be no performance difference.
Now the other part of the question: is it a good idea? That is inherently subjective, but my opinion is that you should avoid declaring log messages as static final. Log messages add to the readability of the code, which is especially valuable when the code is being maintained by people who did not write it. For example:
log.warn(LOGIN_ERROR_OCCURRED, userId, attempt);
// compared to:
log.warn("Login failed for user {}; attempt {} of 5.", userId, attempt);
It's much quicker and easier to read the log message in the context of the code, rather than having to jump somewhere else in the code to see the full log message.

Easier internationalization and localization are possible advantages of using identifiers for string constants.
ResourceBundle bundle = ...
private final static LOGIN_ERROR_OCCURRED = bundle.getString("Login failed for user {}; attempt {} of 5");
But the benefits of i18n/L10n for log messages may be questionable.

Logging strings almost certainly should not be declared public, at least not usually. In most cases, it's better to declare constant variables for them, but sometimes you can be loose about that. Constants should appear together near the top of the class source, so for logging strings this gives a good overview of what happens in the class. It also makes it easier to find them for maintenance, like to edit out silly extraneous exclamation points. (Don't laugh; I've seen them.) I disagree that they obscure the point of log messages, unless you suck at naming variables. Which far too many people do.

what's wrong with this approach?

A new Code Review process has been put in place and now my team must not ever declare a string as a local variable, or the commit won't pass the code review. We are now to use constants instead.
So this is absolutely not allowed, even if we're dead sure the string will never be used in any other place
String operationId = "create";
This is what should be used instead:
private static final String OPERATION_ID = "create";
While I totally agree to use constants for strings that appears +2 times in the code ... I just find it overkill to completely not have the ability to declare a string in place if it's used only once.
Just to make sure it's clear, all the following are NOT ALLOWED under any circumstances:
String div = "div1";
Catch(Exception ex){ LOGGER.log("csv file is corrupt") }
String concatenation String str = "something ...." + someVar + "something" ... we are to replace someVar with %s, declare whole thing as a global string, and then later use String.format(....)
if( name.equals("Audi" ){....}
String value = map.get("key")
Any ideas guys ? I want some strong arguments. I'm ready to embrace any stand that's backed by a good argument.
Thanks.

First, let's throw out your assumption: There's nothing inherently wrong with the approach described.
It's not about strings being used in more than one place, it's about constants being easy to find and documented, and your code being consistent.
private static final String OPERATION_ID = "create";
Really, this isn't used anywhere else? Nothing would break if I changed this to the string "beetlejuice"? If something would break, then something else is using this constant... If the "something else" happens to be a codebase in a different language, and that's why they don't share string constants-- that's the exception, not the rule. Consistency!
That said, there are a few things I would standardize in a slightly different manner, but I would still standardize them nonetheless:
I would suggest allowing string literals in the constructors of enums:
public enum Operation {
CREATE("create"),
...
}
because here, the enum is the constant that is being referenced in the code, not the string literal. Declaring the constant as an enum or as a private static final String are equivalent to me, and there's no need to do both.
Additionally, I would not use this pattern anywhere that it breaks your IDE's ability to warn you about missing strings-- For example, looking up strings from .properties files. Many IDEs will give you proper warnings when you look up a key in a .properties file that doesn't exist, but the extra level of indirection might break that depending upon how smart your IDE is.
Catch(Exception ex){ LOGGER.log("csv file is corrupt") }
This to me is a bit of a gray area - Is this an internal-only message? Are the logs only ever seen by you, the developer, or are they for a user's benefit too?
If it's only for developers of the application These probably don't need to be localized.
If you do expect the user to view the logs, then they should be externalized into a .properties file.

It is good coding style to define a constant for a value/literal when the value/literal is used multiple times.
The imposed coding style forces you to use a constant for every string literal.
The good effect of that coding style is: All string literals which really should be declared as constants are now declared as constants.
The bad implication of that coding style is: You - the developers - are not able to decide if a string literal should be defined as constant or not. This is a heavy punch.
Therefore you should raise your concerns that the good intention of the coding style does not compensate for the mistrust in your developer qualitites.

How do I ensure the format for saving and parsing string representations of Objects correlate properly

I am making a small boardgame program which needs to persist the state of the board to a file, and later read from the file and re-create the board.
I am delegating this functionality to the class shown below. I would like to implement this such that the save format of a square of the board along with it's co-ordinates are captured in the SQUARE_FORMAT constant, and the regex for reading that same information is captured in the LOAD_REGEX constant. Both should co-relate in code and also be able to visually decipher (by that I mean that a person should be able to clearly see that they co-relate to the same data)
Is there an idiom or pattern for doing this in Java code ?
public class BoardPersistenceUtility {
private final String SQUARE_SAVE_FORMAT = "";
private fial String LOAD_REGEX = "";
public void save(PrintWriter writer, Board board) {
}
public Board load(BufferedReader reader) {
// Implement
return null;
}
}
Update 1:
On reading my question again, I guess it might be a bit confusing, about what exactly I am looking for. I am specifically looking for the right way to represent SQUARE_SAVE_FORMAT so that it clearly co-relates with the regex LOAD_REGEX.
SQUARE_SAVE_FORMAT would ideally be a String which uses special characters/variables that will be replaced with actual values and the result will be saved to a file. LOAD_REGEX is the corresponding regex that will be used to read contents from the file. The regex will use capturing groups so I can re-create the original object from the values I get from the capturing groups.
My question is, what are the idioms around creating such pairs of Strings - one of them a format string to be used for saving data, and the other a regex to be used while reading that data.
Update 2:
On thinking a bit more, I think I have been able to clarify my question a bit better.
If you look at both the Strings, SQUARE_SAVE_FORMAT is a format string which will be used in String.format() to create the text for a square on the board, which will be saved in the file. The constant SQUARE_LOAD_REGEX is a regex which will be used to read the line and capture relevant parts into named groups, so I can re-create the original object. (sorry if my regex is slightly incorrect... I quickly wrote something, but I need to refresh some regex principles to ensure that this is indeed what I need)
If you look at both these Strings visually, it is difficult to co-relate them together. Perhaps it is because we do not have any named variables in a Java format String. The best we can do is to specify %i where i is the index of the argument.
I would like to understand if there is any idiom or pattern to represent such pairs of Strings, where one is used for formatting some data to text and the other is used to read the same text and parse it's parts.
public class BoardPersistenceUtility {
private final String SQUARE_SAVE_FORMAT =
"%d,%d:%b-%s";
private final String SQUARE_LOAD_REGEX =
"^(?<row>\d*),(?<col>\d*):(?<mine>true|false)-(?<status>\w)$";
public void save(PrintWriter writer, Board board) {
}
public Board load(BufferedReader reader) {
// Implement
return null;
}
}

Note: you call SQUARE_SAVE_FORMAT and LOAD_REGEX "constants" which they are not, as you haven't declared them static final. It's better to keep terminology clear :-)
The simplest way to link these two is to define a class which encloses both as (final) fields. If you plan to define multiple such pairs of information, you can define multiple instances of the class, one for each type of format.
If you really want to keep these as constants, it may be best to define the enclosing class as an enum. Note that Java enums may contain methods too, so you may choose to implement the save/load logic as Strategies in the enum instances themselves, and call these polymorphically, which may help simplify your code.

I'm still not sure what you mean, but need formatting, so answer instead of comment.
First of all, the names are almost completely unrelated--related them somehow.
SQUARE_DATA_STORE
SQUARE_DATA_REGEX
Second, there's no point in differentiating the "style" of the saved data if there's only a single BoardPersistenceUtility--if there were multiple formats then that information would be captured in a persistence utility subclass, like SquareFormatPersister or something.
Third, according to your text, one string is where the data will actually be stored. The other is a regular expression. The two will, in this case, never be "visually similar"--regular expressions of any complexity will never (much) look like the strings they can represent. (In this case, we have no clue, because we don't know what the board data can look like, of course.)
If your code is so non-self-explanatory that the reader can't figure out the two fields are related through via your comments and your code, something has gone horribly wrong. I'm having a hard time imagining this code is so overwhelmingly complex that their relationship cannot be trivially communicated.
Edit after update
The answer is still no.
You could use a templating mechanism to provide names for the fields, similar to those used in your regex. This might also make the code a bit more self-explanatory as you'd fill the template context with named values (like "row" or "col").
You could use a real parser/generator, but the complexity there is a bit too much.
You could use a DSL (internal using Groovy, JRuby, JavaScript, etc. or external, which brings us back to parsing) and write chunks of the code that way.
IMO you're over-thinking, and over-estimating perceived complexity: except possibly for the templating solution, which IMO is likely over-engineering for the level of difficulty, you'd be far better off writing one or two sentences, which should be more than enough to relate the "fields" of the load and save formats.

Put comments in your code to explain that they're related, how they're related, what they're used for, and that if one is changed, the other should be modified accordingly.
Implement a unit test to make sure that a saved board can be loaded.
Make sure that your build and release process runs the unit tests, and fails if one of them doesn't pass.

java shortcuts to Pojos?

I am wondering a stupid question but well, I love to learn :)
Say I got the following code :
public String method(<T> a)
{
String dataA = a.getB().getC().getD();
}
At what point it becomes interesting to define a map which cache our requests and holds this :
Map<<T>, String> m;
m.put(a, dataA);
and then of course,
[SNIP the usual tests of nullity and adding the object if it is missing and so forth plus the refreshing issues]
return m.get(a);
Let me stress that the successive gets are NOT costy (no things such as DB calls, or JNDI lookups).
It's just that it's clearer if we define a dictionnary rather than read the whole string of "gets".
I consider that making a get call is NEARLY "free" in CPU time. Again, I suppose that retrieving the data from an hashmap is NOT exactly free but nearly (at least, in my case, it is :) ).
My question is really in terms of readibility, not performance.
Thanks !

To increase readability (and decrease dependencies), you should define an accessor in A, such as
public String getDataA() {
return getB().getC().getD();
}
Then your calling code becomes
String dataA = a.getDataA();
You may say that you would need too many such shortcut methods in A, cluttering its interface. That is actually a sign of a class design issue. Either A has grown too big and complex (in which case it may be better to partition it into more than one class), or the code needing all these far away pieces of data actually belongs to somewhere else - say into B or C - rather than to A's client.

A couple of things to consider:
Apache Beanutils has a lot of utilities for this sort of thing: http://commons.apache.org/beanutils/
java.util.properties, if the values are all strings
If you really want to access things like this you can also look at using groovy instead. All lookups on maps in groovy can be done with '.' notation and it also supports a "safe" accessor which will check for nulls.

MVEL is another option:
String d = (String) MVEL.eval("b.?c.?d", a);
I will say that data dictionaries lead to typesafety issues. There's no guarantee that everyone puts the right types in the right data elements, or even defines all the required elements.
Also, when using an expression language as above, there's also typesafety issues, as there's no compile time check on the actual expression to make sure that a) it makes sense, and b) it returns the right type.

Which toString technique is more efficient?

I have a class called Zebra (not her actual name). Zebra overrides the toString method to provide her own convoluted obfuscated stringification.
Which is more efficient to stringify an instance of Zebra? Presuming that I have to do this stringification millions of times per session.
zebra.toString()
""+zebra
static String BLANK (singleton)
BLANK+zebra (multiple executions).
Where the value of zebra is not assured to be the same.
I am conjecturing that the answer could be - no concern: the compiler makes them all equivalent. If that is not the answer, please describe the instantiation process that makes them different. (2) and (3) could be the same, since the compiler would group all similar strings and assign them to a single reference.
Normally, I do ""+zebra because I am too lazy to type zebra.toString().
ATTN: To clarify.
I have seen questions having been criticised like "why do you want to do this, it's impractical" If every programmer refrains from asking questions because it has no practical value, or every mathematician does the same - that would be the end of the human race.
If I wrote an iteration routine, the differences might be too small. I am less interested in an experimental result than I am interested in the difference in processes:
For example, zebra.toString() would invoke only one toString while, "+zebra would invoke an extra string instantiation and and extra string concat. Which would make it less efficient. Or is it. Or does the compiler nullify that.
Please do not answer if your answer is focused on writing an iterative routine, whose results will not explain the compiler or machine process.
Virtue of a good programmer = lazy to write code but not lazy to think.

Number 1 is more efficient.
The other options create an instance of StringBuilder, append an empty string to it, call zebra.toString, append the result of this to the StringBuilder, and then convert the StringBuilder to a String. This is a lot of unnecessary overhead. Just call toString yourself.
This is also true, by the way, if you want to convert a standard type, like Integer, to a String. DON'T write
String s=""+n; // where "n" is an Integer
DO write
String s=n.toString();
or
String s=String.valueOf(n);

As a general rule, I would never use the + operator unless it is on very small final/hard-coded strings. Using this operator usually results in several extra objects in memory being created before your resulting string is returned (this is bad, especially if it happens "millions of times per session").
If you ever do need to concatenate strings, such as when building a unique statement dynamically (for SQL or an output message for example). Use a StringBuilder!!! It is significantly more efficient for concatenating strings.
In the case of your specific question, just use the toString() method. If you dont like typing, use an IDE (like eclipse or netbeans) and then use code completion to save you the keystrokes. just type the first letter or 2 of the method and then hit "CTRL+SPACE"

zebra.toString() is the best option. Keep in mind zebra might be null, in which case you'll get a NullPointerException. So you might have to do something like
String s = zebra==null ? null : zebra.toString()
""+zebra results in a StringBuilder being created, then "" and zebra.String() are appended separately, so this is less efficient. Another big difference is that if zebra is null, the resulting string will be "null".

If the Zebra is Singleton class or the same instance of zebra is being used then you can store the result of toString in Zebra and reuse it for all future calls to toString.
If its not the case then in implementation of toString cache the part which is unchanges everytime in constructing String at one place, this was you can save creating some string instances every time.
Otherwise I do not see any escape from the problem you have :(

Option 1 is the best option since every option calls the toString() method of zebra, but options 2 and 3 also do other (value free) work.
zebra.toString() - Note that this calls the toString() method of zebra.
""+zebra - This also calls the toString() method of zebra.
static String BLANK; BLANK+zebra; - This also calls the toString() method of zebra.
You admit "I'm lazy so I do stupid stuff". If you are unwilling to stop being lazy, than I suggest you not concern yourself with "which is better", since lazy is likely to trump knowledge.

Since the object's toString method will be invoked implicitly in cases where it is not invoked explicitly, a more "efficient" way doesn't exist unless the "stringification" is happening to the same object. In that case, it's best to cache and reuse instead of creating millions of String instances.
Anyway, this question seems more focused on aesthetics/verbosity than efficiency/performance.

If you want to know things like this you can code small example routines and look at the generated bytecode using the javap utility.
I am conjecturing that the answer could be - no concern: the compiler makes them all equivalent. [...] Normally, I do ""+zebra because I am too lazy to type zebra.toString().
Two things:
First: The two options are different. Think about zebra being null.
Second: I'm to lazy to do this javap stuff for you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.