I'm designing a public interface (API) for a package. I wonder, should I use CharSequence generally instead of String. (I'm mainly talking about the public interfaces).
Are there any drawbacks of doing so? Is it considered a good practice?
What about using it for identifier-like purposes (when the value is matched against a set in a hash-based container)?
CharSequence is rarely used in general purpose libraries. It should usually be used when your main use case is string handling (manipulation, parsing, ...).
Generally speaking you can do anything with a CharSequence that you could do with a String (trivially, since you can convert every CharSequence into a String). But there's one important difference: A CharSequence is not guaranteed to be immutable! Whenever you handle a String and inspect it at two different points in time, you can be sure that it will have the same value every time.
But for a CharSequence that's not necessarily true. For example someone could pass a StringBuilder into your method and modify it while you do something with it, which can break a lot of sane code.
Consider this pseudo-code:
public Object frobnicate(CharSequence something) {
Object o = getFromCache(something);
if (o == null) {
o = computeValue(something);
putIntoCache(o, something);
}
return o;
}
This looks harmless enough and if you'd had used String here it would mostly work (except maybe that the value might be calculated twice). But if something is a CharSequence then its content could change between the getFromCache call and the computeValue call. Or worse: between the computeValue call and the putIntoCache call!
Therefore: only accept CharSequence if there are big advantages and you know the drawbacks.
If you accept CharSequence you should document how your API handles mutable CharSequence objects. For example: "Modifying an argument while the method executes results in undefined behaviour."
This does depend on what you need, I'd like to state two advantages of String, however.
From CharSequence's documentation:
Each object may be implemented by a different class, and there is no
guarantee that each class will be capable of testing its instances for
equality with those of the other. It is therefore inappropriate to use
arbitrary CharSequence instances as elements in a set or as keys in a
map.
Thus, whenever you need a Map or reliable equals/hashCode, you need to copy instances into a String (or whatever).
Moreover, I think CharSequence does not explicitly mention that implementations must be immutable. You may need to do defensive copying which may slow down your implementations.
Java CharSequence is an interface. As the API says, CharSequence has been implemented in CharBuffer, Segment, String, StringBuffer, StringBuilder classes. So if you want to access or accept your API from all these classes thenCharSequence is your choice. If not then String is very good for a public API because it is very easy & everybody knows about it. Remember CharSequence only gives you 4 method, so if you are accepting a CharSequence object through a method, then your input manipulation ability will be limited.
If a parameter is conceptually a sequence of chars, use CharSequence.
A string is technically a sequence of chars, but most often we don't think of it like that; a string is more atomic / holistic, we don't usually care about individual chars.
Think about int - though an int is technically a sequence of bits, we don't usually care about individual bits. We manipulate ints as atomic things.
So if the main work you are going to do on a parameter is to iterate through its chars, use CharSequence. If you are going to manipulate the parameter as an atomic thing, use String.
You can implement CharSequenceto hold your passwords, because the usage of String is discouraged for that purpose. The implementation should have a dispose method that wipes out the plain text data.
Related
I can see that contentEquals is useful for comparing char sequences but I can't find anywhere specifying which method is the best to use when comparing two strings.
Here mentions the differences between both methods but it doesn't explicitly say what to do with two strings.
I can see one advantage of usng contentEquals is that if the variable passed in has its type changed, a compilation error will be thrown. A disadvantage could be the speed of execution.
Should I always use contentEquals when comparing strings or only use it if there are different objects extending CharSequence?
you should use String#equals when comparing the content fo two Strings. Only use contentEquals if one of the Object is not of the type String.
1) it is less confusing. Every Java developer should know what the method is doing, but contentEquals is a more specialised method and therefore less known.
2) It is faster, as you can see in the implementation of contentEquals it calls equals after checking if the sequence is of type AbstractStringBuilder so you save the execution time of that check. But even if the execution would be slower this should not be the first point to make your decision on. First go for readability.
The advantage of contentEquals() is support for objects that implement a CharSequence. When you have a StringBuilder it would be wasteful to call StringBuilder.toString() just so you can use equals() a moment later. In this case contentEquals() helps to avoid allocating a new String to do the comparison.
When comparing two String objects just use equals().
In an interview, I want to build up a new String with some substrings. I argued that ArrayList<String> is almost the same as StringBuilder, but the interviewer said I should always use StringBuilder if I need to deal with String. I think the time complexity of adding/removing functions between them are the same.
They aren't the same thing at all. StringBuilder builds a single string, while ArrayList<String> is just that--an array of separate strings. Of course, you can concatenate all of the array's strings with String.join("", list), where the first argument is the separator that you want to use, but why would you go that route instead of just using the class that was designed to do exactly what you're trying to do in the first place?
It all comes down to memory consumption. String is an object, while ArrayList<String> holds separate objects, StringBuilder holds only one.
StringBuilder has a member function to return the whole built string, whereas in ArrayList, you have to concatenate the strings yourself.
Unless you continue to need the separate elements you are adding to the list, you should use a StringBuilder.
After all, you can't directly get a concatenated string from the contents of the list: you have to put it in, say, a StringBuilder.
But in the specific case of building up a string of substrings, StringBuilder provides methods to allow you to append portions of Strings without using substring: the append(CharSequence, int, int) method is an optimization to avoid creating that extra string.
It should be mentioned that, at least when I have written python, it has been considered better to build up a list, and then use ''.join(theList) at the end, which is basically the analog of ArrayList<String>.
I don't know enough about python to know why this is considered particularly better.
You can "build" strings using both. However StringBuilder is a class specializing in building strings with its append insert delete charAt etc... methods. An ArrayList is a general purpose collection which lacks most of this functionality. Consider implementing the following (contrived example) with an ArrayList:
StringBuilder sb = new StringBuilder().append("time: ")
.append(System.currentTimeMillis())
.deleteCharAt(4)
.reverse();
System.err.println(sb); // 3153067310451 emit
Ergonomics and readability aside, there are performance considerations but those are largely irrelevant on trivially sized examples.
If you need a single String at the end, performance and memory consumption are some differences for sure. Whenever you build a String from parts, in the good case you end up using StringBuilder, or in a slightly worse case StringBuffer, and in the worst case you end up concatenating two strings, then throw them away, and repeat - lots of allocations and garbage collection in this case.
JLS12 still mentions StringBuffer by name for optimization (but hopefully StringBuilder is used internally, as similar technique):
An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
In the particular case of having a List<String> and later using String.join() on it, StringJoiner contains the particular StringBuilder object which is going to be used.
So there will be a builder anyway, and then it may be more efficient to use it from the beginning.
I'm writing a program with a bunch of classes that will be serialized to save in a database and to be sent through a network.
To make things easier for accessing the class properties via command line interface, I'm considering storing the properties in a Map class, instead of giving each property it's own variable.
Basically, instead of using something like this:
String id = account.getUserId();
I would do this
String id = account.properties.get("userId");
Is this an advisable way to do things?
Yes, it's a pretty sensible model. It's sometimes called the "prototype object model" and is very similar to how you would work in JavaScript where every object is effectively a Map. This in turn has led to the very popular JSON serialisation format.
Nice features:
You don't have to worry about messy inheritance heirarchies - you can just alter the properties at will.
You can create a new object just by copying from another object (the prototype)
Code to manipulate the data can do so in a uniform way, without having to explicitly name all the variables.
It's more "dynamic" compared to a static class definition - it's easy to extend and modify your objects
Potential risks / downsides:
You need to keep track of your property names if you use Strings - the compiler won't do it for you! This issue can be alleviated by using Enums as keys, but then you lose some flexibility...
You don't get the benefits of static type checking, so you may find that you need to write more JUnit tests as a result to ensure things are working properly
There is a slight performance overhead (though probably not enough to worry about, as map lookups are very fast)
I actually wrote an entire game in the 90s using a variant og this object model (Tyrant) and it worked very well.
Rather than having a Map object exposed however, you may want to consider encapsulating this functionality so that you can use an accessor method on the object itself, e.g.
String id = account.getProperty("userId");
How I prefer to do this is often like this:
enum StringPropertyType {
USERID, FIRSTNAME, LASTNAME
}
interface StringAttributes {
String get(StringPropertyType s);
void put(StringPropertyType s, String value);
}
class MapBasedStringAttributes implements StringAttributes {
Map<StringPropertyType, String> map = new HashMap<~>();
String get(StringPropertyType s) { return map.get(s); }
void put(StringPropertyType s, String value) { map.put(s,value); }
}
this gives you compile-time safety, refactoring, etc.
you could also use the stringPropertyType.name() to get the string representation of the enum value and use
Map<String,String>
instead..
Why there is no reverse method in String class in Java? Instead, the reverse() method is provided in StringBuilder? Is there a reason for this? But String has split(), regionMatches(), etc., which are more complex than the reverse() method.
When they added these methods, why not add reverse()?
Since you have it in StringBuilder, there's no need for it in String, right? :-)
Seriously, when designing an API there's lots of things you could include. The interfaces are however intentionally kept small for simplicity and clarity. Google on "API design" and you'll find tons of pages agreeing on this.
Here's how you do it if you actually need it:
str = new StringBuilder(str).reverse().toString();
Theoretically, String could offer it and just return the correct result as a new String. It's just a design choice, when you get down to it, on the part of the Java base libraries.
If you want an historical reason, String are immutable in Java, that is you cannot change a given String if not creating another String.
While this is not bad "per se", initial versions of Java missed classes like StringBuilder. Instead, String itself contained (and still contains) a lot of methods to "alter" the String but since String is immutable, each of these methods actually creates and return a NEW String object.
This caused simple expressions like :
String s = "a" + anotherString.substr(10,5).trim().toLowerCase();
To actually create in ram something like 5 strings, 4 of which are absolutely useless, with obvious performance problems (despite after there has been some optimizations regarding underlying char[] arrays).
To solve this, Sun introduced StringBuilder and other classes that ARE NOT immutable. These classes freely modify a single char[] array, so that calling methods does not need to produce many intermediate String instances.
They added "reverse" quite lately, so they added it to StringBuilder instead of String, cause that's now the preferred way to manipulate strings.
As a side-note, in Scala you use the same java.lang.String class and you do get a reverse method (along with all kinds of other handy stuff). The way it does it is with implicit conversions, so that your String gets automatically converted into a class that does have a reverse method. It's really quite clever, and removes the need to bloat the base class with hundred of methods.
String is immutable, meaning it can't be changed.
When you reverse a String, what's happening is that each letter is switched on it's own, means it will always create the new object each times.
Let us see with example:
This means that for instance Hello becomes as below
elloH lloeH loleH olleH
and you end up with 4 new String objects on the heap.
So think if you have thousands latter of string or more then how much object will be created.... it will be really a very expensive. So too much memory will be occupied.
So because of this String class not having reverse() method.
Well I think it could be because it is an immutable class so if we had a reverse method it would actually create a new object.
reverse() acts on this, modifying the current object, and String objects are immutable - they can't be modified.
It's peculiarly efficient to do reverse() in situ - the size is known to be the same, so no allocation is necessary, there are half as many loop iterations as there would be in a copy, and, for large strings, memory locality is optimal. From looking at the code, one can see that a lot of care was taken to make it fast. I suspect the author(s) had a particular use case in mind that demanded high performance.
I'm using enumerations to replace String constants in my java app (JRE 1.5).
Is there a performance hit when I treat the enum as a static array of names in a method that is called constantly (e.g. when rendering the UI)?
My code looks a bit like this:
public String getValue(int col) {
return ColumnValues.values()[col].toString();
}
Clarifications:
I'm concerned with a hidden cost related to enumerating values() repeatedly (e.g. inside paint() methods).
I can now see that all my scenarios include some int => enum conversion - which is not Java's way.
What is the actual price of extracting the values() array? Is it even an issue?
Android developers
Read Simon Langhoff's answer below, which has pointed out earlier by Geeks On Hugs in the accepted answer's comments. Enum.values() must do a defensive copy
For enums, in order to maintain immutability, they clone the backing array every time you call the Values() method. This means that it will have a performance impact. How much depends on your specific scenario.
I have been monitoring my own Android app and found out that this simple call used 13.4% CPU time! in my specific case.
In order to avoid cloning the values array, I decided to simple cache the values as a private field and then loop through those values whenever needed:
private final static Protocol[] values = Protocol.values();
After this small optimisation my method call only hogged a negligible 0.0% CPU time
In my use case, this was a welcome optimisation, however, it is important to note that using this approach is a tradeoff of mutability of your enum. Who knows what people might put into your values array once you give them a reference to it!?
Enum.values() gives you a reference to an array, and iterating over an array of enums costs the same as iterating over an array of strings. Meanwhile, comparing enum values to other enum values can actually be faster that comparing strings to strings.
Meanwhile, if you're worried about the cost of invoking the values() method versus already having a reference to the array, don't worry. Method invocation in Java is (now) blazingly fast, and any time it actually matters to performance, the method invocation will be inlined by the compiler anyway.
So, seriously, don't worry about it. Concentrate on code readability instead, and use Enum so that the compiler will catch it if you ever try to use a constant value that your code wasn't expecting to handle.
If you're curious about why enum comparisons might be faster than string comparisons, here are the details:
It depends on whether the strings have been interned or not. For Enum objects, there is always only one instance of each enum value in the system, and so each call to Enum.equals() can be done very quickly, just as if you were using the == operator instead of the equals() method. In fact, with Enum objects, it's safe to use == instead of equals(), whereas that's not safe to do with strings.
For strings, if the strings have been interned, then the comparison is just as fast as with an Enum. However, if the strings have not been interned, then the String.equals() method actually needs to walk the list of characters in both strings until either one of the strings ends or it discovers a character that is different between the two strings.
But again, this likely doesn't matter, even in Swing rendering code that must execute quickly. :-)
#Ben Lings points out that Enum.values() must do a defensive copy, since arrays are mutable and it's possible you could replace a value in the array that is returned by Enum.values(). This means that you do have to consider the cost of that defensive copy. However, copying a single contiguous array is generally a fast operation, assuming that it is implemented "under the hood" using some kind of memory-copy call, rather than naively iterating over the elements in the array. So, I don't think that changes the final answer here.
As a rule of thumb : before thinking about optimizing, have you any clue that this code could slow down your application ?
Now, the facts.
enum are, for a large part, syntactic sugar scattered across the compilation process. As a consequence, the values method, defined for an enum class, returns a static collection (that's to say loaded at class initialization) with performances that can be considered as roughly equivalent to an array one.
If you're concerned about performance, then measure.
From the code, I wouldn't expect any surprises but 90% of all performance guesswork is wrong. If you want to be safe, consider to move the enums up into the calling code (i.e. public String getValue(ColumnValues value) {return value.toString();}).
use this:
private enum ModelObject { NODE, SCENE, INSTANCE, URL_TO_FILE, URL_TO_MODEL,
ANIMATION_INTERPOLATION, ANIMATION_EVENT, ANIMATION_CLIP, SAMPLER, IMAGE_EMPTY,
BATCH, COMMAND, SHADER, PARAM, SKIN }
private static final ModelObject int2ModelObject[] = ModelObject.values();
If you're iterating through your enum values just to look for a specific value, you can statically map the enum values to integers. This pushes the performance impact on class load, and makes it easy/low impact to get specific enum values based on a mapped parameter.
public enum ExampleEnum {
value1(1),
value2(2),
valueUndefined(Integer.MAX_VALUE);
private final int enumValue;
private static Map enumMap;
ExampleEnum(int value){
enumValue = value;
}
static {
enumMap = new HashMap<Integer, ExampleEnum>();
for (ExampleEnum exampleEnum: ExampleEnum.values()) {
enumMap.put(exampleEnum.value, exampleEnum);
}
}
public static ExampleEnum getExampleEnum(int value) {
return enumMap.contains(value) ? enumMap.get(value) : valueUndefined;
}
}
I think yes. And it is more convenient to use Constants.