Weird Java String comparison - java

I'm having a minor issue with Java String comparisons.
I've written a class which takes in a String and parses it into a custom tree type. I've written a toString class which then converts this tree back to a String again. As part of my unit tests I'm just checking that the String generated by the toString method is the same as the String that was parsed in the first place.
Here is my simple test with a few printouts so that we can see whats going on.
final String exp1 = "(a|b)";
final String exp2 = "((a|b)|c)";
final Node tree1 = Reader.parseExpression2(exp1);
final Node tree2 = Reader.parseExpression2(exp2);
final String t1 = tree1.toString();
final String t2 = tree2.toString();
System.out.println(":" + exp1 + ":" + t1 + ":");
System.out.println(":" + exp2 + ":" + t2 + ":");
System.out.println(exp1.compareToIgnoreCase(t1));
System.out.println(exp2.compareToIgnoreCase(t2));
System.out.println(exp1.equals(t1));
System.out.println(exp2.equals(t2));
Has the following output; (NB ":" - are used as delineators so I can ensure theres no extra whitespace)
:(a|b):(a|b):
:((a|b)|c):((a|b)|c):
-1
-1
false
false
Based on manually comparing the strings exp1 and exp2 to t1 and t2 respectively, they are exactly the same. But for some reason Java is insisting they are different.
This isn't the obvious mistake of using == instead of .equals() but I'm stumped as to why two seemingly identical strings are different. Any help would be much appreciated :)

Does one of your strings have a null character within it? These might not be visible when you use System.out.println(...).
For example, consider this class:
public class StringComparison {
public static void main(String[] args) {
String s = "a|b";
String t = "a|b\0";
System.out.println(":" + s + ":" + t + ":");
System.out.println(s.equals(t));
}
}
When I ran this on Linux it gave me the following output:
:a|b:a|b:
false
(I also ran it on Windows, but the null character showed up as a space.)

Well, it certainly looks okay. What I would do would be to iterate over both strings using charAt to compare every single character with the equivalent in the other string. This will, at a minimum, hopefully tell you the offending character.
Also output everything else you can find out about both strings, such as the length.
It could be that one of the characters, while looking the same, may be some other Unicode doppelganger :-)
You may also want to capture that output and do a detailed binary dump on it, such as loading it up into gvim and using the hex conversion tool, or executing od -xcb (if available) on the captured output. There may be an obvious difference when you get down to the binary examination level.

I have some suggestions
Copy each output and paste in Notepad (or any similar editor), then
copy them again and do something like this
System.out.println("(a|b)".compareToIgnoreCase("(a|b)"));
Print out the integer representation of each character. If it is a weird unicode, the int representation will be different.
Also what version of JDK are you using?

Related

What is the difference between a = a.trim() and a.trim()?

I've ran into a bit of a confusion.
I know that String objects are immutable. This means that if I call a method from the String class, like replace() then the original contents of the String are not altered. Instead, a new String is returned based on the original. However the same variable can be assigned new values.
Based on this theory, I always write a = a.trim() where a is a String. Everything was fine until my teacher told me that simply a.trim() can also be used. This messed up my theory.
I tested my theory along with my teacher's. I used the following code:
String a = " example ";
System.out.println(a);
a.trim(); //my teacher's code.
System.out.println(a);
a = " example ";
a = a.trim(); //my code.
System.out.println(a);
I got the following output:
example
example
example
When I pointed it out to my teacher, she said,
it's because I'm using a newer version of Java (jdk1.7) and a.trim()
works in the previous versions of Java.
Please tell me who has the correct theory, because I've absolutely no idea!
String is immutable in java. And trim() returns a new string so you have to get it back by assigning it.
String a = " example ";
System.out.println(a);
a.trim(); // String trimmed.
System.out.println(a);// still old string as it is declared.
a = " example ";
a = a.trim(); //got the returned string, now a is new String returned ny trim()
System.out.println(a);// new string
Edit:
she said that it's because I'm using a newer version of java (jdk1.7) and a.trim() works in the previous versions of java.
Please find a new java teacher. That's completely a false statement with no evidence.
Simply using "a.trim()" might trim it in memory (or a smart compiler will toss the expression entirely), but the result isn't stored unless you precede with assigning it to a variable like your "a=a.trim();"
String are immutable and any change to it will create a new string. You need to use the assignment in case you want to update the reference with the string returned from trim method. So this should be used:
a = a.trim()
You have to store string value in same or different variable if you want some operation (e.g trim)on string.
String a = " example ";
System.out.println(a);
a.trim(); //output new String is not stored in any variable
System.out.println(a); //This is not trimmed
a = " example ";
a = a.trim(); //output new String is stored in a variable
System.out.println(a); //As trimmed value stored in same a variable it will print "example"

Java printing a String containing an integer

I have a doubt which follows.
public static void main(String[] args) throws IOException{
int number=1;
System.out.println("M"+number+1);
}
Output: M11
But I want to get it printed M2 instead of M11. I couldn't number++ as the variable is involved with a for loop, which gives me different result if I do so and couldn't print it using another print statement, as the output format changes.
Requesting you to help me how to print it properly.
Try this:
System.out.printf("M%d%n", number+1);
Where %n is a newline
Add a bracket around your sum, to enforce the sum to happen first. That way, your bracket having the highest precedence will be evaluated first, and then the concatenation will take place.
System.out.println("M"+(number+1));
It has to do with the precedence order in which java concatenates the String,
Basically Java is saying
"M"+number = "M1"
"M1"+1 = "M11"
You can overload the precedence just like you do with maths
"M"+(number+1)
This now reads
"M"+(number+1) = "M"+(1+1) = "M"+2 = "M2"
Try
System.out.println("M"+(number+1));
Try this:
System.out.println("M"+(number+1));
A cleaner way to separate data from invariants:
int number=1;
System.out.printf("M%d%n",number+1);
System.out.println("M"+number+1);
Here You are using + as a concatanation Operator as Its in the println() method.
To use + to do sum, You need to Give it high Precedence which You can do with covering it with brackets as Shown Below:
System.out.println("M"+(number+1));
System.out.println("M"+number+1);
String concatination in java works this way:
if the first operand is of type String and you use + operator, it concatinates the next operand and the result would be a String.
try
System.out.println("M"+(number+1));
In this case as the () paranthesis have the highest precedence the things inside the brackets would be evaluated first. then the resulting int value would be concatenated with the String literal resultingin a string "M2"
If you perform + operation after a string, it takes it as concatenation:
"d" + 1 + 1 // = d11
Whereas if you do the vice versa + is taken as addition:
1 + 1 + "d" // = 2d

Does concatenating strings in Java always lead to new strings being created in memory?

I have a long string that doesn't fit the width of the screen. For eg.
String longString = "This string is very long. It does not fit the width of the screen. So you have to scroll horizontally to read the whole string. This is very inconvenient indeed.";
To make it easier to read, I thought of writing it this way -
String longString = "This string is very long." +
"It does not fit the width of the screen." +
"So you have to scroll horizontally" +
"to read the whole string." +
"This is very inconvenient indeed.";
However, I realized that the second way uses string concatenation and will create 5 new strings in memory and this might lead to a performance hit. Is this the case? Or would the compiler be smart enough to figure out that all I need is really a single string? How could I avoid doing this?
I realized that the second way uses string concatenation and will create 5 new strings in memory and this might lead to a performance hit.
No it won't. Since these are string literals, they will be evaluated at compile time and only one string will be created. This is defined in the Java Language Specification #3.10.5:
A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator +
[...]
Moreover, a string literal always refers to the same instance of class String.
Strings computed by constant expressions (ยง15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run-time are newly created and therefore distinct.
Test:
public static void main(String[] args) throws Exception {
String longString = "This string is very long.";
String other = "This string" + " is " + "very long.";
System.out.println(longString == other); //prints true
}
However, the situation situation below is different, because it uses a variable - now there is a concatenation and several strings are created:
public static void main(String[] args) throws Exception {
String longString = "This string is very long.";
String is = " is ";
String other = "This string" + is + "very long.";
System.out.println(longString == other); //prints false
}
Does concatenating strings in Java always lead to new strings being created in memory?
No, it does not always do that.
If the concatenation is a compile-time constant expression, then it is performed by the compiler, and the resulting String is added to the compiled classes constant pool. At runtime, the value of the expression is the interned String that corresponds to the constant pool entry.
This will happen in the example in your question.
Please check below snippet based on your inputs:
String longString = "This string is very long. It does not fit the width of the screen. So you have to scroll horizontally to read the whole string. This is very inconvenient indeed.";
String longStringOther = "This string is very long. " +
"It does not fit the width of the screen. " +
"So you have to scroll horizontally " +
"to read the whole string. " +
"This is very inconvenient indeed.";
System.out.println(" longString.equals(longStringOther) :"+ longString.equals(longStringOther));
System.out.println(" longString == longStringother : " + (longString == longStringOther ));
Output:
longString.equals(longStringOther) :true
longString == longStringother : true
1st Case : Both Strings are equal ( have same content)
2nd Case : Shows that there is only one String after concatenation. So only one String is created.

Trim String in Java while preserve full word

I need to trim a String in java so that:
The quick brown fox jumps over the laz dog.
becomes
The quick brown...
In the example above, I'm trimming to 12 characters. If I just use substring I would get:
The quick br...
I already have a method for doing this using substring, but I wanted to know what is the fastest (most efficient) way to do this because a page may have many trim operations.
The only way I can think off is to split the string on spaces and put it back together until its length passes the given length. Is there an other way? Perhaps a more efficient way in which I can use the same method to do a "soft" trim where I preserve the last word (as shown in the example above) and a hard trim which is pretty much a substring.
Thanks,
Below is a method I use to trim long strings in my webapps.
The "soft" boolean as you put it, if set to true will preserve the last word.
This is the most concise way of doing it that I could come up with that uses a StringBuffer which is a lot more efficient than recreating a string which is immutable.
public static String trimString(String string, int length, boolean soft) {
if(string == null || string.trim().isEmpty()){
return string;
}
StringBuffer sb = new StringBuffer(string);
int actualLength = length - 3;
if(sb.length() > actualLength){
// -3 because we add 3 dots at the end. Returned string length has to be length including the dots.
if(!soft)
return escapeHtml(sb.insert(actualLength, "...").substring(0, actualLength+3));
else {
int endIndex = sb.indexOf(" ",actualLength);
return escapeHtml(sb.insert(endIndex,"...").substring(0, endIndex+3));
}
}
return string;
}
Update
I've changed the code so that the ... is appended in the StringBuffer, this is to prevent needless creations of String implicitly which is slow and wasteful.
Note: escapeHtml is a static import from apache commons:
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
You can remove it and the code should work the same.
Here is a simple, regex-based, 1-line solution:
str.replaceAll("(?<=.{12})\\b.*", "..."); // How easy was that!? :)
Explanation:
(?<=.{12}) is a negative look behind, which asserts that there are at least 12 characters to the left of the match, but it is a non-capturing (ie zero-width) match
\b.* matches the first word boundary (after at least 12 characters - above) to the end
This is replaced with "..."
Here's a test:
public static void main(String[] args) {
String input = "The quick brown fox jumps over the lazy dog.";
String trimmed = input.replaceAll("(?<=.{12})\\b.*", "...");
System.out.println(trimmed);
}
Output:
The quick brown...
If performance is an issue, pre-compile the regex for an approximately 5x speed up (YMMV) by compiling it once:
static Pattern pattern = Pattern.compile("(?<=.{12})\\b.*");
and reusing it:
String trimmed = pattern.matcher(input).replaceAll("...");
Please try following code:
private String trim(String src, int size) {
if (src.length() <= size) return src;
int pos = src.lastIndexOf(" ", size - 3);
if (pos < 0) return src.substring(0, size);
return src.substring(0, pos) + "...";
}
Try searching for the last occurence of a space that is in a position less or more than 11 and trim the string there, by adding "...".
Your requirements aren't clear. If you have trouble articulating them in a natural language, it's no surprise that they'll be difficult to translate into a computer language like Java.
"preserve the last word" implies that the algorithm will know what a "word" is, so you'll have to tell it that first. The split is a way to do it. A scanner/parser with a grammar is another.
I'd worry about making it work before I concerned myself with efficiency. Make it work, measure it, then see what you can do about performance. Everything else is speculation without data.
How about:
mystring = mystring.replaceAll("^(.{12}.*?)\b.*$", "$1...");
I use this hack : suppose that the trimmed string must have 120 of length :
String textToDisplay = textToTrim.substring(0,(textToTrim.length() > 120) ? 120 : textToTrim.length());
if (textToDisplay.lastIndexOf(' ') != textToDisplay.length() &&textToDisplay.length()!=textToTrim().length()) {
textToDisplay = textToDisplay + textToTrim.substring(textToDisplay.length(),textToTrim.indexOf(" ", textToDisplay.length()-1))+ " ...";
}

Beginners Java Question (string output)

So I'm reading input from a file, which has say these lines:
NEO
You're the Oracle?
NEO
Yeah.
So I want to output his actual lines only, not where it says NEO. So I tried this:
if(line.trim()=="NEO")
output=false;
if (output)
TextIO.putln(name + ":" + "\"" + line.trim() + "\""); // Only print the line if 'output' is true
But thats not working out. It still prints NEO. How can I do this?
When comparing strings in Java you have to use the equals() method. Here's why.
if ( "NEO".equals(line.trim() )
I think you're looking for line.trim().equals("NEO") instead of line.trim() == "NEO"
That said, you can get rid of the output variable by instead doing
if(!line.trim().equals("NEO"))
{
TextIO.putln(name + ":" + "\"" + line.trim() + "\""); // Only print the if it isn't "NEO"
}
Strings are objects in Java. This means you can't just use the == operator to compare them, since the two objects will be different even if they both represent the same string. That's why the String object implements an equal() method, which will compare the contents of the objects, instead of just their memory addresses.
Reference
String.equals() docs
In Java, Strings are objects. And the == operator checks for exact equality.
In other terms
final String ans = line.trim();
final String neo = "NEO";
if (ans == neo) ...
implies you want to check that the ans and the neo objects are the same. They are not, since Java allocated (instantiated) two objects.
As other said, you have to test for equality using a method created for the String object, that actually, internally, checks the values are the same.
if (ans.equals(neo)) ...
try the following:
if(line.trim().equals("NEO"))

Categories