I'm reading a CSV file using Java. Inside the file, each row is in this format:
operation, start, end.
I need to do a different operation for different input. But something weird happened when I'm trying to compare two string.
I used equals to compare two strings. And one of the operation is "add", but the first element I fetched from the document always give me the wrong answer. I know that's an "add" and I printed it out it looks like an "add", but when I'm using operation.equals("add"), it's false. For all rest of Strings it's correct except the first one. Is there anything special about the first row in CSV file?
Here is my code:
while ((line = br.readLine()) != null) {
String[] data = line.split(",");
String operation = data[0];
int start = Integer.parseInt(data[1]);
int end = Integer.parseInt(data[2]);
System.out.println(operation + " " + start + " " + end);
System.out.println(operation.equals("add"));
For example, it printed out
add 1 3
false
add 4 6
true
And I really don't know why. These two add looks exactly the same.
And here is what my csv file look like:
enter image description here
There are (at least) 4 reasons why two string that "look" like they are the same when you display / print them could turn out to be non-equal:
If you compare Strings using == rather than equals(Object), then you will often get the wrong answer. (This is not the problem here ... since you are using the equals method. However, this is a common problem.)
Unexpected leading or trailing whitespace characters on one string. These can be removed using trim().
Other leading, trailing or embedded control characters or Unicode "funky" characters. For example stray Unicode BOM (byte order mark) characters.
Homoglyphs. There are a number of examples where two or more distinct Unicode code points are rendered on the screen using the same or virtually the same glyphs.
Cases 3 and 4 can only be reliably detected by using traceprints or a debugger to examine the lengths and the char values in the two strings.
(Screen shots of the CSV file won't help us to diagnose this! A cut-and-paste of the CSV file might help.)
You should remove the double quotes from the first element and then check with equals method.
Try this:
String operation = operation.substring(1, to.length() - 1);
operation.equals("add")
Hope it works for you.
It looks like your line in image looks fine. I suppose in this case, that you could set wrong document encoding. E.g. when UTF, and you do not put it, then is has special header at the beginning. It could be a reason, why you read first word incorrectly.
Related
In Java, when I replace characters in a String with escaped-characters, the characters show up in the return value, although they were not there according to System.out.println.
String[][][] proCategorization(String[] pros, String[][] preferences) {
String str = "wehnquflkwe,wefwefw,wefwefw,wefwef";
String strReplaced = str.replace(",","\",\""); //replace , with ","
System.out.println(strReplaced);
The console output is: wehnquflkwe","wefwefw","wefwefw","wefwef
String[][][] array3d = new String[1][1][1]; // initialize 3d array
array3d[0][0][0] = strReplaced;
System.out.println(array3d[0][0][0]);
return array3d;
}
The console output is:
wehnquflkwe","wefwefw","wefwefw","wefwef
Now the return value is:
[[["wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"]]]
I don't understand why the \ show up in the return value but not in the System.out.println.
Characters in memory can be represented in different ways.
Your integrated development environment (IDE) has a debugger that chooses to represent a String[][][] with a single element that contains the characters
wehnquflkwe","wefwefw","wefwefw","wefwef
as a java-quoted string
"wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"
this makes a lot of sense, because you can then copy and paste this string into java code without any loss.
On the other hand, your system's console, and the IDE's built-in terminal emulator, will output the characters in their normal representation, that is, without any java string-escape-characters:
wehnquflkwe","wefwefw","wefwefw","wefwef
As an experiment, you may want to check what happens with other "special" characters, such as \t (a tab break) or \b (backspace). This is just the tip of the iceberg - characters in Java generally translate into unicode points, which may or may not be supported by the fonts available in your system or terminal. The IDE's way of representing characters as java-quoted strings allows it to losslessly represent pretty much anything; System.out.println's output is a lot more variable.
System.out.println prints the String exactly as it is stored in memory.
On the other hand, when you stop the application flow using a breakpoint you are able to look up the values.
Most of the IDEs display escape characters with \ to indicate that it's just one String, not String[] in this case, or not to split the String into two lines if it contains \n in the middle.
Just in case, you still have doubts, I suggest printing strReplaced.length(). This should allow you to count characters one by one.
Possible experiments:
String s = "my cute \n two line String";
System.out.println(s + " length is: " + s.length());
I am trying to replace all occurrences of a substring from a String.
I want to replace "\t\t\t" with "<3tabs>"
I want to replace "\t\t\t\t\t\t" with "<6tabs>"
I want to replace "\t\t\t\t" with "< >"
I am using
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
But no use, it does not replace anything, then i tried using
s = s.replaceAll("\t\t\t\t", "< >");
s = s.replaceAll("\t\t\t", "<3tabs>");
s = s.replaceAll("\t\t\t\t\t\t", "<6tabs>");
Again, no use, it does not replace anything. after trying these two methods i tried StringBuilder
I was able to replace the items through StringBuilder, My Question is, why am i unable to replace the items directly through String from the above two commands? Is there any method from which i can directly replace items from String?
try in this order
String s = "This\t\t\t\t\t\tis\t\t\texample\t\t\t\t";
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
System.out.print(s);
output:
This<6tabs>is<3tabs>example< >
6tabs is never going to find a match as the check before it will have already replaced them with two 3tabs.
You need to start with largest match first.
Strings are immutable so you can't directly modify them, s.replace() returns a new String with the modifications present in it. You then assign that back to s though so it should work fine.
Put things in the correct order and step through it with a debugger to see what is happening.
Take a look at this
Go through your text, divide it into a char[] array, then use a for loop to go through the individual characters.
Don't print them out straight, but print them using a %x tag (or %d if you like decimal numbers).
char[] characters = myString.tocharArray();
for (char c : characters)
{
System.out.printf("%x%n", c);
}
Get an ASCII table and look up all the numbers for the characters, and see whether there are any \n or \f or \r. Do this before or after.
Different operating systems use different line terminating characters; this is the first reference I found from Google with "line terminator Linux Windows." It says Windows uses \r\f and Linux \f. You should find that out from your example. Obviously if you strip \n and leave \r you will still have the text break into separate lines.
You might be more successful if you write a regular expression (see this part of the Java Tutorial, etc) which includes whitespace and line terminators, and use it as a delimiter with the String.split() method, then print the individual tokens in order.
I'm working on my second bigger programming project at the moment and I got stuck. I'm using Processing for this project.
What I'm trying to do is retrieve information (used to assign a certain color palette to the individual 'lines' of a horizontal bar chart) from an external text file that contains the following line, using an instance of the java.util.Properties class:
formating = p;p;n;n
My code snippet for importing it looks like this (using a class named 'Import' that handles the BufferedInputStream, etc.):
Import imp = new Import();
Properties properties = imp.importSettings();
The next step reads the 'formating' line from the text file and puts it into a four element String array, using the Semicolon as a delimiter.
String[] formating = properties.getProperty("formating").split(";");
I was expecting for this String array to be identical to the one I would get by creating it in my source code using:
String[] formating2 = {"p", "p", "n", "n"};
But it isn't. It tried a number of things already, including checking for unwanted characters (blanks for example) in each element of my String array, converting my text file or the characters I use for comparison to Unicode, converting the elements of the String array to Chars.
What I can't seem to get working is the following comparison:
for(int i=0;i < formating.length;i++){
println(formating[i]==formating2[i]);
}
which returns 'false' for each iteration of the for-loop.
I'm sure it's just some rookie mistake but it would be nice if someone could point me in the right direction. Thanks in advance!
Nick
comparing strings using == is not safe since Strings are possibly different objects and comparing them, no matter if they contain the same "text" does not compare the texts but the objects. So, you should try it like this:
println(formating[i].equals(formating2[i]));
or if you want to avoid excess spaces and tabs all-together you can also do:
println(formating[i].trim().equals(formating2[i].trim()));
This is the useful part of code:
java.util.List<Element> elems = src.getAllElements();
Iterator it = elems.iterator();
Element el;
String key,value,date="",place="";
String [] data;
int k=0;
Segment content;
String contentstr;
String classname;
while(it.hasNext()){
el = (Element)it.next();
if(el.getName().equals("span"))
{
classname=el.getAttributeValue("class");
if(classname.equals("edit_body"))
{
//java.util.List<Element> elemsinner = el.getChildElements();
//Iterator itinner = elemsinner.iterator();
content=el.getContent();
contentstr=content.toString();
if(true)
{
System.out.println("Done!");
System.out.println(classname);
System.out.println(contentstr);
}
}
}
}
No output. But if I remove the if(classname.equals("edit_body")) condition it does print (in one of the iterations):
Done!
edit_body
"I honestly think it is better to be a failure at something you love than to be a success at something you hate."
Can't get the bug part... help!
I am using an external java library BTW for html parsing.
BTW there are two errors at the start of the output, which is there in both the cases, with or without if condition.:
Dec 20, 2012 11:53:11 AM net.htmlparser.jericho.LoggerProviderJava$JavaLogger error SEVERE: EndTag br at (r1992,c60,p94048) not recognised as type '/normal' because its name and closing delimiter are separated by characters other than white space
Dec 20, 2012 11:53:11 AM net.htmlparser.jericho.LoggerProviderJava$JavaLogger error SEVERE: Encountered possible EndTag at (r1992,c60,p94048) whose content does not match a registered EndTagType
Hope that wont cause the error
Ok guys, Somebody explain me please! "edit_body".equals(el.getAttributeValue("class")) worked!!
I had right now the exactly same problem.
I success to solve it by using: SomeStringVar.replaceAll("\\P{Print}","");.
This command remove all the Unicode characters in the variant (characteres that you cant see- the strings look like equal, even they not really equal).
I use this command on each variant i needed in the equalization, and it works for me as well.
Looks like you are having leading or trailing whitespaces in your classname.
Try using this: -
if(classname.trim().equals("edit_body"))
This will trim any of those whitespaces at the ends.
Firstly, String.equals() is NOT broken. It works for millions of other programs / programmers. This is NOT the cause of your problems (unless you or someone has deliberately modified ... and broken your Java installation ...)
So why can two apparently equal strings compare as unequal?
There could be leading or trailing whitespace characters on the String.
There could be embedded non-printing characters.
There could be pairs Unicode characters that look the same when you display them with a typical font, but in fact are not the same. For instance the Greek code page contains characters that look by Latin vowels ... but are in fact different codes, and hence are not equal.
change the code to:
classname="edit_body"; //<- hardcode
if(classname.equals("edit_body"))
if the code enters the if statement now, then there must obviously be some difference in the string content when you use the original "classname=el.getAttributeValue("class");".
in such case, loop over the individual characters and compare those to find the difference.
If the code still doesnt enter the if statement, either your code is not compiling and you are running old code, or your java installation is broken ;-)
OR.
if java is anything like .net (I don't know java)
is "el.getAttributeValue" typed as string?
if it is typed as object, then the if statement would not enter since those are two different instances of the same string.
equals() is a method of String class. So, it works with double quotes.
if(someString.equals("something")) ✓
if(someString.equals('something')) ×
Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.
I am using Java but am .NET developer.
I have a string and I need to split it on semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).
What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?
thanks much
First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.
"foo,,bar,baz,,".split(",")
is:
{foo,,bar,baz}
an array of 4 elements. But
"foo,,bar,baz,,".split(",", -1)
is::
{foo,,bar,baz,,}
with all 6.
Note that only trailing empty strings are dropped by default.
Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since , is not a special character, but you should keep it in mind.
There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.
But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,
The ones at the end get dropped. Check the java documentation for the split function.
http://download.java.net/jdk7/docs/api/java/lang/String.html
As others have pointed out, String.split has some very non-intuitive behaviour.
If you're using Google's Guava open-source Java library, there's a Splitter class which gives a much nicer (in my opinion) API for this, with more flexibility:
String input = "foo, bar,";
Splitter.on(',').split(input);
// returns "foo", " bar", ""
Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"
Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"
Is it omitting blanks?
Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?
Are there extra delimiters in the cell data?
Short example: foo = "1,2" and
foo.split(",").length = 2
count(foo, ",") = 1
Probably you have a mistake in your code. Here is an example in Java code:
String row = "1,2,3,4,,5"; // second example: 1,2,3,5,,
System.out.println(row.split(",").length); // print 6 in both cases
// code to count how many , you have in your row
Pattern patter = Pattern.compile(",");
Matcher m = patter.matcher(row);
int nr = 0;
while(m.find())
{
nr++;
}
System.out.println(nr); // print 5 for the first example and 6 for second