Java String created with byte[] bug [duplicate] - java

I have a string that I am creating, and I need to add multiple "\0" (null) characters to the string. Between each null character, is other text data (Just ASCII alphanumeric characters).
My problem is that in J2SE when you add the first null (\0), java then seems to determine that it's a string terminator, (similar to C++), and ignores all other data being appended. No error is raised, the trailing data is just ignored. I need to force the additional trailing data after a null in the string. I have to do this for a legacy database that I am supporting.
I have tried to encode/decode the string in hoping that something like %00 would fool the interpretation of the string behaviour, but when I re-encode the string, Java sees the null character again, and removes all data after the first null.
Update: Here is the relevant code snippet. Yes, I am trying to use Strings. I intend to try chars, but I still have to save it into the database as a string, so I suspect that I will end up with the same problem.
Some background. I am receiving data via HTTP post that has "\n". I need to remove the newlines and replace them with "\0". The "debug" method is just a simple method that does System.out.println.
String[] arrLines = sValue.split("\n");
for(int k=0;k<arrLines.length;k++) {
if (0<k) {
sNewValue += "\0";
}
sNewValue+= arrLines[k];
debug("New value =" + sNewValue);
}
sNewValue, a String, is committed to the database and needs to be done as a String. What I am observing when i display the current value of sNewValue after each iteration in the console is something like this:
input is value1\nValue2\nValue3
Output in the console is giving me from this code
value1
value1
value1
I am expecting
value1
value1 value2
value1 value2 value3
with non-printable null between value1, value2 and value3 respectively. Note that the value actually getting saved back into the database is also just "value1". So, it's not just a console display problem. The data after \0 is getting ignored.

I strongly suspect this is nothing to do with the text in the string itself - I suspect it's just how it's being displayed. For example, try this:
public class Test {
public static void main(String[] args) {
String first = "first";
String second = "second";
String third = "third";
String text = first + "\0" + second + "\0" + third;
System.out.println(text.length()); // Prints 18
}
}
This prints 18, showing that all the characters are present. However, if you try to display text in a UI label, I wouldn't be surprised to see only first. (The same may be true in fairly weak debuggers.)
Likewise you should be able to use:
char c = text.charAt(7);
And now c should be 'e' which is the second letter of "second".
Basically, I'd expect the core of Java not to care at all about the fact that it contains U+0000. It's just another character as far as Java is concerned. It's only at boundaries with native code (e.g. display) that it's likely to cause a problem.
If this doesn't help, please explain exactly what you've observed - what it is that makes you think the rest of the data isn't being appended.
EDIT: Another diagnostic approach is to print out the Unicode value of each character in the string:
for (int i = 0; i < text.length(); i++) {
System.out.println((int) text.charAt(i));
}

I suggest you use a char[] or List<Char> instead since it sounds like you're not really using a String as such (a real String doesn't normally contain nulls or other unprintable characters).

Same behavior for the StringBuffer class?
Since "\0" makes some trouble, I would recommend to not use it.
I would try to replace some better delimiter with "\0" when actually writing the string to your DB.

This is because \ is an escape character in Java (as in many C-related languages) and you need to escape it using additional \ as follows.
String str="\\0Java language";
System.out.println(str);
and you should be able the display \0Java language on the console.

Related

Escape character '\' doesn't show in System.out.println() but in return value

In Java, when I replace characters in a String with escaped-characters, the characters show up in the return value, although they were not there according to System.out.println.
String[][][] proCategorization(String[] pros, String[][] preferences) {
String str = "wehnquflkwe,wefwefw,wefwefw,wefwef";
String strReplaced = str.replace(",","\",\""); //replace , with ","
System.out.println(strReplaced);
The console output is: wehnquflkwe","wefwefw","wefwefw","wefwef
String[][][] array3d = new String[1][1][1]; // initialize 3d array
array3d[0][0][0] = strReplaced;
System.out.println(array3d[0][0][0]);
return array3d;
}
The console output is:
wehnquflkwe","wefwefw","wefwefw","wefwef
Now the return value is:
[[["wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"]]]
I don't understand why the \ show up in the return value but not in the System.out.println.
Characters in memory can be represented in different ways.
Your integrated development environment (IDE) has a debugger that chooses to represent a String[][][] with a single element that contains the characters
wehnquflkwe","wefwefw","wefwefw","wefwef
as a java-quoted string
"wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"
this makes a lot of sense, because you can then copy and paste this string into java code without any loss.
On the other hand, your system's console, and the IDE's built-in terminal emulator, will output the characters in their normal representation, that is, without any java string-escape-characters:
wehnquflkwe","wefwefw","wefwefw","wefwef
As an experiment, you may want to check what happens with other "special" characters, such as \t (a tab break) or \b (backspace). This is just the tip of the iceberg - characters in Java generally translate into unicode points, which may or may not be supported by the fonts available in your system or terminal. The IDE's way of representing characters as java-quoted strings allows it to losslessly represent pretty much anything; System.out.println's output is a lot more variable.
System.out.println prints the String exactly as it is stored in memory.
On the other hand, when you stop the application flow using a breakpoint you are able to look up the values.
Most of the IDEs display escape characters with \ to indicate that it's just one String, not String[] in this case, or not to split the String into two lines if it contains \n in the middle.
Just in case, you still have doubts, I suggest printing strReplaced.length(). This should allow you to count characters one by one.
Possible experiments:
String s = "my cute \n two line String";
System.out.println(s + " length is: " + s.length());

Removing items from String

I am trying to replace all occurrences of a substring from a String.
I want to replace "\t\t\t" with "<3tabs>"
I want to replace "\t\t\t\t\t\t" with "<6tabs>"
I want to replace "\t\t\t\t" with "< >"
I am using
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
But no use, it does not replace anything, then i tried using
s = s.replaceAll("\t\t\t\t", "< >");
s = s.replaceAll("\t\t\t", "<3tabs>");
s = s.replaceAll("\t\t\t\t\t\t", "<6tabs>");
Again, no use, it does not replace anything. after trying these two methods i tried StringBuilder
I was able to replace the items through StringBuilder, My Question is, why am i unable to replace the items directly through String from the above two commands? Is there any method from which i can directly replace items from String?
try in this order
String s = "This\t\t\t\t\t\tis\t\t\texample\t\t\t\t";
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
System.out.print(s);
output:
This<6tabs>is<3tabs>example< >
6tabs is never going to find a match as the check before it will have already replaced them with two 3tabs.
You need to start with largest match first.
Strings are immutable so you can't directly modify them, s.replace() returns a new String with the modifications present in it. You then assign that back to s though so it should work fine.
Put things in the correct order and step through it with a debugger to see what is happening.
Take a look at this
Go through your text, divide it into a char[] array, then use a for loop to go through the individual characters.
Don't print them out straight, but print them using a %x tag (or %d if you like decimal numbers).
char[] characters = myString.tocharArray();
for (char c : characters)
{
System.out.printf("%x%n", c);
}
Get an ASCII table and look up all the numbers for the characters, and see whether there are any \n or \f or \r. Do this before or after.
Different operating systems use different line terminating characters; this is the first reference I found from Google with "line terminator Linux Windows." It says Windows uses \r\f and Linux \f. You should find that out from your example. Obviously if you strip \n and leave \r you will still have the text break into separate lines.
You might be more successful if you write a regular expression (see this part of the Java Tutorial, etc) which includes whitespace and line terminators, and use it as a delimiter with the String.split() method, then print the individual tokens in order.

Search java string for 'special' characters before inserting into derbyDB varchar field

I am trying to convert from MS Access to DerbyDB. However some varchar fields have 'special' characters, such as newlines, tabs, percent marks, foreign characters etc.
I created a quick method...
public String charCheck(String s)
{
errLog.add(1, "converting string from " + s);
s.replaceAll("'", "''");//an apostrophe is escaped with an apostrophy in Derby...
s.replaceAll("%", "\\%");//a percent sign
s.replaceAll("\\s+n", " ");//whitespace characters (newlines and tabs etc)
s.replaceAll("/", "\\/");//the 'divide' \ character,
s.replaceAll("<", "\\<");//mathematical symbol less than
s.replaceAll(">", "\\>");//mathematical symbol greater than
errLog.add(1, "to " + s);
return s;
}//end method
Which I run whenever I determine that I need a varchar (or long varchar) data type. The strange thing is that my error log prints out the messages, but in the output whitespace characters do not appear to change (ie tabs and new lines, do not get converted to a simple space) and any apostrophe in the string is not replaced.
a sample of the output of this method produces the following.
converting string from 2. FIN DE L’ESSAI
to 2. FIN DE L’ESSAI
So the string remains obviously unchanged, which upsets derbyDB when I run the insert statement, also I am not finding any obvious documentation on the escape sequence for inserting multiple records into a table, I would like to use a statement then add the escape keyword after it, ie
stmt.execute("{call "+ sqlInsertStatement +"}{escape '" + escapeCharacter +"'" );
I also read from the docs that the escape keyword may not be usefull in the above statement, if so how can I te
I need to know where to go to sort the insert error that I get.
If I copy and past the insert statement directly into ij, then remove the special characters the record will insert fine, I just don't understand why it isn't being converted in the first instance.
I have also tried surrounding varchar and longvarchar fields with double quotes, but again derby kicks out an error saying that a double quote was found!
I want to get this sorted as I feel like I am so close...
Thanks in advance
Strings are immutable, all operations you perform on them results new Strings. You need to assign current reference to new String.
Example:
s= s.replaceAll("'", "''");
If it is just replace, then replace() may be best option than using replaceAll()
Instead of working through the String, turn the String into a char array, then use a do...while loop to work through each char in turn using if tests to replace each char. Then turn the array back to a String then return it

Is it possible to add data to a string after adding "\0" (null)?

I have a string that I am creating, and I need to add multiple "\0" (null) characters to the string. Between each null character, is other text data (Just ASCII alphanumeric characters).
My problem is that in J2SE when you add the first null (\0), java then seems to determine that it's a string terminator, (similar to C++), and ignores all other data being appended. No error is raised, the trailing data is just ignored. I need to force the additional trailing data after a null in the string. I have to do this for a legacy database that I am supporting.
I have tried to encode/decode the string in hoping that something like %00 would fool the interpretation of the string behaviour, but when I re-encode the string, Java sees the null character again, and removes all data after the first null.
Update: Here is the relevant code snippet. Yes, I am trying to use Strings. I intend to try chars, but I still have to save it into the database as a string, so I suspect that I will end up with the same problem.
Some background. I am receiving data via HTTP post that has "\n". I need to remove the newlines and replace them with "\0". The "debug" method is just a simple method that does System.out.println.
String[] arrLines = sValue.split("\n");
for(int k=0;k<arrLines.length;k++) {
if (0<k) {
sNewValue += "\0";
}
sNewValue+= arrLines[k];
debug("New value =" + sNewValue);
}
sNewValue, a String, is committed to the database and needs to be done as a String. What I am observing when i display the current value of sNewValue after each iteration in the console is something like this:
input is value1\nValue2\nValue3
Output in the console is giving me from this code
value1
value1
value1
I am expecting
value1
value1 value2
value1 value2 value3
with non-printable null between value1, value2 and value3 respectively. Note that the value actually getting saved back into the database is also just "value1". So, it's not just a console display problem. The data after \0 is getting ignored.
I strongly suspect this is nothing to do with the text in the string itself - I suspect it's just how it's being displayed. For example, try this:
public class Test {
public static void main(String[] args) {
String first = "first";
String second = "second";
String third = "third";
String text = first + "\0" + second + "\0" + third;
System.out.println(text.length()); // Prints 18
}
}
This prints 18, showing that all the characters are present. However, if you try to display text in a UI label, I wouldn't be surprised to see only first. (The same may be true in fairly weak debuggers.)
Likewise you should be able to use:
char c = text.charAt(7);
And now c should be 'e' which is the second letter of "second".
Basically, I'd expect the core of Java not to care at all about the fact that it contains U+0000. It's just another character as far as Java is concerned. It's only at boundaries with native code (e.g. display) that it's likely to cause a problem.
If this doesn't help, please explain exactly what you've observed - what it is that makes you think the rest of the data isn't being appended.
EDIT: Another diagnostic approach is to print out the Unicode value of each character in the string:
for (int i = 0; i < text.length(); i++) {
System.out.println((int) text.charAt(i));
}
I suggest you use a char[] or List<Char> instead since it sounds like you're not really using a String as such (a real String doesn't normally contain nulls or other unprintable characters).
Same behavior for the StringBuffer class?
Since "\0" makes some trouble, I would recommend to not use it.
I would try to replace some better delimiter with "\0" when actually writing the string to your DB.
This is because \ is an escape character in Java (as in many C-related languages) and you need to escape it using additional \ as follows.
String str="\\0Java language";
System.out.println(str);
and you should be able the display \0Java language on the console.

Java string replace and the NUL (NULL, ASCII 0) character?

Testing out someone elses code, I noticed a few JSP pages printing funky non-ASCII characters. Taking a dip into the source I found this tidbit:
// remove any periods from first name e.g. Mr. John --> Mr John
firstName = firstName.trim().replace('.','\0');
Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a C-string. Would this be the culprit to the funky characters?
Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string.
That depends on how you define what is working. Does it replace all occurrences of the target character with '\0'? Absolutely!
String s = "food".replace('o', '\0');
System.out.println(s.indexOf('\0')); // "1"
System.out.println(s.indexOf('d')); // "3"
System.out.println(s.length()); // "4"
System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"
Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.
It DOESN'T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn't work like that. A null character is still a character!
String s = Character.toString('\0');
System.out.println(s.length()); // "1"
assert s.charAt(0) == 0;
It also DOESN'T work if you expect the null character to terminate a string. It's evident from the snippets above, but it's also clearly specified in JLS (10.9. An Array of Characters is Not a String):
In the Java programming language, unlike C, an array of char is not a String, and neither a String nor an array of char is terminated by '\u0000' (the NUL character).
Would this be the culprit to the funky characters?
Now we're talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even "Hello world!" will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.
That said, the null character probably will look funky regardless; usually it's not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.
Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.
System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU
The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.
Should be probably changed to
firstName = firstName.trim().replaceAll("\\.", "");
I think it should be the case. To erase the character, you should use replace(".", "") instead.
Does replacing a character in a String
with a null character even work in
Java?
No.
Would this be the culprit to the funky characters?
Quite likely.
This does cause "funky characters":
System.out.println( "Mr. Foo".trim().replace('.','\0'));
produces:
Mr[] Foo
in my Eclipse console, where the [] is shown as a square box. As others have posted, use String.replace().

Categories