Do you know the problem as to why i am not getting Hello
byte f [] ="hello".getBytes();
System.out.println(f.toString());
Because byte[]#toString() is (usually) not implemented as new String(byteArray), which would lead to the result you expect (e.g. System.out.println(new String(byteArray));.
You may want to give this page an eye...
Because the toString method of a byte array does not print its contents (at all). And bytes are not characters anyway. Why do you expect to see "hello"? Why not doing System.out.println("hello") directly?
The reason you are getting "strange" output from System.out.println(f.toString()) is that you are printing an array, not a String. Java's array classes do not override the toString() method. Therefore the toString() method that is being called is the one from java.lang.Object which is defined to output the object's class name and its identity hashcode. (In this case, the class name of the byte[] class will be "[b".)
I think your confusion arises from the fact that you are mentally equating a String and a byte array. There are two reasons why this is conceptually wrong:
In Java, Strings are not arrays of anything. The String class a fully encapsulated class that cannot be cast to anything else .... apart from Object.
In Java, a String models a sequence of characters, not a sequence of bytes.
The latter is a key difference because there are many possible conversions between character sequences and bytes, many of which are lossy in one or both directions. When you call "hello".getBytes() you get the conversion implied by your platform's default character encoding, but you could have supplied a parameter to getBytes to use a different encoding in the conversion.
as f is not a string, toString() method of object class is called and not of String class.
toString of String class returns a String and toString of object class returns :
getClass().getName() + '#' + Integer.toHexString(hashCode())
..... aww aww dont go too far ...its same as :
classname.#hexadecimal code for the hash code
You're not able to convert between a byte array and a String without providing an encoding method.
Try System.out.println(new String(f, "UTF8"));
Related
Currently, I need to work with the bytes of a String in Java, and it has raised so many questions about encodings and implementation details of the JVM. I would like to know if what I'm doing makes sense, or it is redundant.
To begin with, I understand that at runtime a Java char in a String will always represent a symbol in Unicode.
Secondly, the UTF-8 encoding is always able to successfully encode any symbol in Unicode. In turn, the following snippet will always return a byte[] without doing any replacement. getBytes documentation is here.
byte[] stringBytes = myString.getBytes(StandardCharsets.UTF_8);
Then, if stringBytes is used in a different JVM instance in the following way, it will always yield a string equivalent to myString.
new String(stringBytes, StandardCharsets.UTF_8);
Do you think that my understanding of getBytes is correct? If that is the case, how would you justify it? Am I missing a pathological case which could lead me not to get an equivalent version of myString?
Thanks in advance.
EDIT:
Would you agree that by doing the following any non-exceptional flow leads to a handled case, which allow us to successfully reconstruct the string?
EDIT:
Based on the answers, here goes the solution which allows you to safely reconstruct strings when no exception is thrown. You still need to handle the exception somehow.
First, get the bytes using the encoder:
final CharsetEncoder encoder =
StandardCharsets.UTF_8.
.newEncoder()
.onUnmappableCharacter(CodingErrorAction.REPORT)
.onMalformedInput(CodingErrorAction.REPORT);
// It throws a CharacterCodingException in case there is a replacement or malformed string
// The given array is actually bigger than required because it is the internal array used by the ByteBuffer. Read its doc.
byte[] stringBytes = encoder.encode(CharBuffer.wrap(string)).array();
Second, construct the string using the bytes given by the encoder (non-exceptional path):
new String(stringBytes, StandardCharsets.UTF_8);
it will always yield a string equivalent to myString.
Well, not always. Not a lot of things in this world happens always.
One edge case I can think of is that myString could be an "invalid" string when you call getBytes. For example, it could have a lone surrogate pair:
String myString = "\uD83D";
How often this will happen heavily depends on what you are doing with myString, so I'll let you think about that on your own.
If myString has a lone surrogate pair, getBytes would encode a question mark character for it:
// prints "?"
System.out.println(
new String(myString.getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8)
);
I wouldn't say a ? is "equivalent" to a malformed string.
See also: Is an instance of a Java string always valid UTF-16?
I'm using CXF to implement a web services server.
Since I'm low on memory I don't want the web service call parameters to be translated strings which are UTF-16 I rather access the original UTF-8 buffers which are usually half in size in my case.
So if I have a web method:
void addBook(String bookText)
How can I get the bookText without CXF translating it to java string?
The XML parsers used in Java (StAX parsers for CXF) only allow getting the XML contents as either a String or char[]. Thus, it wouldn't be possible to get the raw bytes.
If you have a String object in java, there is no such thing as whether it is UTF-8 or UTF-16 string. The encoding comes in when you convert a String to or from a byte array.
A String in java is a character array. If you already have a String object in java (for example passed as a parameter to your addBook() method, it has already been interpreted properly and converted to a character array.
If you want to avoid character encoding conversions, the only way to do that is to define your method to receive a byte array instead of a String:
void addBook(byte[] bookTextUtf16);
But keep in mind that in this way you have to "remember" the encoding in which the byte array is valid (adding it to the name is one way).
If you need a java.lang.String object, then there is nothing you can do. A String is a character array, characters with which each being a 16-bit value. This is String internal, no way to change the internal representation. Either accept this or don't use java.lang.String to represent your strings.
An alternative way could be to create your own Text class for example which honds the UTF-8 encoded byte array for example, and as long as you don't need the String representation, keep it as a byte array and store it if you want to. Only create the java.lang.String instance when you do need the String.
I want to know, if a String is a collection. I have read around but am still quite confused.
Strings are immutable objects representing a character sequence (CharSequence is one of the interfaces implemented by String).
Main difference to char arrays and collections of chars: String cannot be modified, it's not possible (ignoring reflection) to add/remove/replace characters.
Internally they are represented by a char array with an offset and length (this allows to create lightweight substrings, using the same char arrays).
Example: ['F','o','o','H','e','l','l','o',' ','W','o','r','l','d'], offset=3, count=5 = "Hello".
strings are object of class String and it's not a Collection as well. It is implemented on top of char array. You can see it in the below code snippet:
public final class String implements
java.io.Serializable, Comparable<String>, CharSequence
{
private final char value[];
No, it's not a collection in that it does not implement the Collection<E> interface.
Conceptually, it is an immutable sequence of characters (and implements the CharSequence interface).
Internally, the String class is likely to use an array of chars, although I am pretty sure the spec does not mandate that.
No, A string is an object of class String
The String class represents character strings. All string literals in
Java programs, such as "abc", are implemented as instances of this
class....
A String represents a string in the UTF-16 format in which
supplementary characters are represented by surrogate pairs (see the
section Unicode Character Representations in the Character class for
more information). Index values refer to char code units, so a
supplementary character uses two positions in a String.
No it's not an Array or a Collection. However there is a convenient method if you need a char array and you have a String - namely String.toCharArray(); you could use it like this
// Prints the String "Hello, World!" as a char[].
System.out.println(Arrays.toString("Hello, World!".toCharArray()));
No, a String is an immutable string of characters, and the class extends Object directly and does not implement any of the Collection interfaces. Really, that's the basic answer.
However, there's a lot going on under the covers in the runtime--there's a whole collection of cached strings held by the JVM and in its most primitive representation, yeah, a String is basically an array of characters (meaning it's a bunch of memory addresses pointing to representations of characters). Still, once you go below the definition of String as it's defined as a class, you can keep going until you get to the point that you're just talking about organized combinations of ones and zeroes.
I'm guessing here, but I imagine you posed the question because you're studying this and an instructor said something about a String being an array of characters. While technically correct, that's really confusing because the two concepts exists at completely different levels of abstraction.
String Class is just a class which allows you to create objects of strings. Like all classes it makes easy the use of these objects using its owns methods.
An array of char is an string, but its methods are relative to arrays or the elements of the array.
So for example if you want to look for the position of a carácter in an string of a class String then you use a method called IndexOf.
But if you want to find a character in an array of char, then you would have to do it manually using loops.
I run this method:
String str=Character.valueOf(char).toString()
The output comes like a small square at console and just like a bar code in the file. What the actual format of the output is, also the output is not able to copy.
Character.valueOf(char) simply takes a char and returns a Character wrapper instance representing the same value. Using toString() on that returns a String containing only that single charcter.
So if your initial char value represents a printable character, then the result should be just as printable.
If, however, you use an arbitrary numeric value (especially, if you use 0, 1 or very high values) as your starting point, then the result will be a String containing a single Unicode character with that codepoint. That's unlikely to be a useful result.
In other words: crap in, crap out.
What is your actual input value and what do you expect to happen?
Try
String.valueOf(c);
.toString() offers a string representation of an object and that can be anything. What you want is not the string representation of the Character object. You want to convert/create a string from a char.
Currently when I make a signature using java.security.signature, it passes back a string.
I can't seem to use this string since there are special characters that can only be seen when i copy the string into notepad++, from there if I remove these special characters I can use the remains of the string in my program.
In notepad they look like black boxes with the words ACK GS STX SI SUB ETB BS VT
I don't really understand what they are so its hard to tell how to get ride of them.
Is there a function that i can run to remove these and potentially similar characters?
when i use the base64 class supplied in the posts, i cant go back to a signature
System.out.println(signature);
String base64 = Base64.encodeBytes(sig);
System.out.println(base64);
String sig2 = new String (Base64.decode(base64));
System.out.println(sig2);
gives the output
”zÌý¥y]žd”xKmËY³ÕN´Ìå}ÏBÊNÈ›`Αrp~jÖüñ0…Rõ…•éh?ÞÀ_û_¥ÂçªsÂk{6H7œÉ/”âtTK±Ï…Ã/Ùê²
lHrM/aV5XZ5klHhLbctZs9VOtMzlfc9Cyk7Im2DOkXJwfmoG1vzxMIVS9YWV6Wg/HQLewF/7X6XC56pzwmt7DzZIN5zJL5TidFRLsc+Fwy/Z6rIaNA2uVlCh3XYkWcu882tKt2RySSkn1heWhG0IeNNfopAvbmHDlgszaWaXYzY=
[B#15356d5
The odd characters are there because cryptographic signatures produce bytes rather than strings. Consequently if you want a printable representation you should Base64 encode it (here's a public domain implementation for Java).
Stripping the non-printing characters from a cryptographic signature will render it useless as you will be unable to use it for verification.
Update:
[B#15356d5
This is the result of toString called on a byte array. "[" means array, "B" means byte and "15356d5" is the address of the array. You should be passing the array you get out of decode to [Signature.verify](http://java.sun.com/j2se/1.4.2/docs/api/java/security/Signature.html#verify(byte[])).
Something like:
Signature sig = new Signature("dsa");
sig.initVerify(key);
sig.verify(Base64.decode(base64)); // <-- bytes go here
How are you "making" the signature? If you use the sign method, you get back a byte array, not a string. That's not a binary representation of some text, it's just arbitrary binary data. That's what you should use, and if you need to convert it into a string you should use a base64 conversion to avoid data corruption.
If I understand your problem correctly, you need to get rid of characters with code below 32, except maybe char 9 (tab), char 10 (new line) and char 13 (return).
Edit: I agree with the others as handling a crypto output like this is not what you usually want.