I have a big String that was once converted to a ByteBuffer & then while reading later several times, only a portion of the String(overview of the text) needs to be presented, so I want to convert only a part of the ByteBuffer to String.
Is it possible to convert only a part of bytebuffer to string rather than [converting entire Bytebuffer to String & then using substring()]
try {
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(yourstr));
bbuf.position(0);
bbuf.limit(200);
CharBuffer cbuf = decoder.decode(bbuf);
String s = cbuf.toString();
System.out.println(s);
} catch (CharacterCodingException e) {
}
Which should return chars from the byte buffer starting at 0. byte and ending in 200.
Or rather:
ByteBuffer bbuf = ByteBuffer.wrap(yourstr.getBytes());
bbuf.position(0);
bbuf.limit(200);
byte[] bytearr = new byte[bbuf.remaining()];
bbuf.get(bytearr);
String s = new String(bytearr);
Which does the same but without explicit character decoding/encoding.
Decoding of course does happen in constructor of String s and it is platform dependent, so watch out.
// convert all byteBuffer to string
String fullByteBuffer = new String(byteBuffer.array());
// convert part of byteBuffer to string
byte[] partOfByteBuffer = new byte[PART_LENGTH];
System.arraycopy(fullByteBuffer.array(), 0, partOfByteBuffer, 0, partOfByteBuffer.length);
String partOfByteBufferString = new String(partOfByteBuffer.array());
Related
I am trying the following:
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes:
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
mClientSocket.GetStream().Write(headerBytes, 0, headerBytes.Length);
//write text:
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
int headerSize = in.read();
int bytesRead = 0;
char[] input = new char[headerSize];
while (bytesRead < headerSize)
{
bytesRead += in.read(input, bytesRead, headerSize - bytesRead);
}
String resString = new String(input);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
The string size equals 9.That's correct on both sides.But, when I am reading the string iteself on the Java side, the data looks wrong.The char buffer ('input' variable)content looks like this:
",",",'H','e','l','l','o',''
I tried to change endianness with reversing the byte array.Also tried changing string encoding format between ASCII and UTF-8.I still feel like it relates to the endianness problem,but can not figure out how to solve it.I know I can use other types of writers in order to write text data to the steam,but I am trying using raw byte arrays for the sake of learning.
These
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
are 4 bytes. And they aren't character data so it makes no sense to read them with a BufferedReader. Just read the bytes directly.
byte[] headerBytes = new byte[4];
// shortcut, make sure 4 bytes were actually read
in.read(headerBytes);
Now extract your text's length and allocate enough space for it
int length = ByteBuffer.wrap(headerBytes).getInt();
byte[] textBytes = new byte[length];
Then read the text
int remaining = length;
int offset = 0;
while (remaining > 0) {
int count = in.read(textBytes, offset, remaining);
if (-1 == count) {
// deal with it
break;
}
remaining -= count;
offset += count;
}
Now decode it as UTF-8
String text = new String(textBytes, StandardCharsets.UTF_8);
and you are done.
Endianness will have to match for those first 4 bytes. One way of ensuring that is to use "network order" (big-endian). So:
C# Client
byte[] headerBytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder(stringToSend.Length));
Java Server
int length = ByteBuffer.wrap(headerBytes).order(ByteOrder.BIG_ENDIAN).getInt();
At first glance it appears you have a problem with your indexes.
You C# code is sending an integer converted to 4 bytes.
But you Java Code is only reading a single byte as the length of the string.
The next 3 bytes sent from C# are going to the three zero bytes from your string length.
You Java code is reading those 3 zero bytes and converting them to empty characters which represent the first 3 empty characters of your input[] array.
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes: Original line was sending the entire string here. Optionally if you string is longer than 255 characters, you'll need to send another data type, perhaps an integer converted to 4 bytes.
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
mClientSocket.GetStream().Write((byte)textBytes.Length);
//write text the entire buffer
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
// original code was sending an integer as 4 bytes but was only reading a single char here.
int headerSize = in.read();// read a single byte from the input
int bytesRead = 0;
char[] input = new char[headerSize];
// no need foe a while statement here:
bytesRead = in.read(input, 0, headerSize);
// if you are going to use a while statement, then in each loop
// you should be processing the input but because it will get overwritten on the next read.
String resString = new String(input, utf8);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.
You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)
When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.
When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)
I am converting a String to byte[] and then again byte[] to String in java using the getbytes() and String constructor,
String message = "Hello World";
byte[] message1 = message.getbytes();
using PipedInput/OutputStream I send this to another thread, where,
byte[] getit = new byte[1000];
pipedinputstream.read(getit);
print(new String(getit));
This last print result in 1000 to be printed... I want the actual string length. How can i do that?
When reading the String, you need to get the number of bytes read, and give the length to your String:
byte[] getit = new byte[1000];
int readed = pipedinputstream.read(getit);
print(new String(getit, 0, readed));
Note that if your String is longer than 1000 bytes, it will be truncated.
You are ignoring the number of bytes read. Do it as below:
byte[] getit = new byte[1000];
int bytesRead = pipedinputstream.read(getit);
print(new String(getit, 0, bytesRead).length());
public String getText (byte[] arr)
{
StringBuilder sb = new StringBuilder (arr.length);
for (byte b: arr)
if (b != 32)
sb.append ((char) b);
return sb.toString ();
}
not so clean, but should work.
I am converting a String to byte[] and then again byte[] to String
Why? The round trip is guaranteed not to work. String is not a container for binary data. Don't do this. Stop beating your head against the wall: the pain will stop after a while.
I have a string cityName which I decoded into bytes as follows:
byte[] cityBytes = cityName.getBytes("UTF-8");
...and stored the bytes somewhere. When I retrieve those bytes, how can I decode them back into a string?
Use the String(byte[], Charset) or String(byte[], String) constructor.
byte[] rawBytes = /* whatevs */
try
{
String decoded = new String(rawBytes, Charset.forName("UTF-8"));
// or
String decoded = new String(rawBytes, "UTF-8");
// best, if you're using Java 7 (thanks to #ColinD):
String decoded = new String(rawBytes, StandardCharsets.UTF_8);
}
catch (UnsupportedEncodingException e)
{
// see http://stackoverflow.com/a/6030187/139010
throw new AssertionError("UTF-8 not supported");
}
The String class has a few constructors that accept an array of bytes, including one that takes an array of bytes and a String representation of a charset and another that takes a Charset object. There are also constructors that take the offset and length of the String as arguments, if the String is only a small section of the byte array.
Like this:
String cityName = new String(cityByte,"UTF-8");
String s = new String(cityByte, "UTF-8");
Try this: http://docs.oracle.com/javase/6/docs/api/java/lang/String.html
String(byte[] bytes, String charsetName)
I have a ByteBuffer containing bytes that were derived by String.getBytes(charsetName), where "containing" means that the string comprises the entire sequence of bytes between the ByteBuffer's position() and limit().
What's the best way for me to get the string back? (assuming I know the encoding charset) Is there anything better than the following (which seems a little clunky)
byte[] ba = new byte[bbuf.remaining()];
bbuf.get(ba);
try {
String s = new String(ba, charsetName);
}
catch (UnsupportedEncodingException e) {
/* take appropriate action */
}
String s = Charset.forName(charsetName).decode(bbuf).toString();