How To Convert InputStream To String To Byte Array In Java? - java

On my java server I get from an iOS client an InputStream, which looks like this:
--0xKhTmLbOuNdArY
Content-Disposition: form-data; filename="Image001"
Content-Type: image/png
âPNG
IHDR���#���#���™iqfi���gAMA��Ø»7äÈ���tEXtSoftware�Adobe ImageReadyq…e<��IDATx⁄‰;iê]Uôflπ˜Ω◊Ø;ΩB::õY
ê6LÄ“Õ¿
... etc. ...
≠Yy<‘_˜øüYmc˚æØ…ægflóÏK$å±çe0ˆΩleIë¢êH¢Tñê–Üd
≠≤§àä6D¸>˙÷˜˚<øÁ˘˝˜˚º^sÁ=Áû{ÓπÁ‹œπ˜úÄÎ:!44¡#
--0xKhTmLbOuNdArY--
The first and last line are my HTTP Boundary. In line 2 and 3 are Information about the image file. And from line 5 until the penultimate line there is the image file which I need as a byte array.
So how do I get the image information as a String and the image file as a byte array from the InputStream?
The solution should be fast and efficient (The file size can be several megabytes / < 10 MB ).
My approach:
I convert the InputStream to a String, then split it and convert the second String to byte array...
String str = org.apache.commons.io.IOUtils.toString( inputStream );
String[] strArray1 = str.split( "\r\n\r\n", 2 );
byte[] bytes = strArray1[1].getBytes();
That way is very fast, but the byte array seems to be damaged. I can not create an image file from that byte array... Some characters are incorrectly converted.
Perhaps someone can help?

The reason why your code breaks is the first line:
String str = org.apache.commons.io.IOUtils.toString( inputStream );
Trying to convert random bytes into Unicode characters, and then back to the same random bytes, isn't going to work.
The only way you can make this work is by reading the input in stages, rather than reading it all into a String.
Read from the InputStream until you're convinced you're past the HTTP boundary line.
Read the rest of the stream into a byte array (you can use IOUtils for that, too).

You probably don't want to convert your bytes to char and back, that would destroy your bytes as the byte stream doesn't correspond to any encoding.
I would read the whole thing in as a byte[] using IOUtils.toByteArray, then look for the byte sequence "\r\n\r\n".getBytes() in that array.
Note that IOUtils.toByteArray doesn't stop until end-of-stream. This should be fine for HTTP 1.0, but will break for HTTP 1.1 which can send multiple requests on the same stream. In that case, you'll have to read incrementally to find the Content-Length field so you know how much of the InputStream to read.

Related

FileInputStream available method returns 2 when there is only 1 byte in the file

As said in the title I am trying to read a file byte by byte using a FileInputSteam. My code reads:
FileInputStream input = new FileInputStream(inFileName);
System.out.println(input.available());
My file inFileName contains only the character "±" which should only amount to one byte, however when i run the program, the output is 2.
Any help is greatly appreciated.
That is a unicode character, which in this case is 2 bytes.
http://www.fileformat.info/info/unicode/char/b1/index.htm
Scroll down to the UTF-8 part and you can see the value of each byte.
If your ultimate goal is to get a string from a byte array that is UTF-8, then you can generate a String from bytes using new String(bytes, "UTF-8");
It's also possible that this is UTF-16 (which would also be 2 bytes), but that is less common.

How to convert chunks of UTF-8 bytes to charcters?

I have a large UTF-8 input that is divided to 1-kB size chunks. I need to process it using a method that accepts String. Something like:
for (File file: inputs) {
byte[] b = FileUtils.readFileToByteArray(file);
String str = new String(b, "UTF-8");
processor.process(str);
}
My problem is that I have no guarantee that any UTF-8 character is not split between two chunks. The result of running my code is that some lines end with '?', which corrupts my input.
What would be a good approach to solve this?
If I understand correctly, you had a large text, which was encoded with UTF-8, then split into 1-kilobyte files. Now you want to read the text back, but you are concerned that an encoded character might be split across file boundaries, and cause a UTF-8 decoding error.
The API is a bit dusty, but there is a SequenceInputStream that will create what appears to be a single InputStream from a series of sub-streams. Create one of these with a collection of FileInputStream instances, then create an InputStreamReader that decodes the stream of UTF-8 bytes to text for your application.

Dividing inpustream into two streams

I needed to combine to inpustreams, which one has only string and the other one is an avi file. I can easily combine two of them like below:
InputStream is = new SequenceInputStream(stringStream, aviStream);
However, when it comes to take the string and the avi separately, I am failing. The avi file does not match to its first size and somehow I am losing some size.
I can read is with a Scanner by using a delimiter, but in that case the avi stream will be String and i will lose information.
Any idea how can I separate the inputstream? I want to read back my string and i should have the rest of the stream as an inpustream.
EDIT: I am thinking of converting the stream to a byte array, then divide the array into two according to the delimiter. Then second part, which is the avi one, i can convert it back to an inputstream. But it seems to troublesome.
A Scanner is only for text data, not binary. Binary data e.g. AVI can contain every byte value which means there is no delimiter you can use which will not appear in the data.
You can combine text with raw data in the following manner.
DataOutputStream dos = new DataOutputStream(...);
dos.writeUTF(string);
dos.writeInt(bytes.length);
dos.write(bytes);
to reverse the process you can use DataInputStream with readUTF, readInt and readFully()

RandomAccessFile adding ETX M to start of file

This is on CentOS 6.2. I am writing to a text file, and it is adding a ETX M to the beggining. (ETX is the name of the character)
file.setLength(0);
file.seek(0);
file.writeUTF(somestring);
To quote from the documentation for RandomAccessile.writeUTF()
First, two bytes are written to the file, starting at the current file pointer, as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for each character.
If you don't want this, convert the string to bytes manually and write those bytes with the basic write() method (nb: writeBytes() is not what you want). However, you're going to need some way to keep track of the size of the string in order to read it again (unless you're using fixed-width fields).

Periodically incrementally reading text file as byte array ignoring partially appended lines

I am trying to read into byte array a text file which is appended by another process. I'd like to "poll" the file periodically to extract only new "full" lines of text (the lines ended with a new line). What's the best way to do this in Java 6 using standard libraries?
I am not interested in storing and creating Strings, so probably all the "readLine()" methods from readers are not the ones I should look into. I am thinking of using RandomAccessFile, but I am wondering how to truncate the read byte array so it would end at the last new line character.
You can have one thread which polls the file's length. When the length increases, read that length as a byte[] (and no more, so don't use BufferedInputStream) and you will be able to continue reading the file later. Take the byte[] and write it to a PipedOutputStream.
In your main thread you can use BufferedReader + InputStreamReader + PipedInputStream.
Using that a plain readLine() will do fine.

Categories