Dividing inpustream into two streams - java

I needed to combine to inpustreams, which one has only string and the other one is an avi file. I can easily combine two of them like below:
InputStream is = new SequenceInputStream(stringStream, aviStream);
However, when it comes to take the string and the avi separately, I am failing. The avi file does not match to its first size and somehow I am losing some size.
I can read is with a Scanner by using a delimiter, but in that case the avi stream will be String and i will lose information.
Any idea how can I separate the inputstream? I want to read back my string and i should have the rest of the stream as an inpustream.
EDIT: I am thinking of converting the stream to a byte array, then divide the array into two according to the delimiter. Then second part, which is the avi one, i can convert it back to an inputstream. But it seems to troublesome.

A Scanner is only for text data, not binary. Binary data e.g. AVI can contain every byte value which means there is no delimiter you can use which will not appear in the data.
You can combine text with raw data in the following manner.
DataOutputStream dos = new DataOutputStream(...);
dos.writeUTF(string);
dos.writeInt(bytes.length);
dos.write(bytes);
to reverse the process you can use DataInputStream with readUTF, readInt and readFully()

Related

Java : reading lines of text from a binary file, with DataInputStream.readLine() deprecated

i am making a Java program that reads a binary file, and this binary file contains lines of text (strings that end with a \n, not \0), among other things (binary data, and even images). I am using a DataInputStream to read the binary data, and its readLine() method looks like exactly what i need to read the lines of text, however it is deprecated, and the official documentation suggests replacing the DataInputStream itself with a BufferedReader if need to read strings in a file. Which is not suitable for my file at all since BufferedReader doesn't read binary data.
So my question is : what do i do ? I can see some "solutions", like using both a DataInputStream and a BufferedReader concurrently, or creating a temporary BufferedReader each time i need to read a string, but they seem very messy and probably unsafe, so i was wondering if there is a simpler way to read both binary data and lines of text.

How to strip the first few characters of an input stream in java?

I have a fileInputStream object that may or may not contain xml declaration <?xml version='1.0'?>. I need to strip the xml declaration if it exists. How do I do that without converting the inputStream object into string, stripping the XML PI and then converting it back to input stream?
InputStream inputStream = new FileInputStream(importFilePath);
Wrap the FileInputStream in a PushbackInputStream, and check to see if it starts with a PI. If so, then read it out. Otherwise, push back the characters you read to test it.
I can think of two options:
Use FileInputStream.getChannel(). After reading from the channel, you can invoke position(0) if needed to reset it to the beginning.
Use a custom FilteredInputStream to wrap the InputStream. This can be written in such a way that the first line is buffered in advance, to determine if it will be used or discarded.

How to convert chunks of UTF-8 bytes to charcters?

I have a large UTF-8 input that is divided to 1-kB size chunks. I need to process it using a method that accepts String. Something like:
for (File file: inputs) {
byte[] b = FileUtils.readFileToByteArray(file);
String str = new String(b, "UTF-8");
processor.process(str);
}
My problem is that I have no guarantee that any UTF-8 character is not split between two chunks. The result of running my code is that some lines end with '?', which corrupts my input.
What would be a good approach to solve this?
If I understand correctly, you had a large text, which was encoded with UTF-8, then split into 1-kilobyte files. Now you want to read the text back, but you are concerned that an encoded character might be split across file boundaries, and cause a UTF-8 decoding error.
The API is a bit dusty, but there is a SequenceInputStream that will create what appears to be a single InputStream from a series of sub-streams. Create one of these with a collection of FileInputStream instances, then create an InputStreamReader that decodes the stream of UTF-8 bytes to text for your application.

Testing for unseen characters in java

I'm writing a program in Java that tests the validity of several FTP commands. These commands must in a a carriage return and new line feed (the sequence "\r\n"). I'm using a BufferedReader to read in lines, but I cannot come up with a way to check if the line ends in this sequence. Any ideas?
Do not use the BufferedReader because abstraction level seems too high for your specific tasks. Use the ordinary InputStream, and read into byte array. InputStream will read all bytes as they are. You can process them and later produce strings yourself later using new String(array, offset, length). Maybe it can be other invalid characters like 0x0C in the input.

Periodically incrementally reading text file as byte array ignoring partially appended lines

I am trying to read into byte array a text file which is appended by another process. I'd like to "poll" the file periodically to extract only new "full" lines of text (the lines ended with a new line). What's the best way to do this in Java 6 using standard libraries?
I am not interested in storing and creating Strings, so probably all the "readLine()" methods from readers are not the ones I should look into. I am thinking of using RandomAccessFile, but I am wondering how to truncate the read byte array so it would end at the last new line character.
You can have one thread which polls the file's length. When the length increases, read that length as a byte[] (and no more, so don't use BufferedInputStream) and you will be able to continue reading the file later. Take the byte[] and write it to a PipedOutputStream.
In your main thread you can use BufferedReader + InputStreamReader + PipedInputStream.
Using that a plain readLine() will do fine.

Categories