estimating size of file on disk when using ObjectOutputStream - java

I am trying to write my spatial data from a table to a file. But I need to know the exact size of the data on disk before writing to disk. As an example, let's say that I am writing to disk using the following code:
FileOutputStream fos = new FileOutputStream("t.tmp",false);
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeInt(gid);
oos.writeUTF(fullname);
oos.writeInt(d.shape.length);
oos.write(d.shape);
oos.close();
fos.close();
I was thinking that file size on disk is equal to:
size= 4B {for gid, int} + fullname.getBytes.length() {string} + 4B {d.shape.length, int} + d.shape.length
but in fact, this is very different than the real file size on disk.
I also noticed that even creating an empty file using ObjectOutputstream leads to 4B space on disk.
Any help on how to calculate the file size on disk?
(I can't write the data to disk and then read the real size. This will lower the performance. Instead, I need to calculate the size of data on disk based on data values stored in memory.)

I am trying to write my spatial data from a table to a file. But I need to know the exact size of the data on disk before writing to disk.
You shouldn't use an ObjectOutputStream. An ObjectOutputStream can automatically serialise a complex graph of objects for you - but this doesn't appear to be one of your requirements. As part of this serialisation, the ObjectOutputStream writes some stream header information (this is the 4 bytes you discovered at the beginning), and also keeps track of objects written previously so that it can write special marker values rather than writing out the whole object again.
Instead, just use a DataOutputStream. It provides the same functionality you want:
A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in.
FileOutputStream fos = new FileOutputStream("t.tmp",false);
DataOutputStream dos = new DataOutputStream(fos);
dos.writeInt(gid); // write 4 bytes
dos.writeUTF(fullname); // write 2 bytes of length, then variable length string (UTF encoded)
dos.writeInt(d.shape.length); // write 4 bytes
dos.write(d.shape); // write a variable length byte array
dos.close();
fos.close();
There won't be any surprises here (provided you know how many bytes your UTF encoded String will end up), and you can do the arithmetic to calculate what the exact file size will be.
(If you were dealing with strings that didn't just equate to one-character-one-byte, you could render the string to a byte array first using a charset encoder).

Assuming you don't mind wasting some memory, you can write it all out to a ByteArrayOutputStream first, then get the size.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(boas);
oos.writeInt(gid);
oos.writeUTF(fullname);
oos.writeInt(d.shape.length);
oos.write(d.shape);
oos.close();
boas.close();
int size = boas.size();

Related

Reading large file in bytes by chunks with dynamic buffer size

I'm trying to read a large file by chunks and save them in an ArrayList of bytes.
My code, in short, looks like this:
public ArrayList<byte[]> packets = new ArrayList<>();
FileInputStream fis = new FileInputStream("random_text.txt");
byte[] buffer = new byte[512];
while (fis.read(buffer) > 0){
packets.add(buffer);
}
fis.close();
But it has a behavior that I don't know how to solve, for example: If a file has only the words "hello world", this chunk does not necessarily need to be 512 bytes long. In fact, I want each chunk to be a maximum of 512 bytes not that they all necessarily have that size.
First of all, what you are doing is probably a bad idea. Storing a file's contents in memory like this is liable to be a waste of heap space ... and can lead to OutOfMemoryError exceptions and / or a requirement for an excessively large heap if you process large (enough) input files.
The second problem is that your code is wrong. You are repeatedly reading the data into the same byte array. Each time you do, it overwrites what was there before. So you will end up will a list containing lots of reference to a single byte array ... containing just the last chunk of data that you read.
To solve the problem that you asked about1, you will need to copy the chunk that you read to a new (smaller) byte array.
Something like this:
public ArrayList<byte[]> packets = new ArrayList<>();
try (FileInputStream fis = new FileInputStream("random_text.txt")) {
byte[] buffer = new byte[512];
int len;
while ((len = fis.read(buffer)) > 0) {
packets.add(Arrays.copyOf(buffer, len));
}
}
Note that this also deals with the second problem I mentioned. And fixes a potential resource leak by using try with resource syntax to manage the closure of the input stream.
A final issue: If this is really a text file that you are reading, you probably should be using a Reader to read it, and char[] or String to hold it.
But even if you do that there are some awkward edge cases if your text contains Unicode codepoints that are not in code plane 0. For example, emojis. The edge cases will involve code points that are represented as a surrogate pair AND the pair being split on a chunk boundary. Reading and storing the text as lines would avoid that.
1 - The issue here is not the "wasted" space. Unless you are reading and caching a large number of small file, any space wastage due to "short" chunks will be unimportant. The important issue is knowing which bytes in each byte[] are actually valid data.

how to write long (4byte) value to binary file in android

I'm writing a binary file header from java, and I had been using fixed values for the file size in the header. That was easy:
OutputStream os = new FileOutputStream(filename);
os.write(0x36);//LSB
os.write(0x10);
os.write(0x0E);
os.write(0x00);//MSB
But now I want to be more dynamic and write whatever size buffer I have to a file. So I might get the size of my array as say 4054; I want to take that and either break it apart and do four os.writes, or maybe there's a way to write it all at once.
OutputStream seems to only take one byte at a time, but I'd like to still use it as all the rest of my header code is already using it.
Use a ByteBuffer, so you can control whether it writes LSB or MSB first.
ByteBuffer buf = ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN);
buf.putLong(value);
os.write(buf.array());

Practical usage of ByteArrayInputStream/ByteArrayOutputStream

What are some practical areas where ByteArrayInputStream and/or ByteArrayOutputStream are used? Examples are also welcome.
If one searches for examples, one finds usually something like:
byte[] buf = { 16, 47, 12 };
ByteArrayInputStream byt = new ByteArrayInputStream(buf);
It does not help where or why should one use it. I know that they are used when working with images, ZIP files, or writing to ServletOutputStream.
ByteArrayInputStream: every time you need an InputStream (typically because an API takes that as argument), and you have all the data in memory already, as a byte array (or anything that can be converted to a byte array).
ByteArrayOutputStream: every time you need an OutputStream (typically because an API writes its output to an OutputStream) and you want to store the output in memory, and not in a file or on the network.

Data loss when writing bytes to a file

I'm working on a string compressor for a school assignment,
There's one bug that I can't seem to work out. The compressed data is being written a file using a FileWriter, represented by a byte array. The compression algorithm returns an input stream so the data flows as such:
piped input stream
-> input stream reader
-> data stored in char buffer
-> data written to file with file writer.
Now, the bug is, that with some very specific strings, the second to last byte in the byte array is written wrong. and it's always the same bit values "11111100".
Every time it's this bit values and always the second to last byte.
Here are some samples from the code:
InputStream compress(InputStream){
//...
//...
PipedInputStream pin = new PipedInputStream();
PipedOutputStream pout = new PipedOutputStream(pin);
ObjectOutputStream oos = new ObjectOutputStream(pout);
oos.writeObject(someobject);
oos.flush();
DataOutputStream dos = new DataOutputStream(pout);
dos.writeFloat(//);
dos.writeShort(//);
dos.write(SomeBytes); // ---Here
dos.flush();
dos.close();
return pin;
}
void write(char[] cbuf, int off, int len){
//....
//....
InputStreamReader s = new InputStreamReader(
c.compress(new ByteArrayInputStream(str.getBytes())));
s.read(charbuffer);
out.write(charbuffer);
}
A string which triggers it is "hello and good evenin" for example.
I have tried to iterate over the byte array and write them one by one, it didn't help.
It's also worth noting that when I tried to write to a file using the output stream in the algorithm itself it worked fine. This design was not my choice btw.
So I'm not really sure what i'm doing wrong here.
Considering that you're saying:
Now, the bug is, that with some very specific strings, the second to
last byte in the byte array is written wrong. and it's always the same
bit values "11111100".
You are taking a
binary stream (the compressed data)
-> reading it as chars
-> then writing it as chars.
And your are converting bytes to chars without clearly defining the encoding.
I'd say that the problem is that your InputStreamReader is translating some byte sequences in a way that you're not expecting.
Remember that in encodings like utf-8 two or three bytes may become one single char.
It can't be coincidence that the very byte pattern you pointed out (11111100) Is one of the utf-8 escape codes (1111110x). Check this wikipedia table at and you'll see that uft-8 is destructive since if a byte starts with: 1111110x the next must start with 10xxxxxx.
Meaning that if using utf-8 to convert
bytes1[] -> chars[] -> bytes2[]
in some cases bytes2 will be different from bytes1.
I recommend changing your code to remove those readers. Or specify ASCII encoding to see if that prevent the translations.
I solved this by encoding and decoding the bytes with Base64.

store fixed bytes into byte array from input stream

I'm trying to learn Java and I came across this practice problem in which I have to create a URL extractor. I am able to stream data and print it. However I'm not really familiar with the buffered reader therefore I need help with creating a buffer of 100 bytes, copying 100 bytes of data from the stream to this byte array, then process this part, then take the next chunk of 100 bytes from the stream and so on....
The following is my code and any help would greatly be appreciated.
I know that what i want needs to be done inside the while loop. I think I need to create a byte array and then store the data into it. It is the how I'm more interested in.
EDIT: I do not need the code sample for anything because I'm trying to learn. Only the description of how I can do this would suffice . Thanks a lot in advance.
Create a byte array (of the size you want) outside your while-loop (you can re-use it that way, so it's faster).
You can use a BufferedInputStream wrapped around your original InputStream instead of a Reader (as Readers can convert bytes to Strings, but we don't need that).
Then you can use the read(byte[]) method of BufferedInputStream to copy the next series of bytes into the array. You can then process the retrieved bytes the way you want.
See the API documentation as a reference of what read(byte[]) does.
As mentioned in the comments, a Reader (and its subclass BufferedReader) is used to read characters not bytes. You should instead use a BufferedInputStream to read into a byte array of the specified size:
public static void main(String[] args) throws IOException {
String website = "thecakestory.com";
Socket client = new Socket(InetAddress.getByName(website), 80);
PrintWriter pw = new PrintWriter(client.getOutputStream());
pw.println("GET /index.php / HTTP/1.1\r\n");
pw.println("Host: " + website);
pw.flush();
BufferedInputStream input = new BufferedInputStream(client.getInputStream());
String x;
int bytesRead;
byte[] contents = new byte[100];
while ((bytesRead = input.read(contents)) != -1) {
x = new String(contents, 0, bytesRead);
System.out.print(x);
}
client.close();
pw.close();
}
Some useful links:
For an introduction to Java IO related stuff, see the Java tutorial page http://docs.oracle.com/javase/tutorial/essential/io/. This should be the starting point for learning about streams, readers, etc.
For the documentation of BufferedInputStream and BufferedReader, see their API reference:
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedInputStream.html
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html

Categories