According to Java specification, the size of a char data type is 16 bits or two bytes.
So, I have the written following code:
private static final int BUFFER_SIZE=1024;
char[] buffer=new char[BUFFER_SIZE];
BufferedReader br= new BufferedReader(new InputStreamReader(conn.getInputStream()));
while (true){
byteFromStream=in.read(buffer);
if (byteFromStream==-1) break;
totalBytesLoaded=totalBytesLoaded+byteFromStream*2;
}
But for some strange reason I am reading more bytes then is available on the stream, according to the specification of read() return numbers of characters actually read by stream.
Oh, I am getting total stream size by
bytesTotal=conn.getContentLength();
Which is working pretty fine as I myself uploaded files on the server and I know their sizes.
The method returns the amount of read characters. That value does not need to be multiplied by 2, especially since you cannot make that general assumption about the byte size of a character from a stream.
The amount of bytes per character depends on the character encoding (it can be 1 byte for example). The reader component knows that and only tells you the amount of read characters.
Related
I would like to read the bytes into the direct ByteBuffer and then decode them without rewrapping the original buffer into the byte[] array to minimize memory allocations.
Hence I'd like to avoid using StandardCharsets.UTF_8.decode() as it allocates the new array on the heap.
I'm stuck on how to decode the bytes. Consider the following code that writes a string into the buffer and then reads id again.
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(2 << 16);
byteBuffer.put("Hello Dávid".getBytes(StandardCharsets.UTF_8));
byteBuffer.flip();
CharBuffer charBuffer = byteBuffer.asCharBuffer();
for (int i = charBuffer.position(); i < charBuffer.length(); i++) {
System.out.println(charBuffer.get());
}
The code output:
䡥汬漠
How can I decode the buffer?
I would like to read the bytes into the direct ByteBuffer and then decode them without rewrapping the original buffer into the byte[] array to minimize memory allocations.
ByteBuffer.asCharBuffer() fits your need, indeed, since both wrappers share the same underlying buffer.
This method's javadoc says:
The new buffer's position will be zero, its capacity and its limit will be the number of bytes remaining in this buffer divided by two
Although it's not explicitly said, it's a hint that CharBuffer uses UTF-16 character encoding over the given buffer. Since we don't have control over what encoding the charbuffer uses, you don't have much choice but to necessarily write the character bytes in that encoding.
byteBuffer.put("Hello Dávid".getBytes(StandardCharsets.UTF_16));
One thing about your printing for loop. Be careful that CharBuffer.length() is actually the number of remaining chars between the buffer's position and limit, so it decreases as you call CharBuffer.get(). So you should use get(int) or change the for termination condition to limit().
You can't specify the encoding of a CharBuffer. See here: What Charset does ByteBuffer.asCharBuffer() use?
Also, since buffers are mutable, I don't see how you could ever possibly create a String from it which are always immutable without doing a memory re-allocation...
I am trying to read a 512MB file into java memory. Here is my code:
String url_part = "/homes/t1.csv";
File f = new File(url_part);
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
ArrayList<String> mem = new ArrayList<String>();
System.out.println("Start loading.....");
System.gc();
double start = System.currentTimeMillis();
String line = br.readLine();
int count = 0;
while(line!=null){
line=br.readLine();
mem.add(line);
//System.out.println(count);
count++;
if(count%500000==0){
System.out.println(count);
}
}
The file contains 40000000 lines, the performance is totally fine before reading 18500000 lines, but it stucks somewhere after reading about 20000000 lines. (It freezes here, but continue after a long waiting, about 10seconds)
I kept track of the memory use, I found even the totaly file size is just 512 MB, the memory grows about 2GB when running the program. Also, the 8 core CPU keeps working at 100% utils.
I just want to read the file into memory so that later I can access the data I want faster from memory. Am I doing in the right way? THank!
First, Java stores strings in UTF-16, so if your input file contains mostly latin-1 symbols, then you will need twice more memory to store these symbols, thus 1Gb is used to store the chars. Second, there's an overhead per each line. We may roughly estimate it:
Reference from ArrayList to String - 4 bytes (assuming compressed oops)
Reference from String to char[] array - 4 bytes
String object header - at least 8 bytes
hash String field (to store hashCode) - 4 bytes
char[] object header - at least 8 bytes
char[] array length - 4 bytes
So in total at least 32 bytes will be wasted per each line. Usually it's more as objects must be padded. So for 20_000_000 lines you have at least 640_000_000 bytes overhead.
Consider we have a socket connection between two device (A and B). Now if I write only 16 bytes (size doesn't matter here) to the output stream (not BufferedOutputStream) of the socket in side A 3 times or in general more than once like this:
OutputStream outputStream = socket.getOutputStream();
byte[] buffer = new byte[16];
OutputStream.write(buffer);
OutputStream.write(buffer);
OutputStream.write(buffer);
I read the data in side B using the socket input stream (not BufferedInputStream) with a buffer larger than sending buffer for example 1024:
InputStream inputStream = socket.getInputStream();
byte[] buffer = new byte[1024];
int read = inputStream.read(buffer);
Now I wonder how the data is received on side B? May it get accumulated or it exactly read the data as A sends it? In another word may the read variable get more than 16?
InputStream makes very few guarantees about how much data will be read by any invocation of the multi-byte read() methods. There is a whole class of common programming errors that revolve around misunderstandings and wrong assumptions about that. For example,
if InputStream.read(byte[]) reads fewer bytes than the provided array can hold, that does not imply that the end of the stream has been reached, or even that another read will necessarily block.
the number of bytes read by any one invocation of InputStream.read(byte[]) does not necessarily correlate to any characteristic of the byte source on which the stream draws, except that it will always read at least one byte when not at the end of the stream, and that it will not read more bytes than are actually available by the time it returns.
the number of available bytes indicated by the available() method does not reliably indicate how many bytes a subsequent read should or will read. A nonzero return value reliably indicates only that the next read will not block; a zero return value tells you nothing at all.
Subclasses may make guarantees about some of those behaviors, but most do not, and anyway you often do not know which subclass you actually have.
In order to use InputStreams properly, you generally must be prepared to perform repeated reads until you get sufficient data to process. That can mean reading until you have accumulated a specific number of bytes, or reading until a delimiter is encountered. In some cases you can handle any number of bytes from any given read; generally these are cases where you are looping anyway, and feeding everything you read to a consumer that can accept variable length chunks (many compression and encryption interfaces are like that, for example).
Per the docs:
public int read(byte[] b) throws IOException
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes
actually read is returned as an integer. This method blocks until
input data is available, end of file is detected, or an exception is
thrown. If the length of b is zero, then no bytes are read and 0 is
returned; otherwise, there is an attempt to read at least one byte. If
no byte is available because the stream is at the end of the file, the
value -1 is returned; otherwise, at least one byte is read and stored
into b.
The first byte read is stored into element b[0], the next one into
b[1], and so on. The number of bytes read is, at most, equal to the
length of b. Let k be the number of bytes actually read; these bytes
will be stored in elements b[0] through b[k-1], leaving elements b[k]
through b[b.length-1] unaffected.
Read(...) tells you how many bytes it put into the array and yes, you can read further; you'll get whatever was already there.
Why the read () method is different in reading the total number of byte?
For example,
int n = System.in.read();
System.out.println("The total bytes are:"+System.in.available());
And in another place we use
byte [] in= new byte[30];
int n = System.in.read(byte);
System.out.println("The total bytes are:"+System.in.available());
And the word Java has been read in both methods
The output of the first method is :
the total bytes are 5
Where the second method is:
the total bytes are 6
What is the differnce in returing bytes between these two methods?
As the Javadoc says of the available() method, it: "Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream."
The exact way a stream determines this count is not strictly defined. In the case of System.in it may use the number of bytes currently available in its internal buffer, or it may delegate the call to the underlying input stream which may be implementation dependent (e.g. by operating system). The only thing you can really determine from the returned value is the number of bytes you should be able to safely read without blocking.
System.in.read() it read value from standard input stream.
byte [] in= new byte[30];
int n = System.in.read(byte);
It reads value from your in array.
Does the read command check the size of the buffer when filling it with data or is there a chance that data is lost because buffer isn't big enough? In other words, if there are ten bytes of data available to be read, will the server continue to store the remaining 2 bytes of data until the next read.
I'm just using 8 as an example here to over dramatise the situation.
InputStream stdout;
...
while(condition)
{
...
byte[] buffer = new byte[8];
int len = stdout.read(buffer);
}
No, read() won't lose any data just because you haven't given it enough space for all the available bytes.
It's not clear what you mean by "the server" here, but the final two bytes of a 10 byte message would be available after the first read. (Or possible, the first read() would only read the first six bytes, leaving four still to read, for example.)