I'm trying to learn Java and I came across this practice problem in which I have to create a URL extractor. I am able to stream data and print it. However I'm not really familiar with the buffered reader therefore I need help with creating a buffer of 100 bytes, copying 100 bytes of data from the stream to this byte array, then process this part, then take the next chunk of 100 bytes from the stream and so on....
The following is my code and any help would greatly be appreciated.
I know that what i want needs to be done inside the while loop. I think I need to create a byte array and then store the data into it. It is the how I'm more interested in.
EDIT: I do not need the code sample for anything because I'm trying to learn. Only the description of how I can do this would suffice . Thanks a lot in advance.
Create a byte array (of the size you want) outside your while-loop (you can re-use it that way, so it's faster).
You can use a BufferedInputStream wrapped around your original InputStream instead of a Reader (as Readers can convert bytes to Strings, but we don't need that).
Then you can use the read(byte[]) method of BufferedInputStream to copy the next series of bytes into the array. You can then process the retrieved bytes the way you want.
See the API documentation as a reference of what read(byte[]) does.
As mentioned in the comments, a Reader (and its subclass BufferedReader) is used to read characters not bytes. You should instead use a BufferedInputStream to read into a byte array of the specified size:
public static void main(String[] args) throws IOException {
String website = "thecakestory.com";
Socket client = new Socket(InetAddress.getByName(website), 80);
PrintWriter pw = new PrintWriter(client.getOutputStream());
pw.println("GET /index.php / HTTP/1.1\r\n");
pw.println("Host: " + website);
pw.flush();
BufferedInputStream input = new BufferedInputStream(client.getInputStream());
String x;
int bytesRead;
byte[] contents = new byte[100];
while ((bytesRead = input.read(contents)) != -1) {
x = new String(contents, 0, bytesRead);
System.out.print(x);
}
client.close();
pw.close();
}
Some useful links:
For an introduction to Java IO related stuff, see the Java tutorial page http://docs.oracle.com/javase/tutorial/essential/io/. This should be the starting point for learning about streams, readers, etc.
For the documentation of BufferedInputStream and BufferedReader, see their API reference:
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedInputStream.html
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
Related
I am new to Java IO. Currently, I have these lines of code which generates an input stream based on string.
String sb = new StringBuilder();
for(...){
sb.append(...);
}
String finalString = sb.toString();
byte[] objectBytes = finalString.getBytes(StandardCharsets.UTF_8);
InputStream inputStream = new ByteArrayInputStream(objectBytes);
Maybe, I am misunderstanding something, but is there a better way to generate InputStream from String other than using getBytes()?
For instance, if String is really large, 50MB, and there is no way to create another copy (getBytes() for another 50MB) of it due to resource constraints, it could potentially throw an out of memory error.
I just wanted to know if above lines of code is the efficient way to generate InputStream from String. For instance, is there a way which I can "stream" String into input stream without using additional memory? Like a Reader-like abstraction on top of String?
I think what you're looking for is a StringReader which is defined as:
A character stream whose source is a string.
To use this efficiently, you would need to know exactly where the bytes are located that you wish to read. It supports both random and sequential access, so you can read the entire String, char by char, if you prefer.
You are producing data, actually writing and you want to almost immediately consume the data, reading.
The Unix technique is to pipe the output of one process to the input of an other process. In java one also needs at least two threads. They will synchronize on producing and consuming.
PipedInputStream in = new PipedInputStream();
PipedOutputStream out = new PipedOutputStream(in);
new Thread(() -> writeAllYouveGot(out)).start();
readAllYouveGot(in);
Here I started a Thread for writing with a Runnable that calls some self-defined method on out. Instead of using new Thread you might prefer an ExecutorService.
Piped I/O is rather seldomly used, though the asynchrone behaviour is optimal. One can even set the pipe's size on the PipedInputStream. The reason for that rare usage, is the need for a second thread.
To complete things, one would probably wrap the binary Input/OutputStreams in new InputStreamReader(in, "UTF-8") and new OutputStreamWriter(out, "UTF-8").
Try something like this (no promises about typos:)
BufferedReader reader = new BufferedRead(new InputStreamReader(yourInputStream), Charset.defaultCharset());
final char[] buffer = new char[8000];
int charsRead = 0;
while(true) {
charsRead = reader.read(buffer, 0, 8000);
if (charsRead == -1) {
break;
}
// Do something with buffer
}
The InputStreamReader converts from byte to char, using the Charset. BufferedReader allows you to read blocks of char.
For really large input streams, you may want to process the input in chunks, rather than reading the entire stream into memory and then processing.
I'm working on a string compressor for a school assignment,
There's one bug that I can't seem to work out. The compressed data is being written a file using a FileWriter, represented by a byte array. The compression algorithm returns an input stream so the data flows as such:
piped input stream
-> input stream reader
-> data stored in char buffer
-> data written to file with file writer.
Now, the bug is, that with some very specific strings, the second to last byte in the byte array is written wrong. and it's always the same bit values "11111100".
Every time it's this bit values and always the second to last byte.
Here are some samples from the code:
InputStream compress(InputStream){
//...
//...
PipedInputStream pin = new PipedInputStream();
PipedOutputStream pout = new PipedOutputStream(pin);
ObjectOutputStream oos = new ObjectOutputStream(pout);
oos.writeObject(someobject);
oos.flush();
DataOutputStream dos = new DataOutputStream(pout);
dos.writeFloat(//);
dos.writeShort(//);
dos.write(SomeBytes); // ---Here
dos.flush();
dos.close();
return pin;
}
void write(char[] cbuf, int off, int len){
//....
//....
InputStreamReader s = new InputStreamReader(
c.compress(new ByteArrayInputStream(str.getBytes())));
s.read(charbuffer);
out.write(charbuffer);
}
A string which triggers it is "hello and good evenin" for example.
I have tried to iterate over the byte array and write them one by one, it didn't help.
It's also worth noting that when I tried to write to a file using the output stream in the algorithm itself it worked fine. This design was not my choice btw.
So I'm not really sure what i'm doing wrong here.
Considering that you're saying:
Now, the bug is, that with some very specific strings, the second to
last byte in the byte array is written wrong. and it's always the same
bit values "11111100".
You are taking a
binary stream (the compressed data)
-> reading it as chars
-> then writing it as chars.
And your are converting bytes to chars without clearly defining the encoding.
I'd say that the problem is that your InputStreamReader is translating some byte sequences in a way that you're not expecting.
Remember that in encodings like utf-8 two or three bytes may become one single char.
It can't be coincidence that the very byte pattern you pointed out (11111100) Is one of the utf-8 escape codes (1111110x). Check this wikipedia table at and you'll see that uft-8 is destructive since if a byte starts with: 1111110x the next must start with 10xxxxxx.
Meaning that if using utf-8 to convert
bytes1[] -> chars[] -> bytes2[]
in some cases bytes2 will be different from bytes1.
I recommend changing your code to remove those readers. Or specify ASCII encoding to see if that prevent the translations.
I solved this by encoding and decoding the bytes with Base64.
Using java.net, java.io, what is the fastest way to parse html from online, and load it to a file or the console? Is buffered writer/buffered reader faster than inputstreamreader/outputstreamwriter? Are writers and readers faster than outputstreams and inputstreams?
I am experiencing serious lag with the following output writer/stream:
URLConnection ii;
BufferedReader iik = new BufferedReader(new InputStreamReader(ii.getInputStream()));
String op;
while(iik.readLine()!=null) {
op=iik.readLine();
System.out.println(op);
}
But curiously i am experiencing close to no lagtime with the following code:
URLConnection ii=i.openConnection();
Reader xh=new InputStreamReader(ii.getInputStream());
int r;
Writer xy=new PrintWriter(System.out);
while((r=xh.read())!=-1) {
xy.write(r);
}
xh.close();
xy.close();
What is going on here?
Your first snippet is wrong: it reads the next line, tests if it's null, ignores it, then reads the next line without testing if it's null, and prints it.
The second code prints the integer value of every char read from the reader.
Both snippets use the same underlying streams and readers, and, if coded correctly, the first one should probably be a bit faster thanks to buffering. But of course, you'll have something printed on the screen only when the line is ended. If the server sends a single line of text of 10 MBs, you'll have to read the whole 10 MBs before something is printed to the screen.
Make sure to close the readers in finally blocks.
Readers/Writers shouldn't be inherently faster than Input/OutputStreams.
That said, going through readLine() and println() probably isn't the optimal way of transferring bytes. In your case, if the file you're loading doesn't contain many newline characters, BufferedReader will have to buffer a lot of data before readLine() will return.
The canonical non-terrible way of transferring data between streams is doing it in chunks by using a buffer:
byte[] buf = new byte[1<<12];
InputStream in = urlConnection.getInputStream();
int read = -1;
while ((read = in.read(buf) != -1) {
System.out.write(buf, 0, read);
}
It might be faster yet to use NIO, the code for it is a little less straightforward and I just use the one found in this blog post.
If you're writing to/from a file, the best method is to use a zero-copy approach, which Java makes available with FileChannel.transferFrom() and transferTo(). Sample code is available in a DeveloperWorks article.
I am working on a project and have a question about Java sockets. The source file which can be found here.
After successfully transmitting the file size in plain text I need to transfer binary data. (DVD .Vob files)
I have a loop such as
// Read this files size
long fileSize = Integer.parseInt(in.readLine());
// Read the block size they are going to use
int blockSize = Integer.parseInt(in.readLine());
byte[] buffer = new byte[blockSize];
// Bytes "red"
long bytesRead = 0;
int read = 0;
while(bytesRead < fileSize){
System.out.println("received " + bytesRead + " bytes" + " of " + fileSize + " bytes in file " + fileName);
read = socket.getInputStream().read(buffer);
if(read < 0){
// Should never get here since we know how many bytes there are
System.out.println("DANGER WILL ROBINSON");
break;
}
binWriter.write(buffer,0,read);
bytesRead += read;
}
I read a random number of bytes close to 99%. I am using Socket, which is TCP based,
so I shouldn't have to worry about lower layer transmission errors.
The received number changes but is always very near the end
received 7258144 bytes of 7266304 bytes in file GLADIATOR/VIDEO_TS/VTS_07_1.VOB
The app then hangs there in a blocking read. I am confounded. The server is sending the correct
file size and has a successful implementation in Ruby but I can't get the Java version to work.
Why would I read less bytes than are sent over a TCP socket?
The above is because of a bug many of you pointed out below.
BufferedReader ate 8Kb of my socket's input. The correct implementation can be found
Here
If your in is a BufferedReader then you've run into the common problem with buffering more than needed. The default buffer size of BufferedReader is 8192 characters which is approximately the difference between what you expected and what you got. So the data you are missing is inside BufferedReader's internal buffer, converted to characters (I wonder why it didn't break with some kind of conversion error).
The only workaround is to read the first lines byte-by-byte without using any buffered classes readers. Java doesn't provide an unbuffered InputStreamReader with readLine() capability as far as I know (with the exception of the deprecated DataInputStream.readLine(), as indicated in the comments below), so you have to do it yourself. I would do it by reading single bytes, putting them into a ByteArrayOutputStream until I encounter an EOL, then converting the resulting byte array into a String using the String constructor with the appropriate encoding.
Note that while you can't use a BufferedInputReader, nothing stops you from using a BufferedInputStream from the very beginning, which will make byte-by-byte reads more efficient.
Update
In fact, I am doing something like this right now, only a bit more complicated. It is an application protocol that involves exchanging some data structures that are nicely represented in XML, but they sometimes have binary data attached to them. We implemented this by having two attributes in the root XML: fragmentLength and isLastFragment. The first one indicates how much bytes of binary data follow the XML part and isLastFragment is a boolean attribute indicating the last fragment so the reading side knows that there will be no more binary data. XML is null-terminated so we don't have to deal with readLine(). The code for reading looks like this:
InputStream ins = new BufferedInputStream(socket.getInputStream());
while (!finished) {
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int b;
while ((b = ins.read()) > 0) {
buf.write(b);
}
if (b == -1)
throw new EOFException("EOF while reading from socket");
// b == 0
Document xml = readXML(new ByteArrayInputStream(buf.toByteArray()));
processAnswers(xml);
Element root = xml.getDocumentElement();
if (root.hasAttribute("fragmentLength")) {
int length = DatatypeConverter.parseInt(
root.getAttribute("fragmentLength"));
boolean last = DatatypeConverter.parseBoolean(
root.getAttribute("isLastFragment"));
int read = 0;
while (read < length) {
// split incoming fragment into 4Kb blocks so we don't run
// out of memory if the client sent a really large fragment
int l = Math.min(length - read, 4096);
byte[] fragment = new byte[l];
int pos = 0;
while (pos < l) {
int c = ins.read(fragment, pos, l - pos);
if (c == -1)
throw new EOFException(
"Preliminary EOF while reading fragment");
pos += c;
read += c;
}
// process fragment
}
Using null-terminated XML for this turned out to be a really great thing as we can add additional attributes and elements without changing the transport protocol. At the transport level we also don't have to worry about handling UTF-8 because XML parser will do it for us. In your case you're probably fine with those two lines, but if you need to add more metadata later you may wish to consider null-terminated XML too.
Here is your problem. The first few lines of the program your using in.readLine() which is probably some sort of BufferedReader. BufferedReaders will read data off the socket in 8K chunks. So when you did the first readLine() it read the first 8K into the buffer. The first 8K contains your two numbers followed by newlines, then some portion of the head of the VOB file (that's the missing chunk). Now when you switched to using the getInputStream() off the socket you are 8K into the transmission assuming your starting at zero.
socket.getInputStream().read(buffer); // you can't do this without losing data.
While the BufferedReader is nice for reading character data, switching between binary and character data in a stream is not possible with it. You'll have to switch to using InputStream instead of Reader and convert the first few portions by hand to character data. If you read the file using a buffered byte array you can read the first chunk, look for your newlines and convert everything to the left of that to character data. Then write everything to the right to your file, then start reading the rest of the file.
This used to be easier with DataInputStream, but it doesn't do a good job handling character conversion for you (readLine is deprecated with BufferedReader being the only replacement - doh). Probably should write a DataInputStream replacement that under the covers uses Charset to properly handle string conversion. Then switching between characters and binary would be easier.
Your basic problem is that BufferedReader will read as much data is available and place in its buffer. It will give you the data as you ask for it. This is the whole point of buffereing i.e. to reduce the number of calls to the OS. The only safe way to use an buffered input is to use the same buffer over the life of the connection.
In your case, you only use the buffer to read two lines, however it is highly likely that 8192 bytes has been read into the buffer. (The default size of the buffer) Say the first two lines consist of 32 bytes, this leaves 8160 waiting for you to read, however you by-pass the buffer to perform the read() on the socket directly leading to 8160 bytes left in the buffer you end up discarding. (the amount you are missing)
BTW: You should be able to see this in a debugger if you inspect the contents of your buffered reader.
Sergei may have been right about data being lost inside the buffer, but I'm not sure about his explanation. (BufferedReaders don't usually hold onto data inside their buffers. He may be thinking of a problem with BufferedWriters, which can lose data if the underlying stream is shut down prematurely.) [Never mind; I had misread Sergei's answer. The rest of this is valid AFAIK.]
I think you have a problem that's specific to your application. In your client code, you start reading as follows:
public static void recv(Socket socket){
try {
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
//...
int numFiles = Integer.parseInt(in.readLine());
... and you proceed to use in for the start of the exchange. But then you switch to using the raw socket stream:
while(bytesRead > fileSize){
read = socket.getInputStream().read(buffer);
Because in is a BufferedReader, it's already going to have filled its buffer with up to 8192 bytes from the socket input stream. Any bytes that are in that buffer, and which you don't read from in, will be lost. Your app is hanging because it believes that the server is holding onto some bytes, but the server doesn't have them.
The solution is not to do byte-by-byte reads from the socket (ouch! your poor CPU!), but to use the BufferedReader consistently. Or, to use buffering with binary data, change the BufferedReader to a BufferedInputStream that wraps the socket's InputStream.
By the way, TCP is not as reliable as many people assume it to be. For example, when the server socket closes, it's possible for it to have written data into the socket which then gets lost as the socket connection is shutdown. Calling Socket.setSoLinger can help to prevent this problem.
EDIT: Also BTW, you're playing with fire by treating byte and character data as if they're interchangeable, as you do below. If the data really is binary, then the conversion to String risks corrupting the data. Perhaps you want to be writing into a BufferedOutputStream?
// Java is retarded and reading and writing operate with
// fundamentally different types. So we write a String of
// binary data.
fileWriter.write(new String(buffer));
bytesRead += read;
EDIT 2: Clarified (or attempted to clarify :-} the handling of binary vs. String data.
How do I obtain the number of bytes before allocating the byte size of the array 'handsize' as shown below as the incoming ByteArray data are sent in 3 different sizes. Thanks.
BufferedInputStream bais = new
BufferedInputStream(requestSocket.getInputStream());
DataInputStream datainput = new DataInputStream(bais);
//need to read the number of bytes here before proceeding.
byte[] handsize = new byte[bytesize];
datainput.readFully(handsize);
You could use a ByteArrayOutputStream, then you wouldn't have to worry about it.
ByteArrayOutputStream out = new ByteArrayOutputStream();
//write data to output stream
byte[] bytes = out.toByteArray();
There's no way to know how many bytes of data are yet to be received on a socket--knowing this would be tantamount to clairvoyance. If you're using your own protocol for client/server communication, you could send the number of bytes of data as an integer, before sending the actual bytes themselves. Then the receiving side would know how many bytes to expect.
As has been pointed out, the problem as stated is impossible. But why do you need to know? Why not store the data in a variable size structure, like an ArrayList, as you read it? Or maybe even process it as you read?