Whenever I have to work with topics like file-handling or socket-programming, I have to look for sample code on the internet to see how the xxStreamxx,xxReader,xxWriter classes are used.
I want to be able to use them on my own and know how they work.
How do I go about learning that?
The main idea is simple.
Streams are for binary read/write. Readers/Writers are for character read/write (in Java byte is not a char, since char is unicode). If it is possible to convert binary stream into character sequence, there is an appropriate reader for a stream.
For example, FileInputStream extends InputStream is ty read file binary. If this is textual file to read, you wrap this object into InputStreamReader extends Reader providing character set. Now you are able to read characters.
If you want to do readLine() you need to wrap this reader into BufferedReader.
Similarly with writers.
So, the idea is wrapping to get new abilities.
First of all, you have to learn and understand what streams are. If you don't understand the concepts behind them, you will be always copying code. So read the "Basic I/O lesson of the java tutorial": http://docs.oracle.com/javase/tutorial/essential/io/streams.html. A comprehensive presentation about this topic is this one from javapassion.com: http://www.javapassion.com/javase/javaiostream.pdf.
While reading, as I usually told my students: "write code and make mistakes" :-)
In this website you could find a variety of examples on how to write your own streams in Java: http://java.sun.com/developer/technicalArticles/Streams/WritingIOSC/
Just looking at the examples sometimes helps you much more than the explanations...
Please scroll to the middle and bottom of the page.
Related
I have a class (high performance file reader) that implements java.lang.Readable.
how can I make use of it?
I have found zero classes in the JDK that take a Readable as input (e.g., to convert it into something more generally useful). does such a thing exist?
Background:
I wrote a simple CSV reader that I would like to improve the performance of. Then I found a class from another project that maintains 2 buffers, writes to one of those buffers in a background thread, then flips to the other buffer once the first is full, and does the same. From the outside is looks like any other Readable, but the wonderful magical thing about it is this background thread makes sure that you are almost always reading from memory when you access this Readable. I want that. (At present my CSV reader uses a BufferedReader, and hence uses the Reader interface)
thanks in advance!
There's is one class java.util.Scanner which take Readable as the parameter for one of its constructor.
Check this out
java.util.Scanner
My understanding is that this is a common scenario, but Java doesn't have a baked in solution and I've been searching on and off for more than a day now. I have tried the CircularCharBuffer from the Ostermiller library, but that uses some sort of reader that constantly waits for new input, so I couldn't get readline() to detect the end of the content (it would just hang).
So could someone tell me how I could do a conversion? For what it's worth, I'm converting multiple (potentially many) PDF files to raw text using the PDFBox lib. The PDFBox API puts the content onto a Writer, after which I need to get at the content for further processing (so BufferedReader/Writer is not actually essential, but some kind of Reader/Writer). I know that this is possible using StringReader/Writer, but I'm not sure that this is efficient plus I loose the readline() method.
This is a bit like asking how to convert a pig into an elephant ... :-)
OK, there are two ways to address this problem (using the Java libraries):
You can capture the data written to a buffered writer so that it can then be read using a buffered reader. Basically, you do this by:
using your BufferedWriter to write to a StringWriter or CharArrayWriter,
closing it,
extracting the resulting stuff from the SW / CAW as a String, and
wrapping the String in a StringReader,
wrapping the StringReader in a BufferedReader.
You can create a PipedReader / PipedWriter pair and wrap them with BufferedReader and BufferedWriter respectively.
The two approaches both have disadvantages:
The first one requires you to complete the writing before constructing the read side. That means you need space to hold the entire stream content in memory, and you can't do producer-side and consumer-side processing in parallel.
The second one requires you to produce and consume in separate threads ... or risk having the pipeline block permanently.
Conceptually speaking, the Ostermiller library is really an reimplementation of PipeReader / PipeWriter. (And some of the advantages of his reimplementation were mooted in Java 1.6 ... which allows you to specify the pipeline's buffer size. Mark support is interesting, but I can imagine some problems, depending on how you used it.)
You might also be able to find a PipedReader / PipedWriter replacement that uses a flexible buffer that grows and contracts as required. (At least ... this is conceptually possible.)
The CircularCharBuffer from the Ostermiller lib has two methods getWriter() and getReader() to get a reader on the content of a writer, and vice versa. The reason the Reader was hanging at the final readLine() was because I wasn't calling close() on the writer after I had finished writing to it. So the final readLine() was waiting for new content on the writer that was never going to arrive.
The Ostermiller library can be found here.
I'm working with a library that I have to provide an InputStream and a PrintStream. It uses the InputStream to gather data for processing and the PrintStream to provide results. I'm stuck using this library and its API cannot be altered.
There are two issues with this that I think have related solutions.
First, the data that needs to be read via the InputStream is not available upfront. Instead, the data is dynamically created by a different part of the application and given to my code as a String via method call. My code's job is to somehow allow the library to read this data through the InputStream provided as I get it.
Second, I need to somehow get the result that is written to the PrintStream and send it to another part of the application as a String. This needs to happen as immediately after the data is put in to the PrintStream as possible.
What it looks like I need are two stream objects that behave more or less like buffers. I need an InputStream that I can shove data in to whenever I have it and a PrintStream that I can grab it's contents whenever it has some. This seems a little awkward to me, but I'm not sure how else to do it.
I'm wondering if anything already exists that allows this kind of behavior or if there is a different (better) solution that will work in the situation I've described. The only thing I can come up with is to try to implement streams with this behavior, but that can become complicated fast (especially since the InputStream needs to block until data is available).
Any ideas?
Edit: To be clear, I'm not writing the library. I'm writing code that is supposed to provide the library with an InputStream to read data from and a PrintStream to write data to.
Looks like both streams need to be constantly reading/writing so you'll need two threads independent of each other. The pattern resembles JMS a little bit, in which case you're feeding information to a "queue" or "topic", and wait for it to be processed then put on a "output" queue/topic. This may introduce additional moving parts, but you could write a simple client to place info onto a JMS queue, then have a listener to just grab messages, and feed it to the input stream constantly. Then another piece of code to read from output stream, and do what you need with it.
Hope this helps.
CopyBytes seems like a normal program, but it actually represents a kind of low-level I/O that you should avoid. It has been mentioned that there are streams for characters ,objects etc that should be preferred although all are built on the bytestream itself. What is a reason behind this, has it anything to do with security manager and performance related issues?
source : oracle docs
What Oracle is actually saying, is "Please do not reimplement the wheel!".
You should almost never need regular Byte streams:
Are you parsing text? Use a Character stream, which understand text encoding issues.
Are you parsing XML? Use SAX or some other library.
Are you parsing images? Use ImageIO class.
Are you copying things from A to B? Use apache commons-io FileUtils.
There are very few situations where you will actually need to use the bytestream.
From the text you quoted:
CopyBytes seems like a normal program, but it actually represents a kind of low-level I/O that you should avoid. Since xanadu.txt contains character data, the best approach is to use character streams, as discussed in the next section. There are also streams for more complicated data types. Byte streams should only be used for the most primitive I/O.
Usually, you don't want to work with bytes directly. There are higher-level APIs, for example to read text (i.e. character data that has to be decoded from bytes).
It works, but is very inefficient: it needs 2 method calls for every single byte it copies.
Instead, you should use a buffer (of several thousand bytes, the best size varies by what exactly you read and other conditions) and read/write the entire buffer (or as much as possible) with every method call.
I'm writing arbitrary byte arrays (mock virus signatures of 32 bytes) into arbitrary files, and I need code to overwrite a specific file given an offset into the file. My specific question is: is there source code/libraries that I can use to perform this particular task?
I've had this problem with Python file manipulation as well. I'm looking for a set of functions that can kill a line, cut/copy/paste, etc. My assumptions are that these are extremely common tasks, and I couldn't find it in the Java API nor my google searches.
Sorry for not RTFM well; I haven't come across any information, and I've been looking for a while now.
Maybe you are looking for something like the RandomAccessFile class in the standard Java JDK. It supports reads and writes at some offset, as well as byte arrays.
Java's RandomAccessFile is exactly what you want.
It includes methods like seek(long) that allow you to move wherever you need in the file. It also allows for reading and writing at the same time.
As far as I know, Java has primarily lower level functions for manipulating files directly. Here is the best I've come up with
The actions you describe are standard in the Swing world, and for text comes down to manipulating a Document object. These act on data in memory. The class java.nio.channels.FileChannel has similar methods that act directly on a file. Neither fine the end of lines automatically, but other classes in java.io and java.nio do.
Apache Commons has a sandbox library called Flatfile which looks like it does what you want. The problem is that no code has been released yet. You may, however, want to talk to people working on it to get some more ideas. I didn't do a general check on libraries.
Have you looked into File/FileReader/FileWriter/BufferedReader? You can get the contents of the files and manipulate it as you like, you can search the data in the files, you can overwrite files, create new, append to an existing....
I am not sure this is exactly what you are asking for but I use these APIs all the time for logging, RTF editors, text file creation for email, and many other things.
As far as cut/copy/past goes, I have not come across the ability to do that directly, however, you can output the contents of the file and "copy" what part of it you want and "paste" it into a new file, or append it to an existing.
While writing a byte array to a file is a common task, writing to a give file 32-bytes byte array just once is just not something you are going to find in java.io :)
To get started, would the below method and comments look reasonable to you? I bet someone here, maybe even myself, could whip it out quick like.
public static void writeFauxVirusSignature(File file, byte[] bytes, long offset) {
//open file
//move to offset
//write bytes
//close file
}
Questions:
How big could the potential target files be?
Do you need performance?
I ask because clean, easy to read code would use Apache Commons lib's, but large file writes in a performance sensitive environment will necessitate using java.nio libraries