Reading in from text file character by character - java

In Java, is there a way of reading a file (text file) in a way that it would only read one character at a time, rather than String by String. This is for the purpose of an extremely basic lexical analyzer, so you can understand why I'd want such a method. Thank you.

Here's a sample code for reading / writing one character at a time
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}
Note, this answer was updated to copy the sample code from the Ref link, but I see this is essentially the same answer given below.
ref:
http://download.oracle.com/javase/tutorial/essential/io/charstreams.html

You can use the read method from the InputStreamReader class which reads one character from the stream and returns -1 when it reaches the end of the stream
public static void processFile(File file) throws IOException {
try (InputStream in = new FileInputStream(file);
Reader reader = new InputStreamReader(in)) {
int c;
while ((c = reader.read()) != -1) {
processChar((char) c); // this method will do whatever you want
}
}
}

You can read the whole file (if it is not much big) in the memory as string, and iterate on the string character by character

There are several possible solutions. Generally you can use any Reader from java.io package for reading characters, e.g.:
// Read from file
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
// Read from sting
BufferedReader reader = new BufferedReader(new StringReader("Some text"));

Related

Filter an InputStream line-by-line

I am retrieving large gzipped files from Amazon S3. I would like to be able to transform each line of these files on-the-fly and upload the output to another S3 bucket.
The upload API takes an InputStream as input.
S3Object s3object = s3.fetch(bucket, key);
InputStream is = new GZIPInputStream(s3object.getObjectContent());
// . . . ?
s3.putObject(new PutObjectRequest(bucket, key, is, metadata));
I believe that the most efficient way of doing this is to create my own custom input stream which transforms the original input stream into another input stream. I am not very familiar with this approach and curious to find out more.
The basic idea is as follows.
It's not terribly efficient but should get the job done.
public class MyInputStream extends InputStream {
private final BufferedReader input;
private final Charset encoding = StandardCharsets.UTF_8;
private ByteArrayInputStream buffer;
public MyInputStream(InputStream is) throws IOException {
input = new BufferedReader(new InputStreamReader(is, this.encoding));
nextLine();
}
#Override
public int read() throws IOException {
if (buffer == null) {
return -1;
}
int ch = buffer.read();
if (ch == -1) {
if (!nextLine()) {
return -1;
}
return read();
}
return ch;
}
private boolean nextLine() throws IOException {
String line;
while ((line = input.readLine()) != null) {
line = filterLine(line);
if (line != null) {
line += '\n';
buffer = new ByteArrayInputStream(line.getBytes(encoding));
return true;
}
}
return false;
}
#Override
public void close() throws IOException {
input.close();
}
private String filterLine(String line) {
// Filter the line here ... return null to skip the line
// For example:
return line.replace("ABC", "XYZ");
}
}
nextLine() pre-fills the line buffer with a (filtered) line. Then read() (called by the upload job) fetches bytes from the buffer one-by-one and calls nextLine() again to load the next line.
Use as:
s3.putObject(new PutObjectRequest(bucket, key, new MyInputStream(is), metadata));
A performance improvement could be to also implement the int read(byte[] b, int off, int len) method (if cpu use is high) and use a BufferedInputStream in case the S3 client doesn't internally use a buffer (I don't know).
new BufferedReader(is).lines()

InputStream returns unexpected -1/empty

I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.

Best way to read an input stream to a buffer

For reading any input stream to a buffer there are two methods. Can someone help me understand which is the better method and why? And in which situation we should use each method?
Reading line by line and appending it to the buffer.
Eg:
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException{
StringBuffer buffer = strBuffer;
InputStreamReader isr = null;
try {
isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = null;
while ((line = br.readLine()) != null) {
buffer.append(line + "\n");
}
} finally {
if (is != null) {
is.close();
}
if (isr != null) {
isr.close();
}
}
return buffer.toString();
}
Reading up to buffer size ie 1024 bytes in a char array.
Eg:
InputStreamReader isr = new InputStreamReader(is);
final int bufferSize = 1024;
char[] buffer = new char[bufferSize];
StringBuffer strBuffer = new StringBuffer();
/* read the base script into string buffer */
try {
while (true) {
int read = isr.read(buffer, 0, bufferSize);
if (read == -1) {
break;
}
strBuffer.append(buffer, 0, read);
}
} catch (IOException e) {
}
Consider
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException {
StringBuilder sb = new StringBuilder(strBuffer);
try (BufferedReader rdr = new BufferedReader(new InputStreamReader(is))) {
for (int c; (c = rdr.read()) != -1;) {
sb.append((char) c);
}
}
return sb.toString();
}
Depends on the purpose.
For work with text files read lines (if you need them).
For work with raw binary data use chunks of bytes.
In you examples chunks of bytes are more robust.
What if a line is too long and breaks some of intermediate objects?
If your file is binary, do you know how big a line will be?
May be the size of file.
Trying to "swallow" too big String may cause ErrorOutOfMemory.
With 1024 bytes it (ok - almost) never happens.
Chunking by 1024 bytes may take longer, but its more reliable.
Using 'readLine' isn't so neat. The asker's method 2 is quite standard, but the below method is unique (and likely better):
//read the whole inputstream and put into a string
public String inputstream2str(InputStream stream) {
Scanner s = new Scanner(stream).useDelimiter("\\A");
return s.hasNext()? s.next():"";
}
From a String you can convert to byte array or whatever buffer you want.

How to clone an InputStream?

I have a InputStream that I pass to a method to do some processing. I will use the same InputStream in other method, but after the first processing, the InputStream appears be closed inside the method.
How I can clone the InputStream to send to the method that closes him? There is another solution?
EDIT: the methods that closes the InputStream is an external method from a lib. I dont have control about closing or not.
private String getContent(HttpURLConnection con) {
InputStream content = null;
String charset = "";
try {
content = con.getInputStream();
CloseShieldInputStream csContent = new CloseShieldInputStream(content);
charset = getCharset(csContent);
return IOUtils.toString(content,charset);
} catch (Exception e) {
System.out.println("Error downloading page: " + e);
return null;
}
}
private String getCharset(InputStream content) {
try {
Source parser = new Source(content);
return parser.getEncoding();
} catch (Exception e) {
System.out.println("Error determining charset: " + e);
return "UTF-8";
}
}
If all you want to do is read the same information more than once, and the input data is small enough to fit into memory, you can copy the data from your InputStream to a ByteArrayOutputStream.
Then you can obtain the associated array of bytes and open as many "cloned" ByteArrayInputStreams as you like.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// Code simulating the copy
// You could alternatively use NIO
// And please, unlike me, do something about the Exceptions :D
byte[] buffer = new byte[1024];
int len;
while ((len = input.read(buffer)) > -1 ) {
baos.write(buffer, 0, len);
}
baos.flush();
// Open new InputStreams using recorded bytes
// Can be repeated as many times as you wish
InputStream is1 = new ByteArrayInputStream(baos.toByteArray());
InputStream is2 = new ByteArrayInputStream(baos.toByteArray());
But if you really need to keep the original stream open to receive new data, then you will need to track the external call to close(). You will need to prevent close() from being called somehow.
UPDATE (2019):
Since Java 9 the the middle bits can be replaced with InputStream.transferTo:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
input.transferTo(baos);
InputStream firstClone = new ByteArrayInputStream(baos.toByteArray());
InputStream secondClone = new ByteArrayInputStream(baos.toByteArray());
You want to use Apache's CloseShieldInputStream:
This is a wrapper that will prevent the stream from being closed. You'd do something like this.
InputStream is = null;
is = getStream(); //obtain the stream
CloseShieldInputStream csis = new CloseShieldInputStream(is);
// call the bad function that does things it shouldn't
badFunction(csis);
// happiness follows: do something with the original input stream
is.read();
You can't clone it, and how you are going to solve your problem depends on what the source of the data is.
One solution is to read all data from the InputStream into a byte array, and then create a ByteArrayInputStream around that byte array, and pass that input stream into your method.
Edit 1:
That is, if the other method also needs to read the same data. I.e you want to "reset" the stream.
If the data read from the stream is large, I would recommend using a TeeInputStream from Apache Commons IO. That way you can essentially replicate the input and pass a t'd pipe as your clone.
This might not work in all situations, but here is what I did: I extended the FilterInputStream class and do the required processing of the bytes as the external lib reads the data.
public class StreamBytesWithExtraProcessingInputStream extends FilterInputStream {
protected StreamBytesWithExtraProcessingInputStream(InputStream in) {
super(in);
}
#Override
public int read() throws IOException {
int readByte = super.read();
processByte(readByte);
return readByte;
}
#Override
public int read(byte[] buffer, int offset, int count) throws IOException {
int readBytes = super.read(buffer, offset, count);
processBytes(buffer, offset, readBytes);
return readBytes;
}
private void processBytes(byte[] buffer, int offset, int readBytes) {
for (int i = 0; i < readBytes; i++) {
processByte(buffer[i + offset]);
}
}
private void processByte(int readByte) {
// TODO do processing here
}
}
Then you simply pass an instance of StreamBytesWithExtraProcessingInputStream where you would have passed in the input stream. With the original input stream as constructor parameter.
It should be noted that this works byte for byte, so don't use this if high performance is a requirement.
UPD.
Check the comment before. It isn't exactly what was asked.
If you are using apache.commons you may copy streams using IOUtils .
You can use following code:
InputStream = IOUtils.toBufferedInputStream(toCopy);
Here is the full example suitable for your situation:
public void cloneStream() throws IOException{
InputStream toCopy=IOUtils.toInputStream("aaa");
InputStream dest= null;
dest=IOUtils.toBufferedInputStream(toCopy);
toCopy.close();
String result = new String(IOUtils.toByteArray(dest));
System.out.println(result);
}
This code requires some dependencies:
MAVEN
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
GRADLE
'commons-io:commons-io:2.4'
Here is the DOC reference for this method:
Fetches entire contents of an InputStream and represent same data as
result InputStream. This method is useful where,
Source InputStream is slow. It has network resources associated, so we
cannot keep it open for long time. It has network timeout associated.
You can find more about IOUtils here:
http://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/IOUtils.html#toBufferedInputStream(java.io.InputStream)
Below is the solution with Kotlin.
You can copy your InputStream into ByteArray
val inputStream = ...
val byteOutputStream = ByteArrayOutputStream()
inputStream.use { input ->
byteOutputStream.use { output ->
input.copyTo(output)
}
}
val byteInputStream = ByteArrayInputStream(byteOutputStream.toByteArray())
If you need to read the byteInputStream multiple times, call byteInputStream.reset() before reading again.
https://code.luasoftware.com/tutorials/kotlin/how-to-clone-inputstream/
Cloning an input stream might not be a good idea, because this requires deep knowledge about the details of the input stream being cloned. A workaround for this is to create a new input stream that reads from the same source again.
So using some Java 8 features this would look like this:
public class Foo {
private Supplier<InputStream> inputStreamSupplier;
public void bar() {
procesDataThisWay(inputStreamSupplier.get());
procesDataTheOtherWay(inputStreamSupplier.get());
}
private void procesDataThisWay(InputStream) {
// ...
}
private void procesDataTheOtherWay(InputStream) {
// ...
}
}
This method has the positive effect that it will reuse code that is already in place - the creation of the input stream encapsulated in inputStreamSupplier. And there is no need to maintain a second code path for the cloning of the stream.
On the other hand, if reading from the stream is expensive (because a it's done over a low bandwith connection), then this method will double the costs. This could be circumvented by using a specific supplier that will store the stream content locally first and provide an InputStream for that now local resource.
The class below should do the trick. Just create an instance, call the "multiply" method, and provide the source input stream and the amount of duplicates you need.
Important: you must consume all cloned streams simultaneously in separate threads.
package foo.bar;
import java.io.IOException;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class InputStreamMultiplier {
protected static final int BUFFER_SIZE = 1024;
private ExecutorService executorService = Executors.newCachedThreadPool();
public InputStream[] multiply(final InputStream source, int count) throws IOException {
PipedInputStream[] ins = new PipedInputStream[count];
final PipedOutputStream[] outs = new PipedOutputStream[count];
for (int i = 0; i < count; i++)
{
ins[i] = new PipedInputStream();
outs[i] = new PipedOutputStream(ins[i]);
}
executorService.execute(new Runnable() {
public void run() {
try {
copy(source, outs);
} catch (IOException e) {
e.printStackTrace();
}
}
});
return ins;
}
protected void copy(final InputStream source, final PipedOutputStream[] outs) throws IOException {
byte[] buffer = new byte[BUFFER_SIZE];
int n = 0;
try {
while (-1 != (n = source.read(buffer))) {
//write each chunk to all output streams
for (PipedOutputStream out : outs) {
out.write(buffer, 0, n);
}
}
} finally {
//close all output streams
for (PipedOutputStream out : outs) {
try {
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
Enhancing the #Anthony Accioly with the example.
InputStream: Clones the bytes-Stream and provides number of copies as a List Collection.
public static List<InputStream> multiplyBytes(InputStream input, int cloneCount) throws IOException {
List<InputStream> copies = new ArrayList<InputStream>();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
copy(input, baos);
for (int i = 0; i < cloneCount; i++) {
copies.add(new ByteArrayInputStream(baos.toByteArray()));
}
return copies;
}
// IOException - If reading the Reader or Writing into the Writer goes wrong.
public static void copy(Reader in, Writer out) throws IOException {
try {
char[] buffer = new char[1024];
int nrOfBytes = -1;
while ((nrOfBytes = in.read(buffer)) != -1) {
out.write(buffer, 0, nrOfBytes);
}
out.flush();
} finally {
close(in);
close(out);
}
}
Reader: Clones the chars-Stream and provides number of copies as a List Collection.
public static List<Reader> multiplyChars(Reader reader, int cloneCOunt) throws IOException {
List<Reader> copies = new ArrayList<Reader>();
BufferedReader bufferedInput = new BufferedReader(reader);
StringBuffer buffer = new StringBuffer();
String delimiter = System.getProperty("line.separator");
String line;
while ((line = bufferedInput.readLine()) != null) {
if (!buffer.toString().equals(""))
buffer.append(delimiter);
buffer.append(line);
}
close(bufferedInput);
for (int i = 0; i < cloneCOunt; i++) {
copies.add(new StringReader(buffer.toString()));
}
return copies;
}
public static void copy(InputStream in, OutputStream out) throws IOException {
try {
byte[] buffer = new byte[1024];
int nrOfBytes = -1;
while ((nrOfBytes = in.read(buffer)) != -1) {
out.write(buffer, 0, nrOfBytes);
}
out.flush();
} finally {
close(in);
close(out);
}
}
Full Example:
public class SampleTest {
public static void main(String[] args) throws IOException {
String filePath = "C:/Yash/StackoverflowSSL.cer";
InputStream fileStream = new FileInputStream(new File(filePath) );
List<InputStream> bytesCopy = multiplyBytes(fileStream, 3);
for (Iterator<InputStream> iterator = bytesCopy.iterator(); iterator.hasNext();) {
InputStream inputStream = (InputStream) iterator.next();
System.out.println("Byte Stream:"+ inputStream.available()); // Byte Stream:1784
}
printInputStream(bytesCopy.get(0));
//java.sql.Clob clob = ((Clob) getValue(sql)); - clob.getCharacterStream();
Reader stringReader = new StringReader("StringReader that reads Characters from the specified string.");
List<Reader> charsCopy = multiplyChars(stringReader, 3);
for (Iterator<Reader> iterator = charsCopy.iterator(); iterator.hasNext();) {
Reader reader = (Reader) iterator.next();
System.out.println("Chars Stream:"+reader.read()); // Chars Stream:83
}
printReader(charsCopy.get(0));
}
// Reader, InputStream - Prints the contents of the reader to System.out.
public static void printReader(Reader reader) throws IOException {
BufferedReader br = new BufferedReader(reader);
String s;
while ((s = br.readLine()) != null) {
System.out.println(s);
}
}
public static void printInputStream(InputStream inputStream) throws IOException {
printReader(new InputStreamReader(inputStream));
}
// Closes an opened resource, catching any exceptions.
public static void close(Closeable resource) {
if (resource != null) {
try {
resource.close();
} catch (IOException e) {
System.err.println(e);
}
}
}
}

How do I convert an InputStream to a String in Java?

Suppose I have an InputStream that contains text data, and I want to convert this to a String (for example, so I can write the contents of the stream to a log file).
What is the easiest way to take the InputStream and convert it to a String?
public String convertStreamToString(InputStream is) {
// ???
}
If you want to do it simply and reliably, I suggest using the Apache Jakarta Commons IO library IOUtils.toString(java.io.InputStream, java.lang.String) method.
This is my version,
public static String readString(InputStream inputStream) throws IOException {
ByteArrayOutputStream into = new ByteArrayOutputStream();
byte[] buf = new byte[4096];
for (int n; 0 < (n = inputStream.read(buf));) {
into.write(buf, 0, n);
}
into.close();
return new String(into.toByteArray(), "UTF-8"); // Or whatever encoding
}
String text = new Scanner(inputStream).useDelimiter("\\A").next();
The only tricky is to remember the regex \A, which matches the beginning of input. This effectively tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning...
- from the Oracle Blog
Since Java 9 InputStream.readAllBytes() even shorter:
String toString(InputStream inputStream) throws IOException {
return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8); // Or whatever encoding
}
Note: InputStream is not closed in this example.
You can use a BufferedReader to read the stream into a StringBuilder in a loop, and then get the full contents from the StringBuilder:
public String convertStreamToString(InputStream is) {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
Full disclosure: This is a solution I found on KodeJava.org. I am posting it here for comments and critique.
A nice way to do this is using Apache commons IOUtils
IOUtils.toString(inputStream, string);

Categories