Identify the current character being read in "read" method of fileinputstream? - java

FileInputStream in = new FileInputStream("filetoreadfrom.txt");
while ((c = in.read()) != -1) {
Integer cobj = new Integer(c);
System.out.println("The Current data being read is :" + cobj.byteValue());
out.write(c);
}
The sysouts give an intvalue representing the byte being read.But i want to print the exact character being read.Is there a way to do it?

In InputStream contains bytes, not characters. What does it even mean to talk about the "character" when you're in the middle of an mp3 file for example?
If you want to read text data, you need a Reader, e.g. an InputStreamReader wrapped around an InputStream with a specific encoding.

Try the type conversion (char) cobj.byteValue()

It's better to use BufferedReader and InputStreamReader but you can also use such code:
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(inputFile));
byte[] buffer = new byte[4096];
int len;
while ((len = bis.read(buffer)) >= 0) {
String line = new String(buffer, 0, len);

Related

Inflate output to file performance improvement

I am following code similar to below. Looking around at different implementations it seems that most people are performing the same operations by doing the byte copy. Is there possible a faster way to handle inflating from a file and printing back out to file?
public static String unzipString(InputStream in) {
try {
int length = (int) in.readUBits( 16 );
// Add extra byte to array when Inflater is set to true
byte[] data = in.read( length );
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
FileoutputStream bout = new FileoutputStream(this.file);
int b;
while ((b = in.read()) != -1) {
bout.write(b);
}
bout.close();
} catch (IOException io) {
return null;
}
}
copying one byte at a time is always going to be a very slow way to process a file. I suggest you use a buffer of say 8 KB instead.
try (FileOutputStream fout = new FileOutputStream(this.file)) {
byte[] bytes = new byte[8192];
for (int len; (len = in.read(bytes)) != -1;)
fout.write(b, 0, len);
}
BTW To make it faster you could avoid copying the byte[] in the first place with InputStream which wraps in but reads exactly length bytes.

InputStream returns unexpected -1/empty

I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.

Counting bytes consumed by char streams

I have a large text file (csv) on disk that I'm splitting into lines. Something like this:
BufferedReader reader = new BufferedReader(new FileReader(file));
while ((line = reader .readLine()) != null) {
...
}
What I want to do is compute the offset from the start of the file for every 1,000 lines say, so if in the future I want to read the 10,001th line, I can jump straight to offset X, then start iterating.
The file could be encoded in any way, so there is no strong relationship between bytes and chars.
Does anyone know of any "counting readers", or an alternative approach? I'm very happy to implement a Reader myself, but don't want to write a very complex class if I can avoid it.
When you need random access, BufferedReader is not suited. Instead, you need to look into Channel and its subclasses like FileChannel and so on.
Simple example of reading using a channel:
RandomAccessFile aFile = new RandomAccessFile("data/nio-data.txt", "rw");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buf = ByteBuffer.allocate(48);
int bytesRead = inChannel.read(buf);
while (bytesRead != -1) {
System.out.println("Read " + bytesRead);
buf.flip();
while(buf.hasRemaining()){
System.out.print((char) buf.get());
}
buf.clear();
bytesRead = inChannel.read(buf);
}
aFile.close();
Source: http://tutorials.jenkov.com/java-nio/channels.html
As for your question of reading from where you left off, FileChannel defines a method read(ByteBuffer buf,int position) where position is the position in bytes where yu want to read from.

Java: File to String, problems with using a buffer, byte array not clean

Take the following static method:
public static String fileToString(String filename) throws Exception {
FileInputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[8192];
StringBuffer sb = new StringBuffer();
int bytesRead; // unused? weird compiler messages...
while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
sb.append(new String(buffer));
}
return new String(sb);
}
As you can see everything looks okay, and it is perfect for small text files. But once you get to big files with thousands of lines, you encounter problems with repeating text. Based on my intuition, I thoughtbyte[] buffer was "unclean", so to speak. So I added the following line to the method:
buffer = new byte[8192];
So that it is now:
public static String fileToString(String filename) throws Exception {
FileInputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[8192];
StringBuffer sb = new StringBuffer();
int bytesRead; // unused? weird compiler messages...
while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
sb.append(new String(buffer));
buffer = new byte[8192]; // added new line here
}
return new String(sb);
}
And it's perfect, except for the fact that at the end of the String that the static method returns, I get a lot of null characters (depends on the buffer size). What's going on here?
actually: // unused? weird compiler messages...
is not weird. You never read this.
how could sb.append(new String(buffer)); know how many bytes are written to the buffer.
Exactly, this is where bytesRead comes into play.
So you need new String(bytes, offset, length)
public static String fileToString(String filename) throws Exception {
FileInputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[8192];
StringBuffer sb = new StringBuffer();
int bytesRead; // unused? weird compiler messages...
while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
sb.append(new String(buffer,0,bytesRead));
buffer = new byte[8192];
bytesRead=0;
}
return new String(sb);
}
might work
You really shouldnt be reading bytes and creating a String from the raw bytes. THis is wrong because it completely ignores the encoding of the text. You might be lucky and be reading ASCII in which case things will just work out. In all other cases this is asking for trouble.
You really should use a BufferedReader which wraps an InputStreamReader which wraps your source InputStream.
Don't reinvent wheel. If you are not doing a school homework, use existing library like Apache commons IO.
http://commons.apache.org/io/apidocs/org/apache/commons/io/IOUtils.html#toString%28java.io.InputStream,%20java.nio.charset.Charset%29
For example you can read the File into a String in just a few lines like following:
public static String fileToString(String filepath) throws Exception {
return IOUtils.toString(new FileInputStream(filepath), "utf-8");
}
This will save you from lot of hand -written custom code and possibly have much lesser bugs.

Java IO classes - troubles with file IO

I intitialise a BufferedReader as such:
Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));
where filename is any given string.
When I process the output through a loop as such:
int k;
while((k = reader.read()) != -1){
String entry;
if (dict.containsKey(k))
entry = dict.get(k);
else if (k == mapSize)
entry = w + w.charAt(0);
else
throw new IllegalArgumentException("Bad compressed k: " + k);
this.fos.write(entry);
result += entry;
// Add w+entry[0] to the dictionary.
dict.put(mapSize++, w + entry.charAt(0));
w = entry;
}
it only reads 65536 number of characters before hitting the EOF. Anyone know what's going on here?
You don't need to call ready(). Just read the data or lines
String line;
while((line = reader.readLine()) != null) {
//process, LZW algorithm
}
or
// buffer is redundant if you are reading large blocks.
Reader reader = new InputStreamReader(new FileInputStream(filename), "UTF-8");
char[] buffer = new char[8*1024];
int len;
while((len = reader.read(buffer)) > 0) {
// process text
}
You are attempting to read binary data as character data. that's going to go badly. utf8 is a multi-byte character encoding, which means the number of characters you read from the file may not equal the number of bytes in the file. if you are trying to implement a decompression algorithm, you should be using an InputStream and reading bytes, not chars.

Categories