I intitialise a BufferedReader as such:
Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));
where filename is any given string.
When I process the output through a loop as such:
int k;
while((k = reader.read()) != -1){
String entry;
if (dict.containsKey(k))
entry = dict.get(k);
else if (k == mapSize)
entry = w + w.charAt(0);
else
throw new IllegalArgumentException("Bad compressed k: " + k);
this.fos.write(entry);
result += entry;
// Add w+entry[0] to the dictionary.
dict.put(mapSize++, w + entry.charAt(0));
w = entry;
}
it only reads 65536 number of characters before hitting the EOF. Anyone know what's going on here?
You don't need to call ready(). Just read the data or lines
String line;
while((line = reader.readLine()) != null) {
//process, LZW algorithm
}
or
// buffer is redundant if you are reading large blocks.
Reader reader = new InputStreamReader(new FileInputStream(filename), "UTF-8");
char[] buffer = new char[8*1024];
int len;
while((len = reader.read(buffer)) > 0) {
// process text
}
You are attempting to read binary data as character data. that's going to go badly. utf8 is a multi-byte character encoding, which means the number of characters you read from the file may not equal the number of bytes in the file. if you are trying to implement a decompression algorithm, you should be using an InputStream and reading bytes, not chars.
Related
I am using zip4j to extract zip files in Android. I want to read a file from the zip file without saving it somewhere. I have got it working but it adds extra characters towards the end of the file. The extra characters are part of the file earlier.
is = zipFile.getInputStream(fileHeader);
InputStreamReader isr = new InputStreamReader(is, "UTF-8");
ArrayList<String> list = new ArrayList<String>();
char[] buffer = new char[BUFF_SIZE];
while (isr.read(buffer, 0, buffer.length) != -1) {
String ans = new String(buffer);
//strUnzipped += new String(buffer);
strUnzipped += ans;
list.add(ans);
}
I have used the list to see where the extra characters are inserted. It's inserted at the last line. and the extra text if from the previous line. In a way, it's more like the buffer did not get cleared and it replaced only part of the buffer.
The buffer variable has no notion of how many characters that the prior read placed in it. You need to record the result of the read operation and use it in your string construction:
int charsRead;
while ((charsRead = isr.read(buffer, 0, buffer.length)) != -1) {
String ans = new String(buffer, 0, charsRead);
...
}
That, however, is a poor way to read what is presumably text content. If you're trying to build a giant string containing the file content, you could:
StringBuilder sb = new StringBuilder();
while ((charsRead = isr.read(buffer, 0, buffer.length)) != -1) {
sb.append(buffer, 0, charsRead);
}
strUnzipped = sb.toString();
or, if you wanted a List<String> with each entry being a single line from the file then:
LineNumberReader lnr = new LineNumberReader(isr);
String inputLine;
while((inputLine = lnr.readLine()) != null) {
list.add(inputLine);
}
Usual problem. You're not using the value returned by read() correctly.
while (isr.read(buffer, 0, buffer.length) != -1) {
String ans = new String(buffer);
should be
int count;
while ((count = isr.read(buffer)) != -1) {
String ans = new String(buffer, 0, count);
it's more like the buffer did not get cleared and it replaced only part of the buffer.
It's more like the buffer stayed how it was beyond the read count that was advised. Buffers don't get 'cleared'.
I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.
I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!
The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.
Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}
Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}
For reading any input stream to a buffer there are two methods. Can someone help me understand which is the better method and why? And in which situation we should use each method?
Reading line by line and appending it to the buffer.
Eg:
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException{
StringBuffer buffer = strBuffer;
InputStreamReader isr = null;
try {
isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = null;
while ((line = br.readLine()) != null) {
buffer.append(line + "\n");
}
} finally {
if (is != null) {
is.close();
}
if (isr != null) {
isr.close();
}
}
return buffer.toString();
}
Reading up to buffer size ie 1024 bytes in a char array.
Eg:
InputStreamReader isr = new InputStreamReader(is);
final int bufferSize = 1024;
char[] buffer = new char[bufferSize];
StringBuffer strBuffer = new StringBuffer();
/* read the base script into string buffer */
try {
while (true) {
int read = isr.read(buffer, 0, bufferSize);
if (read == -1) {
break;
}
strBuffer.append(buffer, 0, read);
}
} catch (IOException e) {
}
Consider
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException {
StringBuilder sb = new StringBuilder(strBuffer);
try (BufferedReader rdr = new BufferedReader(new InputStreamReader(is))) {
for (int c; (c = rdr.read()) != -1;) {
sb.append((char) c);
}
}
return sb.toString();
}
Depends on the purpose.
For work with text files read lines (if you need them).
For work with raw binary data use chunks of bytes.
In you examples chunks of bytes are more robust.
What if a line is too long and breaks some of intermediate objects?
If your file is binary, do you know how big a line will be?
May be the size of file.
Trying to "swallow" too big String may cause ErrorOutOfMemory.
With 1024 bytes it (ok - almost) never happens.
Chunking by 1024 bytes may take longer, but its more reliable.
Using 'readLine' isn't so neat. The asker's method 2 is quite standard, but the below method is unique (and likely better):
//read the whole inputstream and put into a string
public String inputstream2str(InputStream stream) {
Scanner s = new Scanner(stream).useDelimiter("\\A");
return s.hasNext()? s.next():"";
}
From a String you can convert to byte array or whatever buffer you want.
I need to read char[] (size is COUNT) from text file from OFFSET with specified Charset. COUNT and OFFSET are in characters, not in bytes.
He is my code:
raf = new RandomAccessFile(filePath, "r");
if ((mBuffer == null) || (mBuffer.length < count)) {
mBuffer = new byte[(int)(count/mDecoder.averageCharsPerByte())];
mByteWrap = ByteBuffer.wrap(mBuffer);
mCharBuffer = new char[count];
mCharWrap = CharBuffer.wrap(mCharBuffer);
}
try {
offset = (int)(offset/mDecoder.averageCharsPerByte());
count = (int)(count/mDecoder.averageCharsPerByte());
raf.seek(offset);
raf.read(mBuffer,0,count);
mByteWrap.position(0);
mCharWrap.position(0);
mDecoder.decode(mByteWrap, mCharWrap, true);
} catch (IOException e) {
return null;
}
return mCharBuffer;
Is there any way easier ? (without manual matching char->byte)
I was looking about java.util.Scanner, but it's Iterator-style, and i need random access-style.
PS data should'n be copied many times
Use BufferedReader's skip() method.
In your case:
BufferedReader reader = new BufferedReader(new FileReader(filePath));
reader.skip(n); // chars to skip
// .. and here you can start reading
And if you want specify a particular encoding you can use
InputStream is = new FileInputStream(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(is,"UTF-8"));
reader.skip(n); // chars to skip
// .. and here you can start reading
you can use read(byte[] b, int off, int len) of BufferedInputStream
here the off is offset (point from where you want to start reading)
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedInputStream.html#read%28byte[],%20int,%20int%29