Read file segment via Random Access File - java

I'm using RandomAccessFile for writing segments. And now I want to read some file segments but have problems with the end of the reading.
For example, I want to read one file page (each page contains 512 bytes).
var totalRead = 0;
var readingByte = 0;
val bytesToRead = 512; // Each file page - 512 bytes
var randomAccessFile = new RandomAccessFile(dbmsFile, "rw");
randomAccessFile.seek(pageId * PAGE_SIZE); // Start reading from chosen page (by pageId)
valr stringRepresentation = new StringBuilder("");
while (totalRead < bytesToRead) {
readingBytes = randomAccessFile.read();
totalRead += readingBytes;
stringRepresentation.append((char) readingBytes);
}
But this approach is not right, because actually, it's reading the non-full pages, just a small part of it. Because 512 - around 41 file records. And just because I'm parsing it symbol by symbol, it cannot be correct. How can I do it better?

Your code is adding the value of the byte to totalRead rather than increment by 1, so it will count to 512 much faster than expected and miss a section of data. The loop should check for and exit if randomAccessFile.read() returns -1 / EOF:
while (totalRead < bytesToRead && (readingByte = randomAccessFile.read()) != -1) {
totalRead++;
stringRepresentation.append((char) readingByte);
}
Note that you this code may not handle all byte to char conversions correctly as it casts a byte to char.

Related

Base64 encode file by chunks

I want to split a file into multiple chunks (in this case, trying lengths of 300) and base64 encode it, since loading the entire file to memory gives a negative array exception when base64 encoding it. I tried using the following code:
int offset = 0;
bis = new BufferedInputStream(new FileInputStream(f));
while(offset + 300 <= f.length()){
byte[] temp = new byte[300];
bis.skip(offset);
bis.read(temp, 0, 300);
offset += 300;
System.out.println(Base64.encode(temp));
}
if(offset < f.length()){
byte[] temp = new byte[(int) f.length() - offset];
bis.skip(offset);
bis.read(temp, 0, temp.length);
System.out.println(Base64.encode(temp));
}
At first it appears to be working, however, at one point it switches to just printing out "AAAAAAAAA" and fills up the entire console with it, and the new file is corrupted when decoded. What could be causing this error?
skip() "Skips over and discards n bytes of data from the input stream", and read() returns "the number of bytes read".
So, you read some bytes, skip some bytes, read some more, skip, .... eventually reaching EOF at which point read() returns -1, but you ignore that and use the content of temp which contains all 0's, that are then encoded to all A's.
Your code should be:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0)
System.out.println(Base64.encode(temp, 0, len));
}
This code reuses the single buffer allocated before the loop, so it will also cause much less garbage collection than your code.
If Base64.encode doesn't have a 3 parameter version, do this:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0) {
byte[] data;
if (len == temp.length)
data = temp;
else {
data = new byte[len];
System.arraycopy(temp, 0, data, 0, len);
}
System.out.println(Base64.encode(data));
}
}
Be sure to use a buffer size that is a multiple of 3 for encoding and a multiple of 4 for decoding when using chunks of data.
300 fulfills both, so that is already OK. Just as an info for those trying different buffer sizes.
Keep in mind, that reading from a stream into a buffer can in some cicumstances result in a buffer not being fully filled, even though the end of the stream was not yet reached. Might be possible when reading from an internet stream and a timeout occures.
You can heal that, but taking that into account would lead to much more complex coding, that would not be educational anymore.

Java Read File Larger than 2 GB (Using Chunking)

I'm implementing a file transfer server, and I've run into an issue with sending a file larger than 2 GB over the network. The issue starts when I get the File I want to work with and try to read its contents into a byte[]. I have a for loop :
for(long i = 0; i < fileToSend.length(); i += PACKET_SIZE){
fileBytes = getBytesFromFile(fileToSend, i);
where getBytesFromFile() reads a PACKET_SIZE amount of bytes from fileToSend which is then sent to the client in the for loop. getBytesFromFile() uses i as an offset; however, the offset variable in FileInputStream.read() has to be an int. I'm sure there is a better way to read this file into the array, I just haven't found it yet.
I would prefer to not use NIO yet, although I will switch to using that in the future. Indulge my madness :-)
It doesn't look like you're reading data from the file properly. When reading data from a stream in Java, it's standard practice to read data into a buffer. The size of the buffer can be your packet size.
File fileToSend = //...
InputStream in = new FileInputStream(fileToSend);
OutputStream out = //...
byte buffer[] = new byte[PACKET_SIZE];
int read;
while ((read = in.read(buffer)) != -1){
out.write(buffer, 0, read);
}
in.close();
out.close();
Note that, the size of the buffer array remains constant. But-- if the buffer cannot be filled (like when it reaches the end of the file), the remaining elements of the array will contain data from the last packet, so you must ignore these elements (this is what the out.write() line in my code sample does)
Umm, realize that your handling of the variable i is not correct..
Iteration 0: i=0
Iteration 1: i=PACKET_SIZE
...
...
Iteration n: i=PACKET_SIZE*n

Load text file to memory in Java

I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.
I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.
My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.
It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.
I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.

java: error checking php output

Hi i have a problem i'm not able to solve.
In my Android\java application i call a script download.php. Basically it gives a file in output that i download and save on my device. I had to add a control on all my php scripts that basically consist in sending a token to the script and check if it's valid or not. If it's a valid token i will get the output (in this case a file in the other scripts a json file) if it's not i get back a string "false".
To check this condition in my other java files i used IOUtils method to turn the input stream to a String, check it, and than
InputStream newInputStream = new ByteArrayInputStream(mystring.getBytes("UTF-8"));
to get a valid input stream again and read it......it works with my JSon files, but not in this case......i get this error:
11-04 16:50:31.074: ERROR/AndroidRuntime(32363):
java.lang.OutOfMemoryError
when i try IOUtils.toString(inputStream, "UTF-8");
I think it's because in this case i'm trying to download really long file.
fileOutput = new BufferedOutputStream(new FileOutputStream(file,false));
inputStream = new BufferedInputStream(conn.getInputStream());
String result = IOUtils.toString(inputStream, "UTF-8");
if(result.equals("false"))
{
return false;
}
else
{
Reader r = new InputStreamReader(MyMethods.stringToInputStream(result));
int totalSize = conn.getContentLength();
int downloadedSize = 0;
byte[] buffer = new byte[1024];
int bufferLength = 0;
while ( (bufferLength = inputStream.read(buffer)) > 0 )
{
fileOutput.write(buffer, 0, bufferLength);
downloadedSize += bufferLength;
}
fileOutput.flush();
fileOutput.close();
Don't read the stream as a string to start with. Keep it as binary data, and start off by just reading the first 5 bytes. You can then check whether those 5 bytes are the 5 bytes used to encode "false" in UTF-8, and act accordingly if so. Otherwise, write those 5 bytes to the output file and then do the same looping/reading/writing as before. Note that to read those 5 bytes you may need to loop (however unlikely that seems). Perhaps your IOUtils class has something to say "read at least 5 bytes"? Will the real content ever be smaller than 5 bytes?
To be honest, it would be better if you could use a header in the response to indicate the different result, instead of just a body with "false" - are you in control of the PHP script?

Java - Comparing bytes in files that have weird contents

I have a database dump program that writes out flat files of a table in a very specific format. I now need to test this against our old program and confirm the produced files are identical. Doing this manually is painful, so I need to write some unit tests.
I need to compare two file contents byte by byte, and see the first difference. The issue is they have all manner of crazy bytes with CF/LF/null's etc littered throughout.
Here is a screenshot of the two files fro Scite to give you an idea:
http://imageshack.us/photo/my-images/840/screenshot1xvt.png/
What is the best strategy for confirming each byte corresponds?
Apache Commons IO has a FileUtils.contentEquals(File file1, File file2) method that seems to do what you want. Pros:
Looks efficient -- reads the file contents using a buffered stream, doesn't even open the files if the lengths are different.
Convenient.
Con:
Won't give you details about where the differences are. It sounds from your comment like you want this.
I would say your best bet is to just download the source code, see what they're doing, and then enhance it to print out the line numbers. The hard part will be figuring out which line you're on. By reading at the byte level, you will have to explicitly check for \r, \n, or \r\n and then increment your own "line number" counter. I also don't know what kind of i18n issues (if any) you'll run into.
class DominicFile {
static boolean equalfiles(File f1, File f2) {
byte[] b1 = getBytesFromFile(f1);
byte[] b2 = getBytesFromFile(f2);
if(b1.length != b2.length) return false;
for(int i = 0; i < b1.length; i++) {
if(b1[i] != b2[i]) return false;
}
return true;
}
// returns the index (0 indexed) of the first difference, or -1 if identical
// fails for files 2G or more due to limitations of "int"... use long if needed
static int firstDiffBetween(File f1, File f2) {
byte[] b1 = getBytesFromFile(f1);
byte[] b2 = getBytesFromFile(f2);
int shortest = b1.length;
if(b2.length < shortest) shortest = b2.length;
for(int i = 0; i < shortest; i++) {
if(b1[i] != b2[i]) return i;
}
return -1;
}
// Returns the contents of the file in a byte array.
// shamelessly stolen from http://www.exampledepot.com/egs/java.io/file2bytearray.html
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
}
Why not do an MD5 checksum, like the one describe here

Categories