Load text file to memory in Java - java

I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.

I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.

My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.

It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.

I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.

Related

Read faster a file & convert it into HEX

I need to read a file that is in ascii and convert it into hex before applying some functions (search for a specific caracter)
To do this, I read a file, convert it in hex and write into a new file. Then I open my new hex file and I apply my functions.
My issue is that it makes way too much time to read and convert it (approx 8sec for a 9Mb file)
My reading method is :
public static void convertToHex2(PrintStream out, File file) throws IOException {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
int value = 0;
StringBuilder sbHex = new StringBuilder();
StringBuilder sbResult = new StringBuilder();
while ((value = bis.read()) != -1) {
sbHex.append(String.format("%02X ", value));
}
sbResult.append(sbHex);
out.print(sbResult);
bis.close();
}
Do you have any suggestions to make it faster ?
Did you measure what your actual bottleneck is? Because you seem to read very little amount of data in your loop and process that each time. You might as well read larger chunks of data and process those, e.g. using DataInputStream or whatever. That way you would benefit more from optimized reads of your OS, file system, their caches etc.
Additionally, you fill sbHex and append that to sbResult, to print that somewhere. Looks like an unnecessary copy to me, because sbResult will always be empty in your case and with sbHex you already have a StringBuilder for your PrintStream.
Try this:
static String[] xx = new String[256];
static {
for( int i = 0; i < 256; ++i ){
xx[i] = String.format("%02X ", i);
}
}
and use it:
sbHex.append(xx[value]);
Formatting is a heavy operation: it does not only the coversion - it also has to look at the format string.

How to read a large json from a text file in android app?

I have a text file in my android app which consist a json. I need to read and parse that json. File size is 21 mb. I am using following code to read file:
StringBuilder stringBuilder = new StringBuilder();
InputStream input = getAssets().open(filename);
int size = input.available();
byte[] buffer = new byte[size];
byte[] tempBuffer = new byte[1024];
int tempBufferIndex = 0;
for(int i=0; i<size; i++){
if(i == 0){
tempBuffer[tempBufferIndex] = buffer[i];
}else{
int mod = 1024 % i;
if(mod == 0){
input.read(tempBuffer);
stringBuilder.append(new String(tempBuffer));
tempBufferIndex = 0;
}
tempBuffer[tempBufferIndex] = buffer[i];
}
}
input.close();
Size int is 20949874 in real case. After loop is done stringBuilder length is always 11264 even if i change range of for loop. I tried to make one String from InputStream without using loop but it always gives me OutOfMemoryError Exception. I also get "Grow heap (frag case) to 26.668MB for 20949890-byte allocation" in my logs. I searched here and tried different solutions but did not make it work. Any idea how should i solve this issue. Thanks in advance.
For big json files you should use SAX parser and not DOM. For example JsonReader.
DOM (“Document Object Model”) loads the entire content into memory and permits the developer to query the data as they wish. SAX presents the data as a stream: the developer waits for their desired pieces of data to appear and saves only the parts they need. DOM is considered easier to use but SAX uses much less memory.
You can try to split the file into several parts.
So during processing the app hopefully doesn't get out of memory.
You should also consider using "largeHeap" flag in your manifest
(See http://developer.android.com/guide/topics/manifest/application-element.html)
I don't know your file, but maybe if you use smaller JSON tags, you can reduce storage as well.

What method is more efficient for concatenating large files in Java using FileChannels

I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.
From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.
I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.
Thanks in advance for your help!!
Using a ByteBuffer:
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
//write the stringBuffer so it goes in the output file first:
buff.put(sb.toString().getBytes());
//create the FileChannels:
inChan = new RandomAccessFile(infile, "r" ).getChannel();
outChan = new RandomAccessFile(outfile, "rw").getChannel();
//read the infile in to the buffer:
inChan.read(buff);
// prep the buffer:
buff.flip();
// write the buffer out to the file via the FileChannel:
outChan.write(buff);
inChan.close();
outChan.close();
} catch...etc
}
Using trasferTo (or transferFrom):
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
//write the stringBuffer so it goes in the output file first:
PrintWriter fw = new PrintWriter(outfile);
fw.write(sb.toString());
fw.flush();
fw.close();
// create the channels appropriate for appending:
outChan = new FileOutputStream(outfile, true).getChannel();
inChan = new RandomAccessFile(infile, "r").getChannel();
long startSize = outfile.length();
long inFileSize = infile.length();
long bytesWritten = 0;
//set the position where we should start appending the data:
outChan.position(startSize);
Byte startByte = outChan.position();
while(bytesWritten < length){
bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
startByte = bytesWritten + 1;
}
inChan.close();
outChan.close();
} catch ... etc
transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.
You do need the loop, one day it will iterate and your code will keep working.

Reading and writing binary file in Java (seeing half of the file being corrupted)

I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!
InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.
Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.

Inserting text into an existing file via Java

I would like to create a simple program (in Java) which edits text files - particularly one which performs inserting arbitrary pieces of text at random positions in a text file. This feature is part of a larger program I am currently writing.
Reading the description about java.util.RandomAccessFile, it appears that any write operations performed in the middle of a file would actually overwrite the exiting content. This is a side-effect which I would like to avoid (if possible).
Is there a simple way to achieve this?
Thanks in advance.
Okay, this question is pretty old, but FileChannels exist since Java 1.4 and I don't know why they aren't mentioned anywhere when dealing with the problem of replacing or inserting content in files. FileChannels are fast, use them.
Here's an example (ignoring exceptions and some other stuff):
public void insert(String filename, long offset, byte[] content) {
RandomAccessFile r = new RandomAccessFile(new File(filename), "rw");
RandomAccessFile rtemp = new RandomAccessFile(new File(filename + "~"), "rw");
long fileSize = r.length();
FileChannel sourceChannel = r.getChannel();
FileChannel targetChannel = rtemp.getChannel();
sourceChannel.transferTo(offset, (fileSize - offset), targetChannel);
sourceChannel.truncate(offset);
r.seek(offset);
r.write(content);
long newOffset = r.getFilePointer();
targetChannel.position(0L);
sourceChannel.transferFrom(targetChannel, newOffset, (fileSize - offset));
sourceChannel.close();
targetChannel.close();
}
Well, no, I don't believe there is a way to avoid overwriting existing content with a single, standard Java IO API call.
If the files are not too large, just read the entire file into an ArrayList (an entry per line) and either rewrite entries or insert new entries for new lines.
Then overwrite the existing file with new content, or move the existing file to a backup and write a new file.
Depending on how sophisticated the edits need to be, your data structure may need to change.
Another method would be to read characters from the existing file while writing to the edited file and edit the stream as it is read.
If Java has a way to memory map files, then what you can do is extend the file to its new length, map the file, memmove all the bytes down to the end to make a hole and write the new data into the hole.
This works in C. Never tried it in Java.
Another way I just thought of to do the same but with random file access.
Seek to the end - 1 MB
Read 1 MB
Write that to original position + gap size.
Repeat for each previous 1 MB working toward the beginning of the file.
Stop when you reach the desired gap position.
Use a larger buffer size for faster performance.
You can use following code:
BufferedReader reader = null;
BufferedWriter writer = null;
ArrayList list = new ArrayList();
try {
reader = new BufferedReader(new FileReader(fileName));
String tmp;
while ((tmp = reader.readLine()) != null)
list.add(tmp);
OUtil.closeReader(reader);
list.add(0, "Start Text");
list.add("End Text");
writer = new BufferedWriter(new FileWriter(fileName));
for (int i = 0; i < list.size(); i++)
writer.write(list.get(i) + "\r\n");
} catch (Exception e) {
e.printStackTrace();
} finally {
OUtil.closeReader(reader);
OUtil.closeWriter(writer);
}
I don't know if there's a handy way to do it straight otherwise than
read the beginning of the file and write it to target
write your new text to target
read the rest of the file and write it to target.
About the target : You can construct the new contents of the file in memory and then overwrite the old content of the file if the files handled aren't so big. Or you can write the result to a temporary file.
The thing would probably be easiest to do with streams, RandomAccessFile doesn't seem to be meant for inserting in the middle (afaik). Check the tutorial if you need.
I believe the only way to insert text into an existing text file is to read the original file and write the content in a temporary file with the new text inserted. Then erase the original file and rename the temporary file to the original name.
This example is focused on inserted a single line into an existing file, but still maybe of use to you.
If it is a text file,,,,Read the existing file in StringBuffer and append the new content in the same StringBuffer now u can write the SrtingBuffer on file. so now the file contains both the existing and new text.
As #xor_eq answer's edit queue is full, here in a new answer a more documented and slightly improved version of his:
public static void insert(String filename, long offset, byte[] content) throws IOException {
File temp = Files.createTempFile("insertTempFile", ".temp").toFile(); // Create a temporary file to save content to
try (RandomAccessFile r = new RandomAccessFile(new File(filename), "rw"); // Open file for read & write
RandomAccessFile rtemp = new RandomAccessFile(temp, "rw"); // Open temporary file for read & write
FileChannel sourceChannel = r.getChannel(); // Channel of file
FileChannel targetChannel = rtemp.getChannel()) { // Channel of temporary file
long fileSize = r.length();
sourceChannel.transferTo(offset, (fileSize - offset), targetChannel); // Copy content after insert index to
// temporary file
sourceChannel.truncate(offset); // Remove content past insert index from file
r.seek(offset); // Goto back of file (now insert index)
r.write(content); // Write new content
long newOffset = r.getFilePointer(); // The current offset
targetChannel.position(0L); // Goto start of temporary file
sourceChannel.transferFrom(targetChannel, newOffset, (fileSize - offset)); // Copy all content of temporary
// to end of file
}
Files.delete(temp.toPath()); // Delete the temporary file as not needed anymore
}

Categories