Reading huge ascii text file quickly in Java. Need help using MappedByteBuffer

Reading huge ascii text file quickly in Java. Need help using MappedByteBuffer - java

I have a text file with thousands of lines of data like the following:
38.48,88.25
48.20,98.11
100.24,181.39
83.01,97.33
... and the list keeps going (thousands of lines just like that).
I figured out how to separate this data into usable tokens using FileReader and Scanner but this method is far too slow.
I created the following delimeter:
src.useDelimiter(",|\n");
and then used the scanner class nextDouble() to get each piece of data.
I have done a lot of research and it looks like the solution is to use a MappedByteBuffer to place the data into memory and access it there. The problem is I don't know how to use MappedByteBuffer to separate this data into usable tokens.
I found this site: http://javarevisited.blogspot.com/2012/01/memorymapped-file-and-io-in-java.html - which helps me to map the file into memory and it explains how to read the file but it looks like the data is returned as a byte or perhaps in binary form? The file I am trying to access is ascii and I need to be able to read the data as ascii as well. Can anyone explain how to do that? Is there a way to scan a file mapped into memory in the same way that I have done using scanner with the previous FileReader method? Or is there another method that would be faster? My current method takes nearly 800x the amount of time that it should take.
I know some may say I am trying to reinvent the wheel but this is for academic purposes and thus, I am not allowed to use external libraries.
Thank you!

To get the data loaded into memory you can use the Scanner in the same way you did earlier, then store each row on a list like the following.
List<Pair> data = new ArrayList<Pair>();
Where Pair is defined as
class Pair {
private final double first;
private final double second;
public Pair(double first, double second) {
this.first = first;
this.second = second;
}
....
}

MappedByteBuffer is a subclass of ByteBuffer on which you can call asCharBuffer. That returns a CharBuffer which implements Readable, which can then be supplied to Scanner.
That way you can use Scanner on the file via MappedByteBuffer. Whether that makes it perform any faster I don't know.

Related

Reading large files for a simulation (Java crashes with out of heap space)

For a school assignment, I need to create a Simulation for memory accesses. First I need to read 1 or more trace files. Each contains memory addresses for each access. Example:
0 F001CBAD
2 EEECA89F
0 EBC17910
...
Where the first integer indicates a read/write etc. then the hex memory address follows. With this data, I am supposed to run a simulation. So the idea I had was parse these data into an ArrayList<Trace> (for now I am using Java) with trace being a simple class containing the memory address and the access type (just a String and an integer). After which I plan to loop through these array lists to process them.
The problem is even at parsing, it running out of heap space. Each trace file is ~200MB. I have up to 8. Meaning minimum of ~1.6 GB of data I am trying to "cache"? What baffles me is I am only parsing 1 file and java is using 2GB according to my task manager ...
What is a better way of doing this?
A code snippet can be found at Code Review

The answer I gave on codereview is the same one you should use here .....
But, because duplication appears to be OK, I'll duplicate the answer here.
The issue is almost certainly in the structure of your Trace class, and it's memory efficiency. You should ensure that the instrType and hexAddress are stored as memory efficient structures. The instrType appears to be an int, which is good, but just make sure that it is declared as an int in the Trace class.
The more likely problem is the size of the hexAddress String. You may not realise it but Strings are notorious for 'leaking' memory. In this case, you have a line and you think you are just getting the hexString from it... but in reality, the hexString contains the entire line.... yeah, really. For example, look at the following code:
public class SToken {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("99 bottles of beer");
int instrType = Integer.parseInt(tokenizer.nextToken());
String hexAddr = tokenizer.nextToken();
System.out.println(instrType + hexAddr);
}
}
Now, set a break-point in (I use eclipse) your IDE, and then run it, and you will see that hexAddr contains a char[] array for the entire line, and it has an offset of 3 and a count of 7.
Because of the way that String substring and other constructs work, they can consume huge amounts of memory for short strings... (in theory that memory is shared with other strings though). As a consequence, you are essentially storing the entire file in memory!!!!
At a minimum, you should change your code to:
hexAddr = new String(tokenizer.nextToken().toCharArray());
But even better would be:
long hexAddr = parseHexAddress(tokenizer.nextToken());

Like rolfl I answered your question in the code review. The biggest issue, to me, is the reading everything into memory first and then processing. You need to read a fixed amount, process that, and repeat until finished.

Try use class java.nio.ByteBuffer instead of java.util.ArrayList<Trace>. It should also reduce the memory usage.
class TraceList {
private ByteBuffer buffer;
public TraceList(){
//allocate byte buffer
}
public void put(byte operationType, int addres) {
//put data to byte buffer
}
public Trace get(int index) {
//get data from byte buffer by index
byte type = ...//read type
int addres = ...//read addres
return new Trace(type, addres)
}
}

How to avoid frequently file write in Java

I have the problem:
in a loop, each time I need to write a large string into one file(or temporary file), then process take the file as an argument for the next step.
Something along:
for(int i=0;i<n;i++){
File f = File.createTmpFile("xxx","xxx");
// write into f etc.
String result = func(f);
}
Since I think each time creating a File and writing string into it seem to be much costly, so is there any alternative methods?

If these Strings do not need to be immediately persisted to a File, you could store them in memory, some sort of Collection, e.g. an ArrayList. And when the list gets "large", say, every tenth time, write all ten at once to a file. This cuts file creation by 10X.
The danger is that if there is a crash you may lose up to 9 values.

Taking an ArrayList and putting in into a text file

Having some issues with my program I am trying to write. Basically what I am needing to do is taking an ArrayList that I have and export the information to a text file. I have tried several different solutions from Google, but none have given me any success.
Basically I have two ArrayList, two lists, one of graduates and one of undergraduates. I simply need to take the information that is associated with these ArrayList and put them into one text file.
I'll later need to do the opposite (import) the .txt file into ArrayList, but I can figure that out later.
Any suggestions?

If you need to write the data in a specific format, you could use a PrintWriter to write the data to a file in whatever manner you wish. The problem with this is that you will then have to figure out a way in which you will then re-read the text file and populate the data.
On the other hand, you could use XStream(tutorial here) to write your files as XML. This will provide you with a human readable text file (as above) however, it will be much easier to re-read the text file when populating the data.
Lastly, you could use the ObjectOutputStream to write the data and the ObjectInputStream to re-read it back. Note however, that this method does not yield a human readable text file. Also, your classes will need to implement the Serializable interface.

Here's a solution using Apache commons-io library:
//Put all data into one big list, prepended with size of first list
List<String> allData = new ArrayList<String>(1+grads.size()+undergrads.size());
allData.add(String.valueOf(grads.size());
allData.addAll(grads);
allData.addAll(undergrads);
FileUtils.writeLines(new File("list.txt"), allData);
To read the data back:
List<String> allData = FileUtils.readLines(new File("list.txt"));
int gradsSize = Integer.parseInt(allData.get(0));
List<String> grads = allData.subList(1, gradsSize+1);
List<String> undergrads = allData.subList(1+gradsSize, allData.size());

How to initialize huge float arrays in java, android?

I was creating a opengl android application. I was trying to render a opengl object with vertices more than 50,000.
float itemVerts [] = {
// f 231/242/231 132/142/132 131/141/131
0.172233487787643f, -0.0717437751698985f, 0.228589675538813f,
0.176742968653347f, -0.0680393472738536f, 0.2284149434494f,
0.167979223684599f, -0.0670168837233226f, 0.24286384937854f,
// f 131/141/131 230/240/230 231/242/231
0.167979223684599f, -0.0670168837233226f, 0.24286384937854f,
0.166391290343292f, -0.0686544011752973f, 0.241920432968569f,......
and many more.... But when i do this in a function or constructor i get a error while compiling that The code of method () is exceeding the 65535 bytes limit. So I was wondering if there is a different way to do this.
I tried storing the value in file and reading it back. But the IO operation, with string parsing of such huge record is very slow. Takes more than 60 sec. Which is not good.
Please let me know if there is any other way to do this. Thank you for your time and help.

But when i do this in a function or constructor i get a error while
compiling that The code of method () is exceeding the 65535 bytes
limit. So I was wondering if there is a different way to do this.
Put it outside the constructor (as a class variable or field)? If this doesn't change, just make it a constant. If it does change, make it a constant anyway and copy it in the constructor.
I tried storing the value in file and reading it back. But the IO
operation, with string parsing of such huge record is very slow. Takes
more than 60 sec. Which is not good.
If you do decide to keep it in an external file and read it in, don't read it as a string, just serialize it somehow (Java serialization, Protocol Buffers, etc.).

The program don't have to parse the float if we preprocess the data.
Write another program that write all float to a binary file using DataOutputStream.
In your program, read them back using DataInputStream. You might want to chain it with BufferedInputStream.

For this cases I normally use the assets folder to store files in binary format (even you can define some kind of file format to include the vertex, normals etc...) and allocate it on application initialization as wannik explains.

I would proprocess and store floats in binary form, then mmap it as byte buffer and create fload array out of it. This way you get float array, without parsing or allocation of space.

Read and a write a file in a reverse order - Java

I have a very big file (might be even 1G) that I want to create a new file from in a reversed order (in Java).
For example:
Original file:
This is the first line
This is the 2nd line
This is the 3rd line
The reversed file:
This is the 3rd line
This is the 2nd line
This is the first line
Since the file is very big, loading the entire file to memory at once and reversing the order there might be problematic (there is a limit to the memory I can use).
How can I achieve this in Java?
Thanks

Nothing very direct, I'm afraid. But you can easily create some (say) ReverseBufferedRead class wrapping a RandomAccessFile.
See also here.

Read the file by chunks of few hundreds lines, reverse the order of lines in the chunks and write them to temporary files. Then join the temporary files in the reverse order and clean up.
In other words, use disk instead of memory.

I would propose making a RandomAccessFile for the output and using setLength() to make it appropriately sized.
Then start scanning the original file and write it out in chunks starting at the end of the RandomAccessFile in reverse.
Java-ish Pseudo:
out.seek(size_of_out_file); //seek to end
RandomAccessFile out = new RandomAccessFile("out_fname", "rw");
out.setLength(size_of_file_to_be_reversed)
File in = new File ("in_fname");
while (hasMoreData(in)){
String chunk = in.readsize();
out.seekBackwardsBy(chunk.length());
out.write(chunk.reverse);
out.seekBackwardsBy(chunk.length());
}

Reading a file line-by-line in reverse order is fundamentally tricky.
It's not too bad if you've got a fixed width encoding. It's feasible if you've got a variable width encoding which you can detect the first byte of etc (e.g. UTF-8). It's virtually impossible to do efficiently if the encoding is variable width with no sensible way of determining boundaries (or if it uses "shifting" for example).
I have an implementation in C# in another question, but it would take a fair amount of effort to port that to Java.

If you use the RandomAccessFile like leonbloy suggested you can use a FileChannel
to skip to the end of the file, you can then read the line and write it to another file.
There is a simple example here in the Java tutorials: example

I would assume you know how to read a file. One way i would advise you do it is with an ArrayList of generic type string. So you read each line of the file and store it in that list. After reading you print the list out or do whatever you want to.
Just wrote something that might be of help here : http://pastebin.com/iWTVrAvm

Read using RandomAccessFile - position the file using randomAccesFile.length()and write using BufferedWriter

A better solution is use a ReversedLinesFileReader provided in Apache Commons IO package. Look at the API here https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReversedLinesFileReader.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading huge ascii text file quickly in Java. Need help using MappedByteBuffer - java

MappedByteBuffer is a subclass of ByteBuffer on which you can call asCharBuffer. That returns a CharBuffer which implements Readable, which can then be supplied to Scanner. That way you can use Scanner on the file via MappedByteBuffer. Whether that makes it perform any faster I don't know.

Related

Reading large files for a simulation (Java crashes with out of heap space)

How to avoid frequently file write in Java

Taking an ArrayList and putting in into a text file

How to initialize huge float arrays in java, android?

Read and a write a file in a reverse order - Java

Categories

Resources