Read/write file contents into/out of ArrayList<Byte> [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I need to read a very large file (1.11gb) into memory and process it in bytes. The only way for me to do this is to use an ArrayList (I can't use a byte[] because then it will exceed the limit).
There is no way to make the file smaller (I'm using it as a test to test how long my program processes data).
I then need to drop an ArrayList back onto the hard drive as a file (still 1.11gb)
I'm not as worried about writing as I am reading.
Also speed is of the essence so sub segmenting is to be avoided unless anyone out there has a quick way of doing so.

You are trying to solve this problem the wrong way (and it won't work1).
The possible ways to solve this are:
Redesign the algorithm so that it doesn't need to read the entire file into memory ... in one go.
Read the data into multiple byte[] objects to get around the 2^31 array size limit.
Map the file using multiple ByteBuffer objects2; see Java MemoryMapping big files.
1 - It won't work because ArrayList has an Object[] inside, and is therefore subject to the same limitation you have with byte arrays. In addition, an ArrayList<Byte> will take 4 to 8 times as much memory as a byte[] representing the same number of bytes. Or more, if you populate the ArrayList<Byte> with Byte objects instantiated the wrong way.
2 - The Buffer APIs all use int sizes and offsets, and (AFAIK) do not support mapping of files >= 2^31 bytes into a single Buffer.

Related

When to use which Data Structure? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am studying Data Structures in a Fundamentals of Software Development course. I have encountered the following data structures:
Structs
Arrays
Lists
Queues
Hash Tables
... among others. I have a good understanding of how they work, but I'm struggling to understand when and where to use them.
I can identify the use of the Queue Data structure, as this would be helpful in printer and/or thread queuing and prioritizing.
Knowing the strengths and weaknesses of a data structure and implementing it in code are different things, and I am finding the former difficult.
What is a simple example of the use of each of the data structures listed above?
For example:
Queue: first-in, first-out → used for printer queue to queue docs
I had trouble understanding them when i first started programming and so i decided to give a heads up to start with.
I am trying to be as simple as possible. Try Oracle Docs fro further details
Struct: When ever you need Object like structure, where you can group related data, use structs. Structs are very rarely used in java though(as objects are created in their place)
Arrays: Arrays are contiguous memory. when ever you want fixed time access based on index, unlike linkedlist, arrays are very fast and so use them.
But the backlog with arrays is that you need to know the size at the time of initialization. Also arrays does not support higher level methods such as add(), remove(), clear(),contains(), indexOf() etc.
List: is an interface which can be implemented using Arrays(ArrayList)
or LinkedLists (LinkedList). They support all the higher level methods specified earlier.
Also Lists re-sizes themselves whenever it is getting out of space. You can specify the initial size which the underlying Arrays or LinkedLists will be created, but whenever the limit is reached, it created the underlying structure with a bigger size and then copies the contents of the initial one.
Queue or Stack: is an implementation technique and not really a data structure. If you want FIFO implementation, you implement Queue on either Arrays or LinkedList(yes, you can implement this technique on both these data structures)
https://en.wikibooks.org/wiki/Data_Structures/Stacks_and_Queues
HashMap: Hashmap is used whenever you want to store key value pairs. if you notice, you cannot use arrays or linked lists or any other mentioned data structure for this purpose. a key can be any thing from String to Object(but note that it has to be an object and cannot be a primitive) and a value can also be any object
google out each data structure for more details
It depends on what you need. If you read and learn more about these data structures you will find convenient ways for their implementation.
Maybe read this book? http://www.amazon.com/Data-Structures-Abstraction-Design-Using/dp/0470128704
All these data structures are used based on their needs in the program. Try to find advantages of one data structure to the other. That should get things more clear for you.What I say wouldn't be much clear, but i'll give it a shot
Like for example,
Structs are used to create a data type, say you want to have a data type for Book & have the names of the bookBook Structure.
Lists are easier to access both ways if you use linked lists & are better than array some times. Queues, well, you can imagine them as real life queues, First In will be First Out. So you can use them when you need to set this priority.
Like I said, looking for advantages of one over the other should get things clear for you.

Java serialization alternative with better performance [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Suppose I use the standard Java object serialization to write/read small (< 1K) Java objects to/from memory buffer. The most critical part is the deserialization, i.e. reading Java objects from memory buffer (byte array).
Is there any faster alternative to the standard Java serialization for this case ?
You might also want to have a look at FST.
also provides tools for offheap reading/writing
have a look at kryo.
its much much faster than the built-in serialization mechanism (that writes out a lot of strings and relies heavily on reflection), but a bit harder to use.
edit: R.Moeller below suggested FST, which i've never heard of until now but looks to be both faster than kryo and compatible with java built-in serialization (which should make it even easier to use), so i'd look at that 1st
Try Google protobuf or Thrift.
The standard serialization adds a lot of type information which is then verified when the object is deserialized. When you know the type of the object you are deserializing, this is usually not necessary.
What you could do, is create your own serialization method for each class, which just writes all the values of the object to a byte buffer, and a constructor (or factory method, when you swing that way) which takes such a byte buffer and reads all the variables from it.
But just like AlexR I wonder if you really need that. Serialization is usually only needed when the data leaves the program (like getting stored on disk or sent over the network to another program).
Java's standard serialisation is known to be slow, and to use a huge ammount of bytes on disk. It is very simple to do your own custom serialisation.
javas std serialisation is nice for demo project but for above reasons not well suited for professional projects. Further versioning is not well under your controll.
java provides all you need for custom serialisation, see demo code in my post at
Java partial (de)serialization of objects
With that approach you even can specify the binary file format, such that in in C or C# it could be read in, too.
Another advantage custom setialized objects need less space than in main memory (a boolean needs 4 byte in main memm but only 1 byte when custom serialized (as byte)
If differnet project partners have to read your serialied data, Googles Protobuf is an alternative to look at.

Update all occurence of String in File using Java [duplicate]

This question already has answers here:
Replace string in file
(2 answers)
Closed 8 years ago.
I have 1 file, which contains some String that need to be updated.
MY REPORT
REPORT RUN DATE : 27/08/2012 12:35:11 PAGE 1 of #TOTAL#
SUCCESSFUL AND UNSUCCESSFUL DAILY TRANSACTIONS REPORT
---record of data here----
MY REPORT
REPORT RUN DATE : 27/08/2012 12:35:11 PAGE 2 of #TOTAL#
SUCCESSFUL AND UNSUCCESSFUL DAILY TRANSACTIONS REPORT
---record of data here----
In case I just want to update all occurence of #TOTAL# to some number, is there a quick and effecient way to do this?
I understand that I can also use BufferedReader+BufferedWriter to print to another file and use String.replace it along the way, but I wonder if there is a better and elegant way to solve this...
The file wont exceed 10MB, so there is no need to concern whether the file can be to big ( exceed 1 GB etc )
If you don't care about the file being too large, and think calling replace() on every line is inelegant, I guess you can just read the entire file into a single String, call replace() once, then write it to the file.
... I wonder if there is a better and elegant way to solve this
It depends on what you mean by "better and elegant", but IMO the answer is no.
The file wont exceed 10MB, so there is no need to concern whether the file can be to big ( exceed 1 GB etc )
You are unlikely to exceed 1Gb. However:
You probably cannot be 100% sure that the file won't be bigger that 10Mb. For any program that has a significant life-time, you can rarely know that the requirements and usage patterns won't change over time.
In fact, a 10Mb text file may occupy up to 60Mb of memory if you load the entire lot into a StringBuilder. Firstly, the bytes are inflated into characters. Secondly, the algorithm used by StringBuilder to manage its backing array involves allocating a new array of double the size the original one. So peak memory usage could be up to 6 times the number of bytes in the file you are reading.
Note that 60Mb is greater than the default maximum heap size for some JVMs on some platforms.

java: how to search a string in a big file? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
exception while Read very large file > 300 MB
Now, i want to search a string from a big file(>=300M). Because the file is big so i can't load it into memory.
What kind of ways can be provided to handle this problem?
Thanks
There are a few options:
Depending on your target OS, you might be able to hand off this task to a system utility such as grep (which is already optimized for this sort of work) and simply parse the output.
Even if the file were small enough to be contained in memory, you'd have to read it from disk either way. So, you can simply read it in, one line at a time, and compare your string to the contents as they are read. If your app only needs to find the first occurrence of a string in a target file, this has the benefit that, if the target string appears early in the file, you save having to read the entire file just to find something that's in the first half of the file.
Unless you have an upper limit on your app's memory usage (i.e. it must absolutely fit within 128 MB of RAM, etc.) then you can also increase the amount of RAM that the JVM will take up when you launch your app. But, because of the inefficiency of this (in terms of time, and disk I/O, as pointed out in #2), this is unlikely to be the course that you'll want to take, regardless of file size.
I would memory map the file. This doesn't use much heap (< 1 KB), regardless of the file size (up to 2 GB) and takes about 10 ms on most systems.
FileChannel ch = new FileInputStream(fileName).getChannel();
MappedByteBuffer mbb = ch.map(ch.MapMode.READ_ONLY, 0L, ch.size());
This works provided you have a minimum of 4 KB free (and your file is less than 2 GB long)

How to determing java object size in memory efficiently? [duplicate]

This question already has answers here:
How to determine the size of an object in Java
(28 answers)
Closed 9 years ago.
I am using a lru cache which has limit on memory usage size. The lru cache include two data structures: a hashmap and a linked list. The hash map holds the cache objects and the linked list keeps records of cache object access order. In order to determine java object memory usage, I use an open source tool -- Classmexer agent
which is a simple Java instrumentation agent at http://www.javamex.com/classmexer/.
I try another tool named SizeOf at http://sizeof.sourceforge.net/.
The problem is the performance very expensive. The time cost for one operation for measuring object size is around 1.3 sec which is very very slow. This can make the benefits of caching to be zero.
Is there any other solution to measuring a java object memory size when it is alive?
Thanks.
Getting an accurate, to the byte value will be expensive as it needs to use reflection to get a nested size, but it can be done.
In Java, what is the best way to determine the size of an object?
http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/Instrumentation.html
Look at this: How to calculate the memory usage of a Java array Maybe it helps you in the analysis.
Since you already have a tool that does it...the reality is that there's no fast way to do this in Java.

Categories