How to fit large table in memory? [closed]

How to fit large table in memory? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a Java Map<String, List<String>>.
Is there a way to improve this to make it use less memory without having too much impact on performance?

Three ideas:
An encoded byte array could provide a less memory-intensive representation than a string, especially if the string data actually uses an 8 bit (or less) character set.
A list of strings could be represented as a single string with a distinguished string separator character between the list components.
String data is often compressible.
Depending on the nature of your data, these could easily give a 2 fold reduction in space for the lists.
The downside is that you may need to fully or partially reconstruct the original List<String> objects, which would be a performance hit.
You should also consider using a non-memory resident representation; e.g. a conventional database, a NOSQL database or an "object cache" framework. JVMs with really large heaps tend to lead to performance problems if you need to do a "full" garbage collection, or if there is competition for physical memory with other applications.

One would really need to know a lot more on your specific application to definitely recommend a specific solution, but as a wild guess, if it is a really, really large table (e.g hundreds of thousands or millions of records), I would suggest you consider using a database to store data and access via one of data layer access abstractions, such as DataSet.
Databases are already optimized to efficiently store, search and access data over an amortized data and time range, so without further info on your application, I would go with this option.

Related

Common data structures used in java Garbage Collection techniques [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have come across the following question multiple times:
What data structures are used in garbage collection?
I haven't found many resources about the data structures used in GC algorithms.
Edit: I understand that the question seems too broad since there are
different kinds of garbage collection techniques. We could go with the
commonly used garbage collection algorithms, like the ones found in
most popular JVMs.

Your question is rather like asking "how does an operating system work?" There are many different algorithms for GC and they use different internal data structures depending on how the algorithm works.
Many algorithms use a root set as a starting point. This is a list of all the objects directly accessible from your application threads. It is created by scanning the thread stacks, registers, static variables, etc. The GC will typically process the root set to follow links to other objects (that are therefore accessible) and build a graph of all accessible objects.
There are other data structures like card tables but these are not used in all algorithms.
You might want to pick a particular GC algorithm and study that.

Disadvantage using Map in terms of Memory Utilization in Java [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Are there any disadvantages (in terms of memory) for using Maps in Java?
Because as per my knowledge if you are using HashMaps or any other collections Classes .
For eg.
Map<String , String> map = new HashMap<String, String>();
map.put("id",1);
map.put("name","test123");
So I used 2 bites for each one of those let's assume.
And according to me Map or any other collection hold 100 bites so remaining 98 bites are wasted.
So, for that scenario, can I use anything else?

For a description of initial capacity and load factor, see What is the significance of load factor in HashMap?
If you use arrays, you will probably use less memory than when you use a map. But in most cases, the ease of use and readabilty is far more important than memory usage.
See also Hash Map Memory Overhead for a description of HashMaps memory usage.

First of all, yes, creating a map has some memory overhead for very small amounts of data. It creates arrays with Entry wrapper classes for the given load capacity/load factor. So, you might be wasting a few bytes, but in the age of gigabyte-sized memory, that would only become an issue when you would be creating millions or even billions of maps, depending on how much memory you actually give your application and what other things it has to manage.
If I know that a collecting will remain really small and I'm not using the keys anyway, I sometimes just use a list instead, because checking 2 or 4 elements is quite fast anyway. In the end, do not even bother to worry about such minor things until they are taking up a major slice of available memory.

Memory efficient collections in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Our application is a gigaspace based solution, which basically reads from multiple flat files and stores the data in a object. Now flat files basically contain some shipment details. So we have multiple files
Dockyard Details
Container details
Shipment Details
etc.
Now we have Dockyard as a parent object under which there can many objects of shipment details. We currently use an ArrayList to maintain shipment details for almost 50k dockyard detail objects. Current volume of data suggests that for each Dockyard object we will have to maintain around 1500 shipment detail object and there will almost 50k dockyard object lying in heap. Our current heap is 8GB.
So wanted to know if ArrayList is the best way to keep so many object. I have looked for other APIs as well like trove, HPPC but they mostly offer benefits when it comes to primitive collection. Ours will be a collection of Objects. So other than increasing heap size. can someone suggest any other better alternatives.

You don't need to keep all you objects on the heap. With Chronicle Map for example, you can keep all the objects off heap and since they are memory mapped files, they don't even have to be in memory. You might find you can reduce your heap size if the bulk of your data is off heap.
there will almost 50k dockyard object lying in heap.
This is not a lot of objects. Even if each object uses 1 KB, then you are only using a 50 MB. If you object are much bigger than this, it highly likely you should look at ways to reduce the size of the individual objects.
When we use primitive based collections it is mostly to avoid the object header for each element. This saves 8 - 16 bytes per entry or up to 800 KB in you case.
However if you objects are 1 KB to 100 KB as you suggest, you might be able to halve the size they use in memory by restructuring them or using different data types.
BTW a 1 GB is worth about an hour of your time. I would explore doubling the memory size before spending too much time on this.

Complete word-database for Java-App to check if a word is actually a legit word, is SQL appropriate in this case? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to write a game in which I have often have to check if a string of letters is actually a word or not. My question is about how to do this the fastest with the least computation-able power as possible (for instance an old smart-phone). With if possible not much start-up time to make it a quick and responsive app.
I once did this look-up by first reading in a word-file with almost all words into an appropriate sized hash-map of around 650,000 words*. (* might be more, I am not sure if this is the exhausted list yet).
Would a SQL database be appropriate here? I am thinking of buying a book about it so I can learn and implement one. Also I have no idea how you could create a hash-map, save it for later and then load one. Is that too much of a hacker solution or is that technique used more often? So would it make sense for me to learn SQL or do it with saving a hashmap and then later restoring it.

A database SQL could be appropriate if you plan to query it every time you need to check a word, but this is not the fastest solution; querying every single word slows down the response time but it should use less memory if the words number is high (you must measure the memory consumed by the db vs the memory consumed by the map). Checking if a word is inside a map is not so computationally expensive, it must calculate the hash and iterate over the array of items with the same hash.
Personally I would choose a map if the memory requirements of keeping all the words in memory can be satisfied. You can store the dictionary as plain text file (one line -> one word) and read it in a background thread when the application starts.

If memory is an issue, this seems like a good use for a B-Tree. This allows for O(log n) search time while searching a large amount of records with minimal memory usage. For this sort of application it sounds like loading the entire thing into memory is not going to be a good idea.

Best data structure for large graph in cpu/memory bound environment [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm working on a academic project: writing a library for finding the shortest path on large, weighted, directed graphs.
Specifications are:
The example data set is a graph of 1500 vertices with an average of 5.68 edges per node. Specification may vary up to 20.000 nodes.
Moreover I'm working in a cpu / memory bound, environment: Android.
Edge weight is not trivial, nor costant. It depends on variable states of the graph.
We must work offline.
I face several difficulties:
I need an efficient way to store, retrive and update data of the graph. Should I use a SQLite object with queries from the Java classes, a large custom java object on the heap, or what? I think this is the most performance-critical aspect.
I need an efficient way to implement some kind of short path algorithm. Since all the weight are positive, should I apply the Dijikstra algorithm with an ArrayList as the container of the visited nodes?
Is this a good case to use the NDK? The task is CPU intensive, but it also make frequent access to the memory, so I don't think so, but I'm open to contribution.
Always remember that resources are scarce, ram is insufficient, disk is slow, cpu is precious (battery - wise).
Any advice is wellcome, cheers :)

For these many nodes I would suggest to aquire some Cloud-computing service and let the android app communicate with it.
How about Hadoop's MapReduce on Amazon's Cloud, there are many graph frameworks such as Mahout and it is really fast. And at least very scalable if there would be more nodes and edges.

linked list is best data structure for storing big sparse graphs.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to fit large table in memory? [closed] - java

Related

Common data structures used in java Garbage Collection techniques [closed]

Disadvantage using Map in terms of Memory Utilization in Java [closed]

Memory efficient collections in java [closed]

Complete word-database for Java-App to check if a word is actually a legit word, is SQL appropriate in this case? [closed]

Best data structure for large graph in cpu/memory bound environment [closed]

Categories

Resources