DLFileEntry size in the memory - java

Can someone tell me what is the size of a DLFileEntry record in the memory? I'm holding a List of DLFileEntries and I want to be just sure that my portlet won't have a memory issue after deploying it on a server, operating with a large number of records. Or can someone give me a guide how to obtain this information? Thank you.

You could run a quick test or look at the source and identify members. You're probably referring to the binary data, which is not directly stored in an object of this class. However, it will most likely be cached, so yes, there's some memory overhead. Do you actually need the binary data or will you just hold placeholders without accessing the binary data for all documents that you're holding in memory?
(Note: The source code I'm linking is the current master branch - check the version that you're actually using and figure out if something changed. As you don't give the version, I'll leave this task for yourself. Also, you might want to check the superclasses (I didn't find anything suspicious in master)

Related

How to handle large XM File Java Around 5 GB

My application needs to use data in a XML file which is up to 5 GB in size. I Load data in Image Classed from the XML. The Image class has many attributes, Like Path, Name, MD5, Hash, and many other information like that.
The 5 GB file has around 50 Million of Image data in it, When i parse the xml the data is loaded inside the app and same amount of image classes is created inside the app, and i perform different operation and calculation on it.
My Problem is when i parse such a hugh file my memory get eat up. I guess all the data is loading inside the ram. Due to complexity of the code, I'm unable to provide the whole code. I there an efficient way to handle such a hugh number of classes. I have done research all night, but didn't get success, Can some one point me in right direction?
Thanks
You need some sort of pipeline to pass the data on to its actual destination without ever store it all in memory at once
I don't know how your code doing the parsing but you you don't need to store all data in the memory.
Here is a very good answer for implementation for reading large XML files
If you're using SAX, but you are eating up memory, then you are doing something wrong, and there is no way we can tell you what you are doing wrong without seeing your code.
I suggest using JVisualVM to get a heap dump and see what objects are using up the memory, and then investigating the part of your application that creates those objects.

Java HashMap memory, to reduce it

I'm using a HashMap in java and I have noted that it is consume too much memory, but I have the necessity to search the elements to quickly possible.
Is there something in order to reduce the memory of the hashMap if I know before how much element I will put inside?
Because I know how much information I will to store it, but I don't know it.
Because my problem is to read a file and in this file there are some information dived in two set and I have to connect these information in the same struct.
I know that the HashMap in order to work well wastes more than 25% of the memory that it has get.
Thank you for your help.
use:
new HashMap<>(capacity);

Creatinge a very, very, large Map in Java

Using Java I would like to create a Map that can grow and grow and potentially be larger than the size of the memory available. Now obviously using a standard POJO HashMap we're going to run out of memory and the JVM will crash. So I was thinking along the lines of a Map that if it becomes aware of memory running low, it can write the current contents to disk.
Has anyone implemented anything like this or knows of any existing solutions out there?
What I'm trying to do is read a very large ASCII file (say 50Gb) a line at a time. Each line contains a key and a value. Keys can be duplicated in the file. I'll then store each line in a Map, which is Keys to a List of values. This Map is the object that will just grow and grow.
Any advice greatly appreciated.
Phil
Update:
Thanks for all the comments and advice everyone. With the problem that I described, a Database is the correct, scalable, solution. I should have stated that this is a temporary Map that needs to be created and used for a short period of time to aid in the parsing of a file. In this case, Michael's suggestion to "store only the line number instead of the actual value " is the most appropriate. Marking Michael's answer(s) as the recommended solution.
I think you are looking for a database.
A NoSQL database will be probably easy to setup and it is more akin a map.
Check BerkeleyDB Java edition, now from Oracle.
It has a map like interface, can be embeddable so no complex setup is needed
Sounds like dumping your huge file into DB.
Well, I had a same situation like this. But, In my case everything was in TXT file format and the throughout the file has the same formatted lines. So, what I did is I just splitted the files into several pieces (possibly, which my JVM can able to process maximum size). Then I called files one by one, to get processed.
Another way, you can directly load your data into database directly.
Seriously, choose a simple database as advised. It's not overhead — you don't have to use JPA or whatnot, just plain JDBC with native SQL. Derby or HSQL, for example, can run in embedded mode, no need to define users, access rights, start the server separately.
The "overhead" will stab you in the back when you've plodden far into the hash map solution and it turns out that you need yet another optimization to avoid the OutOfMemoryException, or the file is not 50 GB, but 75... Really, don't go there.
If you're just wanting to build up the map for data processing (rather than random access in response to requests), then MapReduce may be what you want, with no need to work with a database.
Edit: Note that although many MapReduce introductions focus on the ability to run many nodes, you should still get benefit from sidestepping the requirement to hold all the data in memory on one machine.
How much memory do you have? Unless you have enough memory to keep most of the data in memory its going to be so slow, it may as well have failed. A program which is heavily paging can be 1000x slower or more. Some PC have 16-24 GB and you might consider getting more memory.
Lets assume there is enough duplicates, you can keep most of the data in memory. I suggest you use a byte based String class of your own making, since you have ASCII data and your store your values as another of these "String" types (with a separator) You may find you can keep the working data set in memory.
I use BerkleyDB for this, though it is more complicated than a Map (though they have a Map wrapper which I don't really recommend for anything but simple applications)
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
It is also available in Maven http://www.oracle.com/technetwork/database/berkeleydb/downloads/maven-087630.html
<dependencies>
<dependency>
<groupId>com.sleepycat</groupId>
<artifactId>je</artifactId>
<version>3.3.75</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>oracleReleases</id>
<name>Oracle Released Java Packages</name>
<url>http://download.oracle.com/maven</url>
<layout>default</layout>
</repository>
</repositories>
It also has one other disadvantage of vendor lock-in (i.e. you are forced to use this tool. though there may be other Map wrappers to some other databases)
So just choose according to your needs.
Most cache-APIs work like maps and support overflow to disk. Ehcache for example supports that. Or follow this tutorial for guave.

Java : which of these two methods is more efficient?

I have a Huge data file and I only need specific data from this file, and later on, I will be using these data frequently.
So which of these two methods would be more efficient :
save this data in global variables (maybe LinkedList) and use them every time I need
save them in a file, and read the file every time I need the data
I should mention that these data could be a huge amount of integers.
Which of the mentioned two ways would give better performance with respect to speed and memory ?
If the file I/O overhead is not an issue for you: Save them in a file and create an index file mapping keys to file positions so you do not have to read your huge file.
If the data fits in your RAM and you want to be able to access it quickly - go by the first approach (but maybe without an index file) but read the data into memory at startup or when needed the first time.
As long as it fits in memory, working with memory is surely some orders of magnitude faster. But do not use LinkedList - it has a huge overhead. And do not use any standard Collection at all since it means boxing and blows the memory overhead by a factor 3 at least.
You could use int[] or a specialized collection for primitive types.
I'd recommend using a file via java.nio.IntBuffer. This way the data reside primarily on the disk but get mapped into memory too.
Probably the first one.
But there really isn't enough information there to answer you properly.
Firstly a linked list is fine if you only ever traverse it in order. However, if you need random access to it (5th element, then 100th, then 12th, then 45th...), it's lousy, and you'd be better with an ArrayList or something. Secondly, if you're storing lots of ints, if you use one of the standard Java collections, each int will be boxed, which may present a performance overhead.
Then you haven't said what 'huge' means. Thousands? Millions?
So, yeah, you need to say what kind of numbers you're dealing with, and what the access patterns are likely to be. And is the 'filtering' step a one-off--or is it done quite frequently?
It depends on system spec, if you are designing your app for one machine - the task is simple, elsewhere you should take into account memory and/or disk space limit on client's computer.
I think you cannot compare these two attitudes performance, as each one has it's own benefits and drawbacks. I'm certain that there are some algorithms available that you could further investigate, connected with reading part of a file into the memory, or creating a cache (when you read a number from a file, store it in memory, so next time you load it - it will be stored in memory).

Are all .class files in my Java application loaded into memory after application start?

I am making an app for Android, in my Activity I need to load an array of about 10000 strings. Loading it from database was slow, so I decided to put it directly into one .java file (as a private field). I have about 20 of these classes containing string arrays and my question is, are all the classes loaded into memory after my application is started? If so the Activity in which I need these strings would be loaded quickly, but the application as a whole would have a slow start...
Is there other way, how to very quickly load an 10000 string array from a file?
UPDATE:
Why I need these strings? My Android app allows you to find "journeys" in Prague's public transit - you choose departure stop, arrival stop and it finds your journey (have a look here). My app has a suggestions feature - you enter leter "c" as your departure stop and a suggestions ListView appears with stops starting with "c". For these suggestions I need the strings. Fetching the suggestions from database is slow (about 400ms on G1).
First, 400ms to perform a simple database query is really slow. So slow that I'd suspect that there is some problem in your database schema (e.g. indices) or your database connection configuration.
But if you a serious about not using a database, there are a couple of alternatives to what you are currently doing:
Arrange that the classes containing the arrays are lazily loaded as required, using Class.forName(...). If you implement it right, it should be possible for the garbage collector to reclaim the classes after they have been loaded and the strings have been added to your primary data structure.
Turn the 10000 Strings into a flat file, put the file into your app's JAR file. Then use Class.getResourceAsStream(...) to open the file and read it into the in-memory array.
As above, but using an indexed file and replacing the array with a data structure that allows you to read Strings from the file lazily. (This will be a bit complicated, but if you are worried by the memory consumed by the 10000 Strings, this will help address that.)
A class is loaded only when it is first referenced.
Though you need an array of 10000, you may not need all at once. Here is where the concept of paging comes in. This link indicates that Paging is often done in Android.Initialy have only a small amount of array in memory, and as you need it, keep loading it in to memory and unloading any previous data from memory if not wanted.
For e.g. in any table, at one shot, the user may see at best 50 records, then he will have to scroll(considering his screen is not size of an iMax movie theatre). When he scrolls, load the next chunk of data and unload any data that is now inivsible to the user.
When is a Type Loaded? This is a
surprisingly tricky question to
answer. This is due in large part to
the significant flexibility afforded,
by the JVM spec, to JVM
implementations. Loading must be
performed before linking and linking
must be performed before
initialization. The VM spec does
stipulate the timing of
initialization. It strictly requires
that a type be initialized on its
first active use (see Appendix A for a
list of what constitutes an "active
use"). This means that loading (and
linking) of a type MUST be performed
at or before that type's first active
use.
From http://www.developer.com/java/other/article.php/2248831/Java-Class-Loading-The-Basics.htm
I don't think that you will be happy with maintaining 10K Strings, hardcoded at Java files.
Rather check if you are using the right database for your problem and if your indices are set correctly. A wrong index can cause really poor performance.
Additionally you should limit the amount of results returned by the query, but make sure you don't fetch the entries one by one.
If nothing fits, you can still preload the Strings from the database at startup.
You could preload, let's say 10 entries, for each character. If a character is keyed in, you can preload the entries with that character following another and so on.

Categories