Derby DB - File size leak

Derby DB - File size leak - java

Every now and then one of our remotely deployed java apps with a local derby db locks up due to waiting for a transaction to complete on a table. Every 24 hours the app syncs this derby db with our juggernaught oracle server, which means that after delivering its contents off, it clears all of its tables and then completely repopulates.
I opened the hood up and found that the .dat file for one of the tables has blown itself out to 100Mb. Im not sure exactly how long it takes to get to this point because the users generally arent around when the thing syncs. The gradual slowdown wasnt reported until it appeared to have completely stopped. Im also currently not sure logically how, since although autocommit is off, the code seems pretty tight.
After some googling I think what may have happened is an old transaction was somehow forgotten and neither commited nor rolled back - Future syncs which clear the table and repopulate essentially grow the file by it's original size every sync, since its tracking its entire history back to that old point.
Is there some way to confirm this and/or am I thinking completely down the wrong track? Is it possible to list old or long running transactions? If any are found, how can I clear them and will clearing them reclaim the disk space?

Related

How can I completely save an entire application and all of its memory, and re-load later?

Before I begin, let me say that this may be the most absurd question ever asked here, because I've never even heard of anything like this to know if it's remotely possible. That said, if it is possible, I'd like to find out how I can do it.
I am running a Java program and I want to be able to save it and restore from the save later, to reduce startup time.
The application state at the time of saving would not yet have been burdened down by multiple processes and users, but it would be ready to listen for new users as soon as it is restored.
How can I
Save the application and all of its used memory
Load the entire application later, right where it left off
Or is this even possible? From what I know of computer hibernation, something similar is done; the data saved in RAM is written to the disk and loaded later, but I don't know if it's possible on a server or even a PC.

Stale Lucene index when using multiple machines

I've got a Java/Hibernate/MySQL application up and running, and it works very nicely.
Recently I've been using Lucene (Hibernate Search) to speed up the searching and avoid round trips to the database by using projection. That works great too, except that the index gets stale when the application gets used on multiple machines. Lucene does a good job of updating the local index when changes are made locally, but it can't see changes from other machines.
Currently, I am:
reindexing in full once a week
updating a "last modified" time on all records, and updating the local index at start time based on anything modified since last indexing
But this doesn't work for deletions. If something gets deleted on one machine, it still turns up in searches on other machines.
Is there a 'standard' way to deal with this? I can think of a few options, none of which excite me:
reindex in full every night (still stale during the day, though)
maintain a table of deleted records so that I can use it to update locally
perform a round trip to the db at startup time to find all entries in the index but not in the db
add some sort of trigger to the db to record something somewhere when something gets deleted (this would work for updates as well as deletions)
Hard to believe this is a new problem, but I couldn't find any convincing answers.
Any help much appreciated.

Downloading A Large SQLite Database From Server in Binary vs. Creating It On The Device

I have an application that requires the creation and download of a significantly large SQLite database. Depending on the user's data, creation of the db and the syncing of data from the server can take upwards of 20 to 25 minutes (some customers have a LOT of data). The data is downloaded as JSON and processed with Android's built in JSON classes.
To account for OutOfMemory issues I was having with some devices, I needed to limit the per-call download from the server to 500 records at a time. But, as of now, all of the above is working successfully - although slow.
Recently, there has been talk from my team of creating the complete SQLite db on the server side and then just downloading it to the device in binary in an effort to speed things up. I've never done this before. Is this indeed a viable option OR should I just be looking into speeding up the processing of the JSON through a 3rd party lib like GSON or Jackson.
Thanks in advance for your input.

From my experience with mobile devices, reinventing synchronization is an overkill most of the time. It obviously depends on the hardware, software and amounts of data you're working with. But most of the time long operation execution times on mobile devices are caused by faulty design, careless coding or specifics of embedded systems not taken into consideration.
Unfortunately, I can only give you some hints which you may consider, given pretty vague description of issues you're facing. I mean "LOT" doesn't mean much to me - I've seen mobile apps with DBs containing millions of records running pretty smoothly and ones that had around a 1K records running horribly slow and causing UI to freeze. You also didn't mentioned what OS version and device (or at least it's capabilities) you're using. What's the server configuration, what software is installed, what libraries/frameworks are used and in what modes. It all matters when you want to really speed things up.
Apart of encoding being gzip (which I believe you left default, which is on), you should give this ideas a try:
Streaming! - make sure both the client and the server use a streaming version of JSON API and use buffered streams. If either doesn't - replace it with a library that does. Jackson has one of the fastest streaming API. Sure it's more cumbersome to write a (de)serializer, but it pays off. When done properly, none of the sides must create a buffer large enough for (de)serialization of all the data, fill it with contents, and then parse/write it. Instead, a much smaller buffer is allocated and filled gradually as successive fields are serialized. When this buffer gets filled, it's contents is immediately sent to the other end of data channel. There it can be deserialized right away. The process continues until all data have been transmitted in small chunks. It makes the data interchange much more fluent and less resource-intensive.
For large batch inserts or updates use prepared statements. It also sometimes helps to insert your data without constraints and then create them - that way, for example, an index can be computed in one run instead of for each insert. Don't use transactions (they require maintaining extra database logs) or commit every 300 rows to minimize the overhead. If you're updating existing database and atomic modifications are necessary - load new data to a temporary database and, if everything is ok, replace old database with new one on the fly.
Almost always some data can be precomputed and stored on an sd-card for example. Or it can be loaded directly to an sd-card as a prepared SQLite DB in the company. If a task requires data that is so large that an import takes more than 10 minutes, you probably shouldn't do that task on mobile devices in the first place.

Java Memory Leak Due to Massive Data Processing

I am currently developing an application that processes several files, containing around 75,000 records a piece (stored in binary format). When this app is ran (manually, about once a month), about 1 million records are contained entirely with the files. Files are put in a folder, click process and it goes and stores this into a MySQL database (table_1)
The records contain information that needs to be compared to another table (table_2) containing over 700k records.
I have gone about this a few ways:
METHOD 1: Import Now, Process Later
In this method, I would import the data into the database without any processing from the other table. However when I wanted to run a report on the collected data, it would crash assuming memory leak (1 GB used in total before crash).
METHOD 2: Import Now, Use MySQL to Process
This was what I would like to do but in practice it didn't seem to turn out so well. In this I would write the logic in finding the correlations between table_1 and table_2. However the MySQL result is massive and I couldn't get a consistent output, sometimes causing MySQL giving up.
METHOD 3: Import Now, Process Now
I am currently trying this method and although the memory leak is subtle, It still only gets to about 200,000 records before crashing. I have tried numerous forced garbage collections along the way, destroying properly classes, etc. It seems something is fighting me.
I am at my wits end trying to solve the issue with memory leaking / the app crashing. I am no expert in Java and have yet to really deal with very large amounts of data in MySQL. Any guidance would be extremely helpful. I have put thought into these methods:
Break each line process into individual class, hopefully expunging any memory usage on each line
Some sort of stored routine where once a line is stored into the database, MySQL does the table_1 <=> table_2 computation and stores the result
But I would like to pose the question to the many skilled Stack Overflow members to learn properly how this should be handled.

I concur with the answers that say "use a profiler".
But I'd just like to point out a couple of misconceptions in your question:
The storage leak is not due to massive data processing. It is due to a bug. The "massiveness" simply makes the symptoms more apparent.
Running the garbage collector won't cure a storage leak. The JVM always runs a full garbage collection immediately before it decides to give up and throw an OOME.
It is difficult to give advice on what might actually be causing the storage leak without more information on what you are trying to do and how you are doing it.

The learning curve for a profiler like VirtualVM is pretty small. With luck, you'll have an answer - at least a very big clue - within an hour or so.

you properly handle this situation by either:
generating a heap dump when the app crashes and analyzing that in a good memory profiler
hook up the running app to a good memory profiler and look at the heap
i personally prefer yjp, but there are some decent free apps as well (e.g. jvisualvm and netbeans)

Without knowing too much about what you're doing, if you're running out of memory there's likely some point where you're storing everything in the jvm, but you should be able to do a data processing task like this the severe memory problems you're experiencing. In the past, I've seen data processing pipelines that run out of memory because there's one class reading stuff out of the db, wrapping it all up in a nice collection, and then passing it off to another, which of course requires all of the data to be in memory simultaneously. Frameworks are good for hiding this sort of thing.
Heap dumps/digging with virtualVm hasn't been terribly helpful for me , as the details I'm looking for are often hidden - e.g. If you've got a ton of memory filled with maps of strings it doesn't really help to tell you that Strings are the largest component of your memory useage, you sort of need to know who owns them.
Can you post more detail about the actual problem you're trying to solve?

DB2 jdbc performance

doing profiling on an java application running websphere 7 and DB2 we can see that we spend most of our time in the com.ibm.ws.rsadapter.jdbc package handling connections to and from the database.
How can we tune our jdbc performance?
What other strategies exist when database performance is a bottleneck?
Thanks

You should check your websphere manual for how you configure a connection pool.
Update 2021
Here is an introduction inculding code samples
Update 2021

One cause of slow connect times is a deactivated database, which does not open its files and allocate its memory buffers and heaps until the first application attempts to connect to it. Ask your DBA to confirm that the database is active before running your tests. The LIST ACTIVE DATABASES command (run from the local DB2 server or over a remote attachment) should show your database in its output. If the database is not activated, have your DBA activate it explicitly with ACTIVATE DATABASE yourDBname. That will ensure that the database files and memory structures remain available even when the last user disconnects from the database.
Use GET MONITOR SWITCHES to ensure all your monitor switches are enabled for your database, otherwise you'll miss out on some potentially revealing performance details. The additional overhead of tracking the data associated with those monitor switches is minimal, while the value of the performance data is significant.
If the database is always active and things still seem slow, there are detailed DB2 traces called event monitors that log everything they encounter to a file, pipe, or DB2 table. The statement event monitor is one I turn to fairly often to analyze SQL statement efficiency and UOW hygiene. I also prefer taking the extra hit to log the event monitor records to a table rather than a file, so I can use SQL to search the data for all sorts of patterns. The db2evtbl utility makes it fairly easy to define the event monitor you want and create the tables to store its output. The SET EVENT MONITOR STATE command is how you start and stop the event monitor you've created.

In my experience what you are seeing is pretty common. The question to ask is what exactly is the DB2 connection doing...
The first thing to do is to try and isolate the performance issue down to a section of the website - i.e. is there one part of the application that see poor performance, when you find that you can increase the trace logging to see if you can see the query causing issues.
Additionally, if you chat to your DBA's they may be able to run some analysis on the database to tell you what queries are taking the time to return values, this may also help in your troubleshooting.
Good luck!

Connection pooling
Caching
DBAs

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.