MySQL memory exhausted error - java

Today I was using a simple Java application to load a large size data into MySQL DB, and got a error below:
java.sql.SQLException: Syntax error or access violation message from server: "memory exhausted near ''Q1',2.34652631E10,'000','000',5.0519608E9,5.8128358E9,'000','000',8.2756818E9,2' at line 5332"
I've tried to modified the my.ini file to increase some point, however it doesn't work at all and actually the size of file is not so large, it's just a 14mb xls file, almost running out of idea, awaiting for any suggestion. Appreciate your help!

(Without the relevant parts of your code I can only guess, but here we go...)
From the error message, I will take a shot in the dark and guess that you are trying to load all of 300,000 rows in a single query, which is probably produced by concatenating a whole bunch of INSERT statements in a single string. A 14MB XLS file can become a lot bigger when translated into SQL statements and your server runs out of memory trying to parse the query.
To resolve this (in order of preference):
Convert your file to CSV and use mysqlimport.
Convert your file to CSV and use LOAD DATA INFILE.
Use multiple transactions of moderate size with only a few thousand INSERT statements each. This is the recommended option if you cannot simply import the file.
Use a single transaction - InnoDB MySQL databases should handle transaction sizes in this size range.

Related

How to efficiently export/import database data with JDBC

I have a JAVA application that can use a SQL database from any vendor. Right now we have tested Vertica and PostgreSQL. I want to export all the data from one table in the DB and import it later on in a different instance of the application. The size of the DB is pretty big so there are many rows in there. The export and import process has to be done from inside the java code.
What we've tried so far is:
Export: we read the whole table (select * from) through JDBC and then dump it to an SQL file with all the INSERTS needed.
Import: The file containing those thousands of INSERTS is executed in the target database through JDBC.
This is not an efficient process. Firstly, the select * from part is giving us problems because of the size of it and secondly, executing a lot if inserts one after another gives us problems in Vertica (https://forum.vertica.com/discussion/235201/vjdbc-5065-error-too-many-ros-containers-exist-for-the-following-projections)
What would be a more efficient way of doing this? Are there any tools that can help with the process or there is no "elegant" solution?
Why not do the export/import in a single step with batching (for performance) and chunking (to avoid errors and provide a checkpoint where to start off after a failure).
In most cases, databases support INSERT queries with many values, e.g.:
INSERT INTO table_a (col_a, col_b, ...) VALUES
(val_a, val_b, ...),
(val_a, val_b, ...),
(val_a, val_b, ...),
...
The number of rows you generate into a single such INSERT statement is then your chunk-size, which might need tuning for the specific target database (big enough to speed things up but small enough to make the chunk not exceed some database limit and create failures).
As already proposed, each of this chunk should then be executed in a transaction and your application should remember which chunk it successfully executed last in case some error occurs so it can continue at the next run there.
For the chunks itself, you really should use LIMIT OFFSET .
This way, you can repeat any chunk at any time, each chunk by itself is atomic and it should perform much better than with single row statements.
I can only speak about PostgreSQL.
The size of the SELECT is not a problem if you use server-side cursors by calling setFetchSize with a value greater than 0 (perhaps 10000) on the statement.
The INSERTS will perform well if
you run them all in a single transaction
you use a PreparedStatement for the INSERT
Each insert into Vertica goes into WOS (memory), and periodically data from WOS gets moved to ROS (disk) into a single container. You can only have 1024 ROS containers per projection per node. Doing many thousands of INSERTs at a time is never a good idea for Vertica. The best way to do this is to copy all that data into a file and bulk load the file into Vertica using the COPY command.
This will create a single ROS container for the contents of the file. Depending on how many rows you want to copy it will be many times (sometimes even hundreds of times) faster.
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/COPY/COPY.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ConnectingToVertica/ClientJDBC/UsingCOPYLOCALWithJDBC.htm

Fetching data from DB2 for more than 49 records takes more time

I am basically comparing text file lines data with that from DB. Here are my steps:
Reading data from each line & saving in Array.
Iterate over each line from Text File & fetching data from DB2 for Unique Card Number which is present in Text file.
Problem : When I consider above comparison for 50 records then it works fine (takes only 29 seconds) and give correct result. But when I increase lines in text file (i.e. 55-60 lines in text file) then it executes but take unexpected 20 minutes.
Due to data security I can't share code.
There isn't enough real information here to give an "answer". I do agree with Mao that line-by-line processing really isn't sustainable, but that's not the cause of this aberrant behaviour.
I assume based on the relative naiveté of your approach that you are probably creating a new Connection object for each line of the file. I suspect that what's happening is that you are reaching a limit on the number of concurrent connections for your DB2 server - 50 seems like a reasonable setting for that. If you are doing this, create a single Connection object and reuse it for every line processed.
Alternatively, if you're re-using the same Connection object for all the lines, I think it's likely that the DB2 server has some unusual settings that are causing your connection to behave in an unexpected way, e.g. limit on number of operations per connection. I am not super knowledgeable about DB2 so I can't suggest any specific settings to look for. You could try to work around this by disconnecting and reconnecting every 50 lines.
You should not iterate over each line and query DB.
Instead collect the cardNumbers in to a list and get the data from DB all at once by:
SELECT * FROM table WHERE CARD_NUMBER IN (cardNumberList)

java.lang.OutOfMemoryError: Java heap space at executeQuery

I know there are lots of similar questions and I've read every one of 'em (at least I believe so) and I was not able to resolve my issue with the java.lang.OutOfMemoryError: Java heap space. Let me describe my problem.
I'm working on a simple Java program which queries DB and generates a CSV file.
Everything was fine and I was able to generate CSV files for queries with huge data and with around 320+ columns.
And sometime later I faced this issue where I queried a table with exactly 309 columns.
Query is something like this SELECT * FROM TABLE_A. And it had no rows. So the query will return 0 records.
Ideally this should create a empty file which happens for all my other queries which I've tried except this one where I got this error and console pointed to this line where executeQuery is executed. Even with data in the table I get the same error.
The CSV is getting generated only when I explicitly increase the heap size to more than 3gb whereas for others it was working with default heap size. (no idea why it needs so much heap space for table containing 0 records)
And for default heap size this particular report gets generated successfully only when I have less number of columns like 100-150 columns.
Why do I get the out of memory issue for this alone? Is there something to do with the table? To my knowledge the table is similar to all other tables. And will it be because of the column size for this table? For most of the columns I've 255 as the column size.
I've spent 2-3 days to analysis why this is happening and no luck.
Can someone help me out with this? I think this is not similar to any other out of memory issues out there. Its kinda weird.

Why does a Derby database take up so much space?

I am new to databases and I love how easy it is to get data from a relational database (such as a Derby database). What I don't like is how much data one takes up; I have made a database with two tables, written a total of 130 records to these tables (each table has 6 columns), and the whole relational database gets saved in the system directory as a folder that houses a total of approximately 1914014 bytes! (Assuming that I did the arithmetic right....) What the heck is going on to cause such a huge request of memory?! //I also notice that there is a log1.dat file in log folder that takes up exactly 1MB of data. I looked into this file via Notepad++, and saw that it was mostly NULL characters. What is that all about?
Derby need to keep track on your database data, the redo logs and transactions so your database is in a consistent state and can recover even from pc crashes.
Also he creates most files with a fixed size (like 1MB) to ensure he did not need to increase the file size later on (performance issues and to not fragment his files to much).
Over the runtime or when stopping, Derby will clean up some of this files or regroup them and free space.
So overall the space and the files are the trade offs you get for using a database.
Maybe you can change some of this behaviour via some Derby configs (I did not find any one suitable in the doc :().
When last checked in 2011, an empty Derby database takes about 770 K of disk space: http://apache-database.10148.n7.nabble.com/Database-size-larger-than-expected-td104630.html
The log1.dat file is your transaction log, and records database changes so that the database can be recovered if there is a crash (or if your program calls ROLLBACK).
Note that log1.dat is disk space, not memory.
If you'd like to learn about the basics of Derby's transaction log, start here: http://db.apache.org/derby/papers/recovery.html

Performance problem on Java DB Derby Blobs & Delete

I’ve been experiencing a performance problem with deleting blobs in derby, and was wondering if anyone could offer any advice.
This is primarily with 10.4.2.0 under windows and solaris, although I’ve also tested with the new 10.5.1.1 release candidate (as it has many lob changes), but this makes no significant difference.
The problem is that with a table containing many large blobs, deleting a single row can take a long time (often over a minute).
I’ve reproduced this with a small test that creates a table, inserts a few rows with blobs of differing sizes, then deletes them.
The table schema is simple, just:
create table blobtest( id integer generated BY DEFAULT as identity, b blob )
and I’ve then created 7 rows with the following blob sizes : 1024 bytes, 1Mb, 10Mb, 25Mb, 50Mb, 75Mb, 100Mb.
I’ve read the blobs back, to check they have been created properly and are the correct size.
They have then been deleted using the sql statement ( “delete from blobtest where id = X” ).
If I delete the rows in the order I created them, average timings to delete a single row are:
1024 bytes: 19.5 seconds
1Mb: 16 seconds
10Mb: 18 seconds
25Mb: 15 seconds
50Mb: 17 seconds
75Mb: 10 seconds
100Mb: 1.5 seconds
If I delete them in reverse order, the average timings to delete a single row are:
100Mb: 20 seconds
75Mb: 10 seconds
50Mb: 4 seconds
25Mb: 0.3 seconds
10Mb: 0.25 seconds
1Mb: 0.02 seconds
1024 bytes: 0.005 seconds
If I create seven small blobs, delete times are all instantaneous.
It thus appears that the delete time seems to be related to the overall size of the rows in the table more than the size of the blob being removed.
I’ve run the tests a few times, and the results seem reproducible.
So, does anyone have any explanation for the performance, and any suggestions on how to work around it or fix it? It does make using large blobs quite problematic in a production environment…
I have exact the same issue you have.
I found that when I do DELETE, derby actually "read through" the large segment file completely. I use Filemon.exe to observe how it run.
My file size it 940MB, and it takes 90s to delete just a single row.
I believe that derby store the table data in a single file inside. And some how a design/implementation bug that cause it read everything rather then do it with a proper index.
I do batch delete rather to workaround this problem.
I rewrite a part of my program. It was "where id=?" in auto-commit.
Then I rewrite many thing and it now "where ID IN(?,.......?)" enclosed in a transaction.
The total time reduce to 1/1000 then it before.
I suggest that you may add a column for "mark as deleted", with a schedule that do batch actual deletion.
As far as I can tell, Derby will only store BLOBs inline with the other database data, so you end up with the BLOB split up over a ton of separate DB page files. This BLOB storage mechanism is good for ACID, and good for smaller BLOBs (say, image thumbnails), but breaks down with larger objects. According to the Derby docs, turning autocommit off when manipulating BLOBs may also improve performance, but this will only go so far.
I strongly suggest you migrate to H2 or another DBMS if good performance on large BLOBs is important, and the BLOBs must stay within the DB. You can use the SQuirrel SQL client and its DBCopy plugin to directly migrate between DBMSes (you just need to point it to the Derby/JavaDB JDBC driver and the H2 driver). I'd be glad to help with this part, since I just did it myself, and haven't been happier.
Failing this, you can move the BLOBs out of the database and into the filesystem. To do this, you would replace the BLOB column in the database with a BLOB size (if desired) and location (a URI or platform-dependent file string). When creating a new blob, you create a corresponding file in the filesystem. The location could be based off of a given directory, with the primary key appended. For example, your DB is in "DBFolder/DBName" and your blobs go in "DBFolder/DBName/Blob" and have filename "BLOB_PRIMARYKEY.bin" or somesuch. To edit or read the BLOBs, you query the DB for the location, and then do read/write to the file directly. Then you log the new file size to the DB if it changed.
I'm sure this isn't the answer you want, but for a production environment with throughput requirements I wouldn't use Java DB. MySQL is just as free and will handle your requirements a lot better. I think you are really just beating your head against a limitation of the solution you've chosen.
I generally only use Derby as a test case, and especially only when my entire DB can fit easily into memory. YMMV.
Have you tried increasing the page size of your database?
There's information about this and more in the Tuning Java DB manual which you may find useful.

Categories