Dynamically create tables for each year (hibernate+java+postgresql) - java

There are a java web application which does use Hibernate for interacting with database. Recently, the web app started working slowly and its getting worse for some of its operations and after debugging I've found out that it's a database issue. One of the tables, responding for specific data got about 3mil rows and its increasing, so it takes about to 5-10 sec to add one record.
I was thinking about creating different tables for each year, so the data of a year 2018 for example would be stored only in table_xxx_2018. One solution would be to manually, beforehand create all tables and classes and map them in .hbm.xml files, but I don't know if its good solution or no. Also, it doesn't seem to be sustainable. So, I was looking forward to see if its possible to create dynamic tables using hibernate and map them with responding classes in java.
I also googled some tweaks and improvements for PostgreSQL but it didn't help. The main problem is increased flow of the data which is coming to the app, and it seems that Postgres started choking at 3mil+ rows per table.
These are the server specs, only Postgres is running on this server and nothing else:
CPU Intel Xeon E5-2630 #2.30Ghz
RAM 32 GB and SSD drives in RAID
Any suggestions are welcome
EDIT1:
An example of the piece of code
#Override
public void saveOrUpdateCitizen(Citizen citizen) {
Session session = sessionFactory.getCurrentSession();
session.saveOrUpdate(citizen);
}

Related

Performance of the Application with a database

If you have any problem with the comment, download an application with java and netbeans using a mysql database, there are 2 options:
1.When I work with the java application consuming my database from my own machine, it goes smoothly in that way:
does not slow down when showing the data from the database among other points, such as when showing a list of products in a table
2.when I work with the java application consuming an external database such as hostinger, which is in my case, it goes slow in such cases, an example: when opening a table with a list of products it takes almost 2 minutes to display and it should not be so really
Well based on all this I have used pool connections both with the classes: ComboPooledDataSource and BasicDataSource but anyway it is still the same or capable I am missing some value to add, I cannot find the solution to this problem, 2 words come to mind which are connections or network latency, I would really appreciate if you would support me with this ..

Reading an AS400 .MBR file to a flat file using Java

This is what I have been trying to achieve.
We are in process to let go a vendor tool called GO-Anywhere that reads data from an DB2 database after firing a select query creates a file writes data to it zips it and sftps it to a machine where our ETL tool can read it.
I have been able to achieve what GA does in almost the same time infact beating the above tools by 5 minutes on a 6.5GB file by using JSCH and jaring un-jaring on the fly. This brings down the time to read and write the file from earlier 32 minutes to now 27 minutes.
But to meet the new SLA requirements we need to further bring down the time to almost half of what I have that is something around 13 odd minutes
To achieve the above I have been able to read the .MBR file directly and push the same on to the Linux machine in 13 minutes or less but the format of this file is not clear text.
I would like to know how can one convert the .MBR file into plain text format using Java or using AS400 command without firing the SQL.
Any help appreciated.
You're under the mistaken impression that a "FILE" on the IBM i is like a file on Windows/Unix/Linux.
It's not.
Like every other object type in IBM i, it's an object with well defined interfaces.
In the particular case of a *FILE object, it's a database table. DB2 for i isn't an add-on DBMS installed on top the OS; DB2 for i is simply the name they gave to the DBMS integrated into the OS. A user program can't simply open storage space directly like you can do with files on Windows/Unix/Linux. You have to go through the interfaces provided by the OS.
There are two interfaces available, Record Level Access (RLA) or SQL. Both can be used from a Java application. RLA is provided by the com.ibm.as400.access.AS400File class. SQL access is provided by the JDBC classes.
SQL is likely to provide the best performance, since your dealing with a set of records instead of one at a time with RLA.
Take a look at the various performance related JDBC properties available..
From a performance standpoint, it's unlikely that your single process would fully utilize the system, ie. CPU usage won't be at 100% nor will disk activity be upwards of 60-80%.
That being the case, your best bet is to break the process into multiple ones. You'll need some way to limit each process to a selected set of records. Possibly segregation by primary key. That will add some overhead unless the records are in primary key order. If the table doesn't have deleted records, using RRN() to segregate by physical order may work. But be warned, on older versions of the OS, the use of RRN() required a full table scan.
Guessing at what is happening is that there are packed decimal fields in the source table which aren't getting unpacked by your home-grown method of reading the table.
There are several possibilities.
Have the IBM i team create a view over the source table which has all of the numeric columns zoned decimal. Additionally, omit columns that the ETL doesn't need - it will reduce the I/O by not having to move those bytes around. Perform the extract over that. Note: there may be such a view already on the system.
Have the IBM i team build appropriate indexes. Often, SQL bottlenecks can be alleviated with proper indexes.
Don't ZIP and UNZIP; send the raw file to the other system. Even at 6GB, gigabit Ethernet can easily deal with that.
Load an ODBC driver on the ETL system and have it directly read the source table (or the appropriate view) rather than send a copy to the ETL system.
Where did the SLA time limit come from? If the SLA said 'subsecond response time' what would you do? At some point, the SLA needs to reflect some version of reality as defined by the laws of physics. I'm not saying that you've reached that limit: I'm saying that you need to find the rationale for it.
Have the IBM i team make sure they are current on patches (PTFs). IBM often address performance issues via PTF.
Have the IBM i team make sure that the subsystem where your jobs are running has enough memory.

Hibernate deadlocking with 20 simulated users

We have a basic Java EE app that runs under tomcat and maintains a connection pool to a SQL server database. We were having some data issues showing up only in production, so I created a testing tool that would simulate different numbers of users going through the system on different paths.
I've worked on this a bit and so the problem's evolved as I chased it. Now the problem is this.
Ten user threads works perfectly. Twenty user threads and the log record that gets created when the user logs into the system never gets inserted for any of the 20 users. In fact, Hibernate 3.3 goes through the motions of inserting the record, but when I use the show_sql setting, the insert statement never shows up in the dump. Again this works perfectly with 10 users. And more puzzling, every once in a while it will work for one of the 20 users. :(
I'm using the JTDS driver, btw, to avoid the problems we kept finding with the MS one.
I am running SQL Server Express 2008 R2 on my local box with tomcat and running my test app in my eclipse IDE. Has anyone seen anything like this? Any ideas as to why hibernate might be locking after 10 users?
I believe the problem is that you cannot open enough sessions as you need (Because they are pooled)
How do you open the session ?
What size does your connection pool have?
Do you always close the sessions?

Downloading A Large SQLite Database From Server in Binary vs. Creating It On The Device

I have an application that requires the creation and download of a significantly large SQLite database. Depending on the user's data, creation of the db and the syncing of data from the server can take upwards of 20 to 25 minutes (some customers have a LOT of data). The data is downloaded as JSON and processed with Android's built in JSON classes.
To account for OutOfMemory issues I was having with some devices, I needed to limit the per-call download from the server to 500 records at a time. But, as of now, all of the above is working successfully - although slow.
Recently, there has been talk from my team of creating the complete SQLite db on the server side and then just downloading it to the device in binary in an effort to speed things up. I've never done this before. Is this indeed a viable option OR should I just be looking into speeding up the processing of the JSON through a 3rd party lib like GSON or Jackson.
Thanks in advance for your input.
From my experience with mobile devices, reinventing synchronization is an overkill most of the time. It obviously depends on the hardware, software and amounts of data you're working with. But most of the time long operation execution times on mobile devices are caused by faulty design, careless coding or specifics of embedded systems not taken into consideration.
Unfortunately, I can only give you some hints which you may consider, given pretty vague description of issues you're facing. I mean "LOT" doesn't mean much to me - I've seen mobile apps with DBs containing millions of records running pretty smoothly and ones that had around a 1K records running horribly slow and causing UI to freeze. You also didn't mentioned what OS version and device (or at least it's capabilities) you're using. What's the server configuration, what software is installed, what libraries/frameworks are used and in what modes. It all matters when you want to really speed things up.
Apart of encoding being gzip (which I believe you left default, which is on), you should give this ideas a try:
Streaming! - make sure both the client and the server use a streaming version of JSON API and use buffered streams. If either doesn't - replace it with a library that does. Jackson has one of the fastest streaming API. Sure it's more cumbersome to write a (de)serializer, but it pays off. When done properly, none of the sides must create a buffer large enough for (de)serialization of all the data, fill it with contents, and then parse/write it. Instead, a much smaller buffer is allocated and filled gradually as successive fields are serialized. When this buffer gets filled, it's contents is immediately sent to the other end of data channel. There it can be deserialized right away. The process continues until all data have been transmitted in small chunks. It makes the data interchange much more fluent and less resource-intensive.
For large batch inserts or updates use prepared statements. It also sometimes helps to insert your data without constraints and then create them - that way, for example, an index can be computed in one run instead of for each insert. Don't use transactions (they require maintaining extra database logs) or commit every 300 rows to minimize the overhead. If you're updating existing database and atomic modifications are necessary - load new data to a temporary database and, if everything is ok, replace old database with new one on the fly.
Almost always some data can be precomputed and stored on an sd-card for example. Or it can be loaded directly to an sd-card as a prepared SQLite DB in the company. If a task requires data that is so large that an import takes more than 10 minutes, you probably shouldn't do that task on mobile devices in the first place.

MySQL performance

I have this LAMP application with about 900k rows in MySQL and I am having some performance issues.
Background - Apart from the LAMP stack , there's also a Java process (multi-threaded) that runs in its own JVM. So together with LAMP & java, they form the complete solution. The java process is responsible for inserts/updates and few selects as well. These inserts/updates are usually in bulk/batch, anywhere between 5-150 rows. The PHP front-end code only does SELECT's.
Issue - the PHP/SELECT queries become very slow when the java process is running. When the java process is stopped, SELECT's perform alright. I mean the performance difference is huge. When the java process is running, any action performed on the php front-end results in 80% and more CPU usage for mysqld process.
Any help would be appreciated.
MySQL is running with default parameters & settings.
Software stack -
Apache - 2.2.x
MySQL -5.1.37-1ubuntu5
PHP - 5.2.10
Java - 1.6.0_15
OS - Ubuntu 9.10 (karmic)
What engine are you using for MySQL? The thing to note here is if you're using MyISAM, then you're going to have locking issues due to the table locking that engine uses.
From: MySQL Table Locking
Table locking is also disadvantageous
under the following scenario:
* A session issues a SELECT that takes a long time to run.
* Another session then issues an UPDATE on the same table. This session
waits until the SELECT is finished.
* Another session issues another SELECT statement on the same table.
Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish,
after waiting for the first SELECT to finish.
I won't repeat them here, but the page has some tips on increasing concurrency on a table within MySQL. Obviously, one option would be to change to an engine like InnoDB which has a more complex row locking mechanism that for high concurrency tables can make a huge difference in performance. For more info on InnoDB go here.
Prior to changing the engine though it would probably be worth looking at the other tips like making sure your table is indexed properly, etc. as this will increase select and update performance regardless of the storage engine.
Edit based on user comment:
I would say it's one possible solution based on the symptoms you've described, but it may not be
the one that will get you where you want to be. It's impossible to say without more information.
You could be doing full table scans due to the lack of indexes. This could be causing I/O contention
on your disk, which just further exasterbates the table locks used by MyISAM. If this is the case then
the root of the cause is the improper indexing and rectifying that would be your best course of action
before changing storage engines.
Also, make sure your tables are normalized. This can have profound implications on performance
especially on updates. Normalized tables can allow you to update a single row instead of hundreds or
thousands in an un-normalized table. This is due to unduplicated values. It can also save huge amounts
of I/O on selects as the db can more efficiently cache data blocks. Without knowing the structure of
the tables you're working with or the indexes you have present it's difficult to provide you with a
more detailed response.
Edit after user attempted using InnoDB:
You mentioned that your Java process is multi-threaded. Have you tried running the process with a single thread? I'm wondering if maybe it's possibly you're sending the same rows to update out to multiple threads and/or the way you're updating across threads is causing locking issues.
Outside of that, I would check the following:
Have you checked your explain plans to verify you have reasonable costs and that the query is actually using the indexes you have?
Are your tables normalized? More specifically, are you updating 100 rows when you could update a single record if the tables were normalized?
Is it possible that you're running out of physical memory when the Java process is running and the machine is busy swapping stuff in and out?
Are you flooding your disk (a single disk?) with more IOPs than it can reasonably handle?
We'd need to know a lot more about the system to say if thats normal or how to solve the problem.
with about 900k rows in MySQL
I would say that makes it very small - so if its performing badly then you're going seriously wrong somewhere.
Enable the query log to see exactly what queries are running, prioritize based on the product of frequency and duration. Have a look at the explain plans, create some indexes. Think about splitting the database across multiple disks.
HTH
C.

Categories