I have a coupled main and a (temporary and auxiliary) database in a BerkeleyDB JE environment. The problem is as follows:
I am using transactions, and atomicity must span both the main and the aux DB
therefore, as I understand the docs, I have to use a single environment (transactions cannot span several environments)
So the two databases share one directory. But do they also share the same files?
Because the aux DB can grow very big, and I want to wipe it at the start of the application.
I have used e.truncateDatabase(txn, name, false) to do so
But it appears the database directory never shrinks, so if in every application run the aux DB uses e.g. 500 MB, then after four runs, the directory is already 2 GB, irrespective of the truncation. Also I cannot see distinct files for main and aux DB
How can I really wipe the aux database, so that the disk space is freed? This is also a performance problem, because with those several GB large directories, BDB has serious trouble starting up and winding down. Can I force BDB to use separate files, so I can just delete a particular file?
Somehow this single environment seems at the root of the problem. For example, I would love to increase performance by giving the aux DB setTxnNoSync() but then this will also affect the main DB.
If I use setTemporary on the aux DB, I get a runtime exception, apparently it is disallowed to use transactions with a temporary database!?
java.lang.IllegalArgumentException: Attempted to open Database aux and two ore more of the following exclusive properties are true: deferredWrite, temporary, transactional
I improved the situation a bit with the following setting:
envCfg.setConfigParam(EnvironmentConfig.CLEANER_MIN_FILE_UTILIZATION, "33")
and using removeDatabase instead of truncateDatabase upon application start. At least I don't seem to get infinite growth any longer.
I'm still interested in hearing whether I can make BDB use dedicated log files for either database.
Related
This is what I have been trying to achieve.
We are in process to let go a vendor tool called GO-Anywhere that reads data from an DB2 database after firing a select query creates a file writes data to it zips it and sftps it to a machine where our ETL tool can read it.
I have been able to achieve what GA does in almost the same time infact beating the above tools by 5 minutes on a 6.5GB file by using JSCH and jaring un-jaring on the fly. This brings down the time to read and write the file from earlier 32 minutes to now 27 minutes.
But to meet the new SLA requirements we need to further bring down the time to almost half of what I have that is something around 13 odd minutes
To achieve the above I have been able to read the .MBR file directly and push the same on to the Linux machine in 13 minutes or less but the format of this file is not clear text.
I would like to know how can one convert the .MBR file into plain text format using Java or using AS400 command without firing the SQL.
Any help appreciated.
You're under the mistaken impression that a "FILE" on the IBM i is like a file on Windows/Unix/Linux.
It's not.
Like every other object type in IBM i, it's an object with well defined interfaces.
In the particular case of a *FILE object, it's a database table. DB2 for i isn't an add-on DBMS installed on top the OS; DB2 for i is simply the name they gave to the DBMS integrated into the OS. A user program can't simply open storage space directly like you can do with files on Windows/Unix/Linux. You have to go through the interfaces provided by the OS.
There are two interfaces available, Record Level Access (RLA) or SQL. Both can be used from a Java application. RLA is provided by the com.ibm.as400.access.AS400File class. SQL access is provided by the JDBC classes.
SQL is likely to provide the best performance, since your dealing with a set of records instead of one at a time with RLA.
Take a look at the various performance related JDBC properties available..
From a performance standpoint, it's unlikely that your single process would fully utilize the system, ie. CPU usage won't be at 100% nor will disk activity be upwards of 60-80%.
That being the case, your best bet is to break the process into multiple ones. You'll need some way to limit each process to a selected set of records. Possibly segregation by primary key. That will add some overhead unless the records are in primary key order. If the table doesn't have deleted records, using RRN() to segregate by physical order may work. But be warned, on older versions of the OS, the use of RRN() required a full table scan.
Guessing at what is happening is that there are packed decimal fields in the source table which aren't getting unpacked by your home-grown method of reading the table.
There are several possibilities.
Have the IBM i team create a view over the source table which has all of the numeric columns zoned decimal. Additionally, omit columns that the ETL doesn't need - it will reduce the I/O by not having to move those bytes around. Perform the extract over that. Note: there may be such a view already on the system.
Have the IBM i team build appropriate indexes. Often, SQL bottlenecks can be alleviated with proper indexes.
Don't ZIP and UNZIP; send the raw file to the other system. Even at 6GB, gigabit Ethernet can easily deal with that.
Load an ODBC driver on the ETL system and have it directly read the source table (or the appropriate view) rather than send a copy to the ETL system.
Where did the SLA time limit come from? If the SLA said 'subsecond response time' what would you do? At some point, the SLA needs to reflect some version of reality as defined by the laws of physics. I'm not saying that you've reached that limit: I'm saying that you need to find the rationale for it.
Have the IBM i team make sure they are current on patches (PTFs). IBM often address performance issues via PTF.
Have the IBM i team make sure that the subsystem where your jobs are running has enough memory.
I am currently writing a Java application that receives data from various sensors. How often this happens varies, but I believe that my application will receive signals about 100k times per day. I would like to log the data received from a sensor every time the application receives a signal. Because the application does much more than just log sensor data, performance is an issue. I am looking for the best and fastest way to log the data. Thus, I might not use a database, but rather write to a file and keep 1 file per day.
So what is faster? Use a database or log to files? No doubt there is also a lot of options to what logging software to use. Which is the best for my purpose if logging to file is the best option?
The data stored might be used later for analytical purposes, so please keep this in mind as well.
I would recommend you first of all to use log4j (or any other logging framework).
You can use a jdbc appender that writes into the db or any kind of file appender that writes into the file. The point is that your code will be generic enough to be changed later if you like...
In general files are much faster than db access, but there is a place for optimizations here.
If the performance is critical, you can use batching/asynchronous calls to the logging infrastructure.
A free database on a cheap PC should be able to record 10 records per second easily.
A tuned database on a good system or a logger on a cheap PC should be able to write 100 records/lines per second easily.
A tuned logger should be able to write 1000 lines per second easily.
A fast binary logger can perform 1 million records per second easily (depending on the size of the record)
Your requirement is about 1.2 records per second per signal which should be able to achieve any way you like. I assume you want to be able to query your data so you want it in a database eventually so I would put it there.
Ah the world of embedded systems. I had a similar problem when working with a hovercraft. I solved it with a separate computer(you can do this with a separate program) over the local area network that would just SIT and LISTEN as a server for logs I sent to it. The FileWriter program was written in C++. This must solve two problems of yours. First is the obvious performance gain while writing the logs. And secondly the Java program is FREED of writing any logs at all(but will act as a proxy) and can concentrate on performance critical tasks. Using a DB for this is going to be an overkill, except if you're using SQLite.
Good luck!
We use BDB JE in one of our applications, and DbDump for backing up database. The interesting things happened one day. DbDump starts to throw out an OutOfMemoryError. Postmortem analysis shows that a lot of memory is used by internal BDB nodes (IN). It seems like BerkleyDB reads all the dataset in memory while backing it up, which is quite strange for me.
Another strange fact is that this behavior only visible when the environment is open by the application itself. So when DbDumb is the only client who open environment everything seems to be fine.
Have you considered using DbBackup instead? I know they do two different things, but if all you're looking to do is backup a database, there's no need to pull it all into memory when simply copying the files elsewhere will do. Or is the command line ability the deciding factor here?
doing profiling on an java application running websphere 7 and DB2 we can see that we spend most of our time in the com.ibm.ws.rsadapter.jdbc package handling connections to and from the database.
How can we tune our jdbc performance?
What other strategies exist when database performance is a bottleneck?
Thanks
You should check your websphere manual for how you configure a connection pool.
Update 2021
Here is an introduction inculding code samples
Update 2021
One cause of slow connect times is a deactivated database, which does not open its files and allocate its memory buffers and heaps until the first application attempts to connect to it. Ask your DBA to confirm that the database is active before running your tests. The LIST ACTIVE DATABASES command (run from the local DB2 server or over a remote attachment) should show your database in its output. If the database is not activated, have your DBA activate it explicitly with ACTIVATE DATABASE yourDBname. That will ensure that the database files and memory structures remain available even when the last user disconnects from the database.
Use GET MONITOR SWITCHES to ensure all your monitor switches are enabled for your database, otherwise you'll miss out on some potentially revealing performance details. The additional overhead of tracking the data associated with those monitor switches is minimal, while the value of the performance data is significant.
If the database is always active and things still seem slow, there are detailed DB2 traces called event monitors that log everything they encounter to a file, pipe, or DB2 table. The statement event monitor is one I turn to fairly often to analyze SQL statement efficiency and UOW hygiene. I also prefer taking the extra hit to log the event monitor records to a table rather than a file, so I can use SQL to search the data for all sorts of patterns. The db2evtbl utility makes it fairly easy to define the event monitor you want and create the tables to store its output. The SET EVENT MONITOR STATE command is how you start and stop the event monitor you've created.
In my experience what you are seeing is pretty common. The question to ask is what exactly is the DB2 connection doing...
The first thing to do is to try and isolate the performance issue down to a section of the website - i.e. is there one part of the application that see poor performance, when you find that you can increase the trace logging to see if you can see the query causing issues.
Additionally, if you chat to your DBA's they may be able to run some analysis on the database to tell you what queries are taking the time to return values, this may also help in your troubleshooting.
Good luck!
Connection pooling
Caching
DBAs
I have this LAMP application with about 900k rows in MySQL and I am having some performance issues.
Background - Apart from the LAMP stack , there's also a Java process (multi-threaded) that runs in its own JVM. So together with LAMP & java, they form the complete solution. The java process is responsible for inserts/updates and few selects as well. These inserts/updates are usually in bulk/batch, anywhere between 5-150 rows. The PHP front-end code only does SELECT's.
Issue - the PHP/SELECT queries become very slow when the java process is running. When the java process is stopped, SELECT's perform alright. I mean the performance difference is huge. When the java process is running, any action performed on the php front-end results in 80% and more CPU usage for mysqld process.
Any help would be appreciated.
MySQL is running with default parameters & settings.
Software stack -
Apache - 2.2.x
MySQL -5.1.37-1ubuntu5
PHP - 5.2.10
Java - 1.6.0_15
OS - Ubuntu 9.10 (karmic)
What engine are you using for MySQL? The thing to note here is if you're using MyISAM, then you're going to have locking issues due to the table locking that engine uses.
From: MySQL Table Locking
Table locking is also disadvantageous
under the following scenario:
* A session issues a SELECT that takes a long time to run.
* Another session then issues an UPDATE on the same table. This session
waits until the SELECT is finished.
* Another session issues another SELECT statement on the same table.
Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish,
after waiting for the first SELECT to finish.
I won't repeat them here, but the page has some tips on increasing concurrency on a table within MySQL. Obviously, one option would be to change to an engine like InnoDB which has a more complex row locking mechanism that for high concurrency tables can make a huge difference in performance. For more info on InnoDB go here.
Prior to changing the engine though it would probably be worth looking at the other tips like making sure your table is indexed properly, etc. as this will increase select and update performance regardless of the storage engine.
Edit based on user comment:
I would say it's one possible solution based on the symptoms you've described, but it may not be
the one that will get you where you want to be. It's impossible to say without more information.
You could be doing full table scans due to the lack of indexes. This could be causing I/O contention
on your disk, which just further exasterbates the table locks used by MyISAM. If this is the case then
the root of the cause is the improper indexing and rectifying that would be your best course of action
before changing storage engines.
Also, make sure your tables are normalized. This can have profound implications on performance
especially on updates. Normalized tables can allow you to update a single row instead of hundreds or
thousands in an un-normalized table. This is due to unduplicated values. It can also save huge amounts
of I/O on selects as the db can more efficiently cache data blocks. Without knowing the structure of
the tables you're working with or the indexes you have present it's difficult to provide you with a
more detailed response.
Edit after user attempted using InnoDB:
You mentioned that your Java process is multi-threaded. Have you tried running the process with a single thread? I'm wondering if maybe it's possibly you're sending the same rows to update out to multiple threads and/or the way you're updating across threads is causing locking issues.
Outside of that, I would check the following:
Have you checked your explain plans to verify you have reasonable costs and that the query is actually using the indexes you have?
Are your tables normalized? More specifically, are you updating 100 rows when you could update a single record if the tables were normalized?
Is it possible that you're running out of physical memory when the Java process is running and the machine is busy swapping stuff in and out?
Are you flooding your disk (a single disk?) with more IOPs than it can reasonably handle?
We'd need to know a lot more about the system to say if thats normal or how to solve the problem.
with about 900k rows in MySQL
I would say that makes it very small - so if its performing badly then you're going seriously wrong somewhere.
Enable the query log to see exactly what queries are running, prioritize based on the product of frequency and duration. Have a look at the explain plans, create some indexes. Think about splitting the database across multiple disks.
HTH
C.