This is what I have been trying to achieve.
We are in process to let go a vendor tool called GO-Anywhere that reads data from an DB2 database after firing a select query creates a file writes data to it zips it and sftps it to a machine where our ETL tool can read it.
I have been able to achieve what GA does in almost the same time infact beating the above tools by 5 minutes on a 6.5GB file by using JSCH and jaring un-jaring on the fly. This brings down the time to read and write the file from earlier 32 minutes to now 27 minutes.
But to meet the new SLA requirements we need to further bring down the time to almost half of what I have that is something around 13 odd minutes
To achieve the above I have been able to read the .MBR file directly and push the same on to the Linux machine in 13 minutes or less but the format of this file is not clear text.
I would like to know how can one convert the .MBR file into plain text format using Java or using AS400 command without firing the SQL.
Any help appreciated.
You're under the mistaken impression that a "FILE" on the IBM i is like a file on Windows/Unix/Linux.
It's not.
Like every other object type in IBM i, it's an object with well defined interfaces.
In the particular case of a *FILE object, it's a database table. DB2 for i isn't an add-on DBMS installed on top the OS; DB2 for i is simply the name they gave to the DBMS integrated into the OS. A user program can't simply open storage space directly like you can do with files on Windows/Unix/Linux. You have to go through the interfaces provided by the OS.
There are two interfaces available, Record Level Access (RLA) or SQL. Both can be used from a Java application. RLA is provided by the com.ibm.as400.access.AS400File class. SQL access is provided by the JDBC classes.
SQL is likely to provide the best performance, since your dealing with a set of records instead of one at a time with RLA.
Take a look at the various performance related JDBC properties available..
From a performance standpoint, it's unlikely that your single process would fully utilize the system, ie. CPU usage won't be at 100% nor will disk activity be upwards of 60-80%.
That being the case, your best bet is to break the process into multiple ones. You'll need some way to limit each process to a selected set of records. Possibly segregation by primary key. That will add some overhead unless the records are in primary key order. If the table doesn't have deleted records, using RRN() to segregate by physical order may work. But be warned, on older versions of the OS, the use of RRN() required a full table scan.
Guessing at what is happening is that there are packed decimal fields in the source table which aren't getting unpacked by your home-grown method of reading the table.
There are several possibilities.
Have the IBM i team create a view over the source table which has all of the numeric columns zoned decimal. Additionally, omit columns that the ETL doesn't need - it will reduce the I/O by not having to move those bytes around. Perform the extract over that. Note: there may be such a view already on the system.
Have the IBM i team build appropriate indexes. Often, SQL bottlenecks can be alleviated with proper indexes.
Don't ZIP and UNZIP; send the raw file to the other system. Even at 6GB, gigabit Ethernet can easily deal with that.
Load an ODBC driver on the ETL system and have it directly read the source table (or the appropriate view) rather than send a copy to the ETL system.
Where did the SLA time limit come from? If the SLA said 'subsecond response time' what would you do? At some point, the SLA needs to reflect some version of reality as defined by the laws of physics. I'm not saying that you've reached that limit: I'm saying that you need to find the rationale for it.
Have the IBM i team make sure they are current on patches (PTFs). IBM often address performance issues via PTF.
Have the IBM i team make sure that the subsystem where your jobs are running has enough memory.
Related
I have a coupled main and a (temporary and auxiliary) database in a BerkeleyDB JE environment. The problem is as follows:
I am using transactions, and atomicity must span both the main and the aux DB
therefore, as I understand the docs, I have to use a single environment (transactions cannot span several environments)
So the two databases share one directory. But do they also share the same files?
Because the aux DB can grow very big, and I want to wipe it at the start of the application.
I have used e.truncateDatabase(txn, name, false) to do so
But it appears the database directory never shrinks, so if in every application run the aux DB uses e.g. 500 MB, then after four runs, the directory is already 2 GB, irrespective of the truncation. Also I cannot see distinct files for main and aux DB
How can I really wipe the aux database, so that the disk space is freed? This is also a performance problem, because with those several GB large directories, BDB has serious trouble starting up and winding down. Can I force BDB to use separate files, so I can just delete a particular file?
Somehow this single environment seems at the root of the problem. For example, I would love to increase performance by giving the aux DB setTxnNoSync() but then this will also affect the main DB.
If I use setTemporary on the aux DB, I get a runtime exception, apparently it is disallowed to use transactions with a temporary database!?
java.lang.IllegalArgumentException: Attempted to open Database aux and two ore more of the following exclusive properties are true: deferredWrite, temporary, transactional
I improved the situation a bit with the following setting:
envCfg.setConfigParam(EnvironmentConfig.CLEANER_MIN_FILE_UTILIZATION, "33")
and using removeDatabase instead of truncateDatabase upon application start. At least I don't seem to get infinite growth any longer.
I'm still interested in hearing whether I can make BDB use dedicated log files for either database.
I am currently writing a Java application that receives data from various sensors. How often this happens varies, but I believe that my application will receive signals about 100k times per day. I would like to log the data received from a sensor every time the application receives a signal. Because the application does much more than just log sensor data, performance is an issue. I am looking for the best and fastest way to log the data. Thus, I might not use a database, but rather write to a file and keep 1 file per day.
So what is faster? Use a database or log to files? No doubt there is also a lot of options to what logging software to use. Which is the best for my purpose if logging to file is the best option?
The data stored might be used later for analytical purposes, so please keep this in mind as well.
I would recommend you first of all to use log4j (or any other logging framework).
You can use a jdbc appender that writes into the db or any kind of file appender that writes into the file. The point is that your code will be generic enough to be changed later if you like...
In general files are much faster than db access, but there is a place for optimizations here.
If the performance is critical, you can use batching/asynchronous calls to the logging infrastructure.
A free database on a cheap PC should be able to record 10 records per second easily.
A tuned database on a good system or a logger on a cheap PC should be able to write 100 records/lines per second easily.
A tuned logger should be able to write 1000 lines per second easily.
A fast binary logger can perform 1 million records per second easily (depending on the size of the record)
Your requirement is about 1.2 records per second per signal which should be able to achieve any way you like. I assume you want to be able to query your data so you want it in a database eventually so I would put it there.
Ah the world of embedded systems. I had a similar problem when working with a hovercraft. I solved it with a separate computer(you can do this with a separate program) over the local area network that would just SIT and LISTEN as a server for logs I sent to it. The FileWriter program was written in C++. This must solve two problems of yours. First is the obvious performance gain while writing the logs. And secondly the Java program is FREED of writing any logs at all(but will act as a proxy) and can concentrate on performance critical tasks. Using a DB for this is going to be an overkill, except if you're using SQLite.
Good luck!
I have an application that requires the creation and download of a significantly large SQLite database. Depending on the user's data, creation of the db and the syncing of data from the server can take upwards of 20 to 25 minutes (some customers have a LOT of data). The data is downloaded as JSON and processed with Android's built in JSON classes.
To account for OutOfMemory issues I was having with some devices, I needed to limit the per-call download from the server to 500 records at a time. But, as of now, all of the above is working successfully - although slow.
Recently, there has been talk from my team of creating the complete SQLite db on the server side and then just downloading it to the device in binary in an effort to speed things up. I've never done this before. Is this indeed a viable option OR should I just be looking into speeding up the processing of the JSON through a 3rd party lib like GSON or Jackson.
Thanks in advance for your input.
From my experience with mobile devices, reinventing synchronization is an overkill most of the time. It obviously depends on the hardware, software and amounts of data you're working with. But most of the time long operation execution times on mobile devices are caused by faulty design, careless coding or specifics of embedded systems not taken into consideration.
Unfortunately, I can only give you some hints which you may consider, given pretty vague description of issues you're facing. I mean "LOT" doesn't mean much to me - I've seen mobile apps with DBs containing millions of records running pretty smoothly and ones that had around a 1K records running horribly slow and causing UI to freeze. You also didn't mentioned what OS version and device (or at least it's capabilities) you're using. What's the server configuration, what software is installed, what libraries/frameworks are used and in what modes. It all matters when you want to really speed things up.
Apart of encoding being gzip (which I believe you left default, which is on), you should give this ideas a try:
Streaming! - make sure both the client and the server use a streaming version of JSON API and use buffered streams. If either doesn't - replace it with a library that does. Jackson has one of the fastest streaming API. Sure it's more cumbersome to write a (de)serializer, but it pays off. When done properly, none of the sides must create a buffer large enough for (de)serialization of all the data, fill it with contents, and then parse/write it. Instead, a much smaller buffer is allocated and filled gradually as successive fields are serialized. When this buffer gets filled, it's contents is immediately sent to the other end of data channel. There it can be deserialized right away. The process continues until all data have been transmitted in small chunks. It makes the data interchange much more fluent and less resource-intensive.
For large batch inserts or updates use prepared statements. It also sometimes helps to insert your data without constraints and then create them - that way, for example, an index can be computed in one run instead of for each insert. Don't use transactions (they require maintaining extra database logs) or commit every 300 rows to minimize the overhead. If you're updating existing database and atomic modifications are necessary - load new data to a temporary database and, if everything is ok, replace old database with new one on the fly.
Almost always some data can be precomputed and stored on an sd-card for example. Or it can be loaded directly to an sd-card as a prepared SQLite DB in the company. If a task requires data that is so large that an import takes more than 10 minutes, you probably shouldn't do that task on mobile devices in the first place.
This is a little related to my previous question Solaris: Mounting a file system on an application's handlers except this question is for a different purpose and is simpler as there is no open/close/lock it is just a fixed length block of bytes with read/write operations.
Is there anyway I can create a virtual slice, kinda like a RAM disk or a SVM slice.. but I want the reads and writes to go through my app.
I am planning to use ZFS to take multiple of these virtual slices/disks and make them into one larger one for distributed backup storage with snapshots. I really like the compression and stacking that ZFS offers. If necessary I can guarantee that there is only one instance of ZFS accessing these virtual disks at a time (to prevent cache conflicts and such). If the one instance goes down, we can make sure it won't start back up and then we can start another instance of that ZFS.
I am planning to have those disks in chunks of about 4GB or so,, then I can move around each chunk and decide where to store them (multiple times mirrored of course) and then have ZFS access the chunks and put them together in to larger chunks for actual use. Also ZFS would permit adding of these small chunks if necessary to increase the size of the larger chunk.
I am aware there would be extra latency / network traffic if we used my own app in Java, but this is just for backup storage. The production storage is entirely different configuration that does not relate.
Edit: We have a system that uses all the space available and basically when there is not enough space it will remove old snapshots and increase the gaps between old snapshots. The purpose of my proposal is to allow the unused space from production equipment to be put to use at no extra cost. At different times different units of our production equipment will have free space. Also the system I am describing should eliminate any single point of failure when attempting to access data. I am hoping to not have to buy two large units and keep them synchronized. I would prefer just to have two access points and then we can mix large/small units in any way we want and move data around seamlessly.
This is a cross post because this is more software related than sysadmin related The original question is here: https://serverfault.com/questions/212072. it may be a good idea for the original to be closed
One way would be to write a Solaris device driver, precisely a block device one emulating a real disk but that will communicate back to your application instead.
Start with reading the Device Driver Tutorial, then have a look at OpenSolaris source code for real drivers code.
Alternatively, you might investigate modifying Solaris iSCSI target to be the interface with your application. Again, looking at OpenSolaris COMSTAR will be a good start.
It seems that any fixed length file on any file system will do for a block device for use with ZFS. Not sure how reboots work, but I am sure we can get write some boot up commands to work that out.
Edit: The fixed length file would be on a network file system such as NFS.
I have this LAMP application with about 900k rows in MySQL and I am having some performance issues.
Background - Apart from the LAMP stack , there's also a Java process (multi-threaded) that runs in its own JVM. So together with LAMP & java, they form the complete solution. The java process is responsible for inserts/updates and few selects as well. These inserts/updates are usually in bulk/batch, anywhere between 5-150 rows. The PHP front-end code only does SELECT's.
Issue - the PHP/SELECT queries become very slow when the java process is running. When the java process is stopped, SELECT's perform alright. I mean the performance difference is huge. When the java process is running, any action performed on the php front-end results in 80% and more CPU usage for mysqld process.
Any help would be appreciated.
MySQL is running with default parameters & settings.
Software stack -
Apache - 2.2.x
MySQL -5.1.37-1ubuntu5
PHP - 5.2.10
Java - 1.6.0_15
OS - Ubuntu 9.10 (karmic)
What engine are you using for MySQL? The thing to note here is if you're using MyISAM, then you're going to have locking issues due to the table locking that engine uses.
From: MySQL Table Locking
Table locking is also disadvantageous
under the following scenario:
* A session issues a SELECT that takes a long time to run.
* Another session then issues an UPDATE on the same table. This session
waits until the SELECT is finished.
* Another session issues another SELECT statement on the same table.
Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish,
after waiting for the first SELECT to finish.
I won't repeat them here, but the page has some tips on increasing concurrency on a table within MySQL. Obviously, one option would be to change to an engine like InnoDB which has a more complex row locking mechanism that for high concurrency tables can make a huge difference in performance. For more info on InnoDB go here.
Prior to changing the engine though it would probably be worth looking at the other tips like making sure your table is indexed properly, etc. as this will increase select and update performance regardless of the storage engine.
Edit based on user comment:
I would say it's one possible solution based on the symptoms you've described, but it may not be
the one that will get you where you want to be. It's impossible to say without more information.
You could be doing full table scans due to the lack of indexes. This could be causing I/O contention
on your disk, which just further exasterbates the table locks used by MyISAM. If this is the case then
the root of the cause is the improper indexing and rectifying that would be your best course of action
before changing storage engines.
Also, make sure your tables are normalized. This can have profound implications on performance
especially on updates. Normalized tables can allow you to update a single row instead of hundreds or
thousands in an un-normalized table. This is due to unduplicated values. It can also save huge amounts
of I/O on selects as the db can more efficiently cache data blocks. Without knowing the structure of
the tables you're working with or the indexes you have present it's difficult to provide you with a
more detailed response.
Edit after user attempted using InnoDB:
You mentioned that your Java process is multi-threaded. Have you tried running the process with a single thread? I'm wondering if maybe it's possibly you're sending the same rows to update out to multiple threads and/or the way you're updating across threads is causing locking issues.
Outside of that, I would check the following:
Have you checked your explain plans to verify you have reasonable costs and that the query is actually using the indexes you have?
Are your tables normalized? More specifically, are you updating 100 rows when you could update a single record if the tables were normalized?
Is it possible that you're running out of physical memory when the Java process is running and the machine is busy swapping stuff in and out?
Are you flooding your disk (a single disk?) with more IOPs than it can reasonably handle?
We'd need to know a lot more about the system to say if thats normal or how to solve the problem.
with about 900k rows in MySQL
I would say that makes it very small - so if its performing badly then you're going seriously wrong somewhere.
Enable the query log to see exactly what queries are running, prioritize based on the product of frequency and duration. Have a look at the explain plans, create some indexes. Think about splitting the database across multiple disks.
HTH
C.