Performance hit when persisting hundreds of millions of SQL Timestamp objects - java

During execution of a program that relies on the oracle.sql package there is a large performance hit for persisting > 200 million Timestamps when compared to persisting the same number of longs.
Basic Schema
Java to persist:
Collection<ARRAY> longs = new ArrayList<ARRAY>(SIZE);
Collection<ARRAY> timeStamps = new ArrayList<ARRAY>(SIZE);
for(int i = 0; i < SIZE;i++)
{
longs.add(new ARRAY(description, connection, i));
timeStamps.add(new ARRAY(description,connection,new Timestamp(new Long(i)));
}
Statement timeStatement = conn.createStatement();
statement.setObject(1,timeStamps);
statement.execute(); //5 minutes
Statement longStatement = conn.createStatement();
statement.setObject(1,longs);
statement.execute(); //1 minutes 15 seconds
My question is what does Oracle do to Timestamps that make them so awful to insert in a bulk manner?
Configuration:
64 bit RHEL 5
jre 6u16
ojdbc14.jar
64 GB dedicated to the JVM
UPDATE
java.sql.Timestamp is being used

Number takes 4 bytes, Timestamp takes 11 bytes. In addition, Timestamp has metadata associated with it. For each Timestamp, Oracle seems to compute the metadata and store with the field.

Oracle timestamps are not stored as absolute value since epoc like a java.sql.Timestamp internally holds. It's a big bitmask containing values for the various "human" fields, centuries, months, etc.
So each one of your nanosecond-since-epoch timestamps is getting parsed into a "human" date before storage.

Adding to Srini's post, for documentation on memory use by data type:
Oracle Doc on Data Types: http://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#autoId31 (includes memory size for Number and Timestamp)
The docs state that Number takes 5-22 bytes, Timestamp takes 11 bytes, Integer takes 4 bytes.
Also - to your point on querying against a date range - could you insert the dates as long values instead of timestamps and then use a stored procedure to convert when you are querying the data? This will obviously impact the speed of the queries, so it could be kicking the problem down the road, but.... :)

Related

Conversion of Clob to String slow

I have an Oracle table that contains a number of clob fields. The clob fields themselves are plain text CSV files that are about 4kb-8kb in size. There are about 200 records in my table. I have a Sql query written that I run in JdbcTemplate that runs pretty quickly in JdbcTemplate (about 2 seconds).
Unfortunately, when I try to extract the clob field into a string for all 200 records, the execution time goes from 2 seconds to over 20 seconds, which is far too slow.
I am using the following line to convert the clob to a string.
String clobValue = clob.getSubString(1, (int) clob.length());
It seems to be this conversion that is killing performance for me. Are there any alternatives open to me on the Oracle or Java side to speed this up?
I have tried some other suggestions from here, but performance doesn't improve - Most efficient solution for reading CLOB to String, and String to CLOB in Java?
Surely there is a different, or more efficient way to do this?

Best (fastest and memory saving) Java collection / data structure to concurrently insert and delete items

I intend to write a Spring Boot application in Java 8 which shall allow to insert (float) values over a large number of TCP connections. These values shall be kept for a defined time period and deleted afterwards. It may be that more than 250,000 values can be inserted (concurrently) in a short time period.
I am looking for the best Java collection to store float values in terms of:
fast insertion coming in over a lot of TCP connections
fast deletion after a defined time (maybe using Timer class?)
I don't need sortation or so, I only have to sum up the values of the values in the collection (frequently requested) and delete all the values older than x minutes / hours / days / ... (whatever). I think about deleting the items/elements using Timer/TimerTask!?!
Because I have to compare the timestamp of inputting the value into the collection I think I cannot only store the (float) value but also a Timestamp (Instant?) value, hence, I have to use a Bean (Pojo) maybe with attributes
private float value;
private Instant time; (Instant.now();)

How to create a timeline in phpmyadmin using nanoseconds?

I am currently trying to create a timeline in phpmyadmin from a mysql database, using java and jdbc. In the database I store IPs that I get from a pcap file, and I get the time I want from the packet (using packet.getCaptureHeader().nanos()). Every time an IP occurs I increment a counter. What I want is to create a timeline showing the progress of the sum of the counter of each IP. I tried something but I think I am in the wrong way. Any suggestions?
long timer=packet.getCaptureHeader().nanos();
Class.forName("com.mysql.jdbc.Driver");
connect = DriverManager.getConnection("jdbc:mysql://localhost/thesis?"
+ "user=sqluser&password=sqluserpw");
preparedStatement = connect.prepareStatement("INSERT INTO thesis.ICMP values (?, ?, ?, ?) ON DUPLICATE KEY UPDATE counter=counter+1;");
preparedStatement.setString(1, xIP);
preparedStatement.setString(2, "ICMP");
preparedStatement.setInt(3, 1);
preparedStatement.setLong(4, timer);
preparedStatement.executeUpdate();
I noticed that if I use DATE type variable, I can create easy timelines, but DATE doesn't support that kind of accuraccy. Please feel free to think out of the box, even suggest a new approach, I won't mind.
MySQL doesn't support nanosecond resolution. Since MySQL 5.6.4, there is support for fractional seconds with microsecond precision, but for further precision (or if you have an old MySQL version), you'll have to come up with something on your own.
Probably what I'd do in this particular case is store it as a date up to second resolution, then store the fractional portion of the second, converted to nanoseconds, as an unsigned INT (a signed integer would work, but you'd never want a negative value anyway). The downside is that searching and sorting becomes more difficult. This person is discussing storing the nanoseconds as a decimal, which I don't understand, but has some good thoughts on the issue aside from that. Another possibility is to use a BIGINT to store the number of nanoseconds since an epoch, but that only gives about 500 years of possible values. For your use, that may be fine, but is a limitation to keep in mind.

whats the best way to compress a date?

my site needs to store the ip and timestamp of every visit on mysql. i am concerned that very quickly i will have 1e6 rows in my database.
what is the best way to compress a date on mysql or java? does mysql already compress dates? ideally, I would like to un-compress the date values rather quickly to generate reports.
Update: sorry i meant a mil per day. but I guess that is still minuscule.
Mate, one million rows is a tiny database. I wouldn't be worrying too much about that. In any case, MySQL uses a pretty compressed format (3 bytes) anyway as per this page:
DATE: A three-byte integer packed as DD + MM×32 + YYYY×16×32
In other words, at bit level (based on the 1000-01-01 thru 9999-12-31 range):
00000yyy yyyyyyym mmmddddd
Use the built in MySQL datetime type. A million rows isn't that many.
A mysql timestamp would be only 4 bytes. An integer representing the timestamp would be the same. It would be efficient to save it as a mysql type since you'd be able to index and/or query based on that column efficiently.
Any "compressed" form not a mysql type would be inefficient to query.

Service usage limiter implementation

I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance
You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.
I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)
It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.

Categories