whats the best way to compress a date? - java

my site needs to store the ip and timestamp of every visit on mysql. i am concerned that very quickly i will have 1e6 rows in my database.
what is the best way to compress a date on mysql or java? does mysql already compress dates? ideally, I would like to un-compress the date values rather quickly to generate reports.
Update: sorry i meant a mil per day. but I guess that is still minuscule.

Mate, one million rows is a tiny database. I wouldn't be worrying too much about that. In any case, MySQL uses a pretty compressed format (3 bytes) anyway as per this page:
DATE: A three-byte integer packed as DD + MM×32 + YYYY×16×32
In other words, at bit level (based on the 1000-01-01 thru 9999-12-31 range):
00000yyy yyyyyyym mmmddddd

Use the built in MySQL datetime type. A million rows isn't that many.

A mysql timestamp would be only 4 bytes. An integer representing the timestamp would be the same. It would be efficient to save it as a mysql type since you'd be able to index and/or query based on that column efficiently.
Any "compressed" form not a mysql type would be inefficient to query.

Related

Converting a Java date stored in SQL Server to a Readable Date/Time

We are using a third-party Java application that is storing some date/time info in an SQL Server database. In order for the application to be database agnostic, it stores this info in a VARCHAR(15) column. I need to be able to access this in a readable format and from what I have read from the site and the Java docs I have come up with the following:
declare #javadate varchar(15);
select #javadate = '001439586176038' -- Sample date from the DB
select dateadd(mi, datediff(mi, sysutcdatetime(), sysdatetime()), dateadd(s, cast(cast(#javadate as bigint)/1000 as integer), '1/1/1970 00:00:00'));
In the above example, I am just using a fixed example in #javadate (I would actually turn the last statement in to a function) but I want to know if this is an accurate way to get the value I want and if there is a better way to do this.
**EDIT: I should mention that the date/time is stored in UTC (thus the need for the datediff).

How to create a timeline in phpmyadmin using nanoseconds?

I am currently trying to create a timeline in phpmyadmin from a mysql database, using java and jdbc. In the database I store IPs that I get from a pcap file, and I get the time I want from the packet (using packet.getCaptureHeader().nanos()). Every time an IP occurs I increment a counter. What I want is to create a timeline showing the progress of the sum of the counter of each IP. I tried something but I think I am in the wrong way. Any suggestions?
long timer=packet.getCaptureHeader().nanos();
Class.forName("com.mysql.jdbc.Driver");
connect = DriverManager.getConnection("jdbc:mysql://localhost/thesis?"
+ "user=sqluser&password=sqluserpw");
preparedStatement = connect.prepareStatement("INSERT INTO thesis.ICMP values (?, ?, ?, ?) ON DUPLICATE KEY UPDATE counter=counter+1;");
preparedStatement.setString(1, xIP);
preparedStatement.setString(2, "ICMP");
preparedStatement.setInt(3, 1);
preparedStatement.setLong(4, timer);
preparedStatement.executeUpdate();
I noticed that if I use DATE type variable, I can create easy timelines, but DATE doesn't support that kind of accuraccy. Please feel free to think out of the box, even suggest a new approach, I won't mind.
MySQL doesn't support nanosecond resolution. Since MySQL 5.6.4, there is support for fractional seconds with microsecond precision, but for further precision (or if you have an old MySQL version), you'll have to come up with something on your own.
Probably what I'd do in this particular case is store it as a date up to second resolution, then store the fractional portion of the second, converted to nanoseconds, as an unsigned INT (a signed integer would work, but you'd never want a negative value anyway). The downside is that searching and sorting becomes more difficult. This person is discussing storing the nanoseconds as a decimal, which I don't understand, but has some good thoughts on the issue aside from that. Another possibility is to use a BIGINT to store the number of nanoseconds since an epoch, but that only gives about 500 years of possible values. For your use, that may be fine, but is a limitation to keep in mind.

Is it Java best practice to store dates as longs in your database?

My reason for doing so is that dates stored as date objects in whatever database tend to be written in a specific format, which may greatly differ from what you need to present to the user on the front-end. I also think it's especially helpful if your application is pulling info from different types of data stores. A good example would be the difference between a MongoDB and SQL date object.
However, I don't know whether this is recommended practice. Should I keep storing dates as longs (time in milliseconds) or as date objects?
I can't speak for it in relation to MongoDB, but in SQL database, no, it's not best practice. That doesn't mean there might not be the occasional use case, but "best practice," no.
Store them as dates, retrieve them as dates. Your best bet is to set up your database to store them as UTC (loosely, "GMT") so that the data is portable and you can use different local times as appropriate (for instance, if the database is used by geographically diverse users), and handle any conversions from UTC to local time in the application layer (e.g., via Calendar or a third-party date library).
Storing dates as numbers means your database is hard to report against, run ad-hoc queries against, etc. I made that mistake once, it's not one I'll repeat without a really good reason. :-)
It very much depends on:
What database you're using and its date/time support
Your client needs (e.g. how happy are you to bank on the idea that you'll always be using Java)
What information you're really trying to represent
Your diagnostic tools
The third point is probably the most important. Think about what the values you're trying to store really mean. Even though you're clearly not using Noda Time, hopefully my user guide page on choosing which Noda Time type to use based on your input data may help you think about this clearly.
If you're only ever using Java, and your database doesn't have terribly good support for date/time types, and you're only trying to represent an "instant in time" (rather than, say, an instant in a particular time zone, or a local date/time with an offset, or just a local date/time, or just a date...), and you're comfortable writing diagnostic tools to convert your data into more human readable forms - then storing a long is reasonable. But that's a pretty long list of "if"s.
If you want to be able to perform date manipulation in the database - e.g. asking for all values which occur on the first day of the month - then you should probably use a date/time type, being careful around time zones. (My experience is that most databases are at least shocking badly documented when it comes to their date/time types.)
In general, you should use whatever type is able to meet all your requirement and is the most natural representation for that particular environment. So in a database which has a date/time type which doesn't give you issues when you interact with it (e.g. performing arbitrary time zone conversions in an unrequested way), use that type. It will make all kinds of things easier.
The advantage of using a more "primitive" representation (e.g. a 64 bit integer) is precisely that the database won't mess around with it. You're effectively hiding the meaning of the data from the databae, with all the normal pros and cons (mostly cons) of that approach.
It depends on various aspects. When using the standard "seconds since epoch", and someone uses only integer precision, their dates are limited to the 1970-2038 year range.
But there also is some precision issue. For example, unix time ignores leap seconds. Every day is defined to have the same number of seconds. So when computing time deltas between unix time, you do get some error.
But the more important thing is that you assume all your dates to be completely known, as your representation does not have the possibility to half only half-specified dates. In reality, there is a lot of events you do not know at a second (or even ms) precision. So it is a feature if a representation allows specifing e.g. only a day precision. Ideally, you would store dates with their precision information.
Furthermore, say you are building a calendar application. There is time, but there also is local time. Quite often, you need both information available. When scheduling overlaps, you of course can do this best in a synchronized time, so longs will be good here. If you however do also want to ensure you are not scheduling events outside of 9-20 h local time, you also always need to preserve timezone information. For anything that does span more than one location, you really need to include the time zone in your date representation. Assuming that you can just convert all dates you see to your current local time is quite naive.
Note that dates in SQL can lead to odd situations. One of my favorites is the following MySQL absurdity:
SELECT * FROM Dates WHERE date IS NULL AND date IS NOT NULL;
may return records that have the date 0000-00-00 00:00:00, although this violates the popular understanding of logic.
Since this question is tagged with MongoDB: MongoDB does not store dates in String or what not, they actually store it as a long ( http://www.mongodb.org/display/DOCS/Dates ):
A BSON Date value stores the number of milliseconds since the Unix epoch (Jan 1, 1970) as a 64-bit integer. v2.0+ : this number is signed so dates before 1970 are stored as a negative numbers.
Since MongoDB has no immediate plans to utilise the complex date handling functions (like getting only year for querying etc) that SQL has within standard querying there is no real downside, it might infact reduce the size of your indexes and storage.
There is one thing to take into consideration here, the aggregation framework: http://docs.mongodb.org/manual/reference/aggregation/#date-operators there are weird and wonderful things you can only with the supported BSON date type in MongoDB, however, as to whether this matters to you depends upon your queries.
Do you see yourself as needing the aggregation frameworks functions? Or would housing the extra object overhead be a pain?
My personal opinion is that the BSON date type is such a small object that to store a document without it would be determental to the entire system and its future compatibility for no apparent reason. So, yes, I would use the BSON date type rather than a long and I would consider it good practice to do so.
I dont think its a best practice to store dates as long because, that would mean that you would not be able to do any of the date specific queries. like :
where date between
We also wont be able to get the date month of year from the table using sql queries easily.
It is better to use a single date format converter in the java layer and convert the date into that and use a single format throughout the application.
IMHO , storing dates in DB will be best if you can use Strings. Hence avoid unnecessary data going up and down to server , if you don't need all the fields in Calendar.
There is a lot of data is in Calendar and each instance of Calender is pretty heavy too.
So store it as String , with only required data and convert it back to Calendar , whenvever you need them and use them.

Performance hit when persisting hundreds of millions of SQL Timestamp objects

During execution of a program that relies on the oracle.sql package there is a large performance hit for persisting > 200 million Timestamps when compared to persisting the same number of longs.
Basic Schema
Java to persist:
Collection<ARRAY> longs = new ArrayList<ARRAY>(SIZE);
Collection<ARRAY> timeStamps = new ArrayList<ARRAY>(SIZE);
for(int i = 0; i < SIZE;i++)
{
longs.add(new ARRAY(description, connection, i));
timeStamps.add(new ARRAY(description,connection,new Timestamp(new Long(i)));
}
Statement timeStatement = conn.createStatement();
statement.setObject(1,timeStamps);
statement.execute(); //5 minutes
Statement longStatement = conn.createStatement();
statement.setObject(1,longs);
statement.execute(); //1 minutes 15 seconds
My question is what does Oracle do to Timestamps that make them so awful to insert in a bulk manner?
Configuration:
64 bit RHEL 5
jre 6u16
ojdbc14.jar
64 GB dedicated to the JVM
UPDATE
java.sql.Timestamp is being used
Number takes 4 bytes, Timestamp takes 11 bytes. In addition, Timestamp has metadata associated with it. For each Timestamp, Oracle seems to compute the metadata and store with the field.
Oracle timestamps are not stored as absolute value since epoc like a java.sql.Timestamp internally holds. It's a big bitmask containing values for the various "human" fields, centuries, months, etc.
So each one of your nanosecond-since-epoch timestamps is getting parsed into a "human" date before storage.
Adding to Srini's post, for documentation on memory use by data type:
Oracle Doc on Data Types: http://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#autoId31 (includes memory size for Number and Timestamp)
The docs state that Number takes 5-22 bytes, Timestamp takes 11 bytes, Integer takes 4 bytes.
Also - to your point on querying against a date range - could you insert the dates as long values instead of timestamps and then use a stored procedure to convert when you are querying the data? This will obviously impact the speed of the queries, so it could be kicking the problem down the road, but.... :)

Number of days since registration

I code a little Console program and now I store the date they joined in a database like.
CreateDate
2011-04-15 17:52:57
Now I want to do a check like this: a function that gets how many days the guy have been registered.
if(player.getDaysSinceRegistration) {
Thanks for any help.
joda-time has an easy way to do this:
Days.daysBetween(new DateTime(registeredDate), new DateTime()).getDays();
Without 3rd party libraries:
(System.currentTimeMillis() - registeredDate.getTime()) / MILLIS_PER_DAY;
First of all, don't store dates in databases in textual form.
You should save them as database dates (a type like numbers or varchars), as this will allow you to both ask the database do date calculations as well - the exact way to do so is database specific - as have it automatically pulled up in a Java date object by the JDBC driver which is much easier to work with than strings. See Bozho's answer for suggestions to do this in the Java layer.

Categories