I am using Hibernate and Criteria API to write my database quires. What I need to do is I need two dates difference in days and compare that days with with specific number.
E.g. Commonly written criteria restriction as:
criteria.add(Restrictions.eq("somProperty", someValue));
What I needs is:
criteria.add(Restrictions.ge("dateProperty1 - dateProperty2", 15));
Means date difference between two dates is greater than or equal 15 days.
I don't see how to achieve this. And yes I did lots of Google to find out the possible solution but didn't get proper material what I need.
If you check the documentation of the Restrictions class, you will see that:
most functions operate on a "property vs value" pair (just like in your first example)
the remaining comparator functions operate on "property vs other property" pair
But no customized function is available for your case. So what option you have is using the sqlRestriction, which can be used to express this condition in a native form of your DBMS. That would be a different, but much easier problem, altough clearly not as elegant as your original idea.
Related
This sounds pretty simple, but i just cant wrap my head around it.
I have two fields in my DB. availableFromDate and availableToDate.
The user is doing a search to find all entities available within a given range. So my action receives two dates:
searchFromDate and searchToDate.
All i need to do is to return all entities that is available the whole period specified by the user.
Preferably using Criterias.
Anyone?
If I understand you question correctly - That is, that the event availability should lie completely within the search date range - then it should boil down to this:
If you ensure that "availableFromDate.before(availableToDate)" and "searchFromDate.before(searchToDate)" - which you should do anyway in this case - Then both your availableFromDate and your availableToDate have to be in the search date range.
You can use the solution from this answer to make sure both dates are within the range: How to compare dates in hibernate
criteria.add(Restrictions.between("availableFromDate", searchFromDate, searchToDate));
criteria.add(Restrictions.between("availableToDate", searchFromDate, searchToDate));
My reason for doing so is that dates stored as date objects in whatever database tend to be written in a specific format, which may greatly differ from what you need to present to the user on the front-end. I also think it's especially helpful if your application is pulling info from different types of data stores. A good example would be the difference between a MongoDB and SQL date object.
However, I don't know whether this is recommended practice. Should I keep storing dates as longs (time in milliseconds) or as date objects?
I can't speak for it in relation to MongoDB, but in SQL database, no, it's not best practice. That doesn't mean there might not be the occasional use case, but "best practice," no.
Store them as dates, retrieve them as dates. Your best bet is to set up your database to store them as UTC (loosely, "GMT") so that the data is portable and you can use different local times as appropriate (for instance, if the database is used by geographically diverse users), and handle any conversions from UTC to local time in the application layer (e.g., via Calendar or a third-party date library).
Storing dates as numbers means your database is hard to report against, run ad-hoc queries against, etc. I made that mistake once, it's not one I'll repeat without a really good reason. :-)
It very much depends on:
What database you're using and its date/time support
Your client needs (e.g. how happy are you to bank on the idea that you'll always be using Java)
What information you're really trying to represent
Your diagnostic tools
The third point is probably the most important. Think about what the values you're trying to store really mean. Even though you're clearly not using Noda Time, hopefully my user guide page on choosing which Noda Time type to use based on your input data may help you think about this clearly.
If you're only ever using Java, and your database doesn't have terribly good support for date/time types, and you're only trying to represent an "instant in time" (rather than, say, an instant in a particular time zone, or a local date/time with an offset, or just a local date/time, or just a date...), and you're comfortable writing diagnostic tools to convert your data into more human readable forms - then storing a long is reasonable. But that's a pretty long list of "if"s.
If you want to be able to perform date manipulation in the database - e.g. asking for all values which occur on the first day of the month - then you should probably use a date/time type, being careful around time zones. (My experience is that most databases are at least shocking badly documented when it comes to their date/time types.)
In general, you should use whatever type is able to meet all your requirement and is the most natural representation for that particular environment. So in a database which has a date/time type which doesn't give you issues when you interact with it (e.g. performing arbitrary time zone conversions in an unrequested way), use that type. It will make all kinds of things easier.
The advantage of using a more "primitive" representation (e.g. a 64 bit integer) is precisely that the database won't mess around with it. You're effectively hiding the meaning of the data from the databae, with all the normal pros and cons (mostly cons) of that approach.
It depends on various aspects. When using the standard "seconds since epoch", and someone uses only integer precision, their dates are limited to the 1970-2038 year range.
But there also is some precision issue. For example, unix time ignores leap seconds. Every day is defined to have the same number of seconds. So when computing time deltas between unix time, you do get some error.
But the more important thing is that you assume all your dates to be completely known, as your representation does not have the possibility to half only half-specified dates. In reality, there is a lot of events you do not know at a second (or even ms) precision. So it is a feature if a representation allows specifing e.g. only a day precision. Ideally, you would store dates with their precision information.
Furthermore, say you are building a calendar application. There is time, but there also is local time. Quite often, you need both information available. When scheduling overlaps, you of course can do this best in a synchronized time, so longs will be good here. If you however do also want to ensure you are not scheduling events outside of 9-20 h local time, you also always need to preserve timezone information. For anything that does span more than one location, you really need to include the time zone in your date representation. Assuming that you can just convert all dates you see to your current local time is quite naive.
Note that dates in SQL can lead to odd situations. One of my favorites is the following MySQL absurdity:
SELECT * FROM Dates WHERE date IS NULL AND date IS NOT NULL;
may return records that have the date 0000-00-00 00:00:00, although this violates the popular understanding of logic.
Since this question is tagged with MongoDB: MongoDB does not store dates in String or what not, they actually store it as a long ( http://www.mongodb.org/display/DOCS/Dates ):
A BSON Date value stores the number of milliseconds since the Unix epoch (Jan 1, 1970) as a 64-bit integer. v2.0+ : this number is signed so dates before 1970 are stored as a negative numbers.
Since MongoDB has no immediate plans to utilise the complex date handling functions (like getting only year for querying etc) that SQL has within standard querying there is no real downside, it might infact reduce the size of your indexes and storage.
There is one thing to take into consideration here, the aggregation framework: http://docs.mongodb.org/manual/reference/aggregation/#date-operators there are weird and wonderful things you can only with the supported BSON date type in MongoDB, however, as to whether this matters to you depends upon your queries.
Do you see yourself as needing the aggregation frameworks functions? Or would housing the extra object overhead be a pain?
My personal opinion is that the BSON date type is such a small object that to store a document without it would be determental to the entire system and its future compatibility for no apparent reason. So, yes, I would use the BSON date type rather than a long and I would consider it good practice to do so.
I dont think its a best practice to store dates as long because, that would mean that you would not be able to do any of the date specific queries. like :
where date between
We also wont be able to get the date month of year from the table using sql queries easily.
It is better to use a single date format converter in the java layer and convert the date into that and use a single format throughout the application.
IMHO , storing dates in DB will be best if you can use Strings. Hence avoid unnecessary data going up and down to server , if you don't need all the fields in Calendar.
There is a lot of data is in Calendar and each instance of Calender is pretty heavy too.
So store it as String , with only required data and convert it back to Calendar , whenvever you need them and use them.
I am using the CriteriaBuilder and CriteriaQuery to build my query to the database, but I have encountered an issue that I do not know how to solve, since I am very new to this whole ordeal called JPA.
In Java, I have a property called timestamp for a class called Report, and it is set to the same corresponding #TemporalType.
I also have a class called Affiliate which has a list of Report objects.
In my query, I want to fetch all the Affiliate objects that do not have a Report in the last Affiliate.maxSilenceMinutes.
My questions:
Are there any ways in standardized JPA to modify dates? Like a CriteriaBuilder.subtractMilliseconds(Expression<Timestamp>, Long) of sorts?
If not, is there a way to cast Expression<Timestamp> to Expression<Long> so that I can subtract on a currentTimestamp literal to get the minimum value for a CriteriaBuilder.lessThanOrEqualTo(greatestReportTimestampMs, minimumAllowedMs)?
I know this might feel like a confusing question, but the main part is simply: Is it possible to go Expression<Timestamp> to Expression<Long>? It throws an exception for me if I try to use the .as(Long.class) method, but which should be the default underlying data type in most DBs anyway?
Hope you guys can help, since I feel kind of stuck :)
If you know the value you want to subtract at the time of querying,
you can subtract beforehand:
Calendar c = new Calendar();
c.setTime(timestamp.getTimestamp());
c.add(DAY, - someNumberOfDays); //or whatever unit you want
Date d = c.getTime();
If not, you probably need to call a database function to do the subtraction, via
CriteriaBuilder.function()
CriteriaBuilder.lessThanOrEqual() works on Comparables. Timestamps are comparable. So you could construct a Timestamp via new Timestamp(long ms)
and compare it with the other expression.
I hope this helps.
This is not built into Hibernate, so you will need a custom function of some kind.
The JDBC standard includes a function escape {fn TIMESTAMPADD( SQL_TSI_SECOND, secs, timestamp)} which should be translated into the correct SQL for the target database, but not all JDBC implementations provide it. There is therefore a chance you can add a custom StandardJDBCEscapeFunction to Hibernate's Dialect to get the result you need.
If you don't have that available, you'll have to find out what the correct database specific implementation is and there is a lot of variability here. For example:
Oracle: (timestamp + secs/86400)
SQLServer: DATEADD(ss,secs,timestamp)
DB2: (timestamp + secs SECONDS)
MySQL: DATE_ADD(timestamp, INTERVAL secs SECONDS)
Once you know it, you can use the correct expression as an SQL criteria.
The fact that date-time manipulation is not standardised in the Dialect and not fully implemented in many JDBCs means that what you are trying to do will be very difficult to write in a database neutral way.
what is the best solution in terms of performance and "readability/good coding style" to represent a (Java) Enumeration (fixed set of constants) on the DB layer in regard to an integer (or any number datatype in general) vs a string representation.
Caveat: There are some database systems that support "Enums" directly but this would require to keept the Database Enum-Definition in sync with the Business-Layer-implementation. Furthermore this kind of datatype might not be available on all Database systems and as well might differ in the syntax => I am looking for an easy solution that is easy to mange and available on all database systems. (So my question only adresses the Number vs String representation.)
The Number representation of a constants seems to me very efficient to store (for example consumes only two bytes as integer) and is most likely very fast in terms of indexing, but hard to read ("0" vs. "1" etc)..
The String representation is more readable (storing "enabled" and "disabled" compared to a "0" and "1" ), but consumes much mor storage space and is most likely also slower in regard to indexing.
My questions is, did I miss some important aspects? What would you suggest to use for an enum representation on the Database layer.
Thank you very much!
In most cases, I prefer to use a short alphanumeric code, and then have a lookup table with the expanded text. When necessary I build the enum table in the program dynamically from the database table.
For example, suppose we have a field that is supposed to contain, say, transaction type, and the possible values are Sale, Return, Service, and Layaway. I'd create a transaction type table with code and description, make the codes maybe "SA", "RE", "SV", and "LY", and use the code field as the primary key. Then in each transaction record I'd post that code. This takes less space than an integer key in the record itself and in the index. Exactly how it is processed depends on the database engine but it shouldn't be dramatically less efficient than an integer key. And because it's mnemonic it's very easy to use. You can dump a record and easily see what the values are and likely remember which is which. You can display the codes without translation in user output and the users can make sense of them. Indeed, this can give you a performance gain over integer keys: In many cases the abbreviation is good for the users -- they often want abbreviations to keep displays compact and avoid scrolling -- so you don't need to join on the transaction table to get a translation.
I would definitely NOT store a long text value in every record. Like in this example, I would not want to dispense with the transaction table and store "Layaway". Not only is this inefficient, but it is quite possible that someday the users will say that they want it changed to "Layaway sale", or even some subtle difference like "Lay-away". Then you not only have to update every record in the database, but you have to search through the program for every place this text occurs and change it. Also, the longer the text, the more likely that somewhere along the line a programmer will mis-spell it and create obscure bugs.
Also, having a transaction type table provides a convenient place to store additional information about the transaction type. Never ever ever write code that says "if whatevercode='A' or whatevercode='C' or whatevercode='X' then ..." Whatever it is that makes those three codes somehow different from all other codes, put a field for it in the transaction table and test that field. If you say, "Well, those are all the tax-related codes" or whatever, then fine, create a field called "tax_related" and set it to true or false for each code value as appropriate. Otherwise when someone creates a new transaction type, they have to look through all those if/or lists and figure out which ones this type should be added to and which it shouldn't. I've read plenty of baffling programs where I had to figure out why some logic applied to these three code values but not others, and when you think a fourth value ought to be included in the list, it's very hard to tell whether it is missing because it is really different in some way, or if the programmer made a mistake.
The only type I don't create the translation table is when the list is very short, there is no additional data to keep, and it is clear from the nature of the universe that it is unlikely to ever change so the values can be safely hard-coded. Like true/false or positive/negative/zero or male/female. (And hey, even that last one, obvious as it seems, there are people insisting we now include "transgendered" and the like.)
Some people dogmatically insist that every table have an auto-generated sequential integer key. Such keys are an excellent choice in many cases, but for code lists, I prefer the short alpha key for the reasons stated above.
I would store the string representation, as this is easy to correlate back to the enum and much more stable. Using ordinal() would be bad because it can change if you add a new enum to the middle of the series, so you would have to implement your own numbering system.
In terms of performance, it all depends on what the enums would be used for, but it is most likely a premature optimization to develop a whole separate representation with conversion rather than just use the natural String representation.
I have a database of companies. My application receives data that references a company by name, but the name may not exactly match the value in the database. I need to match the incoming data to the company it refers to.
For instance, my database might contain a company with name "A. B. Widgets & Co Ltd." while my incoming data might reference "AB Widgets Limited", "A.B. Widgets and Co", or "A B Widgets".
Some words in the company name (A B Widgets) are more important for matching than others (Co, Ltd, Inc, etc). It's important to avoid false matches.
The number of companies is small enough that I can maintain a map of their names in memory, ie. I have the option of using Java rather than SQL to find the right name.
How would you do this in Java?
You could standardize the formats as much as possible in your DB/map & input (i.e. convert to upper/lowercase), then use the Levenshtein (edit) distance metric from dynamic programming to score the input against all your known names.
You could then have the user confirm the match & if they don't like it, give them the option to enter that value into your list of known names (on second thought--that might be too much power to give a user...)
Although this thread is a bit old, I recently did an investigation on the efficiency of string distance metrics for name matching and came across this library:
https://code.google.com/p/java-similarities/
If you don't want to spend ages on implementing string distance algorithms, I recommend to give it a try as the first step, there's a ~20 different algorithms already implemented (incl. Levenshtein, Jaro-Winkler, Monge-Elkan algorithms etc.) and its code is structured well enough that you don't have to understand the whole logic in-depth, but you can start using it in minutes.
(BTW, I'm not the author of the library, so kudos for its creators.)
You can use an LCS algorithm to score them.
I do this in my photo album to make it easy to email in photos and get them to fall into security categories properly.
LCS code
Example usage (guessing a category based on what people entered)
I'd do LCS ignoring spaces, punctuation, case, and variations on "co", "llc", "ltd", and so forth.
Have a look at Lucene. It's an open source full text search Java library with 'near match' capabilities.
Your database may suport the use of Regular Expressions (regex) - see below for some tutorials in Java - here's the link to the MySQL documentation (as an example):
http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
You would probably want to store in the database a fairly complex regular express statement for each company that encompassed the variations in spelling that you might anticipate - or the sub-elements of the company name that you would like to weight as being significant.
You can also use the regex library in Java
JDK 1.4.2
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
JDK 1.5.0
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html
Using Regular Expressions in Java
http://www.regular-expressions.info/java.html
The Java Regex API Explained
http://www.sitepoint.com/article/java-regex-api-explained/
You might also want to see if your database supports Soundex capabilities (for example, see the following link to MySQL)
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
vote up 1 vote down
You can use an LCS algorithm to score them.
I do this in my photo album to make it easy to email in photos and get them to fall into security categories properly.
* LCS code
* Example usage (guessing a category based on what people entered)
to be more precise, better than Least Common Subsequence, Least Common Substring should be more precise as the order of characters is important.
You could use Lucene to index your database, then query the Lucene index. There are a number of search engines built on top of Lucene, including Solr.