In the company that I work for we have this major discussion on whether it should be better to use wrapping classes for primitives (java.lang.Integer, java.lang.Long) or whether to use the primitive types directly in the POJOs that map Entities to Tables in Hibernate.
The idea is that we want these values to not be null in the database.
The arguments in favor of using primitives:
Handling these values as int means that they can never be null, in
this way making it impossible to inadvertently get a null reference
on the field.
int=32/64 bits of memory. Integer = 16 bytes of memory
and is also slower
The arguments in favor of using wrapper objects:
We can add a constraint at the database level to always prevent null
values from getting there
We can end up with misleading data, we can
have 0's instead of nulls in the database whenever the user doesn't
set a value and buggy data is a tough catch.
Objects have more expressive power than primitives. We have null values and also
integer values, so we can validate them easier using annotations for
example (javax.validation.constraints.NotNull).
Use wrappers, make your life simple.
Your data model should dictate this. You should be enforcing nullability in the database anyway.
If they are nullable in the database, then use wrappers. If they are not nullable, and you use wrappers, then you'll get an exception if you try and insert a null into the database.
If your data model doesn't dictate it, then go for a convention, use wrappers all of the time. That way people don't have to think, or decide that a value of 0 means null.
I would also query your assertion that it would less performant. Have you measured this? I mean really measured it? When you're talking to a database, there are a lot more considerations than the difference between 16 bits and 32 bits.
Just use the simple, consistent solution. Use wrappers everywhere, unless somebody gives you a very good reason (with accurate measured statistics) to do otherwise.
Thought it should be mentioned:
Hibernate recommendation (section 4.1.2) using non-primitive properties in persistent classes actually refers - as titled - to identifier properties:
4.1.2. Provide an identifier property
Cat has a property called id. This property maps to the primary key column(s) of a database table. The property might have been called anything, and its type might have been any primitive type, any primitive "wrapper" type, java.lang.String or java.util.Date.
...
We recommend that you declare consistently-named identifier properties on persistent classes and that you use a nullable (i.e., non-primitive) type.
Nonetheless, the advantages of primitives aren't strong:
Having an inconsistent non-null value in a property is worse than NullPointerException, as the lurking bug is harder to track: more time will pass since the code is written until a problem is detected and it may show up in a totally different code context than its source.
Regarding performance: before testing the code - it is generally a premature consideration. Safety should come first.
The Hibernate documentation (just the first version I happened to find) states:
The property might have been called anything, and its type might have been any primitive type, any primitive "wrapper" type, java.lang.String or java.util.Date.
...
We recommend that you declare consistently-named identifier properties on persistent classes and that you use a nullable (i.e., non-primitive) type.
So the "expert's voice" suggests using Integer / Long... but it's not described why this is the case.
I wonder whether it's so that an object which hasn't been persisted yet can be created without an identifier (i.e. with a property value of null), distinguishing it from persisted entities.
Related
I've been dealing with this, now i want to take control of this. Due to data size, I have to control the list which was populated by Hibernate.
#OneToMany(mappedBy="members")
private List<Members> membersList;
So the memberList can grow upto 100, The Sql of android cannot not take it. I meant the huge size of list stored to internal database.
Is there anyway to control the list size before saving to android internal database?
Thanks,
Pusp
you need to set your type of collection.
#OneToMany(mappedBy="members")
private Set<MembersList> memberList;
UPDATE
The documentation says:
Naturally Hibernate also allows to persist collections. These persistent collections can contain almost any other Hibernate type, including: basic types, custom types, components and references to other entities. The distinction between value and reference semantics is in this context very important. An object in a collection might be handled with "value" semantics (its life cycle fully depends on the collection owner), or it might be a reference to another entity with its own life cycle. In the latter case, only the "link" between the two objects is considered to be a state held by the collection.
As a requirement persistent collection-valued fields must be declared as an interface type (see Example 7.2, “Collection mapping using #OneToMany and #JoinColumn”). The actual interface might be java.util.Set, java.util.Collection, java.util.List, java.util.Map, java.util.SortedSet, java.util.SortedMap or anything you like ("anything you like" means you will have to write an implementation of org.hibernate.usertype.UserCollectionType).
Link: https://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/collections.html#collections-mapping
My task is to make disk cache on Android OS for my application (it is some sort of messenger). I'd like to store messages in database, but have met a problem of storing different types of messages (currently 5 types of messages each type have it's own fields and they all extends base class)
GreenDao documentation says:
Note: currently it’s impossible to have another entity as a super class (there are no polymorphic queries either)
I am planing to have entity which almost 1 to 1 to base class, except one column - raw binary or json data in which every child class can write anything it need.
My questions are:
GreenDao is good solution in such case? Is there any solutions which allow not to worry about inheritance - and how much did they cost in terms of efficiency.
How to "serialize" data to such field (what method I should override or where I should put my code which will do all necessary things
How to give GreenDao correct constructor to "deserialize" Json or binary to correct class instance
Should I use reflection - or just switch/case for finding correct constructor (only 5 types of constructors are possible) - is reflection how much will reflection "cost" in such case?
If you really need inheritance greendao is not the r I get choice, since it doesn't support it. But I think you can go without inheritance:
You can design an entity with a discriminator column (messagetype) and a binary or text column (data). Then you can use an abstract factory to create desired objects from data depending of the messagetype.
If the conversion is complex, I'd put it in a separate class, otherwise I'd put it as a method in the keep section.
Be aware that this design may slow you down, if you really have a lot of messages, since separate tables would reduce index sizes.
Talking about indexes: if you want to access a message through some property of your data column later on, you are screwed since you can't put an index on it.
For possibly no other good reason at this point in time other that 'we've always done it like this', how are new systems being architected to use reference data used to represent state codes?
For example, a Case may have 2 valid states, 'Open' or 'Closed'. Historically I've seen many systems where these valid values would be stored in a database table containing this reference data, and referred to as a code type ('CaseStatus'), and each valid value has a 'code' value (eg 'OPN') and a decode or display value that is used when the value is needed to be displayed to a user (in this case 'Open').
If developing a Java based system today, from a code point of view with type safety, we would define an Enum like this:
public enum CaseStatus{
Open("OPN"),
Closed("CLS");
private String codeValue;
private CaseStatus(String codeValue){
this.codeValue = codeValue;
}
}
This is great solely from the view of the source code, the Enum enforces type-safety with a restricted list of valid values, but by itself there is no representation of this code type or it's valid values in the database. If there are users of the data who run adhoc reports directly against the database, they need a way to look up decoded values for 'OPN', 'CLS'. Historically this would have been done using a reference table containing the codetype, the codes and their decode values.
It seems odd that we continue to use these state code values as '3 letter codes', where the motivation at this point is no longer because we need to save space in the database ('OPN' vs 'Open' is hardly a great optimization anyway).
What other approaches have people used or seen on recent systems they have worked on? Do you maintain the reference data only in the database, only in code, or in both places, and if you maintain it in both, what approaches do you use to keep the two in sync?
First, if there are only two possible values, and it is not possible to expect them to develop into a larger number (as in your example of open/closed), I would probably define a status_open column as BOOLEAN or SMALLINT (0/1) or CHAR (Y/N).
When the universe of status is bigger (or may increase to more than two values), I would use a surrogate key. While saving a few bytes is hardly an optimization, indexing and joining CHAR valued columns is more expensive than indexing and joining INTEGER columns. While I don't have a metric on the issue of INTEGER vs CHAR(3), I would suppose that for this case the difference would not be as big as in the case of INTEGER vs CHAR(50).
However, an disadventage that I find in small CHAR abbreviations is that sometimes it is difficult to find meaningful values. Suppose that you have an status of "broken - replacement has been ordered", does it help if I call it "BRO"? Is it better than calling it 3?
On the other hand, even when it is not required by the model, I found convenient adding a short VARCHAR column on status, for describing what each mnemonic or surrogate key means. (After the model grows, it becomes quite difficult to remember all of them!)
My implementation (with due exceptions in particular cases) would likely be:
On the Java side, the enum, as you defined it. (Even for boolean-like values, sometimes it helps having different enums for each value, particularly if there are methods taking several of those values as parameter. Methods with a long list of parameters of the same type are a recipe for disaster).
On the SQL side:
CREATE TABLE status (
id INTEGER PRIMARY KEY,
description VARCHAR(40)
)
CREATE TABLE entity (
...
status_id INTEGER REFERENCES status(id)
)
INSERT INTO status VALUES (0,'Closed');
INSERT INTO status VALUES (1,'Open');
INSERT INTO status VALUES (2,'Broken - replacement has been ordered');
One solution I've encountered is to use a materialized view in the database to dynamically recalculate the denormalized relation. In a document based database you would probably store the CaseStatus as a String. Finally, you might use an ORM tool to store CaseStatus as an Object but in the cases I'm familiar with the reference data is stored in the database (if you store it in code then it requires a build and deployment to production, along with additional testing for the release).
Would like opinions on the best way to go.
As you can see int cannot be null.
Where as the object of Integer can.
Problem: Database values with a column that is number can be null or can contain a number.
When the database passes the value of null, then we receive and exception stating that
"primitive values cannot be null"
But when we use Integer class, we are creating that object (which of course is bigger/bulkier than a primitive type int)
So that brings up to me a couple of choices.
Use Integer type.
Set Database column to "default"
Set int to default if there is something different in the database, then accept that
Any other suggestions?
I don't think you must worry about Integer <-> int converting performance (100000000 opsec) if you query database (5000 op/sec). Use boxed types courageously.
Use Hibernate (or similar ORM) and let the framework deal with the database directly. Then you can program it how you like, and not have to deal with converting.
Reinventing the wheel seldom works as well as just using someone else's wheel in the first place, especially when thousands of others already use the same wheel.
My choices:
First- If it is possible and if it is logically ok to set a default in database (might be a FK to a ref table which has all values + a default).
Second - If first is not possible, I would use Integer objects without any reservation. I don't think you will have perf issues. The code will be clean but document in the variable that "can contain null and check for null while working on this variable".
I would always go for something which is more readable and understandable to keep the software maintainable and flexible.
Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.
We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.
I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.
Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.
While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.
What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).
If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.
Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.
In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.