JPA merge fails due to duplicate key

JPA merge fails due to duplicate key - java

I have a simple entity, Code, that I need to persist to a MySQL database.
public class Code implements Serializable {
#Id
private String key;
private String description;
...getters and setters...
}
The user supplies a file full of key, description pairs which I read, convert to Code objects and then insert in a single transaction using em.merge(code). The file will generally have duplicate entries which I deal with by first adding them to a map keyed on the key field as I read them in.
A problem arises though when keys differ only by case (for example: XYZ and XyZ). My map will, of course, contain both entries but during the merge process MySQL sees the two keys as being the same and the call to merge fails with a MySQLIntegrityConstraintViolationException.
I could easily fix this by uppercasing the keys as I read them in but I'd like to understand exactly what is going wrong. The conclusion I have come to is that JPA considers XYZ and XyZ to be different keys but MySQL considers them to be the same. As such when JPA checks its list of known keys (or does whatever it does to determine whether it needs to perform an insert or update) it fails to find the previous insert and issuing another which then fails. Is this corrent? Is there anyway round this other than better filtering the client data?
I haven't defined .equals or .hashCode on the Code class so perhaps this is the problem.

I haven't defined .equals or .hashCode on the Code class so perhaps this is the problem.
Well, you really should, you don't want to inherit from the behavior of Object for Entities. Whether you want to use the primary key, do a case sensitive comparison, or use a business identity is another story but you certainly don't want to use reference equality. You don't want the following entities:
Code code1 = new Code();
code1.setKey("abc");
Code code2 = new Code();
code2.setKey("abc");
To be considered as different by JPA.
Second, if you want to be able to insert an Entity with XYZ as key and another one with XyZ, then you should use a case sensitive column type (you can make the varchar column case sensitive by using the binary attribute) or you'll get a primary key constraint violation.
So, to summarize:
implements equals (and hashCode), decide whether you need case sensitive comparison of the key or not.
use the appropriate column type at the database level.

This depends on how your column is defined in mySQL. mySQL is the oddman out of databases in that VARCHAR and similar columns default to case insensitive matches. If you want XYZ and XyZ to be distinct legal options, you'll need to change your CREATE TABLE statement to create a case sensitive column (see docs for your version of mySQL.)
It's likely something like this:
CREATE TABLE code (
key VARCHAR(32) BINARY,
value VARCHAR(32) BINARY
)

Related

Flyway - auto increment id not working with test data in PostgreSQL

Before I added Flyway to my project, I could run POST request and the new user was created successfully with ID = 1, next one ID = 2 etc.
Then I added Flyway to create tables and insert some test data by V1_init.sql:
create table "user"(
id int8 not null,
username varchar(255),
);
insert into "user" values (1, 'user1');
insert into "user" values (2, 'user2');
insert into "user" values (3, 'user3');
Table is created. Users are inserted.
Trying to run POST request -> error 500
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "organisation_pkey" Key (id)=(1) already exists.
So my app should add new user with ID=4 but it looks like it can't recognize that there are 3 users already added.
I'm using GenericEntity:
#Getter
#Setter
#MappedSuperclass
public abstract class GenericEntity<ID extends Serializable> implements Serializable {
#Id
#GeneratedValue
protected ID id;
}
application.properties:
spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://localhost:5432/my-app
spring.datasource.username=user
spring.datasource.password=user
spring.jpa.hibernate.ddl-auto=update
spring.jpa.database-platform=org.hibernate.dialect.PostgreSQLDialect
spring.jpa.properties.hibernate.format_sql=true
I tried to use all strategies #GeneratedValue, changing spring.jpa.hibernate.ddl-auto, adding users in init.sql without id (not working)
but still no positive effects. Any ideas what could be wrong?

You seem to have only a half understanding of what you're doing...
I tried to use all strategies #GeneratedValue
You don't need to randomly try strategies, you need to pick the one that matches your current database design.
changing spring.jpa.hibernate.ddl-auto
This is dangerous and you should set it to "none", given that you are using flyway.
adding users in init.sql without id (not working)
This will only work if postgresql is set up to automatically generate ids (which is easiest through a sequence).
From your code, it does not look like that is the case.
what could be wrong?
JPA's #GeneratedValue is capable of ensuring that values are generated when it is responsible for creating rows (that means when you pass EntityManager#persist). It does not and can not know about your flyway scripts where you bypass JPA to insert rows manually.
Furthermore, let's look at #GeneratedValue's strategy property. The strategy you choose will influence how JPA generates IDs. There are only a few options: TABLE, SEQUENCE, IDENTITY and AUTO. Since you did not explicitly specify a strategy, you are currently using the default, which is AUTO. This is not recommended because it is not explicit, and now it's hard to say what your code is doing.
Under the TABLE and SEQUENCE strategies, JPA will do an interaction with the database in order to generate an ID value. In those cases, JPA is responsible for generating the value, though it will rely on the database to do so. Unsurprisingly, the former will use a table (this is rare, btw, but also the only strategy that is guaranteed to work on all RDBMS) and the latter will use a sequence (far more common and supported by practically every commercially relevant RDBMS).
With IDENTITY, JPA will not attempt to generate a key at all, because this strategy assumes that the DB will generate an ID value on its own. The responsibility is thus delegated to the database entirely. This is great for databases that have their own auto-increment mechanism.
Postgres does not really have an auto-increment system but it has some nice syntactic sugar that nearly makes it work like it: the serial "datatype". If you specify the datatype of a column as "serial", it will in fact be created with datatype int, but postgresql will also create a sequence and tie the default value of the ID column to the sequence's next value generator.
In your case, JPA is most likely using either SEQUENCE or TABLE. Since your DDL setting is set to "update", Hibernate will have generated a table or sequence behind your back. You should check your database with something like pgAdmin to verify which it is, but I'd put my money on a sequence (so I'm assuming it's using the SEQUENCE strategy).
Because you haven't specified a #SequenceGenerator, a default will be used which, AFAIK, will start from 1.
Then when JPA tries to insert a new row, it will call that sequence to generate an ID value. It will get the next value of the sequence, which will be 1. This will conflict with the IDs you manually entered in flyway.
My recommended solution would be to:
redefine your postgresql data type from int8 to "serial" (which is actually int + a sequence + sets up default value linking the ID column to the sequence so that postgres will automatically generate an ID if you don't explicitly specify one - careful, also don't specify null, just don't specify the ID column in the insert statement at all!)
explicitly set the generator strategy to IDENTITY on the JPA side
update your flyway scripts to insert users without explicit ID value (this will ensure that the test data advance the sequence, so that when JPA uses that same sequence later, it will not generate a conflicting ID)
I'd say there are alternative solutions, but other than using the TABLE strategy or generating keys in memory (both things which you should avoid), there isn't really a viable alternative because it will boil down to using a sequence anyway. I suppose it's possible to manually specify the sequence, forego the default value on the id field, call the sequence manually in your insert statements, and map the sequence explicitly in JPA... but I don't see why you'd make things hard on yourself.

Type-safe approaches for handling reference data in Java applications

For possibly no other good reason at this point in time other that 'we've always done it like this', how are new systems being architected to use reference data used to represent state codes?
For example, a Case may have 2 valid states, 'Open' or 'Closed'. Historically I've seen many systems where these valid values would be stored in a database table containing this reference data, and referred to as a code type ('CaseStatus'), and each valid value has a 'code' value (eg 'OPN') and a decode or display value that is used when the value is needed to be displayed to a user (in this case 'Open').
If developing a Java based system today, from a code point of view with type safety, we would define an Enum like this:
public enum CaseStatus{
Open("OPN"),
Closed("CLS");
private String codeValue;
private CaseStatus(String codeValue){
this.codeValue = codeValue;
}
}
This is great solely from the view of the source code, the Enum enforces type-safety with a restricted list of valid values, but by itself there is no representation of this code type or it's valid values in the database. If there are users of the data who run adhoc reports directly against the database, they need a way to look up decoded values for 'OPN', 'CLS'. Historically this would have been done using a reference table containing the codetype, the codes and their decode values.
It seems odd that we continue to use these state code values as '3 letter codes', where the motivation at this point is no longer because we need to save space in the database ('OPN' vs 'Open' is hardly a great optimization anyway).
What other approaches have people used or seen on recent systems they have worked on? Do you maintain the reference data only in the database, only in code, or in both places, and if you maintain it in both, what approaches do you use to keep the two in sync?

First, if there are only two possible values, and it is not possible to expect them to develop into a larger number (as in your example of open/closed), I would probably define a status_open column as BOOLEAN or SMALLINT (0/1) or CHAR (Y/N).
When the universe of status is bigger (or may increase to more than two values), I would use a surrogate key. While saving a few bytes is hardly an optimization, indexing and joining CHAR valued columns is more expensive than indexing and joining INTEGER columns. While I don't have a metric on the issue of INTEGER vs CHAR(3), I would suppose that for this case the difference would not be as big as in the case of INTEGER vs CHAR(50).
However, an disadventage that I find in small CHAR abbreviations is that sometimes it is difficult to find meaningful values. Suppose that you have an status of "broken - replacement has been ordered", does it help if I call it "BRO"? Is it better than calling it 3?
On the other hand, even when it is not required by the model, I found convenient adding a short VARCHAR column on status, for describing what each mnemonic or surrogate key means. (After the model grows, it becomes quite difficult to remember all of them!)
My implementation (with due exceptions in particular cases) would likely be:
On the Java side, the enum, as you defined it. (Even for boolean-like values, sometimes it helps having different enums for each value, particularly if there are methods taking several of those values as parameter. Methods with a long list of parameters of the same type are a recipe for disaster).
On the SQL side:
CREATE TABLE status (
id INTEGER PRIMARY KEY,
description VARCHAR(40)
)
CREATE TABLE entity (
...
status_id INTEGER REFERENCES status(id)
)
INSERT INTO status VALUES (0,'Closed');
INSERT INTO status VALUES (1,'Open');
INSERT INTO status VALUES (2,'Broken - replacement has been ordered');

One solution I've encountered is to use a materialized view in the database to dynamically recalculate the denormalized relation. In a document based database you would probably store the CaseStatus as a String. Finally, you might use an ORM tool to store CaseStatus as an Object but in the cases I'm familiar with the reference data is stored in the database (if you store it in code then it requires a build and deployment to production, along with additional testing for the release).

Sharing constants between java and database

Assume that you have a STORE table having a varchar column STATUS that accepts values (OPEN,CLOSED)
On java side, and especially in your sqls I find myself writing queries like this
select * from store where status='OPEN'
Now this is not a written contract and is open to lots of bugs.
I want to manage cases where on db side a new status added or an existing one renamed and handle it on java side. For example if on STORE table if all statuses with OPEN are changed to OP, my sql code will fail.
PS:This question is in fact programming language and database server agnostic, but I tag it with java since I deal with it more.

Your need is a bit strange. Usually stuff don't just "happen" in database, and you don't have to cope with it. Rather, you decide to change things in your app(s) and both change your code and migrate your data.
This being said, if you want to ensure your data are consistent with a well-known set of values, you can create library tables. In your case:
create table STORE (status varchar(32)) -- your table
create table LIB_STORE_STATUS (status varchar(32) unique) -- a lib table for statuses
alter table STORE add constraint FK_STORE_STATUS foreign key (status) references LIB_STORE_STATUS(status) -- constraints the values in your STORE table
Then:
insert into STORE values ('A') -- fails
insert into LIB_STORE_STATUS values ('A')
insert into STORE values ('A') -- passes
With this, you just have to ensure your lib table is always in sync with your code (i.e. your enum names when using JPA's #Enumerated(EnumType.STRING) mapping strategy).

Use enums, you can map directrly to the enum instance name (not necessary to convert to the int ordinal)
But in this case I would have a boolean/bit column called open, and its possible values would be true or false.
(boolean is bit 0/1 in most DB's)

Using Hibernate sequence generators manually

Basically, I want a way to access sequence values in a database-neutral way.
The use case is that I have a field on an entity that I want to set based on an incrementing value (other than the id).
For instance, say I have a Shipment entity. At some point after the shipment gets created, it gets shipped. Once it gets shipped, a manifest number is generated for it and assigned. The manifest number looks something like M000009 (Where the stuff after the 'M' is a left-padded value from a sequence).
Something similar was asked here at SO , but I'm not a fan of the solution since it requires another table to maintain and seems like a weird relationship to have.
Does anyone know if it is possible to use something like hibernate's MultipleHiLoPerTableGenerator as something other than an ID generator?
If that's not possible, does anyone know of any libraries that handle this (either using hibernate or even just pure JDBC). I'd prefer not to have to write this myself (and have to deal with prefetching values, locking and synchronization).
Thanks.

I think the complexity of your task depends on whether or not you manifest number needs to be sequential:
If you don't need sequential manifest numbers then it's happy days and can use a sequence.
If you do need sequential manifest numbers (or your database doesn't support sequences) then use an id table with the appropriate locking so that each transaction gets a unique sequential value.
Then you've got 2 options that I can think of:
write the necessary JDBC code on your client, ensuring (if the manifest number is sequential) that the transaction being used is the same as that for the database update.
use a trigger to create the manifest number when the appropriate update occurs.
I think my preference would be the trigger because the transaction side of things would be taken care of although it would mean the object would need refreshing on the client.

I didn't read over the linked similar solution, but sounds like something I wound up doing. I created a table just for sequences. I added a row to the table for each sequence type I needed.
I then had a sequence generator class that would do the necessary sql query to fetch and update the sequence value for a particular named sequence.
I used hibernate's Dialect class to do it in a db neutral way.
I also would 'cache' the sequences. I would bump the stored sequence value by a large number, and then dole those out those allocated sequences from my generator class. If the class was destroyed (ie. app shutdown), a new instance of the sequence generator would start up at the stored value. (having a gap in my sequence numbers did not matter)

Here is a code samnple. I would like to caveat this with - I have not comiled this and it reuires spring code. Having said this it should still provide the bones of what you want to do.
public Long getManifestNumber() {
final Object result = getHibernateTemplate().execute(new HibernateCallback() {
public Object doInHibernate(Session sess) throws HibernateException, SQLException {
SQLQuery sqlQuery = sess.createSQLQuery("select MY_SEQUENCE.NEXTVAL from dual");
sqlQuery.uniqueResult();
}
});
Long toReturn;
if (result instanceof BigDecimal) {
toReturn = ((BigDecimal)result).longValue();
}
return toReturn;
}

Persisting data suited for enums

Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.

We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.

I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.

Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.

While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.

What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).

If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.

Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.

In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.