Create a custom Aggregate Function with jOOQ

Create a custom Aggregate Function with jOOQ - java

Context
I am working with jOOQ against a PostgreSQL database.
I want to use jsonb_object_agg(name, value) onto the resultset of a LEFT OUTER JOIN.
Problem
The join being an OUTER one, sometimes the name component of the aggregation function is simply null: that can't work. I would then go for:
COALESCE(
json_object_agg(table.name, table.value) FILTER (WHERE table.name IS NOT NULL),
'{}'
)::json
As of now, the code I use to call jsonb_object_agg is (not exactly, but boils down to) the following:
public static Field<?> jsonbObjectAgg(final Field<?> key, final Select<?> select) {
return DSL.field("jsonb_object_agg({0}, ({1}))::jsonb", JSON_TYPE, key, select);
}
... where JSON_TYPE is:
private static final DataType<JsonNode> JSON_TYPE = SQLDataType.VARCHAR.asConvertedDataType(/* a custom Converter */);
Incomplete solution
I would love to leverage jOOQ's AggregateFilterStep interface, and in particular, to be able to use its AggregateFilterStep#filterWhere(Condition... conditions).
However, the org.jooq.impl.Function class that implements AggregateFilterStep (indirectly via AgregateFunction and ArrayAggOrderByStep) is restricted in visibility to its package, so I can't just recycle blindly the implementation of DSL#ArrayAggOrderByStep:
public static <T> ArrayAggOrderByStep<T[]> arrayAgg(Field<T> field) {
return new org.jooq.impl.Function<T[]>(Term.ARRAY_AGG, field.getDataType().getArrayDataType(), nullSafe(field));
}
Attempts
The closest I got to something reasonable is... building my own coalesceAggregation function that specifically coalesces aggregated fields:
// Can't quite use AggregateFunction there
// v v
public static <T> Field<T> coalesceAggregation(final Field<T> agg, final Condition coalesceWhen, #NonNull final T coalesceTo) {
return DSL.coalesce(DSL.field("{0} FILTER (WHERE {1})", agg.getType(), agg, coalesceWhen), coalesceTo);
}
public static <T> Field<T> coalesceAggregation(final Field<T> agg, #NonNull final T coalesceTo) {
return coalesceAggregation(agg, agg.isNotNull(), coalesceTo);
}
... But I then ran into issues with my T type being JsonNode, where DSL#coalesce seems to CAST my coalesceTo to varchar.
Or, you know:
DSL.field("COALESCE(jsonb_object_agg({0}, ({1})) FILTER (WHERE {0} IS NOT NULL), '{}')::jsonb", JSON_TYPE, key, select)
But that'd be the very last resort: it'd feel like I'd merely be one step away from letting the user inject any SQL they want into my database 🙄
In short
Is there a way in jOOQ to "properly" implement one's own aggregate function, as an actual org.jooq.AgregateFunction?
I'd like to avoid having it generated by jooq-codegen as much as possible (not that I don't like it – it's just our pipeline that's horrible).

Starting with jOOQ 3.14.0
The JSON_OBJECTAGG aggregate function is supported natively in jOOQ now:
DSL.jsonObjectAgg(TABLE.NAME, TABLE.VALUE).filterWhere(TABLE.NAME.isNotNull());
Support for the FILTER clause was added in jOOQ 3.14.8.
Starting with jOOQ 3.14.8 and 3.15.0
If jOOQ doesn't implement a specific aggregate function, you can now specify DSL.aggregate() to make use of custom aggregate functions.
DSL.aggregate("json_object_agg", SQLDataType.JSON, TABLE.NAME, TABLE.VALUE)
.filterWhere(TABLE.NAME.isNotNull());
This was implemented with https://github.com/jOOQ/jOOQ/issues/1729
Pre jOOQ 3.14.0
There's a missing feature in the jOOQ DSL API, namely to create plain SQL aggregate functions. The reason why this is not available yet (as of jOOQ 3.11) is because there are a lot of delicate internals to specifying a vendor agnostic aggregate function that supports all of the vendor-specific options including:
FILTER (WHERE ...) clause (as you mentioned in the question), which has to be emulated using CASE
OVER (...) clause to turn an aggregate function into a window function
WITHIN GROUP (ORDER BY ...) clause to support ordered set aggregate functions
DISTINCT clause, where supported
Other, vendor-specific extensions to aggregate functions
The easy workaround in your specific case is to use plain SQL templating all the way as you mentioned in your question as well:
DSL.field("COALESCE(jsonb_object_agg({0}, ({1})) FILTER (WHERE {0} IS NOT NULL), '{}')::jsonb", JSON_TYPE, key, select)
Or you do the thing you've mentioned previously. Regarding that concern:
... But I then ran into issues with my T type being JsonNode, where DSL#coalesce seems to CAST my coalesceTo to varchar.
That's probably because you used agg.getType() which returns Class<?> instead of agg.getDataType() which returns DataType<?>.
But that'd be the very last resort: it'd feel like I'd merely be one step away from letting the user inject any SQL they want into my database
I'm not sure why that is an issue here. You will still be able to control your plain SQL API usage yourself, and users won't be able to inject arbitrary things into key and select because you control those elements as well.

Related

Using "lower" and "upper" aggregate functions on JOOQ Postgres range types

I'm using the Postgres numrange type with JOOQ as defined here and want to call the lower/upper aggregate function on the selected ranges. My understanding is that these functions are not actually implemented in the jooq-postgres-extensions module and that I somehow have to implement this myself. Reading through this blog post, the author mentions that these functions have to be implemented yourself and he gives some examples:
static <T extends Comparable<T>> Condition
rangeContainsElem(Field<Range<T>> f1, T e) {
return DSL.condition("range_contains_elem({0}, {1})", f1, val(e));
}
static <T extends Comparable<T>> Condition
rangeOverlaps(Field<Range<T>> f1, Range<T> f2) {
return DSL.condition("range_overlaps({0}, {1})", f1, val(f2, f1.getDataType()));
}
However, he does not show any implementation of the lower/upper functions. How are these functions implemented?
Ideally, the end-goal would be to be able to do something like this, where the lower and upper bound of a column of ranges is retrieved:
val rangeMetadata = create.select(
BigDecimalRange(
max(upper(RANGE_PARAMETER.VALUE)),
true,
min(lower(RANGE_PARAMETER.VALUE)),
true
)
)
.from(RANGE_PARAMETER)
.fetch()

There isn't a big difference between defining your rangeOverlaps() and a lower() or upper() aggregate function, you can do it exactly the same way. In order to map that data directly into the BigDecimalRange type, you can use:
Nested records
Ad hoc conversion
create.select(
row(
max(upper(RANGE_PARAMETER.VALUE)),
min(lower(RANGE_PARAMETER.VALUE))
).mapping { u, l -> bigDecimalRange(u, true, l, true) }
)
.from(RANGE_PARAMETER)
.fetch()
A note regarding:
My understanding is that these functions are not actually implemented in the jooq-postgres-extensions module and that I somehow have to implement this myself
There isn't a big reason why this shouldn't be done. It's just a task that hasn't been implemented yet. I've created an issue for this. Could be useful to support this out of the box: #13828
How to create a plain SQL templating Field<T>
Apparently, there seems to be a difficulty to going from the plain SQL templating Condition (which you already have) to a Field<T>, but it's just the same thing:
public static Field<BigDecimal> upper(Field<? extends BigDecimalRange> f) {
return DSL.field("upper({0})", SQLDataType.NUMERIC, f);
}
See also:
Plain SQL
Plain SQL templating

How to convert SearchHits<T> return type to Page<T> return type in spring data elasticsearch 4.0

I have a spring boot application connecting to an ElasticSearch cluster and I'm performing the following operation on it:
#Autowired
private ElasticsearchOperations operations;
public Page<Metadata> search(Request request){
Query query = this.queryBuilder(request);
SearchHits<Metadata> hits = operations.search(query, Metadata.class);
\\ some code to convert SearchHits<Metadata> to Page<Metadata> type
}
Metadata is my custom entity and query builder is another function I've defined that returns an elasticsearch query type at runtime.
The issue is, I need to page this incoming data before returning it. I've looked through the entire official documentation for spring data elasticsearch but haven't found any examples for paging results. In fact, the only example of paging I could find anywhere on the internet was using a spring repository search method, which was something like this,
Page<Metadata> page = repository.search(query, PageRequest.of(0,1));
but that method is now deprecated in the 4.x version of the package.
The query I'm using is constructed dynamically by another function (the queryBuilder function) and it depends on incoming request parameters so defining methods on the repository interface is out of the questions as it would require me to define methods for every single combination of parameters and checking which one to use with if-else conditional blocks each time.
Is there any way to either return a Page <T> type from the ElasticSearchOperations interface methods (the official documentation claims a SearchPage <T> type as one of the types available for return values, but none of the suggested methods returns that type) or alternatively is there any way to convert SearchHits<T> to type Page<T>.
Any help is much appreciated.

I have done this in the following way:
SearchPage<YourClass> page = SearchHitSupport.searchPageFor(searchHits, query.getPageable());
return (Page)SearchHitSupport.unwrapSearchHits(page);

fortunately I found an answer to this question, thanks to this questions asked earlier
As mentioned over there, the query parameter passed to the search method has a setPageable method.
So, the code will now look somewhat like
public SearchPage<Metadata> search(Request request){
Query query = this.queryBuilder(request);
query.setPageable(PageRequest.of(PageNumber,PageSize));
SearchHits<Metadata> hits = operations.search(query, Metadata.class);
return SearchHitSupport.searchPageFor(hits, query.getPageable());
}
As suggested by a commenter, SearchPage and Page are both interfaces, so changing the return type of the method is not necessary

How do I handle nullable fields using either the Mono<Connection> or the DatabaseClient provided by R2dbc in Spring?

I am at a loss for how to contruct an efficient query in R2dbc (java) using spring-webflux (reactive). Using the DatabaseClient object provided by R2dbc (or alternatively, a Connection object), it seems that I am only able to call different variations of one of these two methods: bind(Object field, Object value) or bindNull(Object field, Class<?> type). If I have a schema, and a corresponding class in Java, with multiple nullable fields, how am I expected to handle this [somewhat] efficiently?
Take for example:
public Flux<Item> saveOrUpdate(Item entity) {
Mono<Connection> connection = this.connection;
Flux<? extends Result> itemFlux = connection
.doOnError(e -> e.printStackTrace())
.flatMapMany(connect -> connect.createStatement(INSERT_OR_UPDATE_ITEM)
.bind("itemId", entity.getItemId()).returnGeneratedValues("itemid")
.bind("auditId", entity.getTx().getId())
.bind("itemNum", entity.getItemNum())
.bind("itemCat", entity.getItemCat()) //nullable
// How would I know when to use this?
.bindNull("sourcedQty", Integer.class) //nullable
.bind("makeQty", entity.getMakeQty())
.bind("nameShown", entity.getNameShown()) //nullable
.bind("price", entity.price())
.bind("dateCreated", entity.getDateCreated()) //nullable
.add()
.execute())...
...
}
OR
public Mono<Item> saveOrUpdate(Item entity){
Mono<Item> itemMono = databaseClient.execute.sql(INSERT_OR_UPDATE_ITEM)
.bind("itemId", entity.getItemId()).returnGeneratedValues("itemid")
.bind("auditId", entity.getTx().getId())
.bind("itemNum", entity.getItemNum())
.bind("itemCat", entity.getItemCat())
.bind("sourcedQty", entity.getSourcedQty())
.bind("makeQty", entity.getMakeQty())
.bind("nameShown", entity.getNameShown())
.bind("price", entity.price())
.bind("dateCreated", entity.getDateCreated())
.as(Item.class)
.fetch()
.one()...
...
}
For my nullable fields I can replace .bind with .bindNull of course. The problem is that if I do call bind, the value cannot be null. And if I call bindNull, the value must be null. How would I be able to call one or the other based on whether my value is actually null? I already know that I can just make a bunch of methods for each scenerio or call something along the lines of retryOnError. But if I want to do a insertOrUpdate(List<Item> items) this would be wasting a ton of time/resources. Ideally I would like to do something analogous to if (field == null) ? bindNull("field", field.class) : bind("field", myObj.field) somewhere somehow. If that is clearly off the table, I am still interested in figuring out a way to implement this is as efficiently as possible given what I'm working with. Appreciate any feedback.

Either setting a value or null can be done using the Parameter class as shown below:
import org.springframework.r2dbc.core.Parameter;
// rest of the code
.bind("customerId", Parameter.fromOrEmpty(o.getCustomerId(), UUID.class))
Earlier, it was SettableValue.fromOrEmpty, which is not deprecated.

These are two questions:
How to bind potentially nullable values to a Statement/DatabaseClient in a fluent style?
How to let the database figure the rest out?
R2DBC and Spring Data R2DBC make null handling explicit by requiring either binding a value to your Statement or binding a null. There's no method of accepting a potentially nullable argument. There are two reasons for that:
You should deal with nullability to make obvious what happens there. That's a good habit to handle nullable values instead of making null handling implicit. The implicit nature of null is what causes the most bugs.
Being explicit is required by databases. Parametrized statements with placeholders consist on the execution side of two chunks: The SQL statement itself and parameters bindings (descriptors). A parameter descriptor requires an association to a placeholder, type information (VARCHAR, BIT, INT, …) and the actual value. With calling bind(…) with a value, a driver can derive the type information. When binding a null value, the driver requires an additional type of information. Otherwise, we cannot execute the query.
That being said:
There's no API like bindPotentiallyNull("auditId", entity.getTx().getId(), Integer.class)
You cannot do anything within the SQL query because binding parameter information is supplied by auxiliary methods.
We face a similar issue when talking about stored procedures, because stored procedures require additional details about in/out/in-out parameters. We discussed potential wrapper types like
Parameters.in(#Nullable T value, Class<? super T> valueType)
so these could be used as wrappers in
bind("auditId", Parameters.in(entity.getTx().getId(), Integer.class))
Further details:
Mailing list discussion
Postgres protocol documentation
Microsoft SQL Server: sp_cursorprepare

Jpa Specification to find subset of field's value

I am writing a webapp using Spring Data JPA on persistence layer, more specifically, my DAOs extends the JpaSpecificationExecutor interface, so I am able to implement some kind of filter; imagine list of Items with several attributes (I omit annotations and other metadata for sake of clarity):
data class Item(var tags: MutableList<String>)
On my service layer, my filter method looks like this:
fun findBy(tagsToFilterBy: List<String>): List<Items> {
return dao.findAll { root, query, builder ->
builder.//??
}
}
What I want to achieve is to retrieve only Items that contain exactly that tagsToFilterBy, in other words, tagsToFilterBy should be a subset of Item.tags.
I know about isMember(...) method, but I think that it's usage wouldn't be very pleasant with many tags as it accepts only single "entity" at call. Could you advice me something?
My other question is that whether it is safe to use user input directly in let's say builder.like(someExpression, inputFromUser) or I have to put it in builder.parameter(...) and then query.setParameter(...).
Thank you for any idea

So I managed to write by myself. I'm not saying that it is pretty, but it is the prettiest one, I could come with:
dao.findAll { root, query, builder ->
val lst = mutableListOf<Predicate>()
val tagsPath = root.get<List<Tag>>("tags")
tagsToFilterBy.forEach {
lst.add(cb.isMember(it, tagsPath))
}
cb.or(*lst.toTypedArray())
}
This is basically going through the given tags, and checking whether it is a member of tags or not.

One way is to use filter and test each element to see if your filter list contains it.
val result = dao.filter { tagsToFilterBy.contains(it.tag) }
To speed it up, you could force sort your filter list, and maybe use binarySearch, but performance improvement (or not) would depend on the size of the filter list. For example, assuming tagsToFilterBy is sorted, then:
val result2 = dao.filter { tagsToFilterBy.binarySearch(it.tag) >= 0 }
The Kotlin Collection page describes each of these extension methods.

What's the difference between Spring Data's MongoTemplate and MongoRepository?

I need to write an application with which I can do complex queries using spring-data and mongodb. I have been starting by using the MongoRepository but struggled with complex queries to find examples or to actually understand the Syntax.
I'm talking about queries like this:
#Repository
public interface UserRepositoryInterface extends MongoRepository<User, String> {
List<User> findByEmailOrLastName(String email, String lastName);
}
or the use of JSON based queries which I tried by trial and error because I don't get the syntax right. Even after reading the mongodb documentation (non-working example due to wrong syntax).
#Repository
public interface UserRepositoryInterface extends MongoRepository<User, String> {
#Query("'$or':[{'firstName':{'$regex':?0,'$options':'i'}},{'lastName':{'$regex':?0,'$options':'i'}}]")
List<User> findByEmailOrFirstnameOrLastnameLike(String searchText);
}
After reading through all the documentation it seems that mongoTemplate is far better documented then MongoRepository. I'm referring to following documentation:
http://static.springsource.org/spring-data/data-mongodb/docs/current/reference/html/
Can you tell me what is more convenient and powerful to use? mongoTemplate or MongoRepository? Are both same mature or does one of them lack more features then the other?

"Convenient" and "powerful to use" are contradicting goals to some degree. Repositories are by far more convenient than templates but the latter of course give you more fine-grained control over what to execute.
As the repository programming model is available for multiple Spring Data modules, you'll find more in-depth documentation for it in the general section of the Spring Data MongoDB reference docs.
TL;DR
We generally recommend the following approach:
Start with the repository abstract and just declare simple queries using the query derivation mechanism or manually defined queries.
For more complex queries, add manually implemented methods to the repository (as documented here). For the implementation use MongoTemplate.
Details
For your example this would look something like this:
Define an interface for your custom code:
interface CustomUserRepository {
List<User> yourCustomMethod();
}
Add an implementation for this class and follow the naming convention to make sure we can find the class.
class UserRepositoryImpl implements CustomUserRepository {
private final MongoOperations operations;
#Autowired
public UserRepositoryImpl(MongoOperations operations) {
Assert.notNull(operations, "MongoOperations must not be null!");
this.operations = operations;
}
public List<User> yourCustomMethod() {
// custom implementation here
}
}
Now let your base repository interface extend the custom one and the infrastructure will automatically use your custom implementation:
interface UserRepository extends CrudRepository<User, Long>, CustomUserRepository {
}
This way you essentially get the choice: everything that just easy to declare goes into UserRepository, everything that's better implemented manually goes into CustomUserRepository. The customization options are documented here.

FWIW, regarding updates in a multi-threaded environment:
MongoTemplate provides "atomic" out-of-the-box operations updateFirst, updateMulti, findAndModify, upsert... which allow you to modify a document in a single operation. The Update object used by these methods also allows you to target only the relevant fields.
MongoRepository only gives you the basic CRUD operations find, insert, save, delete, which work with POJOs containing all the fields. This forces you to either update the documents in several steps (1. find the document to update, 2. modify the relevant fields from the returned POJO, and then 3. save it), or define your own update queries by hand using #Query.
In a multi-threaded environment, like e.g. a Java back-end with several REST endpoints, single-method updates are the way to go, in order to reduce the chances of two concurrent updates overwriting one another's changes.
Example: given a document like this: { _id: "ID1", field1: "a string", field2: 10.0 } and two different threads concurrently updating it...
With MongoTemplate it would look somewhat like this:
THREAD_001 THREAD_002
| |
|update(query("ID1"), Update().set("field1", "another string")) |update(query("ID1"), Update().inc("field2", 5))
| |
| |
and the final state for the document is always { _id: "ID1", field1: "another string", field2: 15.0 } since each thread is accesing the DB only once and only the specified field is changed.
Whereas the same case scenario with MongoRepository would look like this:
THREAD_001 THREAD_002
| |
|pojo = findById("ID1") |pojo = findById("ID1")
|pojo.setField1("another string") /* field2 still 10.0 */ |pojo.setField2(pojo.getField2()+5) /* field1 still "a string" */
|save(pojo) |save(pojo)
| |
| |
and the final document being either { _id: "ID1", field1: "another string", field2: 10.0 } or { _id: "ID1", field1: "a string", field2: 15.0 } depending on which save operation hits the DB last.
(NOTE: Even if we used Spring Data's #Version annotation as suggested in the comments, not much would change: one of the save operations would throw an OptimisticLockingFailureException, and the final document would still be one of the above, with only one field updated instead of both.)
So I'd say that MongoTemplate is a better option, unless you have a very elaborated POJO model or need the custom queries capabilities of MongoRepository for some reason.

This answer may be a bit delayed, but I would recommend avoiding the whole repository route. You get very little implemented methods of any great practical value. In order to make it work you run into the Java configuration nonsense which you can spend days and weeks on without much help in the documentation.
Instead, go with the MongoTemplate route and create your own Data access layer which frees you from the configuration nightmares faced by Spring programmers. MongoTemplate is really the savior for engineers who are comfortable architecting their own classes and interactions since there is lot of flexibility. The structure can be something like this:
Create a MongoClientFactory class that will run at the application level and give you a MongoClient object. You can implement this as a Singleton or using an Enum Singleton (this is thread safe)
Create a Data access base class from which you can inherit a data access object for each domain object). The base class can implement a method for creating a MongoTemplate object which you class specific methods can use for all DB accesses
Each data access class for each domain object can implement the basic methods or you can implement them in the base class
The Controller methods can then call methods in the Data access classes as needed.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.