Increase efficiency in comparing one field with multiple collection Mongodb Jdbc

Increase efficiency in comparing one field with multiple collection Mongodb Jdbc - java

Hi , I have one SQL query i'm trying to implement it in MongoDB using MongoJava driver Jdbc [2.10]. My sql query is,
SELECT DISTINCT table1.id FROM table1,table2 WHERE table1.x = table2.x and
table1.y IN ( somevalue ) AND table2.y IN (somevalue)
In MongoDB i have Table1 collection and Table2 collection. Using Jdbc i created two object to access two collection. Consider i have 1 lack record in each collection. If i try to compare each single document value with another collection, it takes 1 lack * 1 lack comparison. ?? after that i want to match it with 'y' value ??
Can any one suggest me how can i efficiently convert this query to MongoDB jdbc query ??
Thanks

mongdb doesn't support joins like that so you'd need to do multiple queries. something like this maybe:
db.collection1.distinct( 'id', { y: { $in: [...] } } )
then take those IDs and do another $in query against collection2.
Though, I have to ask why you'd have a table without unique IDs.

With classic RDBMS, you modeled your base and then you write your queries.
With MongoDB, it's tend to be the opposite : you list your use cases, i.e. access pattern and you model your data according to your needs.

The Mongo Java driver does not support SQL or the JDBC API. MongoDB does not support joins. If you want to use SQL, there is a JDBC driver available: JDBC Driver for MongoDB. You can also avoid joins by combining the two collections into one using nesting.

Related

How to use SUM inside COALESCE in JOOQ

Given below is a gist of the query, which I'm able to run successfully in MySQL
SELECT a.*,
COALESCE(SUM(condition1 or condition2), 0) as countColumn
FROM table a
-- left joins with multiple tables
GROUP BY a.id;
Now, I'm trying to use it with JOOQ.
ctx.select(a.asterisk(),
coalesce(sum("How to get this ?")).as("columnCount"))
.from(a)
.leftJoin(b).on(someCondition)
.leftJoin(c).on(someCondition))
.leftJoin(d).on(someCondition)
.leftJoin(e).on(someCondition)
.groupBy(a.ID);
I'm having a hard time preparing the coalesce() part, and would really appreciate some help.

jOOQ's API is more strict about the distinction between Condition and Field<Boolean>, which means you cannot simply treat booleans as numbers as you can in MySQL. It's usually not a bad idea to be explicit about data types to prevent edge cases, so this strictness isn't necessarly a bad thing.
So, you can transform your booleans to integers as follows:
coalesce(
sum(
when(condition1.or(condition2), inline(1))
.else_(inline(0))
),
inline(0)
)
But even better than that, why not use a standard SQL FILTER clause, which can be emulated in MySQL using a COUNT(CASE ...) aggregate function:
count().filterWhere(condition1.or(condition2))

JOOQ multiple select count in one connection with PostgreSQL

I have a table SUBSCRIPTION, I want to run multiple selectCount written with JOOQ in one connection with different predicates to the database.
To do so, I have created a list of queries:
List<Query> countQueries = channels.stream().map(c ->
selectCount().from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.equal(c))
).collect(toList());
And finally, I have launched this list of queries using batch:
using(configuration).batch(countQueries).execute();
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
Is this the right way to run multiple selectCount using JOOQ?
What is the signification of the integer array returned by the execute method?
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.

Comments on your assumptions
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
The batch() API can only be used for DML queries (INSERT, UPDATE, DELETE), just like with native JDBC. I mean, you can run the queries as a batch, but you cannot fetch the results this way.
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Plain SQL queries almost always translate quite literally to jOOQ, so you can apply the technique from that article also in your case. In fact, you should! Running so many queries is definitely not a good idea.
Translating that linked query to jOOQ
So, let's look at how to translate that plain SQL example from the link to your case:
Record record =
ctx.select(
channels.stream()
.map(c -> count().filterWhere(CHANNEL.CODE.equal(c)).as(c))
.collect(toList())
)
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // Not strictly necessary, but might speed up things
.fetch();
This will produce a single record containing all the count values.
As always, this is assuming the following static import
import static org.jooq.impl.DSL.*;
Using classic GROUP BY
Of course, you can also just use a classic GROUP BY in your particular case. This might even be a bit faster:
Result<?> result =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // This time, you need to filter
.groupBy(CHANNEL.CODE)
.fetchOne();
This now produces a table with one count value per code. Alternatively, fetch this into a Map<String, Integer>:
Map<String, Integer> map =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels))
.groupBy(CHANNEL.CODE)
.fetchMap(CHANNEL.CODE, count());

Which datastore to use when you have unbounded(dynamic) number of fields/attributes for an entity?

I am designing a system where I have a fixed set of attributes (an entity) and then some dynamic attributes per client.
e.g. customer_name, customer_id etc are common attributes.
whereas order_id, patient_number, date_of_joining etc are dynamic attributes.
I read about EVA being an anti-pattern. I wish to use a combination of mysql and a nosql datastore for complex queries. I already use elastic search.
I cannot let the mapping explode with unlimited number of fields. So I have devised the following model:
mysql :
customer, custom_attribute, custom_attribute_mapping, custom_attribute_value
array of nested documents in elasticsearch :
[{
"field_id" :123,
"field_type" : "date",
"value" : "01/01/2020" // mapping type date - referred from mysql table at time on inserting data
}...]
I cannot use flattened mappings on es, as I wish to use range queries as well on custom fields.
Is there a better way to do it? Or an obvious choice of another database that I am too naive to see?
If I need to modify the question to add more info, I'd welcome the feedback.
P.S. : I will have large data (order in 10s of millions of records)

Why not using something like mongoDB as a pure NoSQL database.
Or as non-popular solution, I would recommend triple stores such as virtuoso or any other similar ones. Then you can use SPARQL as a query language over them and there are many drivers for such stores, e.g. Jena for Java.
Triples stores allow you to store data in the format of <Subject predicate object>
wherein your case subject is the customer id, predicates are the attributes and object will be the value. All standard and dynamic attributes will be in the same table.
Triple stores can be modeled as 3 columns table in any database management system.

Runtime SQL Query Builder

My question is similar to
Is there any good dynamic SQL builder library in Java?
However one important point taken from above thread:
Querydsl and jOOQ seem to be the most popular and mature choices however there's one thing to be aware of: Both rely on the concept of code generation, where meta classes are generated for database tables and fields. This facilitates a nice, clean DSL but it faces a problem when trying to create queries for databases that are only known at runtime.
Is there any way to create the queries at runtime besides just using plain JDBC + String concatenation?
What I'm looking for is a web application that can be used to build forms to query existing databases. Now if something like that already exists links to such a product would be welcome too.

While source code generation for database meta data certainly adds much value to using jOOQ, it is not a prerequisite. Many jOOQ users use jOOQ for the same use-case that you envision. This is also reflected in the jOOQ tutorials, which list using jOOQ without code generation as a perfectly valid use-case. For example:
String sql = create.select(
fieldByName("BOOK","TITLE"),
fieldByName("AUTHOR","FIRST_NAME"),
fieldByName("AUTHOR","LAST_NAME"))
.from(tableByName("BOOK"))
.join(tableByName("AUTHOR"))
.on(fieldByName("BOOK", "AUTHOR_ID").eq(
fieldByName("AUTHOR", "ID")))
.where(fieldByName("BOOK", "PUBLISHED_IN").eq(1948))
.getSQL();
In a similar fashion, bind values can be extracted from any Query using Query.getBindValues().
This approach will still beat plain JDBC + String concatenation for dynamic SQL statements, as you do not need to worry about:
Syntax correctness
Cross-database compatibility
SQL Injection
Bind variable indexing
(Disclaimer: I work for the vendor of jOOQ)

SQLBuilder http://openhms.sourceforge.net/sqlbuilder/ is very useful for me.
Some simple examples:
String query1 = new InsertQuery("table1")
.addCustomColumn("s01", "12")
.addCustomColumn("stolbez", 19)
.addCustomColumn("FIRSTNAME", "Alexander")
.addCustomColumn("LASTNAME", "Ivanov")
.toString();
String query2 = new UpdateQuery("table2")
.addCustomSetClause("id", 1)
.addCustomSetClause("FIRSTNAME", "Alexander")
.addCustomSetClause("LASTNAME", "Ivanov")
.toString();
Results:
INSERT INTO table1 (s01,stolbez,FIRSTNAME,LASTNAME) VALUES ('12',19,'Alexander','Ivanov')
UPDATE table2 SET id = 1,FIRSTNAME = 'Alexander',LASTNAME = 'Ivanov'

I have a custom solution for dynamically generating such SQL queries with just 2-3 classes for similar requirement. It is a simple approch.
This can be referred at Creating Dynamic SQL queries in Java
For simpler use cases like a dynamic filter condition based on the inputs selected from UI, one can use the below simpler approach by directly modifying the query in below style:
select t1.id, t1.col1, t1.col2,
from table1 t1
where (:col1Value is null or t1.col1 = :col1Value)
and (:col2Value is null or t1.col2 = :col2Value);
Here values for col1 or col2 can be null but the query will work fine.

How to add arbitrary columns to Cassandra using CQL with Datastax Java driver?

I have recently started taking much interest in CQL as I am thinking to use Datastax Java driver. Previously, I was using column family instead of table and I was using Astyanax driver. I need to clarify something here-
I am using the below column family definition in my production cluster. And I can insert any arbitrary columns (with its value) on the fly without actually modifying the column family schema.
create column family FAMILY_DATA
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'BytesType'
and gc_grace = 86400;
But after going through this post, it looks like- I need to alter the schema every time whenever I am getting a new column to insert which is not what I want to do... As I believe CQL3 requires column metadata to exist...
Is there any other way, I can still add arbitrary columns and its particular value if I am going with Datastax Java driver?
Any code samples/example will help me to understand better.. Thanks..

I believe in CQL you solve this problem using collections.
You can define the data type of a field to be a map, and then insert arbitrary numbers of key-value pairs into the map, that should mostly behave as dynamic columns did in traditional Thrift.
Something like:
CREATE TABLE data ( data_id int PRIMARY KEY, data_time long, data_values map );
INSERT INTO data (data_id, data_time, data_values) VALUES (1, 21341324, {'sum': 2134, 'avg': 44.5 });
Here is more information.
Additionally, you can find the mapping between the CQL3 types and the Java types used by the DataStax driver here.

If you enable compact storage for that table, it will be backwards compatible with thrift and CQL 2.0 both of which allow you to enter dynamic column names.
You can have as many columns of whatever name you want with this approach. The primary key is composed of two things, the first element which is the row_key and the remaining elements which when combined as a set form a single column name.
See the tweets example here
Though you've said this is in production already, it may not be possible to alter a table with existing data to use compact storage.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Increase efficiency in comparing one field with multiple collection Mongodb Jdbc - java

mongdb doesn't support joins like that so you'd need to do multiple queries. something like this maybe: db.collection1.distinct( 'id', { y: { $in: [...] } } ) then take those IDs and do another $in query against collection2. Though, I have to ask why you'd have a table without unique IDs.

With classic RDBMS, you modeled your base and then you write your queries. With MongoDB, it's tend to be the opposite : you list your use cases, i.e. access pattern and you model your data according to your needs.

The Mongo Java driver does not support SQL or the JDBC API. MongoDB does not support joins. If you want to use SQL, there is a JDBC driver available: JDBC Driver for MongoDB. You can also avoid joins by combining the two collections into one using nesting.

Related

How to use SUM inside COALESCE in JOOQ

JOOQ multiple select count in one connection with PostgreSQL

Which datastore to use when you have unbounded(dynamic) number of fields/attributes for an entity?

Runtime SQL Query Builder

How to add arbitrary columns to Cassandra using CQL with Datastax Java driver?

Categories

Resources