I have an application in Java requiring me to find specific records given specific conditionals. For example, I have the table:
id
song
artist
record_label
1
Never Gonna Give You Up
Rick Astley
Rickroll'd Records
2
Blackbird
The Beatles
Apple Records
3
Yesterday
The Beatles
Apple Records
4
WonderWall
Oasis
Columbia Records
I'd like to bulk query a subset of them based on specific conditions. Something similar to:
SELECT id FROM songs
WHERE
(song = 'Blackbird' AND artist = 'The Beatles' AND record_label = 'Apple Records') OR
(song = 'WonderWall' AND artist = 'Oasis' AND record_label = 'Columbia Records') OR
(song = 'Yesterday' AND artist = 'The Beatles' AND record_label = 'Apple Records')
The application is going to receive these conditions from the user and could be trying to find thousands of these records. As a result, I'm hoping to find a way to do this without any case of SQL injection and in as little queries as possible.
MY first approach would be some flavor of PreparedStatement where I iterate through this SQL query to query each individual record:
SELECT id from songs WHERE song = ? AND artist = ? AND record_label = ?
This prevents SQL injection, but I feel like this could be optimized more as we hammer the DB with thousands of these requests in seconds.
Another option is to create a temp table, import our passed conditions into the temp table and do an INNER JOIN on the songs table to only retrieve the rows that match between the 2. This solves both problems, but it requires a good amount of development work.
I'm wondering if there's any other methods I haven't taken into account. Thanks in advance for any suggestions!
One way I can think of, is to pass the parameters as a JSON string, then you can have a single parameter:
SELECT id
FROM songs
WHERE (song, artist, record_label)
in (select item ->> 'song',
item ->> 'artist',
item ->> 'record_label'
from jsonb_array_elements(cast(? as jsonb)) as p(item)
);
The parameter would then be a String passed through PreparedStatement.setString().
For your sample query e.g.
[
{"song": "Blackbird", "artist": "The Beatles", "record_label": "Apple Records"},
{"song": "Wonderwall", "artist": "Oasis", "record_label": "Columbia Records"},
{"song": "Yesterday", "artist": "The Beatles", "record_label": "Apple Records"}
]
Not sure about performance, but the OR condition is usually a performance killer to begin with, so the small overhead of parsing and unnesting the JSON array shouldn't make a big difference.
A simple test show how Spring Boot does not batch the query:
myRepo.findAllById(IntStream
.range(0, 1000000)
.mapToObj(i -> UUID.randomUUID())
.collect(toList()));
producing a big query
... where myRepo0_.id in (? , ? , ?, ...
and failing with big ranges.
The default saveAll JPA implementation neither use batching.
Probably you should test the speed of perform one by one queries:
List<MyResult> myResults = myQueryParams
.stream()
.map(qp -> myRepo.findByMyParams(qp...))
.collect(toList());
If is too slow, check the query is optimal and only if the speed is really important (e.g. if the transaction is too long for a HTTP request you can do it asynchronously, paging, ...) then use batching.
To batching queries you can create a temporary (only for your specific query, it will not lock db objects):
long t0 = System.currentTimeMillis();
try(Connection cnx = primaryDataSource.getConnection()) {
cnx.setAutoCommit(false);
cnx.createStatement().execute("create temp table resultSet(id uuid)");
PreparedStatement s = cnx.prepareStatement("insert into resultSet(id) select id from tq_event where id = ?");
for(int i = 0; i < 1000000; i++) {
s.setObject(1, UUID.fromString("39907bfb-f77a-47a3-9ab6-2b4794c7d6ec"));
s.addBatch();
}
s.executeBatch();
cnx.commit();
ResultSet rs = cnx.createStatement().executeQuery("select id from resultSet");
while(rs.next())
System.out.printf("%s%n", rs.getString(1));
cnx.createStatement().execute("drop table resultSet");
cnx.commit();
}
System.out.printf("Time: %d mS%n", System.currentTimeMillis() - t0);
running from my PC querying via ssh to a cloud PostgreSQL database (very slow connection), query 1.000.000 rows take Time: 179640 mS (avg 0.18mS per row).
You can try the #a_horse_with_no_name solution:
PreparedStatement s = cnx.prepareStatement("select uid from test where (id) in (select (k ->> 'id')::integer from jsonb_array_elements(cast(? as jsonb)) as p(k))");
s.setString(1, "[" + IntStream.range(1, 1000000).mapToObj(i -> "{\"id\": " + i + "}").collect(joining(",")) + "]");
ResultSet rs = s.executeQuery();
while(rs.next())
System.out.printf("%s%n", rs.getString(1));
witch run much more faster (only one query is sent to the server) but casting datatypes could be a problem and you will have a limit of 2Gbytes! for all your parameters (looks good for many cases).
Related
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '#gmail.com' at line 1
I don't know where is the problem
public List<UserModel> listUser(String emailParam) throws SQLException {
List<UserModel> users = new ArrayList<>();
Connection conn = null;
PreparedStatement pre = null;
ResultSet rs = null;
// Get Connection
conn = dataSource.getConnection();
// fetch query
String fetchUser = "SELECT * FROM user WHERE email = " + emailParam;
pre = conn.prepareStatement(fetchUser);
// execute query
rs = pre.executeQuery();
// fetch data using resultSet interface;
while (rs.next()) {
Integer id = rs.getInt("id");
String firstName = rs.getString("firstName");
...
String email = rs.getString("email");
Boolean isActive = rs.getBoolean("isActive");
Boolean isLibrarian = rs.getBoolean("isLibrarian");
// insert into user constructor
UserModel theUser = new UserModel(id, firstName, lastName, gender,
department, idNo, contactNo, address, email, null,
isLibrarian, isActive);
// insert into ArrayList
users.add(theUser);
}
// close connection
close(conn, pre, rs);
return users;
}
where is the problem Thanks in advance.
The error is here, in listUser():
// fetch query
String fetchUser = "SELECT * FROM user WHERE email = " + emailParam;
pre = conn.prepareStatement(fetchUser);
You managed to use a prepared statement when inserting the user, and you need to do the same here when querying:
// fetch query
String fetchUser = "SELECT * FROM user WHERE email = ?";
pre = conn.prepareStatement(fetchUser);
pre.setString(1, emailParam);
As a general thought, preparing statements has two main usages:
1) Minimise preparation time when executing a query
2) For security - among other things "query rewriting"
I also have an observation about a syntax error in your select at the bottom of this post.
If you are going to prepare statements, then it is better to do it once, then "remember" the preparedStatement that you get back. Do not prepare the same query over and over.
Most, if not all, DBMS's work as follows for prepared query processing:
1) you send the template query to the DBMS for parsing and optimisation. The output of this is known by a few different names, but for the purposes of this we can call this the "executable plan". This is the PrepareXXX call.
2) The DBMS remembers all of those details for the second stage i.e. when you send the data up as a result of the prepdQuery.executeQuery() (or similar) call. this has the effect of sending up the data and plugging it into the prepared query's executable plan.
This will always involve two network trips (one to prepare and one to execute). However ....
... If you need to run the same query again with different data (e.g. a different email), just execute the second step - this bypasses the overheads associated with parsing and planning. Which will increase your throughput significantly - especially for single row operations such as the insert (and most likely the select) shown above.
The alternative is the string concatenation method which will always require parsing etc and execution - but at least it will be only one trip over the network. This works best for longer running queries (where parse time is insignificant compared to execution times) or where the query logic is dynamic (made up at run time based upon user input).
However, if you do send the query text concatenated with user input, make sure you address point 2 above (query rewriting).
Also, finally, your concatenated SQL is missing single quotes.
The query must look like this (the text must be quoted)
select ... from ... where email = 'email#domain.com';
Thus your concatenation must look like this:
String fetchUser = "SELECT * FROM user WHERE email = '" + emailParam + "'";
What is query rewriting? Imaging if the emailParam entered by the user looked like this:
emailParam = "'; delete from user all; select 'hello"
Try plugging that into your select BUT DO NOT RUN IT unless you have a backup copy of your users table (or you get lucky).
Also, note that you never put quote marks around the ? placeholders in prepared queries - even if the parameter is a text or date value.
I have performance problems when querying CLOBs and LONGs of big Oracle database tables.
So far, I wrote the following unit tests with cx_Oracle (python) and JDBC (java):
Python code using cx_Oracle:
class CXOraclePerformanceTest(TestCase):
def test_cx_oracle_performance_with_clob(self):
self.execute_cx_oracle_performance("CREATE TABLE my_table (my_text CLOB)")
def test_cx_oracle_performance_with_long(self):
self.execute_cx_oracle_performance("CREATE TABLE my_table (my_text LONG)")
def execute_cx_oracle_performance(self, create_table_statement):
# prepare test data
current_milli_time = lambda: int(round(time.time() * 1000))
db = cx_Oracle.connect(CONNECT_STRING)
db.cursor().execute(create_table_statement)
db.cursor().execute("INSERT INTO my_table (my_text) VALUES ('abc')")
for i in range(13):
db.cursor().execute("INSERT INTO my_table (my_text) SELECT 'abc' FROM my_table")
row_count = db.cursor().execute("SELECT count(*) FROM my_table").fetchall()[0][0]
self.assertEqual(8192, row_count)
# execute query with big result set
timer = current_milli_time()
rows = db.cursor().execute("SELECT * FROM my_table")
for row in rows:
self.assertEqual("abc", str(row[0]))
timer = current_milli_time() - timer
print("{} -> duration: {} ms".format(create_table_statement, timer))
# clean-up
db.cursor().execute("DROP TABLE my_table")
db.close()
Java code using ojdbc7.jar:
public class OJDBCPerformanceTest {
#Test public void testOJDBCPerformanceWithCLob() throws Exception {
testOJDBCPerformance("CREATE TABLE my_table (my_text CLOB)");
}
#Test public void testOJDBCPerformanceWithLong() throws Exception {
testOJDBCPerformance("CREATE TABLE my_table (my_text LONG)");
}
private void testOJDBCPerformance(String createTableStmt) throws Exception {
// prepare connection
OracleConnection connection = (OracleConnection) DriverManager.getConnection(connectionString);
connection.setAutoCommit(false);
connection.setDefaultRowPrefetch(512);
// prepare test data
Statement stmt = connection.createStatement();
stmt.execute(createTableStmt);
stmt.execute("INSERT INTO my_table (my_text) VALUES ('abc')");
for (int i = 0; i < 13; i++)
stmt.execute("INSERT INTO my_table (my_text) SELECT 'abc' FROM my_table");
ResultSet resultSet = stmt.executeQuery("SELECT count(*) FROM my_table");
resultSet.next();
Assert.assertEquals(8192, resultSet.getInt(1));
// execute query with big result set
long timer = new Date().getTime();
stmt = connection.createStatement();
resultSet = stmt.executeQuery("SELECT * FROM my_table");
while (resultSet.next())
Assert.assertEquals("abc", resultSet.getString(1));
timer = new Date().getTime() - timer;
System.out.println(String.format("%s -> duration: %d ms", createTableStmt, timer));
// clean-up
stmt = connection.createStatement();
stmt.execute("DROP TABLE my_table");
}
}
Python test output:
CREATE TABLE my_table (my_text CLOB) -> duration: 31186 ms
CREATE TABLE my_table (my_text LONG) -> duration: 218 ms
Java test output:
CREATE TABLE my_table (my_text CLOB) -> duration: 359 ms
CREATE TABLE my_table (my_text LONG) -> duration: 14174 ms
Why is the difference between both durations so high?
What can I do to improve the performance in one or both programs?
Is there any Oracle specific option or parameter which I can use to improve the query performance?
To get the same performance as LONG, you need to tell cx_Oracle to fetch the CLOBs in that fashion. You can look at this sample:
https://github.com/oracle/python-cx_Oracle/blob/master/samples/ReturnLongs.py.
In your code, I added this method:
def output_type_handler(self, cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)
Then, after the connection to the database has been created, I added this code:
db.outputtypehandler = self.output_type_handler
With those changes, the performance is virtually identical.
Note that behind the scenes, cx_Oracle is using dynamic fetching and allocation. This method works very well for small CLOBs (where small generally means a few megabytes or less). In that case, the database can send the data directly, whereas when LOBs are used, just the locator is returned to the client and then another round trip to the database is required to fetch the data. As you can imagine, that significantly slows down the operation, particularly if the database and client are separated on the network!
After some research I can partly answer my question.
I managed to improve the OJDBC performance. The OJDBC API provides the property useFetchSizeWithLongColumn with which you can query LONG columns very fast.
New query duration:
CREATE TABLE my_table (my_text LONG) -> duration: 134 ms
Oracle documentation:
THIS IS A THIN ONLY PROPERTY. IT SHOULD NOT BE USED WITH ANY OTHER DRIVERS.
If set to "true", the performance when retrieving data in a 'SELECT' will be improved but the default behavior for handling LONG columns will be changed to fetch multiple rows (prefetch size). It means that enough memory will be allocated to read this data. So if you want to use this property, make sure that the LONG columns you are retrieving are not too big or you may run out of memory. This property can also be set as a java property :
java -Doracle.jdbc.useFetchSizeWithLongColumn=true myApplication
Or via API:
Properties props = new Properties();
props.setProperty("useFetchSizeWithLongColumn", "true");
OracleConnection connection = (OracleConnection) DriverManager.getConnection(connectionString, props);
http://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html
http://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html
I still have no solution for cx_Oracle. That's why I opened a github issue:
https://github.com/oracle/python-cx_Oracle/issues/63
I'm trying to get the equivalent for this code on Oracle & MySQL
if(vardbtype.equals("POSTGRESQL")){
Long previousTxId = 0L;
Long nextTxId = 0L;
Class.forName("org.postgresql.Driver");
System.out.println("----------------------------");
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
c.setAutoCommit(false);
while(true) {
stmts.clearParameters();
try(ResultSet rss = max.executeQuery()) {
if(rss.next()) {
nextTxId = rss.getLong(1);
}
}
stmts.setLong(1, previousTxId);
stmts.setLong(2, nextTxId + 1);
try(ResultSet rss = stmts.executeQuery()) {
while(rss.next()) {
String message = rss.getString("MESSAGE");
System.out.println("Message = " + message);
TextMessage mssg = session.createTextMessage(message);
System.out.println("Sent: " + mssg.getText());
producer.send(mssg);
}
previousTxId = nextTxId;
}
Thread.sleep(batchperiod2);
}
}
}
Basically, the code works to get contents inside a database's table and sent it to ActiveMQ. And when the table updated, it will sent the content that just updated (not sending the past that was sent). But this code only works on PostgreSQL
Then i'm planning to create an "if" function. So i can use another database to getting the data (Oracle and MySQL).
I guess i must change this code right?
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
A couple thoughts supplemental to Thorsten's answer.
First, xmin is a system column which is, iirc, stored in the row header on disk. It is updated by writes. I have not yet run into a case where the transaction id's don't increase. However, there has to be some wraparound point. I think you are better off with a trigger which stores the transaction ids in another table for processing for this reason (and using that to process things).
For Oracle and MySQL, underlying storage is sufficiently different that I don't see how you can do this directly.
If you want a common solution you want a queue table where you can use a trigger to insert waiting copies, and then select/delete from that in your worker. This will likely work better on MySQL than on PostgreSQL, and for Oracle you want to look for index-oriented tables. If autovacuum has trouble keeping up, ask more questions or hire a consultant.
After further research
InnoDB provides a DB_TRX_ID column which is similar. Note you cannot assume you have this column if you are running MySQL because MySQL has different table storage engines and not all even support transactions. So that is an important limitation.
I was unable to locate a similar column on Oracle.
This script is looking in intervals at a table and putting out all inserted messages since that last loop.
PostgreSQL stores the transaction number that inserted a record, so this can be used to find the newly inserted records (although I am not sure whether it is guaranteed for a new transaction to have a higher number than all previous ones as the script assumes).
Other DBMS don't have this pseudo column. So you would have to have a timestamp column in your table and use this instead. You'd have to change the two queries as well as the code to match the data type (I suppose java.sql.Timestamp instead of Long, but I am no Java guy).
I have a table with millions of records in it. In order to make the system faster, I need to implement the pagination concept in my Java code. I need to fetch just 1000 records at a time and process them, then pick another 1000 records and do my processing and so on. I have already tried a few things and none of them is working. Some of the things I tried are listed below -
1) String query = "select * from TABLENAME" + " WHERE ROWNUM BETWEEN %d AND %d";
sql = String.format(query, firstrow, firstrow + rowcount);
In the above example, when the query is SELECT * from TABLENAME Where ROWNUM BETWEEN 0 and 10 it gives me a result but when the query is SELECT * from TABLENAME Where ROWNUM BETWEEN 10 and 20, it returns an empty result set. I even tried to run it in the DB, it return Empty result set (not sure why!!)
2) preparedStatement.setFetchSize(100); I have that in my Java code, but it still fetches all the records from the table. Adding this statement didnt affect my code in anyway.
Please help!
It sounds like you are not actually needed to paginate the results but to just process the results in batches. If this is the case then all you need to do is set the fetch size to 1000 using setFetchSize and iterate over the resultset as usual (using resultset.next()) and process the results as you iterate. There are many resources describing setFetchSize and what it does. Do some research:
What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
How JDBC Statement.SetFetchsize exaclty works
What and when should I specify setFetchSize()?
For oracle pagination there are a lot of resources describing how to do this. Just do a web search. Here are a couple of resources that describe how to do it:
http://www.databasestar.com/limit-the-number-of-rows-in-oracle/
http://ocptechnology.com/how-to-use-row-limiting-clause/
Pagination is not very useful if you do not define a consistent ordering (ORDER BY clause) since you cannot rely on the order they are returned.
This answer explains why your BETWEEN statement is not working: https://stackoverflow.com/a/10318244/908961
From the answer if using oracle older than 12c you need to do a sub select to get your results. Something like:
SELECT c.*
FROM (SELECT c.*, ROWNUM as rnum
FROM (SELECT * FROM TABLENAME ORDER BY id) c) c
WHERE c.rnum BETWEEN %d AND %d
If you are using Oracle 12c or greater I would recommend using the newer OFFSET FETCH syntax instead of fiddling with rownum. See the first link above or
http://www.toadworld.com/platforms/oracle/b/weblog/archive/2016/01/23/oracle-12c-enhanced-syntax-for-row-limiting-a-k-a-top-n-queries
So your query would be something like
String query = "select * from TABLENAME OFFSET %d ROWS FETCH NEXT 1000 ONLY";
String.format(query, firstrow);
or using prepared statements
PreparedStatement statement = con.prepareStatement("select * from TABLENAME OFFSET ? ROWS FETCH NEXT 1000 ONLY");
statement.setInt(1, firstrow);
ResultSet rs = statement.executeQuery();
Alternately you can also use the limit keyword as described here http://docs.oracle.com/javadb/10.10.1.2/ref/rrefjdbclimit.html and your query would be something like
String query = "select * from TABLENAME { LIMIT 1000 OFFSET %d }";
String.format(query, firstrow);
The normal way to implement pagination in Oracle is to use an analytic windowing function, e.g. row_number together with an ORDER BY clause that defines the row ordering. The query with the analytic function is then wrapped into an inline view (or a "window"), from which you can query the row numbers you need. Here's an example that queries the first 1000 rows from my_table (ordering by column_to_sort_by):
select rs.* from
(select t.*,
row_number() over (order by column_to_sort_by) as row_num
from my_table t
) rs
where rs.row_num >= 1 and rs.row_num < 1001
order by rs.row_num
A JDBC implementation could then look like the following:
public void queryWithPagination() throws SQLException {
String query = "select rs.* from"
+ " (select t.*,"
+ " row_number() over (order by column_to_sort_by) as row_num"
+ " from my_table t"
+ " ) rs"
+ " where rs.row_num >= ? and rs.row_num < ?"
+ " order by rs.row_num";
final int pageSize = 1000;
int rowIndex = 1;
try (PreparedStatement ps = myConnection.prepareStatement(query)) {
do {
ps.setInt(1, rowIndex);
ps.setInt(2, rowIndex + pageSize);
rowIndex += pageSize;
} while (handleResultSet(ps, pageSize));
}
}
private boolean handleResultSet(PreparedStatement ps, final int pageSize)
throws SQLException {
int rows = 0;
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
/*
* handle rows here
*/
rows++;
}
}
return rows == pageSize;
}
Note that the table should remain unchanged while you're reading it so that the pagination works correctly across different query executions.
If there are so many rows in the table that you're running out of memory, you probably need to purge/serialize your list after some pages have been read.
EDIT:
If the ordering of rows doesn't matter to you at all, then -- as #bdrx mentions in his answer -- you don't probably need pagination, and the fastest solution would be to query the table without a WHERE condition in the SELECT. As suggested, you can adjust the fetch size of the statement to a larger value to improve throughput.
Trying my hand at JSONB datatype first time(discussion continued from (Join tables using a value inside a JSONB column) on advise from #Erwin , starting new thread)
Two tables (obfuscated data and table names):
Discussion table { discussion_id int, contact_id, group_id, discussion_updates jsonb } [has around 600 thousand rows]
Authorization table { user_id varchar , auth_contacts jsonb, auth_groups jsonb } [has around 100 thousand rows]
auth_contacts jsonb data has key value pairs data (as example)
{ "CC1": "rr", "CC2": "ro" }
auth_groups jsonb data has key value pairs data (as example)
{ "GRP1": "rr", "GRP2": "ro" }
First, on inserts in database via Java JDBC. What I am doing is :
JSONObject authContacts = new JSONObject();
for (each record in data) {
authContacts.put(contactKey, contactRight);
authGroups.put(groupKey, groupRight);
}
String insertSql = "INSERT INTO SSTA_AuthAll(employee_id, auth_contacts, auth_groups) VALUES(?,?::jsonb,?::jsonb)";
//-- Connect to Db and prepare query
preparedStatement.setObject(2, authContacts.toJSONString());
preparedStatement.setObject(3, authGroups.toJSONString());
// INSERT into DB
Now, the toJSONString() takes time (as much as 1 second sometimes - TIME FOR toJSON STRING LOOP:17238ms) which again is inefficient.
So again is this right way to do it ? Most examples on google directly have a string which they insert.
If I directly insert a MAP into jsonb column, it expects an HSTORE extension which is what I shouldn't be using if I am going for jsonb?
Now on the next part:
I need to join contact_id from discussion table with contact_id of auth_contacts json datatype [which is key as shown in example above] and join group_id of auth_groups with group_id of discussion table
As of now tried join only on contact_id:
SELECT *
FROM discussion d
JOIN
(SELECT user_id, jsonb_object_keys(a.contacts) AS contacts
FROM auth_contacts a
WHERE user_id = 'XXX') AS c ON (d.contact_id = c.contacts::text)
ORDER BY d.updated_date DESC
This join for a user who has around 60 thousand authorized contacts takes around 60 ms and consecutive runs lesser - Obfuscated explain plan is as follows:
"Sort (cost=4194.02..4198.39 rows=1745 width=301) (actual time=50.791..51.042 rows=5590 loops=1)"
" Sort Key: d.updated_date"
" Sort Method: quicksort Memory: 3061kB"
" Buffers: shared hit=11601"
" -> Nested Loop (cost=0.84..4100.06 rows=1745 width=301) (actual time=0.481..44.437 rows=5590 loops=1)"
" Buffers: shared hit=11598"
" -> Index Scan using auth_contacts_pkey on auth_contacts a (cost=0.42..8.93 rows=100 width=888) (actual time=0.437..1.074 rows=1987 loops=1)"
" Index Cond: ((user_id)::text = '105037'::text)"
" Buffers: shared hit=25"
" -> Index Scan using discussion_contact_id on discussion d (cost=0.42..40.73 rows=17 width=310) (actual time=0.016..0.020 rows=3 loops=1987)"
" Index Cond: ((contact_id)::text = (jsonb_object_keys(a.contacts)))"
" Buffers: shared hit=11573"
"Planning time: 17.866 ms"
"Execution time: 52.192 ms"
My final aim is an additional join in the same query with group_id too. What jsonb_object_keys does is actually create a userid vs authcontacts mapping of each key. So for a user with 60 thousand contacts it will create a view of 60 thousand rows (probably in memory). Now if I include join on auth_groups (which for sample user with 60 thousand contacts would have around 1000 thousand groups which would make the query slower.
So is this the right way to do join on jsonb object and is there a better way to do this?