I was successfully able to execute a jpql query and print the result which is stored in a queryResults variable. What I want to achieve next is storing just the IDs (primary key column) in a list without the date (value), but I am not too sure if this is possible; perhaps using something like a java map. Is it possible? If yes, how can this be easily achieved?
private static final TestDao Test_DAO = new TestDao();
#Test
public void testById() {
List<TestEntity> queryResults = TEST_DAO.findById(""); //The record from the sql query is stored in queryResults and findById("") is the method that executes the query in a TestDao class and it is called here
for (TestEntity qResult: queryResults) { // looping through the query result to print the rows
System.out.println(qResult.getId());
System.out.println(qResult.getDate());
}
System.out.println("This is the sql result " + queryResults );
}
Output:
This is the result [TestEntity(id=101, date=2020-01-19 15:12:32.447), TestEntity(id=102, date=2020-09-01 11:04:10.0)]// I want to get the IDs 101 and 102 and store in a list without the Dates
I tried using a map this way:
Map<Integer, Timestamp> map= (Map<Integer, Timestamp>) queryResults.get(0); but I got an exception:
java.lang.ClassCastException: TestEntity cannot be cast to java.util.Map
There are some points before the implementation.
Why are you defining DAO as static? I think this is a bad implementation unless I am missing a particular reason you declared it static. You should define this as a member variable and not a static member
The naming of the method - findById() translated in English is - find Something by this Id, but you are fetching a list of Records, so naming is not correct.
Point 2 becomes invalid if ID property is not a Primary Key in your table, then it makes sense, but still naming is bad. Id is something we use to define Primary Key in the Database and should be and will be unique. But your comments suggest that ID is unique and the Primary Key. So read about how Databases work
And even if not unique, if you pass an Id to find some records, why will get different ids in the Records !!!
About implementation:
Changing in your existing code:
private TestDao Test_DAO = new TestDao();
#Test
public void testById() {
List<TestEntity> queryResults = TEST_DAO.findById("");
List<Long> listOfIds = new ArrayList<>(); // Assuming Id is Long type, same logic for any type
for (TestEntity qResult: queryResults) {
System.out.println(qResult.getId());
listOfIds.add(qResult.getId()); // Just add it to the list
System.out.println(qResult.getDate());
}
}
In case you want to be efficient with the query:
You can use JPQL and hibernate
You can then write a query like:
String query = "select te.id from TestEntity te";
// Create the TypedQuery using EntityManager and then get ResultSet back
List<Long> ids = query.getResultList();
In case of using Spring-Data-Jpa, you can define the repository and define the method and pass the query with #Query annotation. Spring Data JPA
I'm trying to use a SimpleQuery to get distinct result from my solr collection, but even after setting my StatsOption with calcDistinct true, I can't get the result I want.
BTW I'm using spring-data-solr-2.1.4.RELEASE.
SampleCode:
Field field = new SimpleField("fieldName");
StatsOptions statsOptions = new StatsOptions().addField(field).setCalcDistinct(true);
SimpleQuery query = new SimpleQuery("*:*").setStatsOptions(statsOptions);
StatsPage<MyClass> statsPage = solrTemplate.queryForStatsPage(query, MyClass.class);
FieldStatsResult statsResult = statsPage.getFieldStatsResult(field);
Collection<Object> distinctValues = statsResult.getDistinctValues();
Set<String> result = distinctValues.stream().map((i) -> i.toString()).collect(Collectors.toSet());
return result;
After trying the above code, all I get is the max, min, count, but no results for distinct totals or distinct values.
What am I doing wrong in this sample?
Looks like your disctintValues collection is also an implementation of EmptyList which means that there are no values in the response.
Check if your query returns any result, first.
StatsOptions statsOptions = new StatsOptions().addField(field).setSelectiveCalcDistinct(true).setCalcDistinct(true);
The processStatsOptions method in org.springframework.data.solr.core.DefaultQueryParser is described as follows :
Boolean selectiveCountDistincts = statsOptions.isSelectiveCalcDistincts(field);
if (selectiveCountDistincts != null) {
solrQuery.add(selectiveCalcDistinctParam, String.valueOf(selectiveCountDistincts.booleanValue()));
}
I'd like to imagine there's existing API functionality for this. Suppose there was Java code that looks something like this:
JavaRDD<Integer> queryKeys = ...; //values not particularly important
List<Document> allMatches = db.getCollection("someDB").find(queryKeys); //doesn't work, I'm aware
JavaPairRDD<Integer, Iterator<ObjectContainingKey>> dbQueryResults = ...;
Goal of this: After a bunch of data transformations, I end up with an RDD of integer keys that I'd like to make a single db query with (rather than a bunch of queries) based on this collection of keys.
From there, I'd like to turn the query results into a pair RDD of the key and all of its results in an iterator (making it easy to hit the ground going again for the next steps I'm intending to take). And to clarify, I mean a pair of the key and its results as an iterator.
I know there's functionality in MongoDB capable of coordinating with Spark, but I haven't found anything that'll work with this yet (it seems to lean towards writing to a database rather than querying it).
I managed to figure this out in an efficient enough manner.
JavaRDD<Integer> queryKeys = ...;
JavaRDD<BasicDBObject> queries = queryKeys.map(value -> new BasicDBObject("keyName", value));
BasicDBObject orQuery = SomeHelperClass.buildOrQuery(queries.collect());
List<Document> queryResults = db.getCollection("docs").find(orQuery).into(new ArrayList<>());
JavaRDD<Document> parallelResults = sparkContext.parallelize(queryResults);
JavaRDD<ObjectContainingKey> results = parallelResults.map(doc -> SomeHelperClass.fromJSONtoObj(doc));
JavaPairRDD<Integer, Iterable<ObjectContainingKey>> keyResults = results.groupBy(obj -> obj.getKey());
And the method buildOrQuery here:
public static BasicDBObject buildOrQuery(List<BasicDBObject> queries) {
BasicDBList or = new BasicDBList();
for(BasicDBObject query : queries) {
or.add(query);
}
return new BasicDBObject("$or", or);
}
Note that there's a fromJSONtoObj method that will convert your object back from JSON into all of the required field variables. Also note that obj.getKey() is simply a getter method associated to whatever "key" it is.
I have noticed that you can pass "params" straight in to the boilerplate code below:
[fooInstanceList: Foo.list(params), fooInstanceTotal: Foo.count()]
Is it possible to pass "params" in as part of a Hibernate criteria for example the one below?
def c = Foo.createCriteria()
def results = c {
not { eq("bar","test") }
}
[fooInstanceList: results, fooInstanceTotal: results.size()]
I am looking to use the "max" and "offset" params so I can use it for paging for example. I would also like to use the equivalent of count that counts all non-paged results. I think results.size() would only give me paged results, instead of the desired non-paged results. How would I go about this?
You can use params while using the criteria. I suppose you have a typo of not using c.list
def c = Foo.createCriteria()
def results = c.list(params) {
not { eq("bar","test") }
}
Assuming params has max and offset.
Criteria returns a PagedResultList where you can get the totalCount from it. So
results.totalCount //results.getTotalCount()
should give you the total count, although there is always a second query fired to get the total count. In this case Hibernate does that for you instead of you doing it explicitly.
Friends!
I am using MongoDB in java project via spring-data. I use Repository interfaces to access data in collections. For some processing I need to iterate over all elements of collection. I can use fetchAll method of repository, but it always return ArrayList.
However, it is supposed that one of collections would be large - up to 1 million records several kilobytes each at least. I suppose I should not use fetchAll in such cases, but I could not find neither convenient methods returning some iterator (which may allow collection to be fetched partially), nor convenient methods with callbacks.
I've seen only support for retrieving such collections in pages. I wonder whether it is the only way for working with such collections?
Late response, but maybe will help someone in the future. Spring data doesn't provide any API to wrap Mongo DB Cursor capabilities. It uses it within find methods, but always returns completed list of objects. Options are to use Mongo API directly or to use Spring Data Paging API, something like that:
final int pageLimit = 300;
int pageNumber = 0;
Page<T> page = repository.findAll(new PageRequest(pageNumber, pageLimit));
while (page.hasNextPage()) {
processPageContent(page.getContent());
page = repository.findAll(new PageRequest(++pageNumber, pageLimit));
}
// process last page
processPageContent(page.getContent());
UPD (!) This method is not sufficient for large sets of data (see #Shawn Bush comments) Please use Mongo API directly for such cases.
Since this question got bumped recently, this answer needs some more love!
If you use Spring Data Repository interfaces, you can declare a custom method that returns a Stream, and it will be implemented by Spring Data using cursors:
import java.util.Stream;
public interface AlarmRepository extends CrudRepository<Alarm, String> {
Stream<Alarm> findAllBy();
}
So for the large amount of data you can stream them and process the line by line without memory limitation.
See https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongodb.repositories.queries
you can still use mongoTemplate to access the Collection and simply use DBCursor:
DBCollection collection = mongoTemplate.getCollection("boundary");
DBCursor cursor = collection.find();
while(cursor.hasNext()){
DBObject obj = cursor.next();
Object object = obj.get("polygons");
..
...
}
Use MongoTemplate::stream() as probably the most appropriate Java wrapper to DBCursor
Another way:
do{
page = repository.findAll(new PageRequest(pageNumber, pageLimit));
pageNumber++;
}while (!page.isLastPage());
Check new method to handle results per document basis.
http://docs.spring.io/spring-data/mongodb/docs/current/api/org/springframework/data/mongodb/core/MongoTemplate.html#executeQuery-org.springframework.data.mongodb.core.query.Query-java.lang.String-org.springframework.data.mongodb.core.DocumentCallbackHandler-
You may want to try the DBCursor way like this:
DBObject query = new BasicDBObject(); //setup the query criteria
query.put("method", method);
query.put("ctime", (new BasicDBObject("$gte", bTime)).append("$lt", eTime));
logger.debug("query: {}", query);
DBObject fields = new BasicDBObject(); //only get the needed fields.
fields.put("_id", 0);
fields.put("uId", 1);
fields.put("ctime", 1);
DBCursor dbCursor = mongoTemplate.getCollection("collectionName").find(query, fields);
while (dbCursor.hasNext()){
DBObject object = dbCursor.next();
logger.debug("object: {}", object);
//do something.
}
The best way to iterator over a large collection is to use the Mongo API directly. I used the below code and it worked like a charm for my use-case.
I had to iterate over more than 15M records and the document size was huge for some of those.
The following code is in Kotlin Spring Boot App (Spring Boot Version: 2.4.5)
fun getAbcCursor(batchSize: Int, from: Long?, to: Long?): MongoCursor<Document> {
val collection = xyzMongoTemplate.getCollection("abc")
val query = Document("field1", "value1")
if (from != null) {
val fromDate = Date(from)
val toDate = if (to != null) { Date(to) } else { Date() }
query.append(
"createTime",
Document(
"\$gte", fromDate
).append(
"\$lte", toDate
)
)
}
return collection.find(query).batchSize(batchSize).iterator()
}
Then, from a service layer method, you can just keep calling MongoCursor.next() on returned cursor till MongoCursor.hasNext() returns true.
An Important Observation: Please do not miss adding batchSize on 'FindIterable' (the return type of MongoCollection.find()). If you won't provide the batch size, the cursor will fetch initial 101 records and will hang after that (it tries to fetch all the remaining records at once).
For my scenario, I used the batch size as 2000, as it gave the best results during testing. This optimized batch size will be impacted by the average size of your records.
Here is the equivalent code in Java (removing createTime from query as it is specific to my data model).
MongoCursor<Document> getAbcCursor(Int batchSize) {
MongoCollection<Document> collection = xyzMongoTemplate.getCollection("your_collection_name");
Document query = new Document("field1", "value1");// query --> {"field1": "value1"}
return collection.find(query).batchSize(batchSize).iterator();
}
This answer is based on: https://stackoverflow.com/a/22711715/5622596
That answer needs a bit of an update as PageRequest has changed how it is being constructed.
With that said here is my modified response:
int pageNumber = 1;
//Change value to whatever size you want the page to have
int pageLimit = 100;
Page<SomeClass> page;
List<SomeClass> compondList= new LinkedList<>();
do{
PageRequest pageRequest = PageRequest.of(pageNumber, pageLimit);
page = repository.findAll(pageRequest);
List<SomeClass> listFromPage = page.getContent();
//Do something with this list example below
compondList.addAll(listFromPage);
pageNumber++;
}while (!page.isLast());
//Do something with the compondList: example below
return compondList;