How does one filter a SpatialIndexFeatureCollection?

How does one filter a SpatialIndexFeatureCollection? - java

When looking at the documentation for Geotools FeatureCollection, the subsection on Performance Options notes:
TreeSetFeatureCollection: the traditional TreeSet implementation used by default.
Note this does not perform well with spatial queries as the contents
are not indexed.
Later it recommends a SpatialIndexFeatureCollection for faster queries:
SpatialIndexFeatureCollection: uses a spatial index to hold on to
contents for fast visual display in a MapLayer; you cannot add more
content to this feature collection once it is used
DataUtilities.source( featureCollection ) will wrap
SpatialIndexFeatureCollection in a SpatialIndexFeatureSource that is
able to take advantage of the spatial index.
The example given is:
final SimpleFeatureType TYPE =
DataUtilities.createType("location","geom:Point,name:String");
WKTReader2 wkt = new WKTReader2();
SimpleFeatureCollection collection = new SpatialIndexFeatureCollection();
collection.add( SimpleFeatureBuilder.build( TYPE, new Object[]{ wkt.read("POINT(1,2)"), "name1"} ));
collection.add( SimpleFeatureBuilder.build( TYPE, new Object[]{ wkt.read("POINT(4,4)"), "name1"} ));
// Fast spatial Access
SimpleFeatureSource source = DataUtilities.source( collection );
SimpleFeatureCollection features = source.getFeatures( filter );
Besides not being able to compile this code (SimpleFeatureCollection is an interface and does not contain the member add), the code for SpatialIndexFeatureSource.getFeatures(Filter) directly calls SpatialIndexFeatureCollection.subCollection(Filter) which is defined as
public SimpleFeatureCollection subCollection(Filter filter) {
throw new UnsupportedOperationException();
}
Github
Here is an example of my own attempt to use this
FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();
SimpleFeatureCollection answers = getAnswers();
SpatialIndexFeatureCollection collection = new SpatialIndexFeatureCollection();
collection.addAll(answers);
SimpleFeatureSource source = DataUtilities.source( collection );
SimpleFeatureCollection gridCollection = getGridCollection();
SimpleFeatureIterator iter = gridCollection.features();
while(iter.hasNext()) {
SimpleFeature grid = iter.next();
Geometry gridCell = (Geometry) grid.getDefaultGeometry();
Filter gridFilter = ff.intersects(ff.property("geometry"), ff.literal(gridCell));
SimpleFeatureCollection results = source.getFeatures(combinedFilter);
}
Unsurprisingly, this results in a UnsupportedOperationException
I have not been able to get this example to work and would really like to take advantage of the spatial indexing. How am I supposed to use a SpatialIndexFeatureCollection similar to the above example?

SpatialIndexFeatureCollection now implements the subCollection method. See the PR here. I haven't had a chance to backport the changes but future releases will now work the way you expected.

Related

How to store dictionaries(map) in flatbuffer in Java

I was learning flatbuffers from this link , there was no example to demonstrate how to store dictionary(map). There was a mention of "Storing dictionaries in java/Csharp" in this link , but i did not understand much about it. I am from java background. Any example of how to store dictionary/map in flatbuffers in java would be helpful

I realize this is an old question but I came across it when I was trying to figure out the same thing. Here is what I did to get a "dictionary/map"
Schema File
namespace com.dictionary;
table DataItem{
keyValue:string(key);
name:string;
age:int;
codes[string];
}
table DictionaryRoot{
items:[DataItem];
}
root_type DictionaryRoot;
When you run this through the FlatBuffers compiler flatc -j schema.fbs it will produce two Java files, one named DictionaryRoot.java, the other named DataItem.java.
In your Java application
Using those two generated Java files you will need to construct the buffer. This has to be done from the innermost data to the outermost. So you need to construct your DataItems (and keep track of their offsets) before your DictionaryRoot.
In this example, let's assume that you have a map of Objects in Java that you need to create the buffer from.
List<Integer> offsets = new ArrayList<>();
FlatBufferBuilder builder = new FlatBufferBuilder(1024);
for (Entry<String, DataObj> entry : map.entrySet()) {
DataObj dataObj = entry.getValue();
// use the builder to create the data and keep track of their offsets
int keyValueOffset = builder.createString(entry.getKey());
int nameOffset = builder.createString(dataObj.getName());
int ageOffset = dataObj.getAge();
int[] codesOffsets = dataObj.getCodes().stream().mapToInt(builder::createString)
.toArray();
// use the builder to create a vector using the offsets from above
int codesVectorOffset = DataItem.createCodesVector(builder, codesOffsets);
// now with all the inner data offsets, create the DataItem
DataItem.startDataItem(builder);
DataItem.addKeyValue(builder, keyValueOffset);
DataItem.addName(builder, nameOffset);
DataItem.addAge(builder, ageOffset);
DataItem.addCodes(builder, codesVectorOffset);
// ensure you 'end' the DataItem to get the offset
int dataItemOffset = DataItem.endDataItem(builder);
// track the offsets
offsets.add(dataItemOffset);
}
// use the builder to create a sorted vector from your offsets. This is critical
int sortedVectorOffset = builder.createSortedVectorOfTables(new DataItem(),
offsets.stream().mapToInt(Integer::intValue).toArray());
// now with your sorted vector, create the DictionaryRoot
DictionaryRoot.startDictionaryRoot(builder);
DictionaryRoot.addItems(builder, sortedVectorOffset);
int dictRootOffset = DictionaryRoot.endDictionaryRoot(builder);
// signal to the builder you are done
builder.finish(dictRootOffset);
// Write data to file
FileOutputStream outputStream = new FileOutputStream("output.bin");
outputStream.write(builder.sizedByteArray());
outputStream.close();
I hope that will help someone else along their journey using FlatBuffers.

Convert list to hashmap

Title of the question may give you the impression that it is duplicate question, but according to me it is not.
I am just a few months old in Java and a month old in MongoDB, SpringBoot and REST.
I have a Mongo Collection with 3 fields in a document, _id (default field), appName and appKey. I am using list to iterate through all the documents and find one document whose appName and appKey matches with the one that is passed. This collection right now has only 4 entries, and thus it is running smoothly. But I was reading a bit about collections and found that if there will be a higher number of documents in a collection then the result with list will be much slower than hashMap.
But as I have already said that I am quite new to Java, I am having a bit of trouble converting my code to hashMap, so I was hoping if someone can guide me through this.
I am also attaching my code for reference.
public List<Document> fetchData() {
// Collection that stores appName and appKey
MongoCollection<Document> collection = db.getCollection("info");
List<Document> nameAndKeyList = new ArrayList<Document>();
// Getting the list of appName and appKey from info DB
AggregateIterable<Document> output = collection
.aggregate(Arrays.asList(new BasicDBObject("$group", new BasicDBObject("_id",
new BasicDBObject("_id", "$id").append("appName", "$appName").append("appKey", "$appKey"))
)));
for (Document doc : output) {
nameAndKeyList.add((Document) doc.get("_id"));
}
return nameAndKeyList;
}// End of Method
And then I am calling it in another method of the same class:
List<Document> nameAndKeyList = new ArrayList<>();
//InfoController is the name of the class
InfoController obj1 = new InfoController();
nameAndKeyList = obj1.fetchData();
// Fetching and checking if the appName & appKey pair
// is present in the DB one by one.
// If appName & appKey mismatches, it increments the value
// of 'i' and check them with the other values in DB
for (int i = 0; i < nameAndKeyList.size(); i++) {
"followed by my code"
And if I am not wrong then there will be no need for the above loop also.
Thanks in advance.

You just need a simple find query to get the record you need directly from Mongo DB.
Document document = collection
.find(new Document("appName", someappname).append("appKey", someappkey)).first();

First of all a list is not much slower or faster than an HashMap. A Hasmap is commonly used to save key-pair values such as "ID", "Name" or something like that. In your case I see you are using ArrayList without a specified size for the list. better use a linked list when you do not know the size because an arraylist is holding a array behind and extending this by copying. If you want to generate a Hasmap out of the List or use a Hasmap you need to map an ID and the value to the records.
HashMap<String /*type of the identifier*/, String /*type of value*/> map = new HashMap<String,String>();
for (Document doc : output) {
map.put(doc.get("_id"), doc.get("_value"));
}

First, avoid premature optimization (lookup the expression if you don’t know what it is). Put a realistic number of thousands of items containing near-realistic data in your list. Try to retrieve an item that isn’t there. This will force your for loop to traverse the entire list. See how long it takes. Try a number of times to get an impression of whether you get impatient. If you don’t, you’re done.
If you find out that you need a speed-up, I agree that HashMap is one of the obvious solutions to try. One of the first things to consider with this is a key type for you HashMap. As I understand, what you need to search for is an item where appName and appKey are both right. The good solution is to write a simple class with these two fields and equals and hashCode methods (I’ll call it DocumentHashMapKey for now, think of a better name). For hashCode(), try Objects.hash(appName, appKey). If it doesn’t give satisfactory performance with the data you have, consider alternatives. Now you are ready to build your HashMap< DocumentHashMapKey, Document>.
If you’re lazy or just want a first impression of how a HashMap performs, you may also build your keys by concatenating appName + "$##" + appKey (where the string in the middle is something that is unlikely to be part of a name or key) and use HashMap<String, Document>.
Everything I said can be refined depending on your needs. This was just to get you started.

Thanks everyone for your help, without which I would not have got to a solution.
public HashMap<String, String> fetchData() {
// Collection that stores appName and apiKey
MongoCollection<Document> collection = db.getCollection("info");
HashMap<String, String> appKeys = new HashMap<String, String>();
// Getting the list of appName and appKey from info DB
AggregateIterable<Document> output = collection
.aggregate(Arrays.asList(new BasicDBObject("$group", new BasicDBObject("_id",
new BasicDBObject("_id", "$id").append("appName", "$appName").append("appKey", "$appKey"))
)));
String appName = null;
String appKey = null;
for (Document doc : output) {
Document temp = (Document) doc.get("_id");
appName = (String) temp.get("appName");
appKey = (String) temp.get("appKey");
appKeys.put(appName, appKey);
}
return appKeys;
Calling the above method into another method of the same class.
InfoController obj = new InfoController();
//Fetching the values of 'appName' & 'appKey' sent from 'info' DB
HashMap<String, String> appKeys = obj.fetchData();
storedAppkey = appKeys.get(appName);
//Handling the case of mismatch
if (storedAppkey == null || storedApikey.compareTo(appKey)!=0)
{//Then the response and further processing that I need to do.
Now what HashMap has done is that it has made my code more readable and the 'for' loop that I was using for iterating is gone, although it might not make much difference in the performance as of now.
Thanks once again to everyone for your help and support.

How do I query in MongoDB with Apache Spark JavaRDDs?

I'd like to imagine there's existing API functionality for this. Suppose there was Java code that looks something like this:
JavaRDD<Integer> queryKeys = ...; //values not particularly important
List<Document> allMatches = db.getCollection("someDB").find(queryKeys); //doesn't work, I'm aware
JavaPairRDD<Integer, Iterator<ObjectContainingKey>> dbQueryResults = ...;
Goal of this: After a bunch of data transformations, I end up with an RDD of integer keys that I'd like to make a single db query with (rather than a bunch of queries) based on this collection of keys.
From there, I'd like to turn the query results into a pair RDD of the key and all of its results in an iterator (making it easy to hit the ground going again for the next steps I'm intending to take). And to clarify, I mean a pair of the key and its results as an iterator.
I know there's functionality in MongoDB capable of coordinating with Spark, but I haven't found anything that'll work with this yet (it seems to lean towards writing to a database rather than querying it).

I managed to figure this out in an efficient enough manner.
JavaRDD<Integer> queryKeys = ...;
JavaRDD<BasicDBObject> queries = queryKeys.map(value -> new BasicDBObject("keyName", value));
BasicDBObject orQuery = SomeHelperClass.buildOrQuery(queries.collect());
List<Document> queryResults = db.getCollection("docs").find(orQuery).into(new ArrayList<>());
JavaRDD<Document> parallelResults = sparkContext.parallelize(queryResults);
JavaRDD<ObjectContainingKey> results = parallelResults.map(doc -> SomeHelperClass.fromJSONtoObj(doc));
JavaPairRDD<Integer, Iterable<ObjectContainingKey>> keyResults = results.groupBy(obj -> obj.getKey());
And the method buildOrQuery here:
public static BasicDBObject buildOrQuery(List<BasicDBObject> queries) {
BasicDBList or = new BasicDBList();
for(BasicDBObject query : queries) {
or.add(query);
}
return new BasicDBObject("$or", or);
}
Note that there's a fromJSONtoObj method that will convert your object back from JSON into all of the required field variables. Also note that obj.getKey() is simply a getter method associated to whatever "key" it is.

Sling - AEM sling:members from a new collection

I've recently entered the world of AEM and sling (api). What I'm trying to do is write Java code to get the sling:members and its properties sling:resources for a new collection I created in the touch. I'm able to reference the collection properties with a ResourceResolver.resolve(path). The sling:members show up as { ....}. Do I have to do a separate ResourceResolver?
String path="/content/dam/collections/m/fafdsfdaf/my_collection";
Resource resourceMember = resourceResolver.resolve(path+"/sling:members");
ValueMap metaData = resourceMember.adaptTo(ValueMap.class);
String[] slingResources = metaData.get("sling:resources", new String[0]);
Am I totally off the mark? Any help would be greatly appreciated.

The correct way to get the members of the collection is to use the ResourceCollection API. To do this you obtain the resource which points to the collection then adapt it to a ResourceCollection. From there you call getResources() which returns you an iterator over the members.
Resource r = resourceResolver.getResource("/content/dam/collections/m/fafdsfdaf/my_collection");
ResourceCollection collection = r.adaptTo(ResourceCollection.class);
Iterator<Resource> it = collection.getResources();
while(it.hasNext()) {
Resource p = it.next();
%><%= p.getPath() %><%
}

Turns out this is the correct way to do this and is working.

Iterate over large collection in MongoDB via spring-data

Friends!
I am using MongoDB in java project via spring-data. I use Repository interfaces to access data in collections. For some processing I need to iterate over all elements of collection. I can use fetchAll method of repository, but it always return ArrayList.
However, it is supposed that one of collections would be large - up to 1 million records several kilobytes each at least. I suppose I should not use fetchAll in such cases, but I could not find neither convenient methods returning some iterator (which may allow collection to be fetched partially), nor convenient methods with callbacks.
I've seen only support for retrieving such collections in pages. I wonder whether it is the only way for working with such collections?

Late response, but maybe will help someone in the future. Spring data doesn't provide any API to wrap Mongo DB Cursor capabilities. It uses it within find methods, but always returns completed list of objects. Options are to use Mongo API directly or to use Spring Data Paging API, something like that:
final int pageLimit = 300;
int pageNumber = 0;
Page<T> page = repository.findAll(new PageRequest(pageNumber, pageLimit));
while (page.hasNextPage()) {
processPageContent(page.getContent());
page = repository.findAll(new PageRequest(++pageNumber, pageLimit));
}
// process last page
processPageContent(page.getContent());
UPD (!) This method is not sufficient for large sets of data (see #Shawn Bush comments) Please use Mongo API directly for such cases.

Since this question got bumped recently, this answer needs some more love!
If you use Spring Data Repository interfaces, you can declare a custom method that returns a Stream, and it will be implemented by Spring Data using cursors:
import java.util.Stream;
public interface AlarmRepository extends CrudRepository<Alarm, String> {
Stream<Alarm> findAllBy();
}
So for the large amount of data you can stream them and process the line by line without memory limitation.
See https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongodb.repositories.queries

you can still use mongoTemplate to access the Collection and simply use DBCursor:
DBCollection collection = mongoTemplate.getCollection("boundary");
DBCursor cursor = collection.find();
while(cursor.hasNext()){
DBObject obj = cursor.next();
Object object = obj.get("polygons");
..
...
}

Use MongoTemplate::stream() as probably the most appropriate Java wrapper to DBCursor

Another way:
do{
page = repository.findAll(new PageRequest(pageNumber, pageLimit));
pageNumber++;
}while (!page.isLastPage());

Check new method to handle results per document basis.
http://docs.spring.io/spring-data/mongodb/docs/current/api/org/springframework/data/mongodb/core/MongoTemplate.html#executeQuery-org.springframework.data.mongodb.core.query.Query-java.lang.String-org.springframework.data.mongodb.core.DocumentCallbackHandler-

You may want to try the DBCursor way like this:
DBObject query = new BasicDBObject(); //setup the query criteria
query.put("method", method);
query.put("ctime", (new BasicDBObject("$gte", bTime)).append("$lt", eTime));
logger.debug("query: {}", query);
DBObject fields = new BasicDBObject(); //only get the needed fields.
fields.put("_id", 0);
fields.put("uId", 1);
fields.put("ctime", 1);
DBCursor dbCursor = mongoTemplate.getCollection("collectionName").find(query, fields);
while (dbCursor.hasNext()){
DBObject object = dbCursor.next();
logger.debug("object: {}", object);
//do something.
}

The best way to iterator over a large collection is to use the Mongo API directly. I used the below code and it worked like a charm for my use-case.
I had to iterate over more than 15M records and the document size was huge for some of those.
The following code is in Kotlin Spring Boot App (Spring Boot Version: 2.4.5)
fun getAbcCursor(batchSize: Int, from: Long?, to: Long?): MongoCursor<Document> {
val collection = xyzMongoTemplate.getCollection("abc")
val query = Document("field1", "value1")
if (from != null) {
val fromDate = Date(from)
val toDate = if (to != null) { Date(to) } else { Date() }
query.append(
"createTime",
Document(
"\$gte", fromDate
).append(
"\$lte", toDate
)
)
}
return collection.find(query).batchSize(batchSize).iterator()
}
Then, from a service layer method, you can just keep calling MongoCursor.next() on returned cursor till MongoCursor.hasNext() returns true.
An Important Observation: Please do not miss adding batchSize on 'FindIterable' (the return type of MongoCollection.find()). If you won't provide the batch size, the cursor will fetch initial 101 records and will hang after that (it tries to fetch all the remaining records at once).
For my scenario, I used the batch size as 2000, as it gave the best results during testing. This optimized batch size will be impacted by the average size of your records.
Here is the equivalent code in Java (removing createTime from query as it is specific to my data model).
MongoCursor<Document> getAbcCursor(Int batchSize) {
MongoCollection<Document> collection = xyzMongoTemplate.getCollection("your_collection_name");
Document query = new Document("field1", "value1");// query --> {"field1": "value1"}
return collection.find(query).batchSize(batchSize).iterator();
}

This answer is based on: https://stackoverflow.com/a/22711715/5622596
That answer needs a bit of an update as PageRequest has changed how it is being constructed.
With that said here is my modified response:
int pageNumber = 1;
//Change value to whatever size you want the page to have
int pageLimit = 100;
Page<SomeClass> page;
List<SomeClass> compondList= new LinkedList<>();
do{
PageRequest pageRequest = PageRequest.of(pageNumber, pageLimit);
page = repository.findAll(pageRequest);
List<SomeClass> listFromPage = page.getContent();
//Do something with this list example below
compondList.addAll(listFromPage);
pageNumber++;
}while (!page.isLast());
//Do something with the compondList: example below
return compondList;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How does one filter a SpatialIndexFeatureCollection? - java

SpatialIndexFeatureCollection now implements the subCollection method. See the PR here. I haven't had a chance to backport the changes but future releases will now work the way you expected.

Related

How to store dictionaries(map) in flatbuffer in Java

Convert list to hashmap

How do I query in MongoDB with Apache Spark JavaRDDs?

Sling - AEM sling:members from a new collection

Iterate over large collection in MongoDB via spring-data

Categories

Resources