Hibernate-search search by list of numbers - java

I am working in a Hibernate-search, Java application with an entity which has a numeric field indexed:
#Field
#NumericField
private Long orgId;
I want to get the list of entities which match with a list of Long values for this property. I used the "simpleQueryString" because it allows to use "OR" logic with char | for several objective values. I have something like this:
queryBuilder.simpleQueryString().onField("orgId").matching("1|3|8").createQuery()
After run mi application I get:
The specified query '+(orgId:1 orgId:3 orgId:8)' contains a string based sub query which targets the numeric encoded field(s) 'orgId'. Check your query or try limiting the targeted entities.
So, Can some body tell me what is wrong with this code?, Is there other way to do what I need?.
=================================
UPDATE 1:
yrodiere' answer solves the issue, but I have another doubt, I want validate whether entities match other fields, I know I can use BooleanJuntion, but then I need mix "must" and "should" usages right?. i.e.:
BooleanJunction<?> bool = queryBuilder.bool();
for (Integer orgId: orgIds) {
bool.should( queryBuilder.keyword().onField("orgId").matching(orgId).createQuery() );
}
bool.must(queryBuilder.keyword().onField("name").matching("anyName").createQuery() );
Then, I am validating that the entities must match a "name" and also they match one of the given orgIds, Am I right?

As the error message says:
The specified query [...] contains a string based sub query which targets the numeric encoded field(s) 'orgId'.
simpleQueryString can only be used to target text fields. Numeric fields are not supported.
If your string was generated programmatically, and you have a list of integers, this is what you'll need to do:
List<Integer> orgIds = Arrays.asList(1, 3, 8);
BooleanJunction<?> bool = queryBuilder.bool();
for (Integer orgId: orgIds) {
bool.should( queryBuilder.keyword().onField("orgId").matching(orgId).createQuery() );
}
LuceneQuery query = bool.createQuery();
query will match documents whose orgId field contains 1, 3 OR 8.
See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_combining_queries
EDIT: If you need additional clauses, I'd recommend not mixing must and should in the same boolean junction, but nesting boolean junctions instead.
For example:
BooleanJunction<?> boolForOrgIds = queryBuilder.bool();
for (Integer orgId: orgIds) {
boolForOrgIds.should(queryBuilder.keyword().onField("orgId").matching(orgId).createQuery());
}
BooleanJunction<?> boolForWholeQuery = queryBuilder.bool();
boolForWholeQuery.must(boolForOrgIds.createQuery());
boolForWholeQuery.must(queryBuilder.keyword().onField("name").matching("anyName").createQuery());
// and add as many "must" as you need
LuceneQuery query = boolForWholeQuery.createQuery();
Technically you can mix 'must' and 'should', but the effect won't be what you expect: 'should' clauses will become optional and will only raise the score of documents when they match. So, not what you need here.

Related

Java Stream - Retrieving repeated records from CSV

I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.
Initial CSV file
Final result
In addition, the same pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.
You can do it in one pass as a stream with O(n) efficiency:
class PersonKey {
// have a field for every column that is used to detect duplicates
String center, name, mother, birthdate;
public PersonKey(String line) {
// implement String constructor
}
// implement equals and hashCode using all fields
}
List<String> lines; // the input
Set<PersonKey> seen = new HashSet<>();
List<String> unique = lines.stream()
.filter(p -> !seen.add(new PersonKey(p))
.distinct()
.collect(toList());
The trick here is that a HashSet has constant time operations and its add() method returns false if the value being added is already in the set, true otherwise.
What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:
list.stream().filter(x ->
list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
So what does the isDuplicate(x,y) do?
This returns a boolean. You can check whether all the entries have same value except the id in this method:
private boolean isDuplicate(CsvEntry x, CsvEntry y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID

Check if all object entities are equal using Java Streams [duplicate]

I am new to Java 8. I have a list of custom objects of type A, where A is like below:
class A {
int id;
String name;
}
I would like to determine if all the objects in that list have same name. I can do it by iterating over the list and capturing previous and current value of names. In that context, I found How to count number of custom objects in list which have same value for one of its attribute. But is there any better way to do the same in java 8 using stream?
You can map from A --> String , apply the distinct intermediate operation, utilise limit(2) to enable optimisation where possible and then check if count is less than or equal to 1 in which case all objects have the same name and if not then they do not all have the same name.
boolean result = myList.stream()
.map(A::getName)
.distinct()
.limit(2)
.count() <= 1;
With the example shown above, we leverage the limit(2) operation so that we stop as soon as we find two distinct object names.
One way is to get the name of the first list and call allMatch and check against that.
String firstName = yourListOfAs.get(0).name;
boolean allSameName = yourListOfAs.stream().allMatch(x -> x.name.equals(firstName));
another way is to calculate count of distinct names using
boolean result = myList.stream().map(A::getName).distinct().count() == 1;
of course you need to add getter for 'name' field
One more option by using Partitioning. Partitioning is a special kind of grouping, in which the resultant map contains at most two different groups – one for true and one for false.
by this, You can get number of matching and not matching
String firstName = yourListOfAs.get(0).name;
Map<Boolean, List<Employee>> partitioned = employees.stream().collect(partitioningBy(e -> e.name==firstName));
Java 9 using takeWhile takewhile will take all the values until the predicate returns false. this is similar to break statement in while loop
String firstName = yourListOfAs.get(0).name;
List<Employee> filterList = employees.stream()
.takeWhile(e->firstName.equals(e.name)).collect(Collectors.toList());
if(filterList.size()==list.size())
{
//all objects have same values
}
Or use groupingBy then check entrySet size.
boolean b = list.stream()
.collect(Collectors.groupingBy(A::getName,
Collectors.toList())).entrySet().size() == 1;

How can I find Documents with Duplicate Array Elements?

Here is my Document:
{
"_id":"5b1ff7c53e3ac841302cfbc2",
"idProf":"5b1ff7c53e3ac841302cfbbf",
"pacientes":["5b20d2c83e3ac841302cfbdb","5b20d25f3e3ac841302cfbd0"]
}
I want to know how to find a duplicate entry in the array using MongoCollection in Java.
This is what I'm trying:
BasicDBObject query = new BasicDBObject("idProf", idProf);
query.append("$in", new BasicDBObject().append("pacientes", idJugador.toString()));
collection.find(query)
We can try to solve this in your Java-application code.
private final MongoCollection collection;
public boolean hasDuplicatePacientes(String idProf) {
Document d = collection.find(eq("idProf", idProf)).first();
List<String> pacientes = (List<String>) d.get("pacientes");
int original = pacientes.size();
if (original == 0) {
return false;
}
Set<String> unique = new HashSet(pacientes);
return original != unique.size();
}
Or if you're searching for a way to do this fully on db-side, I believe it's also possible with something like Neil Lunn provided.
The best approach really is to compare the length of the array to the length of an array which would have all duplicates removed. A "Set" does not have duplicate entries, so what you need to do is convert an array into a "Set" and test against the original.
Modern MongoDB $expr
Modern MongoDB releases have $expr which can be used with aggregation expressions in a regular query. Here the expressions we would use are $setDifference and $size along with $ne for the boolean comparison:
Document query = new Document(
"$expr", new Document(
"$ne", Arrays.asList(
new Document("$size", "$pacientes"),
new Document("$size",
new Document("$setDifference", Arrays.asList("$pacientes", Collections.emptyList()))
)
)
)
);
MongoCursor<Document> cursor = collection.find(query).iterator();
Which serializes as:
{
"$expr": {
"$ne": [
{ "$size": "$pacientes" },
{ "$size": { "$setDifference": [ "$pacientes", [] ] } }
]
}
}
Here it is actually the $setDifference which is doing the comparison and returning only unique elements. The $size is returning the length, both of the original document array content and the newly reduced "set". And of course where these are "not equal" ( the $ne ) the condition would be true meaning that a duplicate was found in the document.
The $expr operates on receiving a boolean true/false value in order whether to consider the document a match for the condition or not.
Earlier Version $where clause
Basically $where is a JavaScript expression that evaluates on the server
String whereClause = "this.pacientes.length != Object.keys(this.pacientes.reduce((o,e) => Object.assign(o, { [e.valueOf()]: null}), {})).length";
Document query = new Document("$where": whereClause);
MongoCursor<Document> cursor = collection.find(query).iterator();
You do need to have not explicitly disabled JavaScript evaluation on the server ( which is the default ) and it's not as efficient as using $expr and the native aggregation operators. But JavaScript expressions can be evaluated in the same way using $where, and the argument in Java code is basically sent as a string.
In the expression the .length is a property of all JavaScript arrays, so you have the original document content and the comparison to the "set". The Array.reduce() uses each array element as a "key" in a resulting object, from which the Object.keys() will then return those "keys" as a new array.
Since JavaScript objects work like a Map, only unique keys are allowed and this is a way to get that "set" result. And of course the same != comparison will return true when the removal of duplicate entries resulted in a change of length.
In either case of $expr or $where these are computed conditions which cannot use an index where present on the collection. As such it is generally recommended that additional criteria which use regular equality or range based query expressions which can indeed utilize an index be used alongside these expressions. Such additional criteria in the predicate would improve query performance greatly where an index is in place.

Firestore query order on field with filter on a different field

I have a problem with query conditions in Google Cloud Firestore.
Anyone can help me.
Below is my code to get the first Document to start with HA_ and order by ID DESC
public Article getLastArticleProvider() {
ApiFuture<QuerySnapshot> query = firebaseDB.getTblArticles()
.whereGreaterThanOrEqualTo("articleId", "HA_")
.orderBy("id", Query.Direction.DESCENDING)
.limit(1)
.get();
QuerySnapshot snapshotApiFuture;
Article article = null;
try {
snapshotApiFuture = query.get();
List<QueryDocumentSnapshot> documents = snapshotApiFuture.getDocuments();
for (QueryDocumentSnapshot document : documents) {
article = document.toObject(Article.class);
}
} catch (InterruptedException | ExecutionException e) {
return null;
}
return article;
}
I want to get the last id of article with articleId start with "HA_" or "XE_"
Ex for above image:
if(articleId start with "HA_") => return object of id 831
if(articleId start with "XE_") => return object of id 833
Now i get an error
java.util.concurrent.ExecutionException: com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: inequality filter property and first sort order must be the same: articleId and id
TL;DR
Add a dummy order right after your inequality filter property and change your query priority to fit your desires.
I encountered the same issue on javascript but I believe it's also a java solution.
Full Solution:
When I run my query:
module.exports.getLastGroupChat = async function (groupId) {
let colRef = db.collection('messages')
colRef = colRef.where('groupId', '==', groupId)
colRef = colRef.where('userId', '!=', '')
colRef = colRef.orderBy('timestamp', 'desc').limit(1)
const doc = await colRef.get()
return doc
}
And received:
inequality filter property and first sort order must be the same:
userId and timestamp
To solve that issue, first of all, I had to add a sort order, of the same inequality property, right after my inequality filter.
In addition, I had to change my query priority to achieve a dummy sort order of the inequality property.
Note: You can run where -> order -> where -> order on the same query!
module.exports.getLastGroupChat = async function (groupId) {
let colRef = db.collection('messages')
colRef = colRef.where('userId', '!=', '')
colRef = colRef.orderBy('userId', 'desc')
colRef = colRef.where('groupId', '==', groupId)
colRef = colRef.orderBy('timestamp', 'desc').limit(1)
const doc = await colRef.get()
return doc
}
That query worked perfectly on my local debug firestore. Push your changes to your firebase cloud functions and trigger your function. Check out your function logs, You may get an indexing error.
The query requires an index. You can create it here: https://console.firebase.google.com/v1/r/project...
Make sure you get into the link and build the index. It will take about five to ten minutes to take effect. Then run again you function and everything should be just fine.
Have fun! :)
perhaps firebase has changed things since this question was asked.
The answer is that you CAN do a filter and an orderby on different fields.
you can chain orderby's and filters, however, if you do filter first, you then have to order by that filter before you can orderby any other field.
e.g.
citiesRef.where('population', '>', 2500000).orderBy('population').orderBy('country');
It's in their docs.
https://firebase.google.com/docs/firestore/query-data/order-limit-data#order_and_limit_data
However, if you have a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field, see the list of orderBy() limitations below.
you can chain orderBys of different fields, I just did it on one of my queries. (you may get an error the first time you run it, it asks you to create an index, it even has a link to create the index)
await db.collection('daily_equipments').where('date', '<', tomorrow._d).where('date', '>=', date).orderBy('date').orderBy('order','asc').get();
As the following error says:
inequality filter property and first sort order must be the same: articleId and id
So you cannot filter your elements and sort them at the same time, using different properties, in your case articleId and id.
There is also an example of how not to do it in the official documentation:
Range filter and first orderBy on different fields
citiesRef.whereGreaterThan("population", 100000).orderBy("country"); //Invalid
So to solve this, you should filter and order on the same document property.

Morphia to return List of strings which are the fields of all documents

If I got a collection full of following elements
#Entity
public void MyEntity{
public String name;
public String type;
...
}
And I want to return a List<String> (or Set) of not the elements, but only their name fields.
List<String> allNames = datasotre.find(MyEntity.class).asList("name");
This is sample query, there is no such method of Morphia datastore.
To limit the fields returned call the "retrievedFields" method on Query. For example, to only get the name field of all MyEntity objects:
datastore.find(MyEntity.class).retrievedFields( true, "name").asList()
Edit - You can get a list Strings using the following query as long as you don't mind that the list will only contain unique values (i.e. no duplicate names):
DBCollection m = datastore.getCollection( MyEntity.class );
List names = m.distinct( "name", new BasicDBObject() );
The "names" list will only contain Strings.
The problem here is that there is no actual query for "keys". The queries all return "key/value pairs".
In theory, the fields in datastore.find() should map to the fields in MyEntity so you could just use reflection. However, if you have other people writing to the DB from different places they may have seeded extra tables.
If this is the case, you will need to run a Map/Reduce to get the list of all the "key" names.
There is a sample here.
You are looking for datastore.find(MyEntity.class).retrievedFields(true, "name").asList();. This will contain the _id and name attribute.
See http://code.google.com/p/morphia/wiki/Query#Ignoring_Fields

Categories