Storing numeric values in Lucene 6.5.0

Storing numeric values in Lucene 6.5.0 - java

I need to Store the Numeric field in the Lucene docs, but Lucene 6.5.1 the signature of the NumericField is like
NumericDocValuesField(String name, long value)
In older lucene versions the method is like,
NumericField(String, Field.Store, boolean)
.
Can someone guide me how to store the numeric values in the document using lucene6.5.1.
Regards,
Raghavan

NumericDocValuesField is used for scoring/sorting only:
http://lucene.apache.org/core/6_5_0/core/org/apache/lucene/document/NumericDocValuesField.html
If you like to store any kind of values (including numeric) you have to use a StoredField:
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/document/StoredField.html
Depending on what you need you have to add multiple fields for multiple purposes. If you have a numeric value as long and you like to do range queries and sort you would do something like this:
// for range queries
new LongPoint(field, value);
// for storing the value
new StoredField(field, value);
// for sorting / scoring
new NumericDocValuesField(field, value);

Use special type oriented numeric fields:
IntField intField = new IntField("int_value", 100, Field.Store.YES);
LongField longField = new LongField("long_value", 100L, Field.Store.
YES);
FloatField floatField = new FloatField("float_value", 100.0F, Field.
Store.YES);
DoubleField doubleField = new DoubleField("double_value", 100.0D,
Field.Store.YES);
You can store their values and sort them if you need. All of those field are indexable.

Related

BigTable ReadModifyWriteRow Support for Mapping Function

I am aware that BigTable supports operations append and increment using ReadModifyWriteRow requests, but I'm wondering if there is support or an alternative way to use more generic mapping functions where the value from the cell can be accessed and modified within some sort of closure? For instance, bitwise ANDing a long value in a cell:
Function<Long, Long> modifyFunc = f -> f & 10L;
ReadModifyWriteRow
.create("tableName", "rowKey")
.apply("family", "qualifier", modifyFunc);

Doing a mapping like this is not supported by Bigtable, so here is an option you could try. This will only work with single cluster instances due to consistency required for it.
You could add a column to keep track of row version (in addition to the existing row versions) and then you can read the data and version, modify it in memory and then do a checkAndMutate with the version and new value. Something like this:
Row row = dataClient.readRow(tableId, rowkey);
ArrayList<RowCell> cells = row.getCells();
// Get the value and timestamp/version from the cell you are targetting.
RowCell cell = cells.get(...);
long version = cell.getTimestamp();
ByteString value = cell.getValue();
// Do your mapping to the new value.
ByteString newValue = ...;
Mutation mutation =
Mutation.create().setCell(COLUMN_FAMILY_NAME, COLUMN_NAME, timestamp, newValue);
// Filter on a column that tracks the version to do validation.
Filter filter =
FILTERS
.chain()
.filter(FILTERS.family().exactMatch(COLUMN_FAMILY_NAME))
.filter(FILTERS.qualifier().exactMatch(VERSION_COLUMN))
.filter(FILTERS.value().exactMatch(version));
ConditionalRowMutation conditionalRowMutation =
ConditionalRowMutation.create(tableId, rowkey).condition(filter).then(mutation);
boolean success = dataClient.checkAndMutateRow(conditionalRowMutation);

Hibernate-search search by list of numbers

I am working in a Hibernate-search, Java application with an entity which has a numeric field indexed:
#Field
#NumericField
private Long orgId;
I want to get the list of entities which match with a list of Long values for this property. I used the "simpleQueryString" because it allows to use "OR" logic with char | for several objective values. I have something like this:
queryBuilder.simpleQueryString().onField("orgId").matching("1|3|8").createQuery()
After run mi application I get:
The specified query '+(orgId:1 orgId:3 orgId:8)' contains a string based sub query which targets the numeric encoded field(s) 'orgId'. Check your query or try limiting the targeted entities.
So, Can some body tell me what is wrong with this code?, Is there other way to do what I need?.
=================================
UPDATE 1:
yrodiere' answer solves the issue, but I have another doubt, I want validate whether entities match other fields, I know I can use BooleanJuntion, but then I need mix "must" and "should" usages right?. i.e.:
BooleanJunction<?> bool = queryBuilder.bool();
for (Integer orgId: orgIds) {
bool.should( queryBuilder.keyword().onField("orgId").matching(orgId).createQuery() );
}
bool.must(queryBuilder.keyword().onField("name").matching("anyName").createQuery() );
Then, I am validating that the entities must match a "name" and also they match one of the given orgIds, Am I right?

As the error message says:
The specified query [...] contains a string based sub query which targets the numeric encoded field(s) 'orgId'.
simpleQueryString can only be used to target text fields. Numeric fields are not supported.
If your string was generated programmatically, and you have a list of integers, this is what you'll need to do:
List<Integer> orgIds = Arrays.asList(1, 3, 8);
BooleanJunction<?> bool = queryBuilder.bool();
for (Integer orgId: orgIds) {
bool.should( queryBuilder.keyword().onField("orgId").matching(orgId).createQuery() );
}
LuceneQuery query = bool.createQuery();
query will match documents whose orgId field contains 1, 3 OR 8.
See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_combining_queries
EDIT: If you need additional clauses, I'd recommend not mixing must and should in the same boolean junction, but nesting boolean junctions instead.
For example:
BooleanJunction<?> boolForOrgIds = queryBuilder.bool();
for (Integer orgId: orgIds) {
boolForOrgIds.should(queryBuilder.keyword().onField("orgId").matching(orgId).createQuery());
}
BooleanJunction<?> boolForWholeQuery = queryBuilder.bool();
boolForWholeQuery.must(boolForOrgIds.createQuery());
boolForWholeQuery.must(queryBuilder.keyword().onField("name").matching("anyName").createQuery());
// and add as many "must" as you need
LuceneQuery query = boolForWholeQuery.createQuery();
Technically you can mix 'must' and 'should', but the effect won't be what you expect: 'should' clauses will become optional and will only raise the score of documents when they match. So, not what you need here.

Sorting search result in Lucene based on a numeric field

I have some docs with two fields: text, count.
I've used Lucene to index docs and now I want to search in text and get the result sorted by count in descending order. How can I do that?

The default search implementation of Apache Lucene returns results sorted by score (the most relevant result first), then by id (the oldest result first).
This behavior can be customized at query time with an additionnal Sort parameter .
TopFieldDocs Searcher#search(Query query, Filter filter, int n, Sort sort)
The Sort parameter specifies the fields or properties used for sorting. The default implementation is defined this way :
new Sort(new SortField[] { SortField.FIELD_SCORE, SortField.FIELD_DOC });
To change sorting, you just have to replace fields with the ones you want :
new Sort(new SortField[] {
SortField.FIELD_SCORE,
new SortField("field_1", SortField.STRING),
new SortField("field_2", SortField.STRING) });
This sounds simple, but will not work until the following conditions are met :
You have to specify the type parameter of SortField(String field, int
type) to make Lucene find your field, even if this is normaly
optional.
The sort fields must be indexed but not tokenized :
document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));
The sort fields content must be plain text only. If only one single
element has a special character or accent in one of the fields used
for sorting, the whole search will return unsorted results.
Check this tutorial.

Below line will do the trick. Last parameter is boolean reverse if you set it to true it will sort in reverse order i.e. descending in your case.
SortField longSort = new SortedNumericSortField(FIELD_NAME_LONG, SortField.Type.LONG, true);
Sample code:
IndexSearcher searcher = new IndexSearcher(reader);
Query q = new MultiFieldQueryParser(new String[] { FIELD_NAME_NAME}, analyzer).parse("YOUR_QUERY") );
SortField longSort = new SortedNumericSortField(FIELD_NAME_LONG, SortField.Type.LONG, true);
Sort sort = new Sort(longSort);
ScoreDoc[] hits = searcher.search(q, 10 , sort).scoreDocs;
Also it's necessary that you add you sort enabled field as a NumericDocValuesField when you create your index.
doc.add(new NumericDocValuesField(FIELD_NAME_LONG, longValue));//sort enabled field
Code is as per lucene-core-5.0.0

first:
Fieldable count = new NumericField("count", Store.YES, true);
second:
SortField field = new SortField("count", SortField.INT);
Sort sort = new Sort(field);
third:
TopFieldDocs docs = searcher.search(query, 20, sort);
ScoreDoc[] sds = docs.scoreDocs;
Like this is OK !

Updating values of given field using expression in mongodb using java

Consider a collection of objects having fields like:
{
id: // String
type: //Integer
score: //Double value
}
I would like to query on collection using type and for returned documents divide their scores by their maximum. Consider following query oject:
DBObject searchQuery = new BasicDBObject("type", 2);
collection.find(searchQuery);
With above query it'll return some documents. I want to get maximum of scores among all those documents and then divide all those documents' score by obtained maximum.
How can I do this??
I could find maximum using aggregation as follows:
String propertyToOperateOn = "score";
DBObject match = new BasicDBObject("$match", searchQuery);
DBObject groups = new BasicDBObject("_id", null);
DBObject operation = new BasicDBObject("$max", "$" + propertyToOperateOn);
groups.put("maximum", operation);
DBObject apply = new BasicDBObject("$group", groups);
AggregationOutput output = mongoConstants.IAScores.aggregate(match, apply);
Here output will contain the maximum value. But then how can I update (divide) all documents' scores by this maximum??
I hope there could be better way to do this task, but I'm unable to get it as I'm very much new to mongodb (or any database as such).

This is technically the same issue as "mongodb: java: How to update a field in MongoDB using expression with existing value", but I'll repeat the answer:
At the moment, MongoDB doesn't allow you to update the value of a field according to an existing value of a field. Which means, you can't do the following SQL:
UPDATE foo SET field1 = field1 / 2;
In MongoDB, you will need to do this in your application, but be aware that this is no longer an atomic operation as you need to read and then write.

How to check if a Set contains an object which has one member variable equal to some value

I have a java Set of Result objects. My Result class definition looks like this:
private String url;
private String title;
private Set<String> keywords;
I have stored my information in a database table called Keywords which looks like this
Keywords = [id, url, title, keyword, date-time]
As you can see there isn't a one-to-one mapping between an object and a row in the database. I am using SQL (MySQL DB) to extract the values and have a suitable ResultSet object.
How do I check whether the Set already contains a Result with a given URL.
If the set already contains a Result object with the current URL I simply want to add the extra keyword to the Set of keywords, otherwise I create a new Result object for adding to the Set of Result objects.

When you iterate over the JDBC resultSet (to create your own set of Results) why don't you put them into a Map? To create the Map after the fact:
Map<String, List<Result>> map = new HashMap<String, List<Result>>();
for (Result r : resultSet) {
if (map.containsKey(r.url)) {
map.get(r.url).add(r);
} else {
List<Result> list = new ArrayList<Result>();
list.add(r);
map.put(r.url, list);
}
}
Then just use map.containsKey(url) to check.

Normalization is your friend
http://en.wikipedia.org/wiki/Database_normalization

If it's possible, I suggest changing your database design to eliminate this problem. Your current design requries storing the id, url, title and date-time once per key word, which could waste quite a bit of space if you have lots of key words
I would suggest having two tables. Assuming that the id field is guarenteed to be unique, the first table would store the id, url, title and date-time and would only have one row per id. The second table would store the id and a key word. You would insert multiple rows into this table as required.
Is that possible / does that make sense?

You can use a Map with the URLs as the keys:
Map<String, Result> map = new HashMap<String, Result>();
for (Result r : results) {
if (map.containsKey(r.url)) {
map.get(r.url).keywords.addAll(r.keywords);
} else {
map.put(r.url, r);
}
}

I think that you need to make an override on equals() method of your Result class. In that method you will put your logic that will check what you are looking for.
N.B. You also need to know that overrideng the equals() method, you need to override also hashCode() method.
For more on "overriding equals() and hashCode() methods" topic you can look at the this another question.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Storing numeric values in Lucene 6.5.0 - java

Related

BigTable ReadModifyWriteRow Support for Mapping Function

Hibernate-search search by list of numbers

Sorting search result in Lucene based on a numeric field

Updating values of given field using expression in mongodb using java

How to check if a Set contains an object which has one member variable equal to some value

Categories

Resources