Solr sorting by custom function query - java

I'm running into some issues developing a custom function query using Solr 3.6.2.
My goal is to be able to implement a custom sorting technique.
I have a field called daily_prices_str, it is a single value str.
Example:
<str name="daily_prices_str">
2014-05-01:130 2014-05-02:130 2014-05-03:130 2014-05-04:130 2014-05-05:130 2014-05-06:130 2014-05-07:130 2014-05-08:130 2014-05-09:130 2014-05-10:130 2014-05-11:130 2014-05-12:130 2014-05-13:130 2014-05-14:130 2014-05-15:130 2014-05-16:130 2014-05-17:130 2014-05-18:130 2014-05-19:130 2014-05-20:130 2014-05-21:130 2014-05-22:130 2014-05-23:130 2014-05-24:130 2014-05-25:130 2014-05-26:130 2014-05-27:130 2014-05-28:130 2014-05-29:130 2014-05-30:130 2014-05-31:130 2014-06-01:130 2014-06-02:130 2014-06-03:130 2014-06-04:130 2014-06-05:130 2014-06-06:130 2014-06-07:130 2014-06-08:130 2014-06-09:130 2014-06-10:130 2014-06-11:130 2014-06-12:130 2014-06-13:130 2014-06-14:130 2014-06-15:130 2014-06-16:130 2014-06-17:130 2014-06-18:130 2014-06-19:130 2014-06-20:130 2014-06-21:130 2014-06-22:130 2014-06-23:130 2014-06-24:130 2014-06-25:130 2014-06-26:130 2014-06-27:130 2014-06-28:130 2014-06-29:130 2014-06-30:130 2014-07-01:130 2014-07-02:130 2014-07-03:130 2014-07-04:130 2014-07-05:130 2014-07-06:130 2014-07-07:130 2014-07-08:130 2014-07-09:130 2014-07-10:130 2014-07-11:130 2014-07-12:130 2014-07-13:130 2014-07-14:130 2014-07-15:130 2014-07-16:130 2014-07-17:130 2014-07-18:130 2014-07-19:170 2014-07-20:170 2014-07-21:170 2014-07-22:170 2014-07-23:170 2014-07-24:170 2014-07-25:170 2014-07-26:170 2014-07-27:170 2014-07-28:170 2014-07-29:170 2014-07-30:170 2014-07-31:170 2014-08-01:170 2014-08-02:170 2014-08-03:170 2014-08-04:170 2014-08-05:170 2014-08-06:170 2014-08-07:170 2014-08-08:170 2014-08-09:170 2014-08-10:170 2014-08-11:170 2014-08-12:170 2014-08-13:170 2014-08-14:170 2014-08-15:170 2014-08-16:170 2014-08-17:170 2014-08-18:170 2014-08-19:170 2014-08-20:170 2014-08-21:170 2014-08-22:170 2014-08-23:170 2014-08-24:170 2014-08-25:170 2014-08-26:170 2014-08-27:170 2014-08-28:170 2014-08-29:170 2014-08-30:170
</str>
As you can see the structure of the string is date:price.
Basically, I would like to parse the string to get the price for a particular period and sort by that price.
I’ve already developed the java plugin for the custom function query and I’m at the point where my code compiles, runs, executes, etc. Solr is happy with my code.
Example:
price(daily_prices_str,2015-01-01,2015-01-03)
If I run this query I can see the correct price in the score field:
/select?price=price(daily_prices_str,2015-01-01,2015-01-03)&q={!func}$price
One of the problems is that I cannot sort by function result.
If I run this query:
/select?price=price(daily_prices_str,2015-01-01,2015-01-03)&q={!func}$price&sort=$price+asc
I get a 404 saying that "sort param could not be parsed as a query, and is not a field that exists in the index: $price"
But it works with a workaround:
/select?price=sum(0,price(daily_prices_str,2015-01-01,2015-01-03))&q={!func}$price&sort=$price+asc
The main problem is that I cannot filter by range:
/select?price=sum(0,price(daily_prices_str,2015-1-1,2015-1-3))&q={!frange l=100 u=400}$price
Maybe I'm going about this totally incorrectly?

Instead of passing the newly created "price" to the "sort" parameter, can you pass the function with data itself like so?
q=*:*&sort=price(daily_prices_str,2015-01-01,2015-01-03) ...

Related

Substitute ints into Dataflow via Cloudbuild yaml

I've got a streaming Dataflow pipeline, written in Java with BEAM 2.35. It commits data to BigQuery via StorageWriteApi. Initially the code looks like
BigQueryIO.writeTableRows()
.withTimePartitioning(/* some column */)
.withClustering(/* another column */)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withTriggeringFrequency(Duration.standardSeconds(30))
.withNumStorageWriteApiStreams(20) // want to make this dynamic
This code runs in different environment eg Dev & Prod. When I deploy in Dev I want 2 StorageWriteApiStreams, in Prod I want 20, and I'm trying to pass/resolve these values at the moment I deploy with a Cloudbuild.
The cloudbuild-dev.yaml looks like
steps:
- lots-of-steps
args:
--numStorageWriteApiStreams=${_NUM_STORAGEWRITEAPI_STREAMS}
substitutions:
_PROJECT: dev-project
_NUM_STORAGEWRITEAPI_STREAMS: '2'
I expose the substitution in the job code with an interface
ValueProvider<String> getNumStorageWriteApiStreams();
void setNumStorageWriteApiStreams(ValueProvider<String> numStorageWriteApiStreams);
I then refactor the writeTableRows() call to invoke getNumStorageWriteApiStreams()
BigQueryIO.writeTableRows()
.withTimePartitioning(/* some column */)
.withClustering(/* another column */)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withTriggeringFrequency(Duration.standardSeconds(30))
.withNumStorageWriteApiStreams(Integer.parseInt(String.valueOf(options.getNumStorageWriteApiStreams())))
Now it's dynamic but I get a build failure on account of java.lang.IllegalArgumentException: methods with same signature getNumStorageWriteApiStreams() but incompatible return types: [class java.lang.Integer, interface org.apache.beam.sdk.options.ValueProvider]
My understanding was that Integer.parseInt returns an int, which I want so I can pass it to withNumStorageWriteApiStreams() which requires an int.
I'd appreciate any help I can get here thanks
Turns out BigQueryOptions.java already has a method getNumStorageWriteApiStreams() that returns an Integer. I was unknowingly trying to rewrite it with a different return, oops.
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L95-L98

Order by with long type field not working in Spark Java Dataset

I have a Dataset and below are the fields in Employee object.
class Employee {String loginSession, Long loginDateTimeMilli, String name , string empType, String location}
I have to perform below operation on the dataset
group by loginSession
order by loginDateTimeMill
Below is how i am performing the same
ds.orderBy(col('loginDateTimeMill').asc())
.groupBy(col('loginSession'))
.agg(collect_list(struct) ...) ...
This gives me below output (example)
[[loginSession, WrappedArray([name1,emptype1,location1,loginDateTimeMilli_2]
,[name1,emptype1,location1,loginDateTimeMilli_1]
,[name1,emptype1,location1,loginDateTimeMilli_3])]
Below is the expected output
[[loginSession, WrappedArray([name1,emptype1,location1,loginDateTimeMilli_1]
,[name1,emptype1,location1,loginDateTimeMilli_2]
,[name1,emptype1,location1,loginDateTimeMilli_3])]
Not sure why its not working. Am i doing something wrong ?
Any help will be appreciated.
I am using Java api for spark.
EDIT : I am creating files for each loginSession which is working fine only issue is the details are not in sorted order. Also the issue is not with all the files generated it only appear for some.

Using java, how to change the default operator of match query to AND in elasticsearch?

When using java api as below
query.must(matchQuery("name", object.getName()));
The resulted elastic query is
"bool":{
"must":[
{"match":{"name":{"query":"De Michael Schuster","operator":"OR","boost":1.0}}}
.....
Right now I am getting back document with name : De OR Michael OR Schuster as expected.
I want to change the operator to AND to match the whole string.
I know I can use term query, but that is not an option in my scenario.
I came across this, but the answer is not given - https://discuss.elastic.co/t/changing-the-default-operator-for-search-api/47033
How can I achieve this using Java ?
Simply like this:
query.must(matchQuery("name", object.getName()).operator(Operator.AND));

Spark reduceByKey function seems not working with single one key

I have a 5 row records in mysql, like
sku:001 seller:A stock:UK margin:10
sku:002 seller:B stock:US margin:5
sku:001 seller:A stock:UK margin:10
sku:001 seller:A stock:UK margin:3
sku:001 seller:A stock:UK margin:7
And I've this rows read into spark and transformed them into
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<margin,xxx>).
Seems like works fine until now.
However, When I used the reduceByKey function to sum the margin as the structure like:
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<marginSummary, xxx>).
the final result got 2 elements
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<margin,xxx>).
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<marginSummary, xxx>).
seems like the row2 didn't enter the reduceByKey function body. I was wondering why?
It is expected outcome. func is called only when objects for a single key are merged. If there is only one key, there is no reason to call it.
Unfortunately it looks like you have a bigger problem, which can be inferred from you question. You are trying to change the type of the value in reduceByKey. In general it shouldn't even compile as reduceByKey takes Function2<V,V,V> - input and output types have to be identical.
If you want to change a type, you should use either combineByKey
public <C> JavaPairRDD<K,C> combineByKey(Function<V,C> createCombiner,
Function2<C,V,C> mergeValue,
Function2<C,C,C> mergeCombiners)
or aggregateByKey
public <U> JavaPairRDD<K,U> aggregateByKey(U zeroValue,
Function2<U,V,U> seqFunc,
Function2<U,U,U> combFunc)
Both can change the types and fixed your current problem. Please refer to Java test suite for examples: 1 and 2.

MongoDB can't find by national regex or query

Looks like I miss something important.
I have some records inserted into mongoDB that contains fields with national characters. There are no problem to insert it to DB or find them and all values looks pretty good.
But if I try to find particular one with "find()" or "regex()" they return nothing. For example:
DBObject query = new BasicDBObject();
query.put("position", Pattern.compile(".*forsøg.*"));
--or--
query.put("position","forsøg");
System.out.println(collection.find(query).count()); // prints 0
in log
query={ "position" : { "$regex" : ".*������.*"}}
--or---
query={ "position" : "������"}
Field value for "position" is equal "forsøg" ofc. Pattern.matches(".*forsøg.*", "forsøg") returns true.
If I replace pattern with one containing only ASCII characters (".abc." for example ) all methods work as expected. Collection.findAll() return all saved instances with readable and correct values.
Versions: MongoDB 2.0.6 64bit, mongo-java-driver 2.8.0, Java 7. I tried the same with spring-data-mongodb 1.0.2.RELEASE but removed it.
Looks like I meet a strange bug related with a maven + testng. The same code executed from .war and from testsuit provides totally different result in database.
The difference may be easily found if point your browser to
http://127.0.0.1:28017/baseName/collectionName/
and look at the values after each execution.

Categories