Replacement for Unnest transform - java

Up until Beam 2.19, there was a transform called org.apache.beam.sdk.schemas.transforms.Unnest. After that release, the transform seems to have been removed. I am looking for documentation / guides on its replacement.

Sounds like Unnest was replaced by Select in https://github.com/apache/beam/pull/10766/.
There is a test case that does the following:
PCollection<Row> unnested =
pipeline
.apply(Create.of(rows).withRowSchema(SIMPLE_SCHEMA))
.apply(Select.flattenedSchema());
Can you please verify if Select.flattenedSchema does what you are expecting?

Related

Using Apache Solr's boost query function with Spring in Java

I'm writing a Java application that is using Apache Solr to index and search through a list of articles. A requirement I am dealing with is that when a user searches for something, we are supplying a list of recommended related search terms, and the user has the option to include those extra terms in their search. The problem I'm having, however, is that we want the user's original search term to be prioritized, and results that match that should appear before results that only match related terms.
My research suggests that Solr's boost function is the solution for this, but I'm having some trouble getting it to work with Spring. The code all runs fine and I get my search results as expected, but the boost function doesn't seem to actually be re-ordering my searches at all. For example, I'm trying to do something like this:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").contains("A").boost((float) 2);
Criteria extraCriteria = Criteria.where("title").contains("B").boost((float) 1);
query.addCriteria(searchCriteria.or(extraCriteria));
In this example I would be searching for any document whose title contains "A" or "B", but I want to boost results that match "A" to the top of the list.
I've also tried using the Extended DisMax Query Parser with a different syntax to achieve the same result, with similar lack of success. To follow the same example pattern, I'm trying to use the expression criteria as follows:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").expression("A^2.0 OR B^1.0");
query.setDefType("edismax");
query.addCriteria(searchCriteria);
Again I would expect this to return documents with titles matching "A" or "B" but boost results matching "A", and again it simply doesn't seem to actually affect the ordering of my results at all.
Okay, I figured out the problem here. Elsewhere in the code someone else had added this snippet:
query.setPageRequest(pageable);
This was done to support pagination of the search results, but the pageable object ALSO contained some sort orders that looks like they got added to the query as part of the .setPageRequest method. Something to look out for in the future, it looks like sorts override boosting when working with Spring Solr queries in this scenario.

Exact Match in SOLR 5.1

I have setup Solr 5.1.0 with proper data importation from MYSQL database. It is working good.
But I want exact match results or relevant to that only.
like,
Dancers in Mumbai
It gives all results which contains "dancers + mumbai" and only "dancers" + only "mumbai" keywords. I want result which must contains only "dancers + mumbai" not others.
This is not a complete answer, but it's the direction I'm trying to take with a similar problem. Comments are very welcome.
Step 1:
Implement multiple Solr cores, core 1 is "jobs" (dancers/lawyers/etc), and core 2 is "cities" (mumbai/chennai/etc).
Step 2:
Query each core for exact matches, so implement the KeywordTokenizerFactory on the relevant field to find exact matches only. This will give you all the matches accross cores (e.g. jobs: dancers and cities:mumbai).
Step 3:
Perform your general query using EDisMax for a user-friendly search (e.g. searching for "dancers in mumbai" accross many fields), and use the boost field to boost the jobs/cities found in the earlier query.
I would love to know if there is a better way of doing something this elaborate, but I have not found it yet. Hope it helps.
Using required terms like: +dancers +mumbia
Or a phrase query: "dancers in mumbia"
Would work.
You can also set the default operator for your query to be "AND", using the q.op parameter.

IN Equivalent Query In Solr and Solrj

I am using solr5.0.0. I would like to know the equivalent query for
IN in solr or solrj.
If I need to query products of different brands, I can use IN clause. If I have brands like dell, sony, samsung. I need to find the product with these brands using Solr and in Java Solrj.
Now I am using this code in Solrj
qry.addFilterQuery("brand:dell OR brand:sony OR brand:samsung");
I know that I can use OR here, but need to know about IN in Solr. And the performance of OR.
As you can read in Solr's wiki about its' query syntax, Solr uses per default a superset of Lucene's Query parser. As you can see when reading both documents, something like IN does not exist. But you can get shorter than the example query you presented.
In case that your default operator is OR you can leave it out from the query. In addition you can make use of Field Grouping.
qry.addFilterQuery("brand:(dell sony samsung)");
In case OR is not your default operator or you are not sure about this, you can employ Local Parameters for the filter query so that OR is enforced. Afterwards you can again make use of Field Grouping.
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung)");
Keep in mind that you need to surround a phrase with " to keep the words together
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung \"packard bell\")");

Setting pseudo fields (fl) in Solrj

I want to use a pseudo field to return the distance from the center of my solr (geo) spatial search, like it's explained here: http://wiki.apache.org/solr/SpatialSearch#geodist_-_The_distance_function when it says:
Returning the distance
Solr4.0
You can use the pseudo-field feature to return the distance along with the stored fields of each document by adding fl=geodist() to the request. Use an alias like fl=dist:geodist() to make the distance come back in the dist pseudo-field instead. Here is an example of sorting by distance ascending and returning the distance for each document in dist.
...&q=:&sfield=store&pt=45.15,-93.85&sort=geodist() asc&fl=dist:geodist()
Now, I'm using solrj (4.5.1) and I can't find a way to set the fl=_dist_:geodist() part properly. I can actually manage to add it to the solrQuery object doing:
solrQuery.setParam('fl', '_dist_:geodist()')
with no compilation errors, but for some reason this is messing up my returned documents.
Any ideas how it should be done?
Ps. code is in groovy language, don't freak out for no semi-colons or string within single quotes :)
* UPDATE *
Setting the fl param as explained above, actually results in returning documents which only contains the _dist_ field!
After a few minutes of search, i found this article: http://solr.pl/en/2011/11/22/solr-4-0-new-fl-parameter-functionalities-first-look/
It explains how to return the new alias field(s) in addition to all other parameters, simply like this (please note the * part):
fl=*,stock:sum(stockMain,stockShop)
So, in my example for solrj, it will be:
solrQuery.setParam('fl', '*,_dist_:geodist()')

sort and limit castor query

I am attempting to return a single object via castor query that has the earliest date.
This is the sort of thing I have been trying:
SELECT p FROM model.objects.Product p LIMIT $1 WHERE p.status=$2 ORDER BY p.statusDate;
This results in: org.exolab.castor.jdo.oql.OQLSyntaxException: An incorrect token type was found near WHERE (found KEYWORD_WHERE, but expected END_OF_QUERY
I am using version 0.9.6 which I believe supports this kind of thing.
Any hints or pointers much appreciated.
As per my comment, it indeed appears that the LIMIT clause must appear after the ORDER BY clause. See the Castor query syntax.

Categories