Spring Data Elasticsearch containing query with spaces

Spring Data Elasticsearch containing query with spaces - java

I have an entity named Port with a field portName. I wrote following Spring Data ES query method for containing query:
List<Port> ports = portRepository.findByPortNameContaining(searchText);
It is working fine until the searchText doesn't contain any spaces. If it does, I get the following error:
"Cannot constructQuery '*\"sample port\"*'. Use expression or multiple clauses instead."
When I try Spring Data ES search method as:
List<Port> ports = Lists.newArrayList(portRepository.search(
queryStringQuery(searchText)
.field("portName")
));
If I have a port named Loui Kentucky, I am only able to get results when the searchText is exactly a complete word like Loui or Kentucky or Loui Kentucky. Same happens with analyzeWildcard:
List<Port> ports = Lists.newArrayList(portRepository.search(
boolQuery().should(queryStringQuery(searchText).analyzeWildcard(true).field("portName"))
));
I want to construct a simple containing query which can handle spaces as well. No fuzziness. Search results should appear even when I search for i K as Loui Kentucky contains that substring.

Related

How can I get the highlights of my result set in Hibernate search 6?

I am using Hibernate search 6 Lucne backend in my java application.
There are various search operations I am performing including a fuzzy search.
I get search results without any issues.
Now I want to show what are the causes to pick each result in my result list.
Let's say the search keyword is "test", and the fuzzy search is performed in the fields "name", "description", "Id" etc. And I get 10 results in a List. Now I want to highlight the values in the fields of each result which caused that result to be a matching result.
eg: Consider the below to be one of the items in the search result List object. (for clarity I have written it in JSON format)
{
name:"ABC some test name",
description: "this is a test element",
id: "abc123"
}
As the result suggests it's been picked as a search result because the keyword "test" is there in both the fields "name" and the "description". I want to highlight those specific fields in the frontend when I show the search results.
Currently, I am retrieving search results through a java REST API to my Angular frontend. How can I get those specific fields and their values using Hibernate search 6 in my java application?
So far I have gone through Hibernate search 6 documentation and found nothing. (https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#preface) Also looked at what seemed to be related issues on the web over the past week and got nothing so far. It seems like m requirement is a little specific and that's why I need your help here.

Highlighting is not yet implemented in Hibernate Search, see HSEARCH-2192.
That being said, you can leverage native Elasticsearch / Lucene APIs.
With Elasticsearch it's relatively easy: you can use a request transformer to add a highlight element to the HTTP request, then use the jsonHit projection to retrieve the JSON for each hit, which contains a highlight element that includes the highlighted fields and the highlighted fragments.
With Lucene it would be more complex and you'll have to rely on unsupported features, but that's doable.
Retrieve the Lucene Query from your Hibernate Search predicate:
SearchPredicate predicate = ...;
Query query = LuceneMigrationUtils.toLuceneQuery(predicate);
Then do the highlighting: Hibernate search highlighting not analyzed fields may help with that, so that code uses an older version of Lucene and you might have to adapt it:
String highlightText(Query query, Analyzer analyzer, String fieldName, String text) {
QueryScorer queryScorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span>", "</span>");
Highlighter highlighter = new Highlighter(formatter, queryScorer);
return highlighter.getBestFragment(analyzer, fieldName, text);
}
You'll need to add a depdency to org.apache.lucene:lucene-highlighter.
To retrieve the analyzer, use the Hibernate Search metadata: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#backend-lucene-access-analyzers
So, connecting the dots... something like that?
Highlighter createHighlighter(SearchPredicate predicate, SearchScope<?> scope) {
// Taking a shortcut here to retrieve the index manager,
// since we already have the scope
// WARNING: This only works when searching a single index
Analyzer analyzer = scope.includedTypes().iterator().next().indexManager()
.unwrap( LuceneIndexManager.class )
.searchAnalyzer();
// WARNING: this method is not supported and might disappear in future versions of HSearch
Query query = LuceneMigrationUtils.toLuceneQuery(predicate);
QueryScorer queryScorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span>", "</span>");
return new Highlighter(formatter, queryScorer);
}
SearchSession searchSession = Search.session( entityManager );
SearchScope<Book> scope = searchSession.scope( Book.class );
SearchPredicate predicate = scope.predicate().match()
.fields( "title", "authors.name" )
.matching( "refactoring" )
.toPredicate();
Highlighter highlighter = createHighlighter(predicate, scope);
// Using Pair from Apache Commons, but others would work just as well
List<Pair<Book, String>> hits = searchSession.search( scope )
.select( select( f -> f.composite(
// Highlighting the title only, but you can do the same for other fields
book -> Pair.of( book, highlighter.getBestFragment(analyzer, "title", book.getTitle()))
f.entity()
) )
.where( predicate )
.fetch( 20 );
Not sure this compiles, but that should get you started.
Relatedly, but not exactly what you're asking for, there's an explain feature to get a sense of why a given hit has a given score: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-dsl-query-explain

Azure CosmosDb continuation token for SELECT with JOIN

I'm using async client from java library com.azure:azure-cosmos:4.3.0 to connect and query azure cosmosdb(SQL) in direct mode. All the documents in the collection have the following structure
{
"id":"string",
"timestamp":123456789,
"tags":["tag1", "tag2"]
}
What I want to do is find all documents that have any matching tags from a given list of input tags. What is the best way to do this?
What I've tried and not able to make it work -
I inserted four documents to test with and then I ran this query in azure portal - SELECT DISTINCT VALUE CONTAINER_ALIAS FROM CONTAINER_ALIAS JOIN CONTAINER_ALIAS_tags IN CONTAINER_ALIAS.tags WHERE CONTAINER_ALIAS_tags IN ( "tag1", "tag2" ) AND ( CONTAINER_ALIAS.timestamp BETWEEN 1599773389000 AND 1602365389000 ). This gave me 3 documents as response which is right because out of 4 test documents, one of the document doesn't have "tag1" or "tag2"
Then I tried to generate the same query using the client library and it gets to SELECT DISTINCT VALUE CONTAINER_ALIAS FROM CONTAINER_ALIAS JOIN CONTAINER_ALIAS_tags IN CONTAINER_ALIAS.tags WHERE CONTAINER_ALIAS_tags IN ( "tag1", "tag2" ) AND ( CONTAINER_ALIAS.timestamp BETWEEN #timestamp_START AND #timestamp_End )
This query is passed to the database as follows
List<SqlParameters> sqlParameters = Arrays.asList(
new SqlParameter("#timestamp_START", 1599773389000),
new SqlParameter("#timestamp_End", 1602365389000)
);
SqlQuerySpec sqlQuerySpec = new SqlQuerySpec(query, sqlParameters)
long pageSize=20
container.queryItems(sqlQuerySpec, MyCustomEntity.class)
.byPage(continuationToken, pageSize)
.take(1)
.next()
But this only returns 1 document in result. But it does return a continuation token {"lastHash":"","sourceToken":"{\"token\":null,\"range\":\"{\\\"min\\\":\\\"05C1CFFFFFFFF8\\\",\\\"max\\\":\\\"05C1D7FFFFFFFC\\\",\\\"isMinInclusive\\\":true,\\\"isMaxInclusive\\\":false}\"}"}
I have used the client library with WHERE but without JOIN and I was able to make it work properly. The result would include pageSize number of documents and it would return a proper continuation token if there are more documents available. However, when I use join I'm not getting all the results
Other thing I noticed was the RU when I run from azure portal for the above query is around 46 but when I print the logs from the client library the RU is around 2. Not sure if this information is helpful in anyway to figure what I'm doing wrong
Please let me know what I'm doing wrong or is there a better way to achieve this

How to map columns with special characters in Java namedparameterjdbctemplate batchupdate insert query?

I'm trying to fetch custom columns data from sa360 into MySQL DB. So I have a column having the name Blended KPI 85/10/5. So I have saved the column name in the DB as well Blended KPI 85/10/5.
So first the data gets stored in a CSV file and then I'm reading the records from CSV file and capturing it in a List<Map<String, Object>> and later these records are to be stored into the DB. Since I'm having 5000+ records, I'm using batch insert. So I'm facing some issue syntax type error. Please see the below code snippet and the error.
I did try handling escape characters but with no success.
Values inside dailyRecords:
{account_id=2, brand_id=2, platform_id=1, campaign_id=71700000028596159, Blended_KPI_85/10/5=0.0, CPB_(85/10/5)=0.0}
Code:
String sql = "INSERT INTO campaign_table (`account_id` ,`brand_id` ,`platform_id` ,`campaign_id` , `Blended KPI 85/10/5` , `CPB (85/10/5)` ) VALUES (:account_id, :brand_id, :platform_id, :campaign_id, :Blended_KPI_85/10/5 , :CPB_(85/10/5))"
namedParameterJdbcTemplate.batchUpdate(sql, dailyRecords.toArray(new Map[dailyRecords.size()]));
On executing, I'm getting the below error:
No value supplied for the SQL parameter 'Blended_KPI_85': No value registered for key 'Blended_KPI_85'

You cannot use the characters of /,(,) for a placeholder name because they are reserved characters for the SQL syntax. A quick workaround is to change the names of placeholders in the SQL statement and also change the keys as well in your data.
You can easily modify the keys of the Map inside your data by the help of collection streams if your Java version is 8 or above:
String sql = "INSERT INTO campaign_table (`account_id` ,`brand_id` ,`platform_id` ,`campaign_id` ,`Blended KPI 85/10/5` ,`CPB (85/10/5)`) VALUES (:account_id, :brand_id, :platform_id, :campaign_id, :Blended_KPI_85_10_5 , :CPB_85_10_5)"
Map[] params = dailyRecords.stream().map(m -> {
m.put("Blended_KPI_85_10_5", m.get("Blended_KPI_85/10/5"));
m.put("CPB_85_10_5", m.get("CPB_(85/10/5)"));
return m;
}).toArray(Map[]::new);
namedParameterJdbcTemplate.batchUpdate(sql, params);
Note that I removed these characters and changed the placeholder names in your sql statement as below:
:Blended_KPI_85/10/5 => :Blended_KPI_85_10_5
:CPB_(85/10/5) => :CPB_85_10_5
Hope this helps. Cheers!

Stop the JPA (Hibernate) Criteria API creating repeated INs for a group of OR Predicates

I'm using the Criteria API with the Hibernate implementation of JPA. I want to improve the structure of my generated SQL query so that the same IN expression isn't repeated for each OR predicate.
I want to do this because the code is running on GAE and I get a StackOverflowError in the case where the list of names is long in the IN condition (It's due to Hibernate using StringBuilder to build the parameter list). I've pinned the problem down to this section of code as when I remove, say 8, of the OR predicates the code runs without error. The code runs fine on non-memory restrictive environments (my PC)....and yes I have increased the instance memory allocation on GAE but I still get the same error.
The Java code I have to build this part of the query is below (parameter names edited and I'm using #StaticMetamodel classes for the parameter names):-
private void buildNamesToQuery(List<String> names, Root<ARoot> theRoot,
Join<Entity1, Entity2> aJoin, List<Predicate> orPredicates) {
orPredicates.add(aJoin.get(Entity1_.name1).in(names));
orPredicates.add(aJoin.get(Entity1_.name2).in(names));
orPredicates.add(aJoin.get(Entity1_.name3).in(names));
orPredicates.add(aJoin.get(Entity1_.name4).in(names));
orPredicates.add(theRoot.get(Entity2_.name5).in(names));
orPredicates.add(theRoot.get(Entity2_.name6).in(names));
orPredicates.add(theRoot.get(Entity2_.name7).in(names));
orPredicates.add(theRoot.get(Entity2_.name8).in(names));
orPredicates.add(theRoot.get(Entity2_.name9).in(names));
orPredicates.add(theRoot.get(Entity2_.name10).in(names));
}
This generates SQL with the following structure (parameter names edited):-
select
//lots of detail for the select
where
//lots of condition detail
//but what I'm interested in is below
and (
Entity_.name1 in (?,?,?....
) //Lots of names
or Entity_.name2 in (?,?,?....
) //same list of names
or Entity_.name3 in (?,?,?...
) //same list of names again
//.
//continues like this for remainder of the query
//
);
How can I change the Criteria API code so the SQL becomes this?:-
and (
joinEntity_.name1 or joinEntity.name2 or joinEntity.name3 ... etc IN (?,?,?...)
with the same long list of names? Hopefully that's clear enough for you to suggest a solution.

How to query a clause for solr java client?

I am now using solr to query .I want to find all the documents whose key "title" contains text "Bifidobacterium bifidum" or key "abstract" contains text "Bifidobacterium bifidum".So , I write my query like below:
String queryCondition = "title:*Bifidobacterium bifidum* OR abstract:*Bifidobacterium bifidum*";
From the result ,I find out that the returned result is not what I want ,documents whose title contains "Bifidobacterium" or "bifidum" , or whose title contains "Bifidobacterium" or "bifidum" are all returned . So , my question is ,how should I write my query to satisfy my query need?

The * is special symbol, a wildcard. Similar to regular expressions, it tells Solr to match everything. So querying for bifidum* would return everything that starts with bifidum. Not what you want, right?
When reading about Solr's query syntax in the manual, you will find a section named Specifying Terms for the Standard Query Parser there is written
A phrase is a group of words surrounded by double quotes such as "hello dolly"
This is what you need ...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.