Elastic Search : Highlighted field not always returned - java

Using Java API, I need to be able to retrieve the field/highlighted field associated with the query. So I'm adding the _all field (or else *) to the query and highlighted field to the response.
It works most of the time, but not always. Here is a snippet :
final BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
Arrays.asList(query.split(" "))
.stream()
.map(QueryParser::escape)
.map(x -> String.format("*%s*", x))
.forEach(x -> {
boolQueryBuilder.should(
QueryBuilders.queryStringQuery(x)
.field("_all")
.allowLeadingWildcard(true));
});
SearchResponse response = client
.prepareSearch()
.setSize(10)
.addHighlightedField("*")
.setHighlighterRequireFieldMatch(false)
.setQuery(boolQueryBuilder)
.setHighlighterFragmentSize(40)
.setHighlighterNumOfFragments(40)
.execute()
.actionGet();
Any idea on why the field field as well as the highlightedField is not always accessible in the response given that it is technically always queried?

Not sure but I think you might be looking for this :-
String aQueryWithPartialSerach = null;
final BoolQueryBuilder aBoolQueryBuilder = new BoolQueryBuilder();
// Enabling partial sarch
if (query.contains(" ")) {
List<String> aTokenList = Arrays.asList(query.split(" "));
aQueryWithPartialSerach = String.join(" ", aTokenList.stream().map(p -> "*" + p + "*").collect(Collectors.toList()));
} else {
aQueryWithPartialSerach = "*" + query + "*";
}
aBoolQueryBuilder.should(QueryBuilders.queryStringQuery(aQueryWithPartialSerach));

Related

Elasticsearch inner hits reponse

This is my query function :
public List<feed> search(String id) throws IOException {
Query nestedQuery = NestedQuery.of(nq ->nq.path("comment").innerHits(InnerHits.of(ih -> ih)).query(MatchQuery
.of(mq -> mq.field("comment.c_text").query(id))._toQuery()))._toQuery();
Query termQueryTitle = TermQuery.of(tq -> tq.field("title").value(id))._toQuery();
Query termQueryBody = TermQuery.of(tq -> tq.field("body").value(id))._toQuery();
Query boolQuery = BoolQuery.of(bq -> bq.should(nestedQuery, termQueryBody, termQueryTitle))._toQuery();
SearchRequest searchRequest = SearchRequest.of(s -> s.index(indexName).query(boolQuery));
var response = elasticsearchClient.search(searchRequest, feed.class);
for (var hit : response.hits().hits()){
System.out.println("this is inner hit response: " + (hit.innerHits().get("comment").hits().hits())); }
List<Hit<feed>> hits = response.hits().hits();
List<feed> feeds = new ArrayList<>();
feed f=null;
for(Hit object : hits){
f = (feed) object.source();
feeds.add(f); }
return feeds;
}
i have add this code
for (var hit : response.hits().hits()){
System.out.println("this is inner hit response: " + (hit.innerHits().get("comment").hits().hits())); }
if it founds 2 records it gives me the refrence of 2 records but dont show me the actual records like its outpout is as follow if it founds 2 records in inner hit :
this is inner hit response [co.elastic.clients.elasticsearch.core.search.Hit#75679b1a]
this is inner hit response [co.elastic.clients.elasticsearch.core.search.Hit#1916d9c6]
can anyone help me to poput the actual records
This properly works for me in console :
for (var hit : response.hits().hits()) {
var innerHits = hit.innerHits().get("comment").hits().hits();
for (var innerHit : innerHits) {
JsonData source = innerHit.source();
String jsonDataString = source.toString();
System.out.println("Matched comments"+jsonDataString);
}
}
I created a class Comment with property "c_text" and did a cast before adding inside a lists comments.
var comments = new ArrayList<Comment>();
for (var hit : response.hits().hits()) {
comments.addAll(hit.innerHits().get("comment").hits().hits().stream().map(
h -> h.source().to(Comment.class)
).collect(Collectors.toList()));
}
System.out.println(comments);

How can i convert it to java stream

I am pretty new to java8 streams. I was trying to work on collection of objects using stream. But not able to achieve in precise way.
Below is the snippet which I achieved (which is giving wrong result). expected end result is List<String> of "Names email#test.com".
recordObjects is collection of object
choices = recordObjects.stream()
.filter(record -> record.getAttribute
(OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL) != null)
.filter(record -> !record.getAttributeAsString
(OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL).isEmpty())
.map(record -> record.getMultiValuedAttribute
(OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL, String.class))
.flatMap(Collection::stream)
.map(email -> getFormattedEmailAddress(ATTRI_AND_RECORD_CONTACT_DEFAULT_NAME, email))
.collect(Collectors.toList());
but below is the exact logic i want to implement using streams.
for (CallerObject record : recordObjects) {
List<String> emails = record.getMultiValuedAttribute(
OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL, String.class);
List<String> names = record.getMultiValuedAttribute(
OneRecord.AT_RECORD_SUBMITTER_TABLE_NAME, String.class);
int N = emails.size();
for (int i = 0 ; i < N ; i++) {
if(!isNullOrEmpty(emails.get(i)))
{
choices.add(getFormattedEmailAddress(isNullOrEmpty(names.get(i)) ?
ATTRI_AND_RECORD_CONTACT_DEFAULT_NAME : names.get(i) , emails.get(i)));
}
}
}
Since we don't know the getFormattedEmailAddress method, I used String.format instead to achieve the desired representation "Names email#test.com":
// the mapper function: using String.format
Function<RecordObject, String> toEmailString = r -> {
String email = record.getMultiValuedAttribute(OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL, String.class);
String name = record.getMultiValuedAttribute(OneRecord.AT_RECORD_SUBMITTER_TABLE_NAME, String.class);
if (email != null) {
return String.format("%s %s", name, email);
} else {
return null;
}
};
choices = recordObjects.stream()
.map(toEmailString) // map to email-format or null
.filter(Objects::nonNull) // exclude null strings where no email was found
.collect(Collectors.toList());
Changed your older version code to Java 8
final Function<RecordedObject, List<String>> filteredEmail = ro -> {
final List<String> emails = ro.getMultiValuedAttribute(
OneRecord.AT_RECORD_SUBMITTER_TABLE_EMAIL, String.class);
final List<String> names = ro.getMultiValuedAttribute(
OneRecord.AT_RECORD_SUBMITTER_TABLE_NAME, String.class);
return IntStream.range(0, emails.size())
.filter(index -> !isNullOrEmpty(emails.get(index)))
.map(index -> getFormattedEmailAddress(isNullOrEmpty(names.get(index)) ?
ATTRI_AND_RECORD_CONTACT_DEFAULT_NAME : names.get(index) , emails.get(index)))
.collect(Collectors.toList());
};
recordObjects
.stream()
.map(filteredEmail)
.flatMap(Collection::stream)
.collect(Collectors.toList());

Web scraping using multithreading

I wrote a code to lookup for some movie names on IMDB, but if for instance I am searching for "Harry Potter", I will find more than one movie. I would like to use multithreading, but I don't have much knowledge on this area.
I am using strategy design pattern to search among more websites, and for instance inside one of the methods I have this code
for (Element element : elements) {
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if (movieName.matches(patternMatcher)) {
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
resultList.add(result);
}
}
which, for each element (which is the movie name), will create a new connection on IMDB to lookup for ratings and other stuff, on the super.imdbConnection(movieName) line.
The problem is, I would like to have all the connections at the same time, because on 5-6 movies found, the process will take much longer than expected.
I am not asking for code, I want some ideeas. I thought about creating an inner class which implements Runnable, and to use it, but I don't find any meaning on that.
How can I rewrite that loop to use multithreading?
I am using Jsoup for parsing, Element and Elements are from that library.
The most simple way is parallelStream()
List<Result> resultList = elements.parallelStream()
.map(e -> {
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if(movieName.matches(patternMatcher)){
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
return result;
}else{
return null;
}
}).filter(Objects::nonNull)
.collect(Collectors.toList());
If you don't like parallelStream() and want to use Threads, you can to this:
List<Element> elements = new ArrayList<>();
//create a function which returns an implementation of `Callable`
//input: Element
//output: Callable<Result>
Function<Element, Callable<Result>> scrapFunction = (element) -> new Callable<Result>() {
#Override
public Result call() throws Exception{
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if(movieName.matches(patternMatcher)){
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
return result;
}else{
return null;
}
}
};
//create a fixed pool of threads
ExecutorService executor = Executors.newFixedThreadPool(elements.size());
//submit a Callable<Result> for every Element
//by using scrapFunction.apply(...)
List<Future<Result>> futures = elements.stream()
.map(e -> executor.submit(scrapFunction.apply(e)))
.collect(Collectors.toList());
//collect all results from Callable<Result>
List<Result> resultList = futures.stream()
.map(e -> {
try{
return e.get();
}catch(Exception ignored){
return null;
}
}).filter(Objects::nonNull)
.collect(Collectors.toList());

Elasticsearch Java query with combination of AND/OR

I am trying to write a query in Elasticsearch via Spring and Java (Elasticsearch client).
The query is somewhat like:
SELECT *** FROM elasticsearch_index
WHERE isActive = 1 AND
(
(store_code = 41 AND store_genre IN ('01', '03') )
OR (store_code = 40 AND store_genre IN ('02') )
OR (store_code = 42 AND store_genre IN ('05', '06') )
)
AND LATITUDE ...
AND LONGITUDE...
Please know that the parameters within the outer brackets is a Map<Integer, String[]>, so I would iterate over the map to add to AND + OR condition.
I tried with equivalent Java approach but does not seem to work:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.matchQuery("isActive", 1));
BoolQueryBuilder orQuery = QueryBuilders.boolQuery();
for (Entry<Integer, String[]> entry : cvsDepoMapping.entrySet()) {
int key = entry.getKey();
String[] value = entry.getValue();
orQuery.must(QueryBuilders.matchQuery("storeCode", key));
orQuery.must(QueryBuilders.termsQuery("storeGenre", value)); // IN clause
boolQueryBuilder.should(orQuery);
}
But neither is this working nor. I am certain of the solution.
I am struggling to find the Java equivalent conditions for the above condition.
I am using:
Spring Boot 2.1.1.RELEASE
Elasticsearch 6.4.3
within your or query you need to put a nested and query for each entry:
without trying to run it:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.matchQuery("isActive", 1));
BoolQueryBuilder orQuery = QueryBuilders.boolQuery();
for (Entry<Integer, String[]> entry : cvsDepoMapping.entrySet()) {
BoolQueryBuilder storeQueryBuilder = QueryBuilders.boolQuery();
int key = entry.getKey();
String[] value = entry.getValue();
storeQueryBuilder.must(QueryBuilders.matchQuery("storeCode", key));
storeQueryBuilder.must(QueryBuilders.termsQuery("storeGenre", value)); // IN clause
orQuery.should(storeQueryBuilder);
}
boolQueryBuilder.must(orQuery);

How to read Query Parameters key and value

javax.persistence.Query
Query query = ....
query.setParameter("PARAM_1","1")
.setParameter("PARAM_2","2")
.setParameter("PARAM_3","3")
...
...;
I want to get parameters and write Console. Like this;
System out ;
PARAM_1 - 1
PARAM_2 - 2
PARAM_3 - 3
...
...
java.util.Set<Parameter<?>> params = query.getParameters();
for (Parameter p : params) {
String paramName = p.getName();
System.out.print(paramName + " - ");
System.out.println(query.getParameterValue(paramName));
}
You just had to look at the javadoc, the method getParameters()
Query q = ...;
...
Set<Parameter<?>> parameters = q.getParameters();
for (Parameter<?> param : parameters){
if (null == param.getName()){
System.out.print(param.getPosition());
} else {
System.out.print(param.getName());
}
System.out.println(" - ");
System.out.println(q.getParameterValue(param));
}
You can try this:
Query query = ...;
String[] keys = new String[] {"PARAM_1", "PARAM_2", "PARAM_3"};
for(String key : keys) {
System.out.println(key + " - " + query.getParamValue(key));
}
check query interface. It has many methods like getParameter*(). See which one suits you.
getParameter() helps you to understand more. Please click here for more information.
This worked for me:
((NativeQueryImpl) fQuery).getNamedParameterMap()

Categories