how to insert unique data in Elasticsearch index document in java - java

this is my previous question - how to insert data in elastic search index
index mapping is as follows
{
"test" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "keyword"
},
"info" : {
"type" : "nested"
},
"joining" : {
"type" : "date"
}
}
}
how can i check the data of field is already present or not before uploading a data to the index
Note :- I dont have id field maintained in index. need to check name in each document if it is already present then dont insert document into index
thanks in advance

As you don't have a id field in your mapping, you have to search on name field and you can use below code to search on it.
public List<SearchResult> search(String searchTerm) throws IOException {
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
MatchQueryBuilder multiMatchQueryBuilder = new
MatchQueryBuilder(searchTerm, "firstName");
searchSourceBuilder.query(matchQueryBuilder);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = esclient.search(searchRequest, RequestOptions.DEFAULT);
return getSearchResults(searchResponse);
}
Note, as you have keyword field instead of match you can use termquerybuilder
And it uses the utility method to parse the searchResponse of ES, code of which is below:
private List<yourpojo> getSearchResults(SearchResponse searchResponse) {
RestStatus status = searchResponse.status();
TimeValue took = searchResponse.getTook();
Boolean terminatedEarly = searchResponse.isTerminatedEarly();
boolean timedOut = searchResponse.isTimedOut();
// Start fetching the documents matching the search results.
//https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search
// .html#java-rest-high-search-response-search-hits
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
List<sr> sr = new ArrayList<>();
for (SearchHit hit : searchHits) {
// do something with the SearchHit
String index = hit.getIndex();
String id = hit.getId();
float score = hit.getScore();
//String sourceAsString = hit.getSourceAsString();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String firstName = (String) sourceAsMap.get("firstName");
sr.add(userSearchResultBuilder.build());
}

Related

Elasticsearch inner hits reponse

This is my query function :
public List<feed> search(String id) throws IOException {
Query nestedQuery = NestedQuery.of(nq ->nq.path("comment").innerHits(InnerHits.of(ih -> ih)).query(MatchQuery
.of(mq -> mq.field("comment.c_text").query(id))._toQuery()))._toQuery();
Query termQueryTitle = TermQuery.of(tq -> tq.field("title").value(id))._toQuery();
Query termQueryBody = TermQuery.of(tq -> tq.field("body").value(id))._toQuery();
Query boolQuery = BoolQuery.of(bq -> bq.should(nestedQuery, termQueryBody, termQueryTitle))._toQuery();
SearchRequest searchRequest = SearchRequest.of(s -> s.index(indexName).query(boolQuery));
var response = elasticsearchClient.search(searchRequest, feed.class);
for (var hit : response.hits().hits()){
System.out.println("this is inner hit response: " + (hit.innerHits().get("comment").hits().hits())); }
List<Hit<feed>> hits = response.hits().hits();
List<feed> feeds = new ArrayList<>();
feed f=null;
for(Hit object : hits){
f = (feed) object.source();
feeds.add(f); }
return feeds;
}
i have add this code
for (var hit : response.hits().hits()){
System.out.println("this is inner hit response: " + (hit.innerHits().get("comment").hits().hits())); }
if it founds 2 records it gives me the refrence of 2 records but dont show me the actual records like its outpout is as follow if it founds 2 records in inner hit :
this is inner hit response [co.elastic.clients.elasticsearch.core.search.Hit#75679b1a]
this is inner hit response [co.elastic.clients.elasticsearch.core.search.Hit#1916d9c6]
can anyone help me to poput the actual records
This properly works for me in console :
for (var hit : response.hits().hits()) {
var innerHits = hit.innerHits().get("comment").hits().hits();
for (var innerHit : innerHits) {
JsonData source = innerHit.source();
String jsonDataString = source.toString();
System.out.println("Matched comments"+jsonDataString);
}
}
I created a class Comment with property "c_text" and did a cast before adding inside a lists comments.
var comments = new ArrayList<Comment>();
for (var hit : response.hits().hits()) {
comments.addAll(hit.innerHits().get("comment").hits().hits().stream().map(
h -> h.source().to(Comment.class)
).collect(Collectors.toList()));
}
System.out.println(comments);

Elasticsearch Java API implementation to fetch millions of records

I want to get all doc (millions) in elastic index based on some condition. I used below query in elastic.
GET /<index-name>/_search
{
"from" : 99550, "size" : 500,
"query" : {
"term" : { "CC_ENGAGEMENT_NUMBER" : "1967" }
}
}
And below are my java implementation.
public IndexSearchResult findByStudIdAndcollageId(final String studId, final String collageId,
Integer Page_Number_Start_Index, Integer Total_No_Of_Records) {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
List<Map<String, Object>> searchResults = new ArrayList<Map<String, Object>>();
IndexSearchResult indexSearchResult = new IndexSearchResult();
try {
QueryBuilder qurBd = new BoolQueryBuilder().minimumShouldMatch(2)
.should(QueryBuilders.matchQuery("STUD_ID", studId).operator(Operator.AND))
.should(QueryBuilders.matchQuery("CLG_ID", collageId).operator(Operator.AND));
sourceBuilder.from(Page_Number_Start_Index).size(Total_No_Of_Records);
sourceBuilder.query(qurBd);
sourceBuilder.sort(new FieldSortBuilder("ROLL_NO.keyword").order(SortOrder.DESC));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("clgindex");
searchRequest.source(sourceBuilder);
SearchResponse response;
response = rClient.search(searchRequest, RequestOptions.DEFAULT);
response.getHits().forEach(searchHit -> {
searchResults.add(searchHit.getSourceAsMap());
});
indexSearchResult.setListOfIndexes(searchResults);
log.info("searchResultsHits {}", searchResults.size());
} catch (Exception e) {
log.error("search :: Search on clg flat index. {}", e.getMessage());
}
return indexSearchResult;
}
So if the limit from 99550 and size 500 then it will not fetch more that 1L records.
Error: "reason" : "Result window is too large, from + size must be less than or equal to: [100000] but was [100050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
I don't want to change [index.max_result_window]. Only want solution at Java side to search all docs in index based on conditions by implementing elasticserach API.
Thanks in advance..

How to update elastic search in batches for multiple id?

I have a requirement where I need to update a field in elastic search for multiple ids. Currently I am using XcontentBuilder and passing an Id along with field name but it's a for loop that's why time complexity becomes horrible if I pass multiple Ids. Is there a way where I can do same operation in batches?
My Code is like this:
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("_doc");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
Id is a dynamic field and for each Id I am running a loop using above code.
I have not tested this but you can check If this can be of any help
Option 1:
List<String> ids = new ArrayList<>();
ids.add("1");
ids.add("2");
for (String id : ids) {
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("_doc");
updateRequest.id("1");
updateRequest.doc(jsonBuilder().startObject().field("gender", "male").endObject());
bulkRequest.add(updateRequest);
}
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
Option 2: Using Script
List<String> ids = new ArrayList<>();
ids.add("1");
ids.add("2");
Map<String, Object> params = new HashMap<>();
String scriptCode = "ctx._source.gender=params.gender";
params.put("gender", "male");
BulkRequestBuilder bulkRequest = client.prepareBulk();
for(String id : ids) {
UpdateRequestBuilder updateRequestBuilder = client.prepareUpdate("index", "type", id)
.setScript(new Script(ScriptType.INLINE, "painless", scriptCode, params));
bulkRequest.add(updateRequestBuilder);
}
BulkResponse bulkResponse = bulkRequest.execute().actionGet();

How to fetch the particular data from Mongo subdocument in java?

I have tried to fetch data for the particular column value in the mongo document but its displaying whole data.
Following is the mongo document:
{
"_id" : ObjectId("59db2321811a592384865711"),
"User_ID" : "demo",
"Project_ID" : "demo-1",
"Project_Information" : {
"Project_Description" : "Sample",
"Primary_Building_Type" : "Office",
"State" : "AR",
"Analysis_Type" : "1",
"Project_Billing_Number" : "WY",
"Country" : "USA",
"Climate_Zone" : "3A",
"Zip_Code" : "71611"
"City" : "WA",
"Units" : "IP"
}
}
I want to fetch the following output:
[
{
"User_ID": "demo",
"Project_Description": "Sample"
}]
I have tried using dot: Project_Information.Project_Description.The code is as below:
public Object[] addDemo1(String User_ID) throws Exception {
DB db = ConnectToDB.getConnection();
Properties prop = new Properties();
InputStream input = null;
input = GetProjectStatus.class.getClassLoader().getResourceAsStream("config.properties");
prop.load(input);
String col = prop.getProperty("COLLECTION_PI");
System.out.println("data is.." + col);
DBCollection collection = db.getCollection(col);
BasicDBObject obj = new BasicDBObject();
BasicDBObject fields = new BasicDBObject();
BasicDBObject fields2 = new BasicDBObject();
List<DBObject> obj1 = null;
if (User_ID != null && !User_ID.equals("") && User_ID.length() > 0) {
obj.put("User_ID", User_ID);
fields.put("_id", 0);
fields.put("User_ID", 1);
fields.put("Project_ID", 1);
fields.append("Project_Information.Project_Description", "Project_Description");
BasicDBObject fields1 = new BasicDBObject();
fields1.put("User_ID", User_ID);
}
DBCursor cursor = collection.find(obj, fields);
System.out.println("count is:" + cursor.count());
obj1 = cursor.toArray();
System.out.println("" + obj1);
cursor.close();
db.getMongo().close();
return obj1.toArray();
}
But it displays the whole structure of Project_Information.
Please specify how to achieve this. Thanks for help.
Using a 2.x MongoDB Java Driver
Here's an example using the MongoDB 2.x Java driver:
DBCollection collection = mongoClient.getDB("stackoverflow").getCollection("demo");
BasicDBObject filter = new BasicDBObject();
BasicDBObject projection = new BasicDBObject();
// project on "Project_Information.Project_Description"
projection.put("Project_Information.Project_Description", 1);
DBCursor documents = collection.find(filter, projection);
for (DBObject document : documents) {
// the response contains a sub document under the key: "Project_Information"
DBObject projectInformation = (DBObject) document.get("Project_Information");
// the "Project_Description" is in this sub document
String projectDescription = (String) projectInformation.get("Project_Description");
// prints "Sample"
System.out.println(projectDescription);
// to return this single String value in an Object[] (as implied by your OP) just do create the Object[] like this and then return it ...
Object[] r = new Object[] {projectDescription};
// prints the entire projected document e.g.
// { "_id" : { "$oid" : "59db2321811a592384865711" }, "Project_Information" : { "Project_Description" : "Sample" } }
System.out.println(document.toString());
}
Using a 3.x MongoDB Java Driver
Here's an example using the MongoDB 3.x Java driver:
// this finds all documents in a given collection (note: no parameter supplied to the find() call)
// and for each document it projects on Project_Information.Project_Description
FindIterable<Document> documents =
mongoClient.getDatabase("...").getCollection("...")
.find()
// for each attrbute you want to project you must include its dot notation path and the value 1 ...
// this is the equivalent of specifying {'Project_Information.Project_Description': 1} in the MongoDB shell
.projection(new Document("Project_Information.Project_Description", 1));
for (Document document : documents) {
// the response contains a sub document under the key: "Project_Information"
Document projectInformation = (Document) document.get("Project_Information");
// the "Project_Description" is in this sub document
String projectDescription = projectInformation.getString("Project_Description");
// prints "Sample"
System.out.println(projectDescription);
// to return this single String value in an Object[] (as implied by your OP) just do create the Object[] like this and then return it ...
Object[] r = new Object[] {projectDescription};
// prints the entire projected document e.g. { "_id" : { "$oid" : "59db2321811a592384865711" }, "Project_Information" : { "Project_Description" : "Sample" } }
System.out.println(document.toJson());
}
Java libraries won't let you directly access using dot.
They have build in getter and setter methods.
You have not mentioned which package you are using.
Here's the query that you need:
db.mycol.find({},{User_ID:1,"Project_Information.Project_Description":1})
It will give:
{ "_id" : ObjectId("59db2321811a592384865711"),
"User_ID" : "demo",
"Project_Information" : { "Project_Description" : "Sample" }
}
You will have to convert the query in whatever format your package accepts.
Here's a tutorial:
https://www.mongodb.com/blog/post/getting-started-with-mongodb-and-java-part-i

Efficient way to get only _ids in ElasticSearch - Java

Is this the most efficient way to retrieve only ids from ElasticSearch?
requestBuilder.setQuery(queryBuilder);
requestBuilder.setFrom(start);
requestBuilder.setSize(limit);
requestBuilder.setFetchSource(false);
SearchResponse response = requestBuilder.execute().actionGet();
SearchHit[] hits = response.getHits().getHits();
List<Long> refugeeIds = new ArrayList<>();
for (SearchHit hit : hits) {
if (hit.getId() != null) {
refugeeIds.add(Long.parseLong(hit.getId().toString()));
}
}
That should be the best way. You don't return the _source and ES will only return the _type, _index, _score and _id.

Categories