Spring Elasticsearch - bulk save multiple indices in one line?

Spring Elasticsearch - bulk save multiple indices in one line? - java

I have multiple documents with different index name that bulk saves in elasticsearch:
public void bulkCreateOrUpdate(List personUpdateList, List addressUpdateList, List positionUpdateList) {
this.operations.bulkUpdate(personUpdateList,Person.class);
this.operations.bulkUpdate(addressUpdateList, Address.class);
this.operations.bulkUpdate(positionUpdateList, Position.class);
}
However, is this still possible to be optimized by calling just a single line, saving multiple list of different index types?

Tldr;
The bulk api certainly allows for it.
This is a valid call
POST _bulk
{"index":{"_index":"index_1"}}
{"data":"data"}
{"index":{"_index":"index_2"}}
{"data":"data"}
How does your Java Client deal with it ... I am not sure.
Solution
Java client - Bulk
This could be done:
BulkRequest.Builder br = new BulkRequest.Builder();
br.operations(op -> op
.index(idx -> idx
.index("index_1")
.id("1")
.document(document)
)
);
br.operations(op -> op
.index(idx -> idx
.index("index_2")
.id("1")
.document(document)
)
);
Java Rest Client - Bulk
This could be done this way:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("index_1").id("1")
.source(XContentType.JSON,"data", "data"));
request.add(new IndexRequest("index_2").id("1")
.source(XContentType.JSON,"data", "data"));

For Spring Data Elasticsearch :
The ElasticsearchOperations.bulkXXX() methods take a List<IndexQuery> as first parameter. You can set an index name on each of these objects to specify in which index the data should be written/updated. The index name taken from the last parameter (either the entity class or an IndexCoordinates object) is used in case that no index name is set in the IndexQuery.

Related

how to use mongodb java-driver Projections.slice

I am trying to use Aggregates.project to slice the array in my documents.
My documents is like
{
"date":"",
"stype_0":[1,2,3,4]
}
in the mongochef looks like
and my code in java is :
Aggregates.project(Projections.fields(
Projections.slice("stype_0", pst-1, pen-pst),Projections.slice("stype_1", pst-1, pen-pst),
Projections.slice("stype_2", pst-1, pen-pst),Projections.slice("stype_3", pst-1, pen-pst))))
finally i get error
First argument to $slice must be an array, but is of type: int
I guess that is because the first element in stype_0 is int , but I really do not know why? Thanks a lot!

Slice has two versions. $slice(aggregation) & $slice(projection). You are using the wrong one.
Aggregate slice function doesn't have any built-in support. Below is an example for one such projection. Do the same for all the other projection fields.
List stype_0 = Arrays.asList("$stype_0", 1, 1);
Bson project = Aggregates.project(Projections.fields(new Document("stype_0", new Document("$slice", stype_0))));
AggregateIterable<Document> iterable = dbCollection.aggregate(Arrays.asList(project));

Which naming pattern should i use for basic CRUD?

I'm creating a RestFul API for the first time using spring framework and now im a bit confused about the common labels used to create, read, update and delete. I want to follow a pattern for an easy maintenance in the code. Is there any rule or naming pattern for the labels that I should follow?
Im thinking about:
/service -> return every services
/service/new -> create new service
/service/update -> update service
/service/delete -> delete service

Use the HTTP verb to control what you want to do with the resouces:
GET: /services -> returns all elements
GET: /services/{id} -> returns element with id
POST: /services -> creates a new object, pass the object in the body
PUT: /services/{id} -> updates element with id, pass updated values in body
DELETE: /services/{id} -> delete element with id
I strongly recommend you use query params for paging in GET: /services, return a default number on page 1 if it's not listed.
A full request could look like: http://www.example.com/services?page=5&count=10

ElasticSearch Java API to query and return single column instead of the whole json document

While searching using java api in elaticsearch, I would like to retrieve only one column.
Currently when I query using the Java API it returns the whole record like this: [{_id=123-456-7890, name=Wonder Woman, gender=FEMALE}, {_id=777-990-7890, name=Cat Woman, gender=FEMALE}]
The record above correctly matches the search condition shown in th . As shown in the code below:
List<Map<String, Object>> result = new ArrayList<Map<String, Object>>();
SearchRequestBuilder srb = client.prepareSearch("heros")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH);
MatchQueryBuilder mqb;
mqb = QueryBuilders.matchQuery("name", "Woman");
srb.setQuery(mqb);
SearchResponse response = srb.execute().actionGet();
long totalHitCount = response.getHits().getTotalHits();
System.out.println(response.getHits().getTotalHits());
for (SearchHit hit : response.getHits()) {
result.add(hit.getSource());
}
System.out.println(result);
I want only one column to be returned. If I search for name I just want the full names back in a list: "Wonder Woman", "Cat Woman" only not the whole json record for each of them. If you think I need to iterate over the result list of maps in java please propose an example of how to do that in this case.

You can specify the fields to be returned from a search, per documentation. This can be set via SearchRequestBuilder.addFields(String... fields), ie:
SearchRequestBuilder srb = client.prepareSearch("heros")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.addFields("name");

Better combine both:
use .addFields("name") to tell ES that it needs to return only this
column
use hit.field("name").getValue().toString() to get the result
It is important to use .addFields when you don't need the whole document, but the specific field/s as it will lower the overhead and the network traffic

I figured it out.
List<String> valuesList= new ArrayList<String>();
for (SearchHit hit : response.getHits()) {
result.add(hit.getSource());
valuesList.add(hit.getSource().get("name").toString());
}

The other solutions didn't work for me, hit.getSource() was returning null. Maybe they are deprecated? Not sure. But here was my solution, which FYI can speed things up considerably if you are only getting one field and you are getting lots of results.
Use addFields(Strings) on your SearchRequestBuilder as mentioned, but then when you are getting the values you need to use:
hit.getFields().get( fieldName ).getValue()
or
hit.getFields().get( fieldName ).getValues()
to get a single value or a list of values depending on the field.

Faceting using SolrJ and Solr4

I've gone through the related questions on this site but haven't found a relevant solution.
When querying my Solr4 index using an HTTP request of the form
&facet=true&facet.field=country
The response contains all the different countries along with counts per country.
How can I get this information using SolrJ?
I have tried the following but it only returns total counts across all countries, not per country:
solrQuery.setFacet(true);
solrQuery.addFacetField("country");
The following does seem to work, but I do not want to have to explicitly set all the groupings beforehand:
solrQuery.addFacetQuery("country:usa");
solrQuery.addFacetQuery("country:canada");
Secondly, I'm not sure how to extract the facet data from the QueryResponse object.
So two questions:
1) Using SolrJ how can I facet on a field and return the groupings without explicitly specifying the groups?
2) Using SolrJ how can I extract the facet data from the QueryResponse object?
Thanks.
Update:
I also tried something similar to Sergey's response (below).
List<FacetField> ffList = resp.getFacetFields();
log.info("size of ffList:" + ffList.size());
for(FacetField ff : ffList){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
log.info("ffname:" + ffname + "|ffcount:" + ffcount);
}
The above code shows ffList with size=1 and the loop goes through 1 iteration. In the output ffname="country" and ffcount is the total number of rows that match the original query.
There is no per-country breakdown here.
I should mention that on the same solrQuery object I am also calling addField and addFilterQuery. Not sure if this impacts faceting:
solrQuery.addField("user-name");
solrQuery.addField("user-bio");
solrQuery.addField("country");
solrQuery.addFilterQuery("user-bio:" + "(Apple OR Google OR Facebook)");
Update 2:
I think I got it, again based on what Sergey said below. I extracted the List object using FacetField.getValues().
List<FacetField> fflist = resp.getFacetFields();
for(FacetField ff : fflist){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
List<Count> counts = ff.getValues();
for(Count c : counts){
String facetLabel = c.getName();
long facetCount = c.getCount();
}
}
In the above code the label variable matches each facet group and count is the corresponding count for that grouping.

Actually you need only to set facet field and facet will be activated (check SolrJ source code):
solrQuery.addFacetField("country");
Where did you look for facet information? It must be in QueryResponse.getFacetFields (getValues.getCount)

In the solr Response you should use QueryResponse.getFacetFields() to get List of FacetFields among which figure "country". so "country" is idenditfied by QueryResponse.getFacetFields().get(0)
you iterate then over it to get List of Count objects using
QueryResponse.getFacetFields().get(0).getValues().get(i)
and get value name of facet using QueryResponse.getFacetFields().get(0).getValues().get(i).getName()
and the corresponding weight using
QueryResponse.getFacetFields().get(0).getValues().get(i).getCount()

how do i remove redundant tuples in microarray data using java programming?

In WEKA-a data mining software for the MICROARRAY DATA, how can i remove the redundant tuples from the existing data set? The code to remove the redundancy should be in JAVA.
i.e, the data set contains data such as
H,A,X,1,3,1,1,1,1,1,0,0,0
D,R,O,1,3,1,1,2,1,1,0,0,0
H,A,X,1,3,1,1,1,1,1,0,0,0
C,S,O,1,3,1,1,2,1,1,0,0,0
H,A,X,1,3,1,1,1,1,1,0,0,0
here the tuples 1,4,5 are redundant.
The code should return the following REDUNDANCY REMOVED data set...
H,A,X,1,3,1,1,1,1,1,0,0,0
D,R,O,1,3,1,1,2,1,1,0,0,0
C,S,O,1,3,1,1,2,1,1,0,0,0

You could use one of the classes that implements the Set such as java.util.HashSet.
You can load your data set into the Set and then extract them either by converting to an array via the Set.toArray() method or by iterating over the set.
Set<Tuple> tupleSet = new HashSet<Tuple>();
for (Tuple tuple: tupleList) {
tupleSet.add(tuple);
}
// now all of your tuples are unique
for (Tuple tuple: tupleSet) {
System.out.println("tuple: " + tuple);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring Elasticsearch - bulk save multiple indices in one line? - java

Related

how to use mongodb java-driver Projections.slice

Which naming pattern should i use for basic CRUD?

ElasticSearch Java API to query and return single column instead of the whole json document

Faceting using SolrJ and Solr4

how do i remove redundant tuples in microarray data using java programming?

Categories

Resources