Apache Solr slow search - java

I am struggling with solr to make a better search than current implementation on my code.
The current code looks into some caches/hashmaps to retrieve data and what I want to do is to optimize the query response time.
So I already indexed 2 version of documents (some simple documents which does not contain othe objects inside them.only strings and ints). and everything works great.
But now I'm facing another problem while I'm trying to index another core for a more complex bean.
I have a bean like:
Public class Person{
String name;
String surname;
List<Adresse> adress;
List<Stuff> stuff;
List<HashMap<String,String>> otherStuff;
}
Solr helped me only by mapping the simple Lists and the List of Maps, so I mannualy mapped the remaining members (Lists) by transforming from object to List of strings and viceversa from string to Object and set the value into current obtaining object.
But this approach caused really slow response times for my queries.
I am also facing another problem.The execution times gets very slow while I'm receiving more that 10 documents from the index.
Can you guys please help me with suggestions/ideas on how to make all this faster ???

If you have a very complex structure, you might be better off not trying to get it back from Solr. Instead, have the field definitions with stored=false and just get back IDs. Then, round-trip to your original source to get the actual objects.
Then, Solr because just the way to search and you can skip sending any fields to it that you are not searching against.

Related

Mapping neo4j ogm query results to java objects

I'm collecting infos from a neo4j db, but the values I return are picked out from multiple nodes, so what I'm basically returning is a table with some properties. For this example, let's say I return the properties color:String, name:String, count:String. I query these results using session.query(*QUERY*, queryParams).
Now, when I get the results, I want to map it to an existing Java Object, that I created to hold this data. This is kind of different to the 'normal' mapping, as in general, you want to map your graph nodes to objects that represent those nodes. Here, my POJOs have nothing to do with the graph nodes.
I managed to do this using custom CompositeAttributeConverter classes for each of my data-objects, but I feel there must be a better solution than writing a new class for every new object.
You might want to take a look at executing arbitrary Cypher queries using the Session object. You can get an Iterable<Map<String,Object>> from the returned Result object, which you could process over or just output to a collection of Map results.
Or, if you have APOC Procedures installed, you can always write up a query to return your results as a JSON string, and convert that to JSON objects in Java with the appropriate library and use those as needed.

Change Field name in elasticsearch response

I need to change field names in elastic search response (ex. change "title" to "header"). i want to avoid parsing the Json response which take much time.
is there any way to do that?
i'm afraid this might not be available in elasticsearch. you might have to parse the response. consider
Aliasing
One of the things introduced in Apache Solr 4.0 and not available in ElasticSearch right now is the ability to transform result documents. First of all Solr allows you to alias returned fields, so for example you can return field price_usd or price_eur as price depending on your needs. The second thing is the ability to return values returned by functions as a (pseudo) field in the result (or fields). Solr also has the ability to return fields which start with a given prefix (for example all fields starting with price). Apart from the ability to get a function value as a field added to matched documents on the fly other functionalities are not ground breaking, though they can be handy in some cases.
from http://blog.sematext.com/2012/10/01/solr-vs-elasticsearch-part-3-searching/

Spring Data Neo4j and queries

I'm trying to write a query that returns a fairly large amount of data (200ish nodes). The nodes are very simple:
public class MyPojo{
private String type;
private String value;
private Long createdDate;
...
}
I originally used the Spring Data Neo4j template interface, but found that it was very slow after around 100 nodes were returned.
public interface MyPojoRepository extends GraphRepository<MyPojo> {
public List<MyPojo> findByType(String type);
}
I turned on debugging to see why it was so slow, and it turned out SDN was making a query for each node's labels. This made sense, as I understand SDN it needs the labels to do its duck-typing. However, Cypher returns all pertinent data in one go so there's no need for this.
So, I tried rewriting it as a Cypher query instead:
public interface MyPojoRepository extends GraphRepository<MyPojo> {
#Query("MATCH(n:MyPojo) WHERE n.type = {0} RETURN n")
public List<MyPojo> findByType(String type);
}
This had the same problem. I dug a little deeper, and while this query returned all node data in one go, it leaves out the labels. There is a way to get them, which works in the Neo4j console so I tried it with SDN:
"MATCH(n:MyPojo) WHERE n.type = {0} RETURN n, labels(n)"
Unfortunately, this caused an exception about having multiple columns. After looking through the source code, this makes sense because Neo4j returns a map of returned columns which in my case looked like: n, labels(n). SDN would have no way of knowing that there was a label column to read.
So, my question is this: is there a way to provide labels as part of the Cypher query to prevent needing to query again for each node? Or, barring that, is there a way to feed SDN a Node containing labels and properties and convert it to a POJO?
Note: I realize that the SDN team is working on using Cypher completely under the hood in a future release. As of now, a lot of the codebase uses the old (and, I believe, deprecated) REST services. If there is any future work going on that would affect this, I would be overjoyed to hear about it.
You're right it would be solvable for the simple use-case and should also solved.
Unfortunately the current APIs don't return the labels as part of the node so we would have to rewrite the inner workings to generate the extra meta-information and return all of that correctly.
One idea is to use RETURN {id:id(n), labels:labels(n), data:n)} as n for the full representation.
The problem is this breaks with user defined queries.
Not sure when and how we can schedule that work. Feel free to raise it as JIRA issue or watch/upvote any related issues.

In MongoDB: longer property names make bigger documents / use more memory?

I use Mongo with its Java driver and Morphia. I am mapping this class:
public class Transaction {
#Id
private ObjectId id;
private String transactionUniqueIdentifier;
}
I see in the console that Mongo saves a transaction in a form like {transactionUniqueIdentifier: "xjeer"}
Does it mean I should use shorter property names ("uuid" instead of "transactionUniqueIdentifier") to get a smaller database? Or is there a setting in Mongo which would deal with that for me (create shorter names internally...).
Any pointer would be appreciated, thx.
There is no internal mapping of field names within MongoDB.
Whether or not to create shorter names depends upon numerous things, including usage of the document in Map Reduce, overall size of the document and hardware on your servers.
For example if your document is easier to handle in Map Reduce using transactionUniqueIdentifier rather than uuid and the document is quite small (let's say about 5KB each) and you have SSDs (probably not needed actually) then the shrinking of field names becomes almost useless.
Some would argue that it isn't but real world usage dictates that you have bigger things to worry about.
However, if you were to have a lot of fields with long names or you were you have longer names than transactionUniqueIdentifier then you might want to look into shortening them else you could be spending most of your time loading the field names of a document from disk instead of loading the actual values (since the fieldnames will be bigger than the total size of the documents values).
There are plans to compress the field names however, as of yet, other features have taken priority.
With morphia, the document property names match the java field names by default. You can name them however you'd like to be serialized into mongodb using the #Property annotation and providing whatever name you'd like.

How to search across multiple fields in Lucene using Query Syntax?

I'm searching a lucene index and I'm building search queries like
field1:"hello" AND field2:"world"
but I'd like to search for a value in any field as well as the values in specific fields in the same query i.e.
field1:"hello" AND anyField:"world"
Can anyone tell me how I can search across all indexed fields in this way?
Based on the answers I got for this question: Impact of repeat value across multiple fields in Lucene...
I can put the same search term into multiple fields and therefore create an "all" field which I put everything in. This way I can create a query like...
field1:"hello" AND all:"world"
This seems to work very nicely, prevents the need for huge search queries, and apparently the performance impact is minimal.
Boolean (OR) queries with a clause for each field are used to search multiple fields. The MultiFieldQueryParser will do that as well, but the fields still need to be enumerated. There's no implicit "all" fields; but IndexReader.getFieldNames can acquire them.
This might not apply to you, but in Azure Search, which is based on Lucene, using Lucene syntax, I use this:
name:plywood^100 OR plywood
Results with "plywood" in the "name" field are boosted.

Categories