I posted this in the Nabble group also, but figured may get some advice here.
is there a way to get SOLR to search whatever index i tell it to during search time without using multiple cores?
i dont build my indexes with SOLR, i build them with my own java class, but i do use SOLR to search them later. It would be nice to tell Solr during search time which index to access.
I have combined them as well, and this works but there are a few issues in my particular case, that make it easier to solve with sending the index name/path at search.
thanks
I don't really think you can really do what you are looking for here. Part of the simplicity of Solr comes from have the core (and therefore index) in the URL. What you could do is hack how Solr works to add another parameter to the url and then when Solr goes to do a search use that to determine which index it uses. I think you might end up throwing out all the auto warming of caches etc though.
Out of curiosity, why do you NOT want to use multiple cores? Is it that you expect to have thousands and thousands, or that each index is incredibly transient?
Eric
You might want to look into Lucid Works' Enterprise Solr release.
http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
They implement Collections in a way that is similar to your use case.
Related
My current search engine involves two desktop applications based on Lucene (java). One is dedicated to indexing internal documents, the other one to searching.
Now I have been asked to offer the search engine as a web page. So my first thought was to use Solr, so I read the manual (https://lucene.apache.org/solr/guide/7_4/overview-of-searching-in-solr.html) but then I realized that during the indexing phase we have special processing for PDFs. For example we detect whether the PDF originates from a scanned document, or we limit the number of pages that will be OCRed in a scanned PDFs since only the first pages are valuable for search. For now everything works via calls to Lucene API in classes with lots of if!
So my question is : should I use solrj to customize the indexing to our needs, should I keep the current indexing part and only use Solr(j) for searching, or should I overrides some Solr classes to meet our needs and avoid reinventing the wheel. For the latter (overriding Solr classes) how should I do ?
Thank you very much in advance for your advices
While this is rather opinion based - I'll offer my opinion. All your suggested solutions would work, but the best one is to write the indexing code as a separate process, externally to Solr (i.e. re-use your existing code that pushes data to a Lucene index directly today).
Take the tool you have today, and instead of writing data to a Lucene index, use SolrJ and submit the document to Solr instead. That will abstract away the Lucene part of the code you're using today, but will still allow you to process PDFs in your custom way. Keeping the code outside of Solr will also make it far easier to update Solr in the future, or switch to a newer version of the PDF library you're using for parsing without having to coordinate and integrate it into Solr.
It'll also allow you to run the indexing code completely separate from Solr, and if you decide to drop Solr for another HTTP interfaced technology in the future (for example Elasticsearch which is also based on Lucene), you can rip out the small-ish part that pushes content to Solr and push it to Elasticsearch instead.
Running multiple indexing processes in parallel is also easier when as much as possible of the indexing code is outside of Solr, since Solr will only be concerned with the actual text - and don't have to spend time processing and parsing PDFs when it should just be responding to user queries (and your updates) instead.
I am using Lucene 3.6 and i am trying to implement FieldCache. I have seen some posts but did not get any clear idea. Can anyone please suggest me any link where i can find proper example of FieldCache and how to use it while searching.
You dont typically use it directly, it is an internal API used by Lucene.
If you are extending Lucene's search API and need to use it, you will need to provide more details.
I have an opensearch description and am supposed to send my queries using that programmatically. Unfortunately, I'm really new to all of this- are there any examples out there using Lucene and Java?
Thanks!
I don't know nothing about OpenSearch but I can give you some hints of Lucene. To make Lucene search you have to first make a Lucene index of your data (text files, xml files, database, etc..).
There are a lot of example how to use Lucene all over the internet. This looks quite good example how to make a Lucene index.
This covers all: indexing, searching and displaying result (It is very simple example). Because your question is not very specific, this is as I can help. If you'll have a specific problem, I would be happy to help.
I am making an application where I am going to need full text search so I found Compass, but that project is no longer maintained and is replaced by elasticsearch. However I don't understand it. Is it a own server that I need to do requests (get, put) etc against and then parse the JSON response? Are there no annotations like in Compass? I don't understand how this is a replacement and how I use it with Java EE.
Or are there other better projects to use?
Elasticsearch is a great choice nowadays, if you liked Compass you'll love it. Have a look at this answer that the author gave here on which he explains why he went ahead creating elasticsearch after Compass. In fact elasticsearch and Solr make both the use of Lucene pretty easy, adding also some features to it. You basically have a whole search engine server which is able to index your data, which you can then query in order to retrieve the data that you indexed.
Elasticsearch exposes RESTful APIs and it's JSON based, but if you are looking for annotations in the Compass style you can have a look at the Object Search Engine Mapper for ElasticSearch.
I would say give a try on lucene or solr. It creates a DocumentFileSystem for faster indexing
I would recommend either
Elasticsearch, if the system is big and need clustering.
Lucene or Solr, if you want to code at a low level
Hibernate search, if you are using Hibernate as your ORM.
We have recently installed a Google Search Appliance in order to power our internal search (via the Java API), and all seems to be well, however I have a question regarding 'automatic' site-map generation that I'm hoping you guys may know the answer to.
We are aware of the GSA's ability to auto-generate site maps for each of its collections, however this process is rather manual, and considering that we have around 10 regional sites that need to be updated as often as possible, its not ideal to have to log into the admin interface on a regular basis in order to export them to the site root where search engines can find them.
Unfortunately there doesn't seem to be any API support for this, at least none that I can find, so I was wondering if anyone had any ideas for a solution/workaround or, if all else fails, the best alternative.
At present I'm thinking that if we can get the full index back from the API in the form of a list, then we can write an XML file out using that the old fashioned way using a chronjob or similar, however this seems like a bit of a clumsy solution - any better ideas.
You could try the GSA Admin Toolkit, or simply write some code yourself which just logs in on the administration page and then uses that session to invoke the sitemap export URL (which is basically what the Admin Toolkit does).