Solr MultiCore Search - java

I am using Apache Solr for search. I use this to provide personal user-based search. i.e. each user has a separate physical Lucene Index. So for 10 users, I have 10 separate physical indexes on disk.
To support searches on these indexes, I am planning to use Solr MultiCore Feature. With the various articles I have been reading regarding this, it looks like this would work.
Where I am actually not sure is, when a solr searcher gets a query, instead of sending the query to all the multiple-cores, how do I funnel the query to that core which has that particular user's index connected to? Is this a config change or do I need to do code level changes?
i.e. I want to send the query to only one solr-core (based on userid). Is this even possible?
UPDATE: So according to one of the solutons I can add multi-cores in the solrconfig.xml i.e. at the time of starting solr I'll need to mention the cores (or in my case the users). So now, if I want to add a new user's index, I'll probably need to stop solr, edit its config, add that users core & start solr again. Is there any way to dynamically add cores to a running solr instance?

Solr cores are essentially multiple indices run in the same context on an application server. You can think of it as installing 1 war-file for each user. Each core is separated by a name, hence you must yourself keep track of which url is valid for which user.
E.g.,
http://host.com/solr/usercore1/select?q=test
http://host.com/solr/usercore2/select?q=test
Which is based on the config solr.xml:
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
<core name="usercore1" instanceDir="usercore1" />
<core name="usercore2" instanceDir="usercore1" />
</cores>
</solr>
...instead of sending the query to all the multiple-cores...
This approach is called sharding and is based on distributed searching, which is a completely separate feature which focuses on splitting one users index over multiple solr instances.
[EDIT]
One approach to creating new cores is by solrj which provides a routine CoreAdmin.createCore(..). You could also do this using a manual HTTP request: /cores?action=CREATE&name=usercore3...
Solr can also reload its config dynamically, if you had a script in place which edited the cores config these changes should be picked up too.

You can combine multicore with sharding via this following URL :
http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=*:*

Im using solrj.
First creating cores. I found 2 ways.
first way:
SolrCore solrCore = coreContainer.create(new CoreDescriptor(
coreContainer,
coreName,
"."));
coreContainer.register(solrCore, true);
second way:
SolrQuery solrQuery = new SolrQuery();
solrQuery.setParam(CommonParams.QT, "/admin/cores");
solrQuery.setParam(
CoreAdminParams.ACTION,
CoreAdminParams.CoreAdminAction.CREATE.name());
solrQuery.setParam(
CoreAdminParams.NAME,
name);
solrQuery.setParam(
CoreAdminParams.INSTANCE_DIR,
"./" + name);
solrQuery.setParam(
CoreAdminParams.CONFIG,
solrHomeRelativePath + solrConfigHomeRelativePath);
solrQuery.setParam(
CoreAdminParams.SCHEMA,
solrHomeRelativePath + solrSchemaHomeRelativePath);
solrQuery.setParam(
CoreAdminParams.DATA_DIR,
".");
solrServer.query(solrQuery);
to query a particular core I just do :
SolrServer solrServer = new EmbeddedSolrServer(coreContainer, coreName);
and then perform my queries the way I usually do using solrj.
So in your case, you would simply get the corename associated with the user doing a search request. The coreContainer instance would be shared but not the SolrServer instance.
By the way Im doing something similar to you!
See you.

When you use mmulticore, you create separate conf folder that contain separate query and schema and and the way the results are being fetched.
And when you hit the below url
http://{your localhost}:8983/solr
The you will see the list of cores that you have created. and for each core you will have to create index like this
http://{your localhost}:8983/solr/{your_core_name1}/dataimport?command=full-import
http://{your localhost}:8983/solr/{your_core_name2}/dataimport?command=full-import
http://{your localhost}:8983/solr/{your_core_name3}/dataimport?command=full-import
and after creating index, you will have to refer to the core when performing search like this,
http://{your localhost}:8983/solr/{your_core_name3}/select/?q=*:*&start=0&rows=100

Related

CMIS query trying to retrieve folders/files under specific path returns no documents

Greetings to the community! I am using alfresco community edition 6.0.0 and I just faced a very weird problem. I am using the Java API to access my alfresco repository by running CMIS queries. I successfully fetched documents using cmis-strict like shown below:
Example 1)
select * from cmis:document WHERE cmis:name like '%doc%' AND cmis:objectId = 'e318a431-0ff4-4a4a-9537-394d2bd761af' "
Example 2)
SELECT * FROM cmis:document WHERE IN_FOLDER('63958f9c-819f-40f4-bedf-4a2e402f8b9f') AND cmis:name like '%temp%'
which work perfectly, what I would like to do is retrieve files/folders under a specific path (f.e fetch all folders under /app:company_home/app:user_homes)
what I do is running from the node browser of alfresco the following cmis-strict query
SELECT * FROM cmis:folder WHERE CONTAINS('PATH:"//app:company_home/app:user_homes//*"')
but even though there are existing folders under that directory nothing is returned. It seems that the PATH argument is not recognized as it should does, as when i run the query
SELECT * FROM cmis:folder I get back many results that have as parent the
app:company_home/app:user_homes
node
Any idea what may be the problem? Any help would be greatly appreciated, thanks :)
EDIT:
I have also tried to use lucene query like
PATH:"/app:company_home/app:user_homes//*") but no results returned too
Your user homes contains query works for me in both 5.2 and 6.1.1.
I like #Lista's suggestion of checking into your index. If that doesn't bear fruit, you might go get the CMIS object ID of the user homes folder, then use it with the IN_FOLDER clause you've already proven works.
I think both Lucene and CMIS queries (if using CONTAINS) end up on the index (not database), so it's not weird to assume something is off with the index itself. Have you tried rebuilding them? Are your nodes even in index (there's a SOLR admin console you can use to see this)?
https://docs.alfresco.com/6.0/concepts/query-lang-support.html

Retrieving full objects from a query done via Bolt protocol

In Neo4J, I want to use the bolt protocol.
I installed the 3.1 version of Neo4J.
I my Java project, that already works well with normal HTTP Rest API of Neo4J, I integrate with Maven the needed drivers and achieve to perform request with BOLT.
The problem is everywhere you make a search about bolt they give example like this one :
MATCH (a:Product) return a.name
But I don't want the name, I want all the data of all product, what ever i know or not before what are these columns, like here:
MATCH (a:Product) return * --> here I retrieve only the ids of nodes
I found there https://github.com/neo4j-contrib/neo4j-jdbc/tree/master/neo4j-jdbc-bolt we can "flatten" the result but it seems to not work or I didn't understand how it works:
GraphDatabase.driver( "bolt://localhost:7687/?flatten=-1", AuthTokens.basic( "neo4j", "......." ) );
I put the ?flatten=-1 at the end of my connection address... but that changed nothing.
Anyone can help? Or confirm it's not possible or not working ?
Thanks
Ok I understood my error, I didn’t dig enough in the object returned. So used to have a JSON formatted response, I didn’t see that I have to search in the StatementResult object to find the wanted object with its properties. In fact Eclipse in the “expressions” shows “in fly” only the ids, but inside the object data are there.
Record oneRecord = rs.next();
String src = oneRecord.get("m").get("source");
That way I can reconstruct my object

Java mysql using url patterns

I'm trying to build a java http server connected to a database (mysql) where one of my objectives are to NOT use a gui.
Instead, to let the "user" insert data by simply entering the desired input on top of a URL
For example:
in this url, the user is inserting his id number = (123) and his name = (JOE)
localhost/employee/add?id=123&name=JOE
or a remove example, removing a row by id only..
localhost/employee/remove?id=123
I was searching the web for a code example for over 4 hours and got to early morning hours with no luck :/
All I found was that it's called "url patterns" which you configure via java EE & net beans or similar platforms (all that I've encountered so far..)
but none of the tutorials/explanations wont demonstrate how to implement that by sending queries and configuring multiple url patterns together.
Can someone explain, demonstrate this technique ?
Thousand thankyous in advance, Iguana.

How can I configure my index to use BM25 in ElasticSearch using the JAVA API?

I'm trying to migrate from MySQL database to ElasticSearch so I can use the full-text search technique using BM25 similarity over each fields. I'm using JAVA to fetch entries from MySQL and add them into the ElasticSearch index.
I'm building my index using the JAVA index API, but I can't figure out a way to set the BM25 similarity over my fields.
I consider a table products table from MySQL and dev as an index with products as it's index type.
The original table products contains the following fields :
id
title
description
You can find the code on my Github, If you'd like to take a look.
It's forked project that I've configured with Maven integration.
Any suggestion and any help is welcome, Thanks!
I found the answer for my question.
Here is the code :
Settings settings = ImmutableSettings
.settingsBuilder()
.put("cluster.name", "es_cluster_name"))
// Define similarity module settings
.put("similarity.custom.type", "BM25")
.put("similarity.custom.k1", 2.0f)
.put("similarity.custom.b", 1.5f)
.build();
Client client = new TransportClient(settings);
It seems that you can define the similarity modules you wish to use in the Settings before you instantiate your Client.
Here is the list of similarity modules that are supported by elasticsearch for the moment :
default, BM25, DFR, IB, LMDirichlet and LMJelinekMercer. You can specify which one you want to use in the Settings like below :
.put("similarity.custom.type", "..." )
Each similarity has its own parameters which you would want to configure as well in order to use it properly.
Note: Code tested on elasticsearch 1.1.0.

Transform Cassandra query result to POJO with Astyanax

I am working in a Spring web application using Cassandra with Astyanax client. I want to transform result data retrieved from Cassandra queries to a POJO, but I do not know which library or Astyanax API support this.
For example, I have User column family (CF) with some basic properties (username, password, email) and other related additional information can be added to this CF. Then I fetch one User row from that CF by using OperationResult> to hold the data returned, like this:
OperationResult<ColumnList<String>> columns = getKeyspace().prepareQuery(getColumnFamily()).getRow(rowKey).execute();
What I want to do next is populating "columns" to my User object. Here, I have 2 problems and could you please help me solve this:
1/ What is the best structure of User class to hold the corresponding data retrieved from User CF? My suggestion is:
public class User {
String userName, password, email; // Basic properties
Map<String, Object> additionalInfo;
}
2/ How can I transform the Cassandra data to this POJO by using a generic method (so that it can be applied to every single CF which has mapped POJO)?
I am so sorry if there are some stupid dummy things in my questions, because I have just approached NoSQL concepts and Cassandra as well as Astyanax for 2 weeks.
Thank you so much for your help.
You can try Achilles : https://github.com/doanduyhai/achilles, an JPA compliant Entity Manager for Cassandra
Right now there is a complete implementation using Thrift API via Hector.
The CQL3 implementation using Datastax Java Driver is in progress. A beta version will be available in few months (July-August 2013)
CQL3 is great but it's still too low level because you need to extract the data yourself from the ResultSet. It's like coming back to the time when only JDBC Template was available.
Achilles is there to fill the gap.
I would suggest you to use some library like Playorm using which you can easily perform CRUD operations on your entities. See this for an example that how you can create a User object and then you can get the POJO easily by
User user1 = mgr.find(User.class, email);
Assuming that email is your NoSqlId(Primary key or row key in Cassandra).
I use com.netflix.astyanax.mapping.Mapping and com.netflix.astyanax.mapping.MappingCache for exactly this purpose.

Categories