Import Data to Solr using java - java

I was trying out to upload data to solr server using java.
Is it possible to do so or to create collection and upload data directly from java, or is there any way to do so.
I found two options using DIH and Tika.
Any advise will be helpful.

You can give a try to solrj api: https://wiki.apache.org/solr/Solrj. It can be used to upload/search against solr instance.

If you are running Solr in Cloud (ZooKeeper) mode then using solrj you can create collection.
But upload the configuration to be used by SolrCloud before the collection creation command.
If you are using standalone mode then create collection manually.
Sample code to upload Document at solr server using SOLRJ:
SolrServer server = new HttpSolrServer("http://localhost:8983/solr/CORE_NAME/");
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "1");
doc.addField("Name", "John");
doc.addField("RollNo", "101");
server.add(doc);
UpdateResponse updateResponse = server.commit();
System.out.println(updateResponse.getStatus());
Make sure you have following entries in schema.xml which will be at conf folder of Core.
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="Name" type="text_general" indexed="true" stored="true"/>
<field name="RollNo" type="text_general" indexed="true" stored="true"/>

Related

How to solve Invalid property value in Alfresco

I have introduced two new fields inside Alfresco using Model manager, those fields are List of values. After I run the application, and try to edit a file that contains those fields I get this:
The reason is not familiar to me, because this is not my first time that I create new fields, and before everything was ok.
If I add some values to be saved, then it is ok. I suppose that there is a connection with a database, but I am not sure.
This is the code:
<field id="adadoc:regions" set='regionsAndCountries'>
<control template="/org/alfresco/components/form/controls/select-many-regions.ftl">
<control-param name="mode">and</control-param>
<control-param name="style">width:325px</control-param>
</control>
</field>
<field id="adadoc:countries" set='regionsAndCountries' >
<control template="/org/alfresco/components/form/controls/select-many-countries.ftl">
<control-param name="mode">and</control-param>
<control-param name="style">width:325px</control-param>
</control>
</field>
What could be the problem?

HP ALM REST API Java - How to handle UserLists or any other type of List except LookUp?

I'm trying to update existing defects using REST in HP ALM. Updating LookUpLists is easy, and i succeded in doing so. But there are UserLists type lists, which i can't update in the same way i update lookuplists, it always returns internal server error, so there must be a different way to handle them. Did anyone manage to handle updating userlists with rest?If so, can you please help me?
Just checked with UserLists for defect entity on ALM 12.20.
I used the the field 'assigned to' on ALM client.In the REST uses another name of this field - 'owner'.
I performed the next REST API call:
Method: PUT
URL: http://{your_domain}/qcbin/rest/domains/{your_domain}/projects/{your_proj}/defects/{your_defect_id}
Header: Content-Type=application/xml
Body:
<Entity Type="defect">
<Fields>
<Field Name="owner">
<Value>kevin</Value>
</Field>
</Fields>
</Entity>
And I could get this response:
<Entity Type="defect">
<ChildrenCount>
<Value>0</Value>
</ChildrenCount>
<Fields>
<Field Name="owner">
<Value>kevin</Value>
</Field>
......
</Fields>
<RelatedEntities/>
</Entity>
So just check again all what you request to ALM server attentively.
By the way, check that you have appropriate permissions to change values for concrete UserLists fields which you're going to do.

How to properly outline an Apache Solr document?

What is the difference between delta import and full import in ApacheSolr?
What are all the naming conventions to be followed while writing deltaImportQuery and deltaQuery ( ID, TXT_ID etc), any references or tutorial explaining in detail about differences/relations between deltaImportQuery and deltaQuery, Like what it is and what it does etc, How to write deltaImportQuery and deltaQuery ?
How to configure multiple entities in one document, Suppose if there are three tables in database like T1, T2, T3, Then in schema.xml how to configure this, issue with only one <uniquekey>somename</uniquekey> been considered for each schema.xml file?
How to parse BLOB type input from mysql, following convert ( column_name using utf8 ) as alias_name solves this but what is the right convention ,some other methods are also available like using TikaEntityProcessor/ writing custom BLOBTRANSFORMERS etc ?
Just like ORM any concepts explaining how to denormalize and outline an Apache Solr document , Any showcase project including all use cases and features ?
How to configure entities like this in data-config.xml?
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solrdemo" user="root" password="123" batchSize='1' />
<document name="user">
<entity name ="user1" query = "SELECT * FROM user1">
<field column='id' name='id1' />
</entity>
<entity name ="user2" query = "SELECT * FROM user2">
<field column='id' name='id2' />
</entity>
<entity name ="user3" query = "SELECT * FROM user3">
<field column='id' name='id3' />
</entity>
</document>
</dataConfig>
When the above kind of configuration is done then in schema.xml which id should be configured into <uniquekey></uniquekey> ?
The result of above configuration is
Indexing completed. Added/Updated: 2,866 documents. Deleted 0 documents. (Duration: 03s)
Indexing is successfully completed but 0 documents added / updated , How to resolve this issue ?
Overall any references available for proper conventions and configurations to work with Apache Solr?

Solr dismax highlighting not respecting analyzer

In the schema of Solr 3.6.2 there are two field declarations, text and exact
<field name="text" type="text" indexed="true" stored="true" />
<field name="exact" type="string" indexed="true" stored="true" />
The former using StandardTokenizer and the latter KeywordTokenizer.
Solr queries describing the problem:
?hl=true
&hl.fl=text,exact
&defType=edismax
&qf=text+exact <-------- here
&q=a-b
Highlight output for field exact:
<em>a</em>-<em>b</em>.
The problem is the summary for field exact is produced using the analyzer from text.
?hl=true
&hl.fl=text,exact
&defType=edismax
&qf=exact <-------- here
&q=a-b
Highlight output for field exact:
<em>a-b</em>.
By simply removing text from qf we get the correct analyzer, why?
With debugQuery on
+DisjunctionMaxQuery(((exact:a-b) | ((text:a text:b)~2)))
Solr highlighter after finding a match in exact also seem to match a and b only based on the presence in the query. hl.requireFieldMatch=true does seem to combat that!

Solr search is returning partial string matches

Using Solr 3.6.1, I have this field in my schema.xml:
<field name="names" type="text_general" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="names_*" type="text_general" indexed="true" stored="true"/>
The documentation in the schema.xml states that "text_general" should:
tokenize with StandardTokenizer
removes stop words from case-insensitive "stopwords.txt" (which is currently empty)
down cases the string.
At query time only, it also applies synonyms (which is also empty at this time)
I have two documents indexed in Solr with this data for the field:
<!-- doc 1 -->
<str name="names_data">Name ABC Dev Loc</str>
<!-- doc 2 -->
<str name="names_data">Name ABC Dev Location</str>
When I execute the following query:
id:(doc1 OR doc2) AND names:Dev+Location)
Both documents are returned. I would have expected that only doc2 would have been returned based on my understanding of how Solr's StandardTokenizer works.
Why does "Dev+Location" match "Dev Loc" and "Dev Location"?
The type text_general is probably configured to use a stemmer, which is treating Loc as a variant of Location.
You could configure the type to not use a stemmer, or try searching for the whole string using names:"Dev Location"
This might be why.
This part of the query names:Dev+Location is only searching where names:Dev since the Location term does not have a field name qualifier it is searching for Location against whatever the <defaultSearchField> is set to in schema.xml
So you could try to quote the field like names:"Dev Location" or prefix it names:Dev AND names:Location

Categories