Searching Images in Solr 3.3.0 - java

I am working with Solr 3.3.0 version and I need to index and search Images. I m able to index the image files but it fails to retrieve them as search results.
Can anyone of you help me out with this.
My data-config.xml is :
<dataConfig>
<dataSource type="BinFileDataSource" name="bin"/>
<document>
<entity name="f" processor="FileListEntityProcessor" recursive="true"
rootEntity="false"
dataSource="null" baseDir="C:/Files"
fileName=".*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)" onError="skip">
<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<field column="text" name="text"/>
<field column="id" name="id"/>
<field column="Keywords" name="keywords" meta="true"/>
</entity>
<field column="file" name="fileName"/>
<field column="fileAbsolutePath" name="links"/>
</entity>
</document>
</dataConfig>
This works fine for types other than images, I am not able to get images in search result

Ans) you have to search in solr admin page using the query in URL after indexed..
when ever you images are indexed it will display in Overview tab of solr admin ..
suppose it is not displaying the documents and time in solr web pag**e the image is not indexed still iam searching the same .....

Related

Castor JDO / OQL query returning duplicate objects and missing some dependants

I have this issue when querying certain objects (persisting in Postgres database)
using Castor 0.9.6 (can't upgrade)
Problem only occurs on production, not dev machine. Despite using same database.
udOql= db.getOQLQuery( "SELECT DISTINCT ud FROM model.objects.UserDefinition ud WHERE ud.siteId = $1");
udOql.bind(other.getSiteId());
This normally is working fine but for some reason I am getting duplicate objects, even when I include DISTINCT
Mapping XML is as follows:
<class name="model.objects.UserDefinition" identity="userId" key-generator="HIGH-LOW">
<cache-type type="unlimited" capacity="1000"/>
<map-to table="userdefinition" xml="userDefinition" />
<field name="siteId" type="java.lang.Integer">
<sql name="siteid" type="integer" />
</field>
<field name="userAttributeDefinition" type="model.objects.UserAttributeDefinition" collection="vector">
<sql many-key="userid" />
</field>
<field name="roleAssignmentDefinition" type="model.objects.RoleAssignmentDefinition" collection="vector">
<sql many-key="userid" />
</field>
</class>
Not only will it return multiple UserDefinition objects, some of them are missing some linked roleAssignmentDefinition or userAttributeDefinition. Usually after restarting tomcat it fixes the problem but then it comes back. Sometimes the issue the issue is there from the start.
The database holds correct data. Any tips, pointers much appreciated

How to pass customized parameters to SOlR DIH query

I have scenario where i need to pass customized parameters to solr data import query.
Ex- select * from customer where last_updated_date >=last_updated_indexed_date
The last_updated_indexed_date is coming from another table which has details about core.
How can I pass that last_indexed_updated_date in DIH query.
The data-config can be configured something like below :
<dataConfig>
<dataSource name="ds-db" driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:#127.0.0.1:1521:test" user="dev" password="dev" />
<dataSource name="ds-file" type="BinFileDataSource" />
<document name="documents">
<entity name="book" dataSource="ds-db"
query="select distinct
book.id as id,
book.title,
book.author,
book.publisher,
from Books book
where book.book_added_date >= to_date($ {dataimporter.request.lastIndexDate}, 'DD/MM/YYYY HH24:MI:SS')))"
transformer="DateFormatTransformer">
<field column=”id” name=”id” />
<field column=”title” name=”title” />
<field column=”author” name=”author” />
<field column=”publisher” name=”publisher” />
<entity name=”content” query=”select description from content
where content_id='${book.Id}' ”>
<field column=”description” name=”description” />
</entity>
</entity>
</document>
</dataConfig>
The way here '${book.Id}' is retrieved and passed to another query. You will also need to work upon something similar for the last_indexed_updated_date in your data-config.xml. if you don't have the same in your tables. You can try the same passing to the data import url like lastIndexDate(Please refer the below data import url.)
The data import url will be be like
http://localhost:8080/solr/admin/select/?qt=/dataimport&command=full-import&clean=false&commit=true&lastIndexDate='08/05/2011 20:16:11'

Indexing data from CSV file in Apache Solr

I have followed the tutorials given in below link
Indexing csv file in solr
I have configured solr server in my local and
But when i try to post csv file using the below command
java -Dtype=text/csv -Durl=http://localhost:8983/solr/jcg/update -jar post.jar books.csv
I am getting below error response in command prompt
Any one help why i am getting the error response
Error:
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">111</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name=
"msg">ERROR: [doc=0553573403] unknown field 'cat'</str><int name="code">400</int></lst>
</response>
The tutorial you're following is quite old, so if you want follow the tutorial there are two options:
the tutorial you're following misses the creation of Schema configuration in jcg collection. In this case you should fix your managed-schema file to jcg configuration taking care to add the following fields as suggested in the Tutorial, then reload the configuration (or restart Solr). At this point the "Indexing the Data" step should work correctly.
<uniqueKey>id</uniqueKey>
<!-- Fields added for books.csv load-->
<field name="cat" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="price" type="tdouble" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
You have correctly added schema and fields but not reloaded the configuration, but not reloaded the collection. So just reload the configuration (or restart Solr) and continue with the tutorial.
On the other hand, if you're using an earlier version of Solr (6.4) I suggest to delete jcg collection and create it again:
bin/solr delete -c jcg
bin/solr create -c jcg -d ./server/solr/configsets/sample_techproducts_configs

How can I create a blob index field correctly with Solr 5?

I think the title of my question explains much of what I need. I am using the Data Importer Handler of Apache SOLR 5. I configured my solrconfig.xml, schema.xml and data-config.xml. It's working for now.
However, I need to add one more field. An Oracle Blob field. First, let me show my configurations:
data-config.xml
<dataConfig>
<!-- Datasource -->
<dataSource name="myDS"
setReadOnly="true"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:#//server.example.com:1521/service_name"
user="user"
password="pass"/>
<document name="products">
<entity name="product"
dataSource="myDS"
query="select * from products"
pk="id"
processor="SqlEntityProcessor">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="price" name="price" />
<field column="store" name="store" />
<!-- I've added this blob field -->
<field column="picture" name="picture" />
</entity>
</document>
</dataConfig>
solrconfig.xml
<requestHandler name="/products" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<!-- JDBCs -->
<lib dir="../../../lib" />
My fields in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_text" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<!-- BLOB field -->
<field name="picture" type="binary" indexed="true" stored="true"/>
<copyField source="*" dest="_text"/>
<!-- ommited solr default fields -->
Now, when I start a full-importer, SOLR only indexes some records. This is the output after SOLR finish the importing:
Indexing completed. Added/Updated: 64 documents. Deleted 0 documents. (Duration: 04s)
Requests: 1 (0/s), Fetched: 1369 (342/s), Skipped: 0, Processed: 64 (16/s)
Started: less than a minute ago
As you can see, I have 1369 records, but SOLR only index 64 documents. If I remove the field picture from schema or, set index and stored attributes to false, SOLR import all documents.
I opened the SOLR log, and found this error when importing the blob field:
3436212 [Thread-19] WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [name=PRODUCTNAME, price=PRICE, store=STORE, picture=oracle.sql.BLOB#4130607a, _version_=1497915495941144576])
org.apache.solr.common.SolrException: ERROR: [doc=<ID>] Error adding field 'picture'='oracle.sql.BLOB#4130607a' msg=Illegal character .
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:263)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Illegal character .
at org.apache.solr.common.util.Base64.base64toInt(Base64.java:150)
at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:117)
at org.apache.solr.schema.BinaryField.createField(BinaryField.java:89)
at org.apache.solr.schema.FieldType.createFields(FieldType.java:305)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
... 18 more
I checked querying directly against database, and it's working fine. I am using SOLR 5, ojdbc7 and Java 8. How can I use the binary field correctly in SOLR?
Update
I've changed the properties of picture in schema.xml setting indexed=false. This way:
<!-- BLOB field -->
<field name="picture" type="binary" indexed="false" stored="true"/>
Then, I restarted SOLR, reloaded my core, and did a Full-Import again. No success and same exception. The same 64 documents that I described above was imported and the field picture does not appear in JSON response. The query I execute is:
/select?q=*%3A*&wt=json&indent=true

Trying to serialize an object compactly using Castor

I'm using Castor to write out a map of user ID's to time intervals. I'm using it to save and resume progress in a lengthy task, and I'm trying to make the XML as compact as possible. My map is from string userID's to a class that contains the interval timestamps, along with additional transient data that I don't need to serialize.
I'm able to use a nested class mapping:
...
<field name="userIntervals" collection="map">
<bind-xml name="u">
<class name="org.exolab.castor.mapping.MapItem">
<field name="key" type="string"><bind-xml name="n" node="attribute"/></field>
<field name="value" type="my.package.TimeInterval"/>
</class>
</bind-xml>
</field>
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long"><bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long"><bind-xml name="e" node="attribute"/></field>
</class>
...
And get output that looks like:
<u n="36164639"><value s="1292750896000" e="1292750896000"/></u>
What I'd like is the name, start, and end of the user in a single node like this.
<u n="36164639" s="1292750896000" e="1292750896000"/>
But I can't seem to finagle it so the start and end attributes in the "value" go in the same node as the "key". Any ideas would be greatly appreciated.
Nash,
I think to arrange the castor mapping is bit tricky.
If you want to have structure like
<u n="36164639" s="1292750896000" e="1292750896000"/>
Then you need to create a new pojo file where it will be having
all the three fields Key,intervalStart,intervalEnd.
And let the File name as KeyTimeInterval
And map it like the below.
<field name="userIntervals" collection="map">
<class name="org.exolab.castor.mapping.MapItem">
<field name="u" type="my.package.KeyTimeInterval">
<bind-xml name="u" node="element"/>
</field>
</class>
</field>
<class name="my.package.KeyTimeInterval">
<field name="key" type="String">
<bind-xml name="n" node="attribute"/></field>
<field name="intervalStart" type="long">
<bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long">
<bind-xml name="e" node="attribute"/></field>
</class>
I think you should be able to use location on s and e. Try this:-
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long">
<bind-xml name="s" location="u" node="attribute"/>
</field>
<field name="intervalEnd" type="long">
<bind-xml name="e" location="u" node="attribute"/>
</field>
</class>
Am answering my own question here, since there is a solution that does exactly what I want, and there's actually an error in the explanation at http://www.castor.org/xml-mapping.html#Sample-3:-Using-the-container-attribute - the container attribute is exactly what's needed here.
Changing one line in the mapping:
<field name="value" type="my.package.TimeInterval" container="true"/>
did exactly what I wanted, it didn't create a subelement for the value, just mapped the fields into the existing parent element. Since then, I've used this quite a few times to map multiple-value classes into their parent.
The error of course is the documentation states you do this by setting the container attribute to false. Of course, it should be true.

Categories