I have a Solr schema that has a "url" field:
<fieldType name="url" class="solr.TextField"
positionIncrementGap="100">
</fieldType>
<fields>
<field name="id" type="string" stored="true" indexed="true"/>
<field name="url" type="url" stored="true" indexed="false"/>
<field name="chunkNum" type="long" stored="true" indexed="false"/>
<field name="origScore" type="float" stored="true" indexed="true"/>
<field name="concept" type="string" stored="true" indexed="true"/>
<field name="text" type="text" stored="true" indexed="true"
required="true"/>
<field name="title" type="text" stored="true" indexed="true"/>
<field name="origDoctype" type="string" stored="true" indexed="true"/>
<field name="keywords" type="string" stored="true" indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
I can add SolrInputDocuments with all the fields and query them back using the text field and/or with a filter query on "concept". But when I try to query a specific url, I don't get any results. My code looks like:
SolrQuery query = new SolrQuery();
query.setQuery("url:" + ClientUtils.escapeQueryChars(url));
//query.setQuery("*:*");
//query.addFilterQuery("url:" + ClientUtils.escapeQueryChars(url));
List<Chunk> retCode = null;
try
{
QueryResponse resp = solrServer.query(query);
SolrDocumentList docs = resp.getResults();
retCode = new ArrayList<Chunk>(docs.size());
for (SolrDocument doc : docs)
{
LOG.debug("got doc " + doc);
Chunk chunk = new Chunk(doc);
retCode.add(chunk);
}
}
catch (SolrServerException e)
{
LOG.error("caught a server exception", e);
}
return retCode;
I've tried with and without the ClientUtils.escapeQueryChars and I've tried using a query of "url:" or a filter query on url. I never get anything back. Any hints?
Whats the actual type of "url"? In your schema.xml you should have a set of "fieldType" elements which list the actual Solr backing classes and filters that make up a data type.
For your "fieldType" for the "url" you are interested in the "class" attribute. E.g. the most basic free-text type has a class="solr.TextField". You might be using a type that has some wacky filters on it and Lucene/Solr ends up indexing your data differently from what you would expect.
Download Luke and look at your index visually:
http://www.getopt.org/luke/
It will help you "look" at your data - like I said, maybe its stored differently than what you expect.
Dammit, another stupid one on my part: Thanks to Cody's suggestion of using Luke, I discovered this inconvenient part of the schema:
<field name="url" type="url" stored="true" indexed="false"/>
Changing that to indexed="true" fixed the problem.
Related
I want to filter data nearby location using Solr Spatial Search but can't find any data
I want to filter data nearby location using Solr Spatial Search but can't find any data.
I have done the following changes in schema.xml:
<fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true" />
<field name="latlong" type="text_general" indexed="true" stored="false" multiValued="false"/>
<field name="location" type="location" indexed="true" stored="true" multiValued="false"/>
Query :
q=:&fq={!geofilt%20sfield=location}&pt=22.303894,70.802162&d=5
Response :
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"pt":"22.303894,70.802162",
"d":"5",
"fq":"{!geofilt sfield=location}"}},
"response":{"numFound":0,"start":0,"numFoundExact":true,"docs":[]
}}
Expected Document:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1671173790419"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"brandnd_name":["Luxury"],
"country":["India"],
"latitude":[22.2913494],
"longitude":[70.8023621],
"updated_date":["54:03.6"],
"added_date":["0018-05-14T11:19:00Z"],
"active_ind":[true],
"synxis_hotel_id":[37273],
"id":"4598a178-d984-41f1-b009-bbda37469b7c",
"_version_":1752351636887437312},
I think the title of my question explains much of what I need. I am using the Data Importer Handler of Apache SOLR 5. I configured my solrconfig.xml, schema.xml and data-config.xml. It's working for now.
However, I need to add one more field. An Oracle Blob field. First, let me show my configurations:
data-config.xml
<dataConfig>
<!-- Datasource -->
<dataSource name="myDS"
setReadOnly="true"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:#//server.example.com:1521/service_name"
user="user"
password="pass"/>
<document name="products">
<entity name="product"
dataSource="myDS"
query="select * from products"
pk="id"
processor="SqlEntityProcessor">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="price" name="price" />
<field column="store" name="store" />
<!-- I've added this blob field -->
<field column="picture" name="picture" />
</entity>
</document>
</dataConfig>
solrconfig.xml
<requestHandler name="/products" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<!-- JDBCs -->
<lib dir="../../../lib" />
My fields in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_text" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<!-- BLOB field -->
<field name="picture" type="binary" indexed="true" stored="true"/>
<copyField source="*" dest="_text"/>
<!-- ommited solr default fields -->
Now, when I start a full-importer, SOLR only indexes some records. This is the output after SOLR finish the importing:
Indexing completed. Added/Updated: 64 documents. Deleted 0 documents. (Duration: 04s)
Requests: 1 (0/s), Fetched: 1369 (342/s), Skipped: 0, Processed: 64 (16/s)
Started: less than a minute ago
As you can see, I have 1369 records, but SOLR only index 64 documents. If I remove the field picture from schema or, set index and stored attributes to false, SOLR import all documents.
I opened the SOLR log, and found this error when importing the blob field:
3436212 [Thread-19] WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [name=PRODUCTNAME, price=PRICE, store=STORE, picture=oracle.sql.BLOB#4130607a, _version_=1497915495941144576])
org.apache.solr.common.SolrException: ERROR: [doc=<ID>] Error adding field 'picture'='oracle.sql.BLOB#4130607a' msg=Illegal character .
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:263)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Illegal character .
at org.apache.solr.common.util.Base64.base64toInt(Base64.java:150)
at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:117)
at org.apache.solr.schema.BinaryField.createField(BinaryField.java:89)
at org.apache.solr.schema.FieldType.createFields(FieldType.java:305)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
... 18 more
I checked querying directly against database, and it's working fine. I am using SOLR 5, ojdbc7 and Java 8. How can I use the binary field correctly in SOLR?
Update
I've changed the properties of picture in schema.xml setting indexed=false. This way:
<!-- BLOB field -->
<field name="picture" type="binary" indexed="false" stored="true"/>
Then, I restarted SOLR, reloaded my core, and did a Full-Import again. No success and same exception. The same 64 documents that I described above was imported and the field picture does not appear in JSON response. The query I execute is:
/select?q=*%3A*&wt=json&indent=true
I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:
<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
We recently decided that we wanted to index some PDF content, so I started using curl to test some content:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"
But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:
792560 [http-8090-1] INFO org.apache.solr.update.processor.LogUpdateProcessor – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698
and the index looks like:
<doc>
<str name="id">C1-1-5</str>
<long name="_version_">1467000805262360576</long>
<arr name="content">
<str>1467000805262360576</str>
</arr>
</doc>
After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field
So I modified my schema to look something like this:
<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
And fired over this request:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"
And sure enough, the fields with camelcasing aren't being recognized:
<doc>
<arr name="museum_eventactor">
<str>test</str>
</arr>
<str name="id">C1-1-5</str>
<arr name="museumeventtype">
<str>test</str>
</arr>
<long name="_version_">1467001178833289216</long>
</doc>
Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?
In SmartGwtEE project I have hierarchy of DataSources described in .ds.xml files, here is some of them:
BaseElement_DS.ds.xml
<DataSource ID="BaseElement_DS" serverConstructor="com.isomorphic.jpa.JPADataSource"
beanClassName="lnudb.server.model.BaseElement">
<fields>
<field name="id" type="sequence" hidden="true" primaryKey="true" />
<field name="name" type="text" title="Name" required="true" />
<field name="dsId" type="text" title="Datasource" hidden="true"/>
</fields>
</DataSource>
Human_DS.ds.xml
<DataSource ID="Human_DS" serverConstructor="com.isomorphic.jpa.JPADataSource"
beanClassName="org.zasadnyy.lnudb.server.model.Human" inheritsFrom="BaseElement_DS"
useParentFieldOrder="true">
<fields>
<field name="surname" type="text" />
<field name="birthday" type="date" title="Birthday" required="false" />
</fields>
</DataSource>
Problem: when I try to get parent datasource id in code
String parentDsId = DataSource.get("Human_DS").getInheritsFrom();
ClassCastExeption is raised from inside of getInheritsFrom() method:
java.lang.ClassCastException: com.google.gwt.core.client.JavaScriptObject$ cannot be cast to java.lang.String
I will be grateful for any help.
This way you won't get the exception any longer:
String parentDsId = DataSource.get("Human_DS").getInheritsFrom() + "";
However, I'm not sure whether this is "ok" for your purposes. If this is not good for you, try to create a Javascript object and assign the before mentioned value to it. I hope this helps.
I'm using Castor to write out a map of user ID's to time intervals. I'm using it to save and resume progress in a lengthy task, and I'm trying to make the XML as compact as possible. My map is from string userID's to a class that contains the interval timestamps, along with additional transient data that I don't need to serialize.
I'm able to use a nested class mapping:
...
<field name="userIntervals" collection="map">
<bind-xml name="u">
<class name="org.exolab.castor.mapping.MapItem">
<field name="key" type="string"><bind-xml name="n" node="attribute"/></field>
<field name="value" type="my.package.TimeInterval"/>
</class>
</bind-xml>
</field>
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long"><bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long"><bind-xml name="e" node="attribute"/></field>
</class>
...
And get output that looks like:
<u n="36164639"><value s="1292750896000" e="1292750896000"/></u>
What I'd like is the name, start, and end of the user in a single node like this.
<u n="36164639" s="1292750896000" e="1292750896000"/>
But I can't seem to finagle it so the start and end attributes in the "value" go in the same node as the "key". Any ideas would be greatly appreciated.
Nash,
I think to arrange the castor mapping is bit tricky.
If you want to have structure like
<u n="36164639" s="1292750896000" e="1292750896000"/>
Then you need to create a new pojo file where it will be having
all the three fields Key,intervalStart,intervalEnd.
And let the File name as KeyTimeInterval
And map it like the below.
<field name="userIntervals" collection="map">
<class name="org.exolab.castor.mapping.MapItem">
<field name="u" type="my.package.KeyTimeInterval">
<bind-xml name="u" node="element"/>
</field>
</class>
</field>
<class name="my.package.KeyTimeInterval">
<field name="key" type="String">
<bind-xml name="n" node="attribute"/></field>
<field name="intervalStart" type="long">
<bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long">
<bind-xml name="e" node="attribute"/></field>
</class>
I think you should be able to use location on s and e. Try this:-
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long">
<bind-xml name="s" location="u" node="attribute"/>
</field>
<field name="intervalEnd" type="long">
<bind-xml name="e" location="u" node="attribute"/>
</field>
</class>
Am answering my own question here, since there is a solution that does exactly what I want, and there's actually an error in the explanation at http://www.castor.org/xml-mapping.html#Sample-3:-Using-the-container-attribute - the container attribute is exactly what's needed here.
Changing one line in the mapping:
<field name="value" type="my.package.TimeInterval" container="true"/>
did exactly what I wanted, it didn't create a subelement for the value, just mapped the fields into the existing parent element. Since then, I've used this quite a few times to map multiple-value classes into their parent.
The error of course is the documentation states you do this by setting the container attribute to false. Of course, it should be true.