How to get started on searching through bean properties with Apache Lucene? - java

I'm trying out full text search engine frameworks for a Java EE framework with JPA and don't want to switch to Hibernate which offers the quite neat Hibernate Search feature, so I'm starting with Apache Lucene, now.
I'd like to search through String fields of JPA entities (after creating an index for them, i.e. writer/reader example). I'll use an EJB wrapping the persistence layer to keep the index up-to-date. I assume it's irrelevant that I'm using JPA and Java EE.
Since Apache projects don't have a policy to keep their documentation up-to-date at all or at least mark them outdated most of the examples at https://wiki.apache.org/lucene-java/TheBasics and similar sites don't work with because classes have and methods have been removed. The same goes for blog posts found via search engine. It's possible to find them, but anything one finds needs to be tried out because there's ca. 90% change that one figures out that the example refers to classes or methods which no longer exist...
I'm looking for any example showing the above use case with an up-to-date version of Lucene which is 6.5.0 afaik.

I am not sure what all has changed in 6.5 but below are codes for Lucene 6.0.0 and compile / run with 6.5.0 too.
IndexCreation
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.FSDirectory;
public class IndexCreator {
public static IndexWriter getWriter() throws IOException{
File indexDir = new File("D:\\Experiment");
SimpleAnalyzer analyzer = new SimpleAnalyzer();
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir
.toPath()), indexWriterConfig);
indexWriter.commit();
return indexWriter;
}
}
Now you can use this writer to Index your documents using writer.updateDocument(...) writer.addDocument(...) methods.
Fields can be added to document like below,
doc.add(new Field("NAME", "Tom", new FieldType(TextField.TYPE_STORED)));
Searching
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.FSDirectory;
public class LuceneSearcher {
public static void searchIndex() throws IOException{
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("D:\\Experiment")));
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs hits = searcher.search(new WildcardQuery(new Term("NAME", "*")), 20);
if (null == hits.scoreDocs || hits.scoreDocs.length <= 0) {
System.out.println("No Hits1 Found");
return;
}
System.out.println(hits.scoreDocs.length + " hits1 Docs found !!");
for (ScoreDoc hit : hits.scoreDocs) {
Document doc = searcher.doc(hit.doc);
}
reader.close();
}
}
Searcher code assumes that you have an indexed document with NAME as field name.
I think, this should be enough to get you started.
Let me know if you need anything else.
I have these maven dependencies,
<dependencies>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>6.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>6.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>6.5.0</version>
</dependency>
</dependencies>

Related

Autofilter excel using java poi

I am trying to filter an excel sheet using POI class and to set the filter when i used this :
CTAutoFilter sheetFilter = my_sheet.getCTWorksheet().getAutoFilter();
CTFilterColumn myFilterColumn = sheetFilter.insertNewFilterColumn(0);
got below mentioned error on "CTFilterColumn".
Multiple markers at this line
- The method insertNewFilterColumn(int) from the type CTAutoFilter refers to the
missing type CTFilterColumn
- The type org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn
cannot be resolved. It is indirectly referenced from required .class files
- CTFilterColumn cannot be resolved to a type
Entire code :
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.hssf.util.*;
import org.apache.poi.ss.usermodel.ComparisonOperator;
import org.apache.poi.ss.usermodel.IndexedColors;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
public class Test1 {
public static void main(String[] args) {
try {
//To read values and enable auto filter
FileInputStream fileIn = new FileInputStream("./XMLs/Test001.xlsx");
XSSFWorkbook my_workbook = new XSSFWorkbook(fileIn);
XSSFSheet my_sheet = my_workbook.getSheet("Sheet1");
CTAutoFilter sheetFilter = my_sheet.getCTWorksheet().getAutoFilter();
CTFilterColumn myFilterColumn = sheetFilter.insertNewFilterColumn(0);
Instead of this "import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;" i tried import org.openxmlformats.schemas.spreadsheetml.x2006.main.*;
But CTFilterColumn is not even listed in suggestions. Please help.
Is it because of some missing jar files, please help.
Promoting a comment to an answer
This is covered in this Apache POI FAQ Entry - I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something". The key section is:
There are two jar files available, as described in the components overview section. The full jar of all of the schemas is ooxml-schemas-1.3.jar, and it is currently around 15mb. The smaller poi-ooxml-schemas jar is only about 4mb. This latter jar file only contains the typically used parts though.
Many users choose to use the smaller poi-ooxml-schemas jar to save space. However, the poi-ooxml-schemas jar only contains the XSDs and classes that are typically used, as identified by the unit tests. Every so often, you may try to use part of the file format which isn't included in the minimal poi-ooxml-schemas jar. In this case, you should switch to the full ooxml-schemas-1.3.jar. Longer term, you may also wish to submit a new unit test which uses the extra parts of the XSDs, so that a future poi-ooxml-schemas jar will include them.
So, short term you need to swap out your small poi-ooxml-schemas jar for the full ooxml-schemas-1.3 jar. Longer term, if you submit a unit test to Apache POI which uses these extra classes, it'll be included in a future small schema jar.
Maven artifact details for the full schema are covered here on the Apache POI site

Java jena fuseki set OntModelSpec pellet reasoner

the following code is used to send rdf data to a sparql endpoint.
It has worked fine until i've tried to add a reasoner to the OntoModel.
Now the compiler says:
"cannot convert from com.hp.hpl.jena.ontology.OntModelspec to org.apache.jena.ontology.OntModelSpec".
So my question is, what i have to edit to let it works?
(I know that the problem is obviusly in "PelletReasonerFactory.THE_SPEC" which is not from com.hp.hpl..., so is there something similar to this one, which also come from org.apache.jena... ?)
package services;
import org.apache.jena.query.DatasetAccessor;
import org.apache.jena.query.DatasetAccessorFactory;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.QuerySolution;
import org.apache.jena.query.ResultSet;
import org.apache.jena.query.ResultSetFormatter;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.RDFNode;
import org.apache.jena.ontology.OntModel;
import org.mindswap.pellet.jena.PelletReasonerFactory;
import org.apache.jena.ontology.OntModelSpec;
class FusekiExample {
public void addRDF(File rdf, String serviceURI){
throws IOException {
// the next commented line is the old working version...
//Model m = ModelFactory.createDefaultModel();
//these lines are the modified version which doesn't work.
OntModelSpec oms = PelletReasonerFactory.THE_SPEC;
OntModel m = ModelFactory.createOntologyModel(oms);
...
}
It looks like your PelletReasoner is very old and still uses the old jena libraries and not the newest one.
You need to find a newer version of your reasoner to work with the current jena or you need to work with an older jena version.

NodeBuilder not found in the ElasticSearch API application

I am trying to implement the Elasticsearch API. I have errors with the system accepting nodeBuilder. Here is the code -
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.Client;
//import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.*;
import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.node.NodeBuilder.*;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
// on startup
Node node = nodeBuilder().node(); // nodeBuilder not recognised.
Client client = node.client();
// on shutdown
node.close();
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.2.0</version>
</dependency>
Client is recognised. Any ideas?
ClientBuilder is removed in ES API 5.
You can use org.elasticsearch.common.settings.Settings.builder() which will give you an instance of Builder.
Exp :
Settings.Builder elasticsearchSettings =
Settings.builder()
.put("http.enabled", "true")
.put("index.number_of_shards", "1")
.put("path.data", new File(tmpDir, "data").getAbsolutePath())
.put("path.logs", new File(tmpDir, "logs").getAbsolutePath())
.put("path.work", new File(tmpDir, "work").getAbsolutePath())
.put("path.home", tmpDir);
NodeBuilder has been removed. While using Node directly within an application is not officially supported, it can still be constructed with the Node(Settings) constructor.
Refer- https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_java_api_changes.html
org.elasticsearch.node.Node node = org.elasticsearch.node.NodeBuilder.nodeBuilder().node();
I have some dependencies also using node and hence the system could not resolve the elasticsearch one. The qualification resolved it.

ElasticSearch Lucene UnicodeUtil not found

I'm trying to work with ElasticSearch with Java
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
public class EST {
public static void main(String[] args){
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("10.154.12.180", 9200));
Map<String, Object> json = new HashMap<String, Object>();
json.put("user","kimchy");
json.put("postDate",new Date());
json.put("message","trying out Elasticsearch");
IndexResponse response = client.prepareIndex("twitter", "tweet")
.setSource(json)
.execute()
.actionGet();
client.close();
}
}
and added elasticssearch, lucene-core, lucene-queryparser, lucene-analyzers-common and lucene-demo libraries and after run I'm getting NoSuchMethodException
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(Ljava/lang/CharSequence;IILorg/apache/lucene/util/BytesRef;)V
at org.elasticsearch.common.Strings.toUTF8Bytes(Strings.java:1529)
at org.elasticsearch.common.Strings.toUTF8Bytes(Strings.java:1525)
at org.elasticsearch.search.facet.filter.InternalFilterFacet.<clinit>(InternalFilterFacet.java:40)
at org.elasticsearch.search.facet.TransportFacetModule.configure(TransportFacetModule.java:39)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:188)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:118)
at estest.EST.main(EST.java:17)
Coincidence is that I just encountered this problem right now - while googling it, I found your question - Google is indeed amazingly fast a indexing, just 6 hours .
Here's how to fix it:
import lucene-core-4.9.0.jar (using maven, gradle or dropping it in your classpath)
the version (probably 4.10) you are using has a different method signature. ES however is linked against 4.9.
To avoid problems with the compatibility between the java client and ES it`s best to just use the jars delivered by the ES *.zip in the bin folder.

Drools 6.0 dynamically load rules at runtime

I want to load a drl file at runtime. The posts I've found including this one work for version 5.0 but I can't figure out how to do it for drools version 6.0.
In Drools 6, your rules packages are deployed to Maven. A KieScanner is provided, which you can attach to your KieContainer. This polls your repository at a defined interval to see whether a package has been updated and downloads the latest if that's the case.
A full explanation of how to define a KieScanner (including code samples) is provided in the Drools documentation here:
https://docs.jboss.org/drools/release/latest/drools-docs/html/ch04.html
I used info taken from those two docs:
https://docs.jboss.org/drools/release/6.0.1.Final/drools-docs/html_single/#d0e109
https://github.com/droolsjbpm/drools/blob/master/drools-compiler/src/test/java/org/drools/compiler/CommonTestMethodBase.java
I've came out with this snippet that loads rules defined in the /drl/file/path file into the stateful session you obtain at the last line.
File path = new File("/drl/file/path");
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
kbuilder.add(ResourceFactory.newFileResource(path), ResourceType.DRL);
if (kbuilder.hasErrors()) {
throw new RuntimeException("Errors: " + kbuilder.getErrors());
}
kbase = KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages(kbuilder.getKnowledgePackages());
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
Some methods are deprecated, so, don't expect this solution to be valid in the following releases.
Please, double check the imports, they are all from org.kie, not from drools packages. I admit those imports are too much, but I'm pasting the code from an example I'm trying to develop, so I have more things on my code, sorry for that.
import java.io.File;
import org.kie.api.KieServices;
import org.kie.api.builder.KieBuilder;
import org.kie.api.builder.KieFileSystem;
import org.kie.api.builder.KieScanner;
import org.kie.api.builder.ReleaseId;
import org.kie.api.builder.model.KieBaseModel;
import org.kie.api.builder.model.KieModuleModel;
import org.kie.api.builder.model.KieSessionModel;
import org.kie.api.conf.EqualityBehaviorOption;
import org.kie.api.conf.EventProcessingOption;
import org.kie.api.io.Resource;
import org.kie.api.io.ResourceType;
import org.kie.api.runtime.KieContainer;
import org.kie.api.runtime.conf.ClockTypeOption;
import org.kie.internal.KnowledgeBase;
import org.kie.internal.KnowledgeBaseFactory;
import org.kie.internal.builder.KnowledgeBuilder;
import org.kie.internal.builder.KnowledgeBuilderFactory;
import org.kie.internal.io.ResourceFactory;
import org.kie.internal.runtime.StatefulKnowledgeSession;
Hope it helps.

Categories