How to use a CustomScoreQuery in Lucene 7.x

How to use a CustomScoreQuery in Lucene 7.x - java

I'm relativly new to Lucene and want to implement my own CustomScoreQuery since I need it for my University.
I used the Lucene demo as my starting point to index all documents in a Folder and want to score them using my own algorithm.
Here are the links to the source code of the demo.
https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/IndexFiles.html
https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/SearchFiles.html
I'm checking with Luke: Lucene Toolbox Project to see my Index which is as expected. My problem occurs accessing it.
package CustomModul;
import java.io.IOException;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Terms;
import org.apache.lucene.queries.CustomScoreProvider;
import org.apache.lucene.queries.CustomScoreQuery;
import org.apache.lucene.search.Query;
public class CountingQuery extends CustomScoreQuery {
public CountingQuery(Query subQuery) {
super(subQuery);
}
public class CountingQueryScoreProvider extends CustomScoreProvider {
String _field;
public CountingQueryScoreProvider(String field, LeafReaderContext context) {
super(context);
_field = field;
}
public float customScore(int doc, float subQueryScore, float valSrcScores[]) throws IOException {
IndexReader r = context.reader();
//getTermVector returns Null
Terms vec = r.getTermVector(doc, _field);
//*TO-DO* Algorithm
return (float)(1.0f);
}
}
protected CustomScoreProvider getCustomScoreProvider(
LeafReaderContext context) throws IOException {
return new CountingQueryScoreProvider("contents", context);
}
}
In my customScore function I access the Index like described in most Tutorials. I should get access to the Index using getTermVector but it returns NULL.
In other posts I read that this could be caused by contents being a TextField which is declared in the Lucene Demo IndexFiles.
After trying a lot of different approaches I came to the conclusion that I need help and here I am.
My Question now is if I need to adjust the Index Process (how?) or is there another way to access the Index in the ScoreProvider other then getTermVector?

I was able to solve the Problem myself and wanted to share my solution if someone finds this Question looking for answers.
The Problem was indeed caused by the contents being a TextField in
https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/IndexFiles.html
To solve this Problem one has to construct his own Field which I did replacing line 193 in said IndexFile with
FieldType myFieldType = new FieldType(TextField.TYPE_STORED);
myFieldType.setOmitNorms(true);
myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
myFieldType.setStored(false);
myFieldType.setStoreTermVectors(true);
myFieldType.setTokenized(true);
myFieldType.freeze();
Field myField = new Field("contents",
new BufferedReader(new InputStreamReader(stream,
StandardCharsets.UTF_8)),
myFieldType);
doc.add(myField);
this allows the use of getTermVector in the customScore Function. Hope this will help someone in the future.

Related

Soap Webservice Client for JAVAFX Application

I am trying to call the webservice for my application. If I call it in a sample project it is working perfectly fine. But when I merge it with My Java FX it is giving me so many errors. Web Service Client is auto generated using the Eclipse. I am trying to call the Methods only. Can Anyone help me?
Error: **Correction** I have edited it and I am using now JAVASE-15 and JVAFX-SDK 11.0.2
The package javax.xml.namespace is accessible from more than one module: java.xml, jaxrpc
Correction Update 2: I have removed Java.xml dependencies and module-info file as well.
but the new error is this
**Error: Could not find or load main class gload.Main
Caused by: java.lang.NoClassDefFoundError: javafx/application/Application**
and IF I keep the module info file it shows:
**Error occurred during initialization of boot layer
java.lang.module.FindException: Module javafx.graphics not found, required by gload**
Model:
package gload.model;
import java.io.File;
import java.io.FileInputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import javax.swing.JOptionPane;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.datacontract.schemas._2004._07.PE_PPER_MyPdmWebServiceClient_Data.CustomerItem;
import org.datacontract.schemas._2004._07.PE_PPER_MyPdmWebServiceClient_Data.Result;
import org.tempuri.IService;
import org.tempuri.ServiceLocator;
public class PdmData
{
public String scode;
public boolean state = false;
public static String CdfFile;
public static String pdflocation;
public static String Custom_Ci;
public static String Generic_Ci;
public static String Mp_ref;
public static String Interface;
public static String Comments;
public static String PersoAppli;
public static String Code;
public static String Revision;
public static String Customer_Name;
public static String Customer_reference;
public static String getCode() {
return Code;
}
public static void setCode(String code) {
Code = code;
}
public static String getRevision() {
return Revision;
}
public static void setRevision(String revision) {
Revision = revision;
}
public static String getCustomer_Name() {
return Customer_Name;
}
public static void setCustomer_Name(String customer_Name) {
Customer_Name = customer_Name;
}
public static String getCustomer_reference() {
return Customer_reference;
}
public static void setCustomer_reference(String customer_reference) {
Customer_reference = customer_reference;
}
public static String getPersoAppli() {
return PersoAppli;
}
public static void setPersoAppli(String persoAppli) {
PersoAppli = persoAppli;
}
public static String getGeneric_Ci() {
return Generic_Ci;
}
public static void setGeneric_Ci(String generic_Ci) {
Generic_Ci = generic_Ci;
}
public static String getCdfFile() {
return CdfFile;
}
public static void setCdfFile(String cdfFile) {
CdfFile = cdfFile;
}
public static String getPdflocation() {
return pdflocation;
}
public static void setPdflocation(String pdflocation) {
PdmData.pdflocation = pdflocation;
}
public String Cdffile(String reference) {
ServiceLocator locator = new ServiceLocator(); -------->Web Service Locator and call
try {
IService basicHttpBinding_IService = locator.getBasicHttpBinding_IService();
Result result = basicHttpBinding_IService.getFilebyDcode(reference);
//To download the files
String link = result.getLocation();
System.out.println(link);
File out = new File("C:\\TempDownload\\" + reference +".zip"); //Creating a zip file to store the contents of download file
new Thread(new Download(link,out)).start();
//To Unzip the file
Path source = Paths.get("C:\\TempDownload\\" + reference +".zip");
Path target = Paths.get("C:\\TempDownload\\Unzip");
try {
unzipFolder(source, target);
System.out.println("Done");
} catch (IOException e) {
e.printStackTrace();
}
//Creating a File object for directory
File directoryPath = new File("C:\\TempDownload\\Unzip\\Pre Ppc" + reference + "A_Released");
//List of all files and directories
String[] contents = directoryPath.list();
System.out.println("List of files and directories in the specified directory:");
FilenameFilter pdffilter = new FilenameFilter() {
public boolean accept(File dir, String name) {
String lowercaseName = name.toLowerCase();
if (lowercaseName.endsWith(".pdf")) {
return true;
} else {
return false;
}
}
};
String[] contents1 = directoryPath.list(pdffilter);
for(String fileName : contents1) {
System.out.println(fileName);
setCdfFile(fileName);
setPdflocation(directoryPath.toString());
}
//To extract the Data From PDF
File file = new File(getPdflocation() + "\\" + getCdfFile());
//FileInputStream fis = new FileInputStream(file);
PDDocument document = PDDocument.load(file);
PDFTextStripper pdfReader = new PDFTextStripper();
String docText = pdfReader.getText(document);
System.out.println(docText);
document.close();
//To extract details from document
String CI_Ref = "CI Ref";
int pos ;
pos = docText.indexOf(CI_Ref);
setGeneric_Ci(docText.substring(pos+7 , pos+15));
System.out.println("Generic CI: " + getGeneric_Ci());
//To get Details of CI
CustomerItem customerItem = basicHttpBinding_IService.getCiDetails(getGeneric_Ci());
setPersoAppli(customerItem.getPersoAppli());
setCode(customerItem.getCode());
setRevision(customerItem.getRevision());
setCustomer_Name(customerItem.getCustomerName());
setCustomer_reference(customerItem.getCustomerReference());
}catch (Exception e) {
e.printStackTrace();
JOptionPane.showMessageDialog(null, "Unable to reach Service : " + e.getMessage());
}
return getPersoAppli();
}
Module info file
module gload {
requires javafx.controls;
requires javafx.fxml;
requires java.desktop;
requires java.rmi;
requires java.base;
requires axis;
requires jaxrpc;
requires org.apache.pdfbox;
opens gload;
opens gload.views.main;
opens gload.utils;
opens gload.model;
opens gload.controllers;
opens org.tempuri;
opens org.datacontract.schemas._2004._07.PE_PPER_MyPdmWebServiceClient_Data;
}
and IF I keep Jaxrpc in classpath instead of module path I get error like this Description
The type javax.xml.rpc.ServiceException cannot be resolved. It is indirectly referenced from required .class files

OK, this won't really be an answer, more pointers to related issues and potential approaches to come up with solutions. But I'll post it as an answer as it is likely better to do that than lots of comments.
Unfortunately, you have multiple errors and issues, so I'll try to deal with some of them seperately.
According to:
Java FX Modular Application, Module not found (Java 11, Intellij)
The error:
Error occurred during initialization of boot layer
java.lang.module.FindException:
Module X not found, required by Y
can occur when --module-path is wrong and the module can't be found. Probably, that is at least one of your issues. The linked answer is for Idea and I don't use Eclipse, so I don't know how to resolve the issue in Eclipse, but perhaps you could do some research to find out.
Regarding:
The package javax.xml.namespace is accessible from more than one module
there is some info on what is going on here:
Eclipse is confused by imports ("accessible from more than one module").
This fix appears tricky to me. Please review the linked questions and solutions. It looks like either you need to either
Forego Java 9+ modularity OR
Manage your dependencies to not include the violating transitive dependency OR
Change to a library that doesn't rely on the broken library (probably the preferred solution in this case).
The broken library causing this issue is likely the version of jaxrpc you are using. My guess is that some of the relevant XML libraries were only added to standard Java in Java 9, but the jaxrpc library you are using was developed prior to that. So, jaxrpc either includes the XML libraries in its classes or makes use of a transitive library that does the same. This causes a conflict because the XML libraries can only be included once in the project.
Further info on your issues is in this answer:
Eclipse can't find XML related classes after switching build path to JDK 10
The info is so ugly . . . you could read the answer, it may either help or discourage you.
Some things you could do to help resolve the situation
What should be done about this is kind of tricky and will depend on your skill level and how or if you can solve it. I'll offer up some advice on some things you could do, but there are other options. You know your application better than I so you may be able to come up with better solutions for your application.
I'd advise separating these things out, just as a way of troubleshooting, get a project which works with all of the JavaFX components and one which works with all of the SOAP components and make sure they build and do what you want. Then try to combine the two projects either by integrating them into one project or running them in separate VMs with communication between the two (e.g. via an added REST API, though that is a much more complicated solution, so think hard about that before attempting it).
Also, upgrade to the latest version of JavaFX. I don't think it will fix your issue, but it can't hurt and it is possible some refinements in recent JavaFX versions may have done some things which might help ease some of your issues (though not all of them, as some of your issues stem from jaxrpc usage in a modular project, which is unrelated to JavaFX).
Also, and probably more importantly, consider using a different SOAP client framework that interacts better with modular Java 9+ than the broken implementation that jaxrpc appears to have.
In terms of whether you should make your application modular or not (include a module-info or not), I don't really know the best approach for you. Certainly, whichever way you choose you will run into issues. But, the issues and how to resolve them will be different depending on the chosen solution path (as I guess you have already discovered during the course of your investigation for the question).
If necessary, isolate the issues down to single separate issues. If you need help in resolving each separate issue post new questions that feature minimal reproducible example code to replicate the issue. Mind if you do so, that the code is absolutely minimal and also complete so that it replicates and asks about only one issue, not a combination of more than one and that the questions are appropriate tagged - e.g. if the question is about jaxrpc and modularity it should include jaxrpc and modular tags and no JavaFX code or tags (and vice versa) and certainly on pdf code or dependencies anywhere if that isn't part of the problem.

extracting all fields from a Lucene8 index

Given an index created with Lucene-8, but without knowledge of the fields used, how can I programmatically extract all the fields? (I'm aware that the Luke browser can be used interactively (thanks to #andrewjames) Examples for using latest version of Lucene. ) The scenario is that, during a development phase, I have to read indexes without prescribed schemas.
I'm using
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
The reader has methods such as:
reader.getDocCount(field);
but this requires knowing the fields in advance.
I understand that documents in the index may be indexed with different fields; I'm quite prepared to iterate over all documents and extract the fields on a regular basis (these indexes are not huge).
I'm using Lucene 8.5.* so post and tutorials based on earlier Lucene versions may not work.

You can access basic field info as follows:
import java.util.List;
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.store.FSDirectory;
public class IndexDataExplorer {
private static final String INDEX_PATH = "/path/to/index/directory";
public static void doSearch() throws IOException {
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
for (int i = 0; i < reader.numDocs(); i++) {
Document doc = reader.document(i);
List<IndexableField> fields = doc.getFields();
for (IndexableField field : fields) {
// use these to get field-related data:
//field.name();
//field.fieldType().toString();
}
}
}
}

Intellij Completion Contributor

I am developing a plugin for intellij and I want to add custom suggestions to xml editor based on a xsd. Up to now I can get required suggestions from xsd file.
I have implemented a completion contributor for xml as follows
import com.intellij.codeInsight.completion.*;
import com.intellij.codeInsight.lookup.LookupElementBuilder;
import com.intellij.patterns.PlatformPatterns;
import com.intellij.psi.xml.XmlElementType;
import com.intellij.util.ProcessingContext;
import com.intellij.lang.xml.*;
import org.jetbrains.annotations.NotNull;
public class SimpleCompletionContributor extends CompletionContributor {
public SimpleCompletionContributor() {
extend(CompletionType.BASIC,PlatformPatterns.psiElement(XmlElementType.XML_ATTRIBUTE_VALUE).withLanguage(XMLLanguage.INSTANCE),
new CompletionProvider<CompletionParameters>() {
public void addCompletions(#NotNull CompletionParameters parameters,
ProcessingContext context,
#NotNull CompletionResultSet resultSet) {
resultSet.addElement(LookupElementBuilder.create("Hello"));
}
}
);
}
}
but this did not provide any suggestion. but when I implement custom language it works. My objective is to view the context of the cursor position and provide suggestion based on it. as an example when user starts a tag on xml file plugin should provide attributes as code completion. I'm new to this Custom language.
So can anyone help me with this completion contributor?

finally i found a way to solve this problem
here is my code
import com.intellij.codeInsight.completion.*;
import com.intellij.codeInsight.lookup.LookupElementBuilder;
import com.intellij.patterns.PlatformPatterns;
import com.intellij.util.ProcessingContext;
import org.jetbrains.annotations.NotNull;
public class ScalaXMLCompletionContributor extends CompletionContributor {
public ScalaXMLCompletionContributor() {
final RelativeNodes rlt = new RelativeNodes();//this is a class to get siblings and children from a sample xml file generated by a given xsd
/*if the parameter position is an xml attribute provide attributes using given xsd*/
extend(CompletionType.BASIC,
PlatformPatterns.psiElement(), new CompletionProvider<CompletionParameters>() {
public void addCompletions(#NotNull CompletionParameters parameters,//completion parameters contain details of the curser position
ProcessingContext context,
#NotNull CompletionResultSet resultSet) {//result set contains completion details to suggest
if (parameters.getPosition().getContext().toString() == "XmlAttribute") {//check whether scala text editors position is an xml attribute position eg: <name |
try {
String[] suggestions = rlt.getAttribute(parameters.getPosition().getParent().getParent().getFirstChild().getNextSibling().getText().replaceFirst("IntellijIdeaRulezzz", ""));//extract text from completion parameter and get required suggestions from RelativeNodes
int i = 0;
do {
resultSet.addElement(LookupElementBuilder.create(suggestions[i]));//add suggestions to resultset to suggest in editor
i++;
} while (suggestions[i] != null);
} catch (NullPointerException e) {
}
}
}
}
);
}
}
in this case we can obtain cursor position and tokens related to curser position by completion parameters and we can inject suggestions using cpmpletion resultset. this can be implemented in scala language too.
to register completion contributor in plugin xml
<extensions defaultExtensionNs="com.intellij">
<completion.contributor language="Scala" implementationClass="com.hsr.ScalaXMLCompletionContributor"/>
</extensions>

JavaDoc for com.intellij.codeInsight.completion.CompletionContributor contains FAQ.
The last question addresses debugging not working completion.
In my case issue was language="Java", whereas all caps expected.

Retrieve code executed by function in Java

I'm trying to analyse some bits of Java-code, looking if the code is written too complexly. I start with a String containing the contents of a Java-class.
From there I want to retrieve, given a function-name, the "inner code" by that function. In this example:
public class testClass{
public int testFunction(char x) throws Exception{
if(x=='a'){
return 1;
}else if(x=='{'){
return 2;
}else{
return 3;
}
}
public int testFunctionTwo(int y){
return y;
}
}
I want to get, when I call String code = getcode("testFunction");, that code contains if(x=='a'){ ... return 3; }. I've made the input code extra ugly, to demonstrate some of the problems one might encounter when doing character-by-character-analysis (because of the else if, the curly brackets will no longer match, because of the Exception thrown, the function declaration is not of the form functionName{ //contents }, etc.)
Is there a solid way to get the contents of testFunction, or should I implement all problems described manually?

You need to a java parser. I worked too with QDox. it is easy to use. example here:
import com.thoughtworks.qdox.JavaProjectBuilder;
import com.thoughtworks.qdox.model.JavaClass;
import com.thoughtworks.qdox.model.JavaMethod;
import java.io.File;
import java.io.IOException;
public class Parser {
public void parseFile() throws IOException {
File file = new File("/path/to/testClass.java");
JavaProjectBuilder builder = new JavaProjectBuilder();
builder.addSource(file);
for (JavaClass javaClass : builder.getClasses()) {
if (javaClass.getName().equals("testClass")) {
for (JavaMethod javaMethod : javaClass.getMethods()) {
if (javaMethod.getName().equals("testMethod")) {
System.out.println(javaMethod.getSourceCode());
}
}
}
}
}
}

Have you considered using a parser to read your code? There are a lot of parsers out there, the last time I worked on a problem like this http://qdox.codehaus.org made short work of these kinds of problems.

Weka: Supervised Discretize issue and error during attribute selection "Not enough training instances"

I've been learning the Weka API on my own for the past month or so (I'm a student). What I am doing is writing a program that will filter a specific set of data and eventually build a bayes net for it, and a week ago I had finished my discretization class and attribute selection class. Just a few days ago I realized that I needed to change my discretization function to supervised and ended up using the default Fayyad & Irani method, after I did this I began to get this error in my attribute selection class:
Exception in thread "main" weka.core.WekaException:
weka.attributeSelection.CfsSubsetEval: Not enough training instances with class labels (required: 1, provided: 0)!
at weka.core.Capabilities.test(Capabilities.java:1138)
at weka.core.Capabilities.test(Capabilities.java:1023)
at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
at weka.attributeSelection.CfsSubsetEval.buildEvaluator(CfsSubsetEval.java:331)
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:597)
at weka.filters.supervised.attribute.AttributeSelection.batchFinished(AttributeSelection.java:456)
at weka.filters.Filter.useFilter(Filter.java:663)
at AttributeSelectionFilter.selectionFilter(AttributeSelectionFilter.java:29)
at Runner.main(Runner.java:70)
My attribute selection before the change worked just fine, so I think that I may have done something wrong in my discretize class. My other part of this question relates to that, because I also noticed that my discretize class does not appear to really be discretizing the data; it's just putting all the numeric data into ONE range, not binning it strategically like the Fayyad & Irani should.
Here is my discretize class:
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.attribute.Discretize;
import weka.filters.unsupervised.attribute.NumericToNominal;
public class DiscretizeFilter
{
private Instances data;
private boolean sensitiveOption;
private Filter filter = new Discretize();
public DiscretizeFilter(Instances data, boolean sensitiveOption)
{
this.data = data;
this.sensitiveOption = sensitiveOption;
}
public Instances discreteFilter() throws Exception
{
NumericToNominal nm = new NumericToNominal();
nm.setInputFormat(data);
Filter.useFilter(data, nm);
Instances nominalData = nm.getOutputFormat();
if(sensitiveOption)//if the user wants extra sensitivity
{
String options[] = new String[1];
options[0] = options[0];
options[2] = "-E";
((Discretize) filter).setOptions(options);
}
filter.setInputFormat(nominalData);
Filter.useFilter(nominalData,filter);
return filter.getOutputFormat();
}
}
Here is my attribute selection class:
import weka.attributeSelection.BestFirst;
import weka.attributeSelection.CfsSubsetEval;
import weka.core.Instances;
import weka.filters.supervised.attribute.AttributeSelection;
public class AttributeSelectionFilter
{
public Instances selectionFilter(Instances data) throws Exception
{
AttributeSelection filter = new AttributeSelection();
for(int i = 0; i < data.numInstances(); i++)
{
filter.input(data.instance(i));
}
CfsSubsetEval eval = new CfsSubsetEval();
BestFirst search = new BestFirst();
filter.setSearch(search);
filter.setEvaluator(eval);
filter.setInputFormat(data);
AttributeSelection.useFilter(data, filter);
return filter.getOutputFormat();
}
public int attributeCounter(Instances data)
{
return data.numAttributes();
}
}
Any help would be greatly appreciated!!!

Internally Weka stores attribute values as doubles. It appears that an exception was thrown because every single instance in your dataset (data) is "missing a class", i.e. was given an internal class attribute value NaN ("not a number") for whatever reason. I would recommend to double-check if data's class attribute was created/set correctly.

I figured it out, it was my mistake of misunderstanding the description of the method "outputFormat()" in the Discretize class. I instead got the filtered instances from the useFilter() and that solved my problems! I was just giving the attribute selection filter the wrong type of data.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.