Trying to figure out how to embody terrier Indexing and Retrieval in my app but cant even run properly the documentation demo (https://github.com/terrier-org/terrier-core/blob/5.x/doc/quickstart-integratedsearch.md)
I know there are some serious updates now but feels like mess to me.
Can anyone help?
This is the most recent example you can find***
import java.io.File;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Iterator;
import org.terrier.indexing.Document;
import org.terrier.indexing.TaggedDocument;
import org.terrier.indexing.tokenisation.Tokeniser;
import org.terrier.querying.LocalManager;
import org.terrier.querying.Manager;
import org.terrier.querying.ManagerFactory;
import org.terrier.querying.ScoredDoc;
import org.terrier.querying.ScoredDocList;
import org.terrier.querying.SearchRequest;
import org.terrier.realtime.memory.MemoryIndex;
import org.terrier.utility.ApplicationSetup;
import org.terrier.utility.Files;
public class IndexingAndRetrievalExample {
public static void main(String[] args) throws Exception {
// Directory containing files to index
String aDirectoryToIndex = "/my/directory/containing/files/";
// Configure Terrier
ApplicationSetup.setProperty("indexer.meta.forward.keys", "docno");
ApplicationSetup.setProperty("indexer.meta.forward.keylens", "30");
// Create a new Index
MemoryIndex memIndex = new MemoryIndex();
// For each file
for (String filename : new File(aDirectoryToIndex).list() ) {
String fullPath = aDirectoryToIndex+filename;
// Convert it to a Terrier Document
Document document = new TaggedDocument(Files.openFileReader(fullPath), new HashMap(), Tokeniser.getTokeniser());
// Add a meaningful identifier
document.getAllProperties().put("docno", filename);
// index it
memIndex.indexDocument(document);
}
// Set up the querying process
ApplicationSetup.setProperty("querying.processes", "terrierql:TerrierQLParser,"
+ "parsecontrols:TerrierQLToControls,"
+ "parseql:TerrierQLToMatchingQueryTerms,"
+ "matchopql:MatchingOpQLParser,"
+ "applypipeline:ApplyTermPipeline,"
+ "localmatching:LocalManager$ApplyLocalMatching,"
+ "filters:LocalManager$PostFilterProcess");
// Enable the decorate enhancement
ApplicationSetup.setProperty("querying.postfilters", "org.terrier.querying.SimpleDecorate");
// Create a new manager run queries
Manager queryingManager = ManagerFactory.from(memIndex.getIndexRef());
// Create a search request
SearchRequest srq = queryingManager.newSearchRequestFromQuery("search for document");
// Specify the model to use when searching
srq.setControl(SearchRequest.CONTROL_WMODEL, "BM25");
// Enable querying processes
srq.setControl("terrierql", "on");
srq.setControl("parsecontrols", "on");
srq.setControl("parseql", "on");
srq.setControl("applypipeline", "on");
srq.setControl("localmatching", "on");
srq.setControl("filters", "on");
// Enable post filters
srq.setControl("decorate", "on");
// Run the search
queryingManager.runSearchRequest(srq);
// Get the result set
ScoredDocList results = srq.getResults();
// Print the results
System.out.println("The top "+results.size()+" of documents were returned");
System.out.println("Document Ranking");
for(ScoredDoc doc : results) {
int docid = doc.getDocid();
double score = doc.getScore();
String docno = doc.getMetadata("docno")
System.out.println(" Rank "+i+": "+docid+" "+docno+" "+score);
}
}
}
Indexing part seems to be okay.
This lines of retrieval part seem problematic.
Manager queryingManager = ManagerFactory.from(memIndex.getIndexRef());
cursor message: Cannot resolve method 'getIndexRef' in 'MemoryIndex
srq.setControl(SearchRequest.CONTROL_WMODEL, "BM25");
cursor message: Cannot resolve symbol 'CONTROL_WMODEL'
ScoredDocList results = srq.getResults();
cursor message: Cannot resolve method 'getResults' in 'SearchRequest'
I think the problem is that there are new ways to do this and some methods are now deprecated.
Could anyone try this code and see if it works?
It is a Maven project.
These are the dependencies :
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>5.5</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>5.4</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>5.1</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-realtime</artifactId>
<version>5.1</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>4.2</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-batch-indexers</artifactId>
<version>5.4</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-batch-retrieval</artifactId>
<version>5.4</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-index-api</artifactId>
<version>5.5</version>
</dependency>
</dependencies>
Related
I have trying selenium code with #Test, but eclipse is not running it and asking for Main method.
I've added Maven Project and added Selenium & TestNG dependencies to pom.xml
Please help with the issue I'm facing
Sample code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Properties;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
public class BaseTest {
#Test
public void openBrowser() {
File file = new File("./src/test/resources/config.properties");
FileInputStream fileInput = null;
try {
fileInput = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("File not found");
}
Properties prop = new Properties();
//load properties file
try {
prop.load(fileInput);
} catch (IOException e) {
e.printStackTrace();
System.out.println("Values not found");
}
System.setProperty("webdriver.chrome.driver","./src/test/resources/drivers/chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get(prop.getProperty("baseURL"));
}
}
On running it,I'm facing
Error: Main method not found in class com.digivalsolutions.digiassess.LoginTest, please define the main method as:
public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
My pom.xml file:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>7.3.0</version>
<scope>test</scope>
</dependency>
I think you should install TestNG for your IDE. You can check this link for more details.
I am using geotools and the jts class BufferOP to create a buffer around my geometries. During tesdting I came along a weard result with point geometries. If i set capstyle to flat, my result is always an emty polygon.
Lines and Polygons are working. Only points seems ti have this kind of issue.
If i change it to round or square parameter, I get the expected result.
I am using geotools snapshot 21 with maven and Jave 8.
here is the maven pom file snipped I've been using and the code example
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<geotools.version>21-SNAPSHOT</geotools.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-geometry</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-epsg-hsql</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-epsg-wkt </artifactId>
<version>${geotools.version}</version>
</dependency>
import org.geotools.geometry.jts.JTSFactoryFinder;
import org.geotools.referencing.CRS;
import org.geotools.util.factory.Hints;
import org.locationtech.jts.geom.*;
import org.locationtech.jts.io.ParseException;
import org.locationtech.jts.io.WKTReader;
import org.locationtech.jts.operation.buffer.BufferOp;
import org.locationtech.jts.operation.buffer.BufferParameters;
import org.opengis.referencing.FactoryException;
import org.opengis.referencing.crs.CoordinateReferenceSystem;
import org.opengis.referencing.operation.TransformException;
import java.io.IOException;
public class App
{
public static void main( String[] args ) throws ParseException, IOException, FactoryException, TransformException {
Integer epsg= 32632;
String wkt = "POINT (5293201.002716452 1208988.4067087262)";
//setup geometry point in utm coordinates (meter)
// create geometry
CoordinateReferenceSystem crs = CRS.decode(("EPSG:"+ epsg.toString()));
Hints hints = new Hints(Hints.CRS, crs);
GeometryFactory geometryFactoryWKT = JTSFactoryFinder.getGeometryFactory(hints);
WKTReader wktReader = new WKTReader(geometryFactoryWKT);
Geometry geom = wktReader.read(wkt);
geom.setSRID(epsg);
// creates BufferParameters
BufferParameters bufferParam = new BufferParameters();
bufferParam.setEndCapStyle(BufferParameters.CAP_FLAT);
// if using any other parameter result is as expected
// bufferParam.setEndCapStyle(BufferParameters.CAP_ROUND);
bufferParam.setJoinStyle(BufferParameters.JOIN_BEVEL );
bufferParam.setMitreLimit(5);
bufferParam.setSimplifyFactor(0.01);
bufferParam.setQuadrantSegments(8);
// creates buffer geom on point with 10m distance and use set bufferParameters
Geometry bufferGeom = BufferOp.bufferOp(geom ,10, bufferParam);
System.out.println(bufferGeom);
}
Does anyone know why?
Having had a look at the code it seems to come down to how OffsetCurveBuilder handles the end caps. It seems to (quite reasonably in my opinion) not calculate anything for flat end caps and since a point doesn't generate anything except ends you get nothing for a flat end cap.
private void computePointCurve(Coordinate pt, OffsetSegmentGenerator segGen) {
switch (bufParams.getEndCapStyle()) {
case BufferParameters.CAP_ROUND:
segGen.createCircle(pt);
break;
case BufferParameters.CAP_SQUARE:
segGen.createSquare(pt);
break;
// otherwise curve is empty (e.g. for a butt cap);
}
}
I am using iText PDF 5.5.11 to convert PDF to XML.I already checked similar answers on stackoverflow. I am getting below error when I run jar file using command line on ubuntu. java version "1.8.0_101"
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/asn1/ASN1Encodable
at com.itextpdf.text.pdf.PdfEncryption.<init>(PdfEncryption.java:147)
at com.itextpdf.text.pdf.PdfReader.readDecryptedDocObj(PdfReader.java:1063)
at com.itextpdf.text.pdf.PdfReader.readDocObj(PdfReader.java:1469)
at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:751)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:198)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:236)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:224)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:214)
at test.pdfreader.readXml(pdfreader.java:34)
at test.pdfreader.main(pdfreader.java:30)
I am not much familiar with java. I call this jar file from PHP using PHP exec function.
Below is the code I use to convert PDF to XML.
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.XfaForm;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class pdfreader {
public static void main(String[] args) throws IOException, DocumentException, TransformerException {
String SRC = "";
String DEST = "";
for (String s : args) {
SRC = args[0];
DEST = args[1];
}
File file = new File(DEST);
file.getParentFile().mkdirs();
new pdfreader().readXml(SRC, DEST);
}
public void readXml(String src, String dest) throws IOException, DocumentException, TransformerException {
PdfReader reader = new PdfReader(src);
AcroFields form = reader.getAcroFields();
XfaForm xfa = form.getXfa();
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); ++i) {
if ("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
list = node.getChildNodes();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty("encoding", "UTF-8");
tf.setOutputProperty("indent", "yes");
FileOutputStream os = new FileOutputStream(dest);
tf.transform(new DOMSource(node), new StreamResult(os));
reader.close();
}
}
When you use Maven for your Java project, then all you need to do, is add a dependency to iText. Maven will then take care of all transitive dependencies like BouncyCastle. Maven takes away all the heavy lifting.
The same principle applies for other build systems like Gradle etc.
Now, if you want to do it all manually and put the correct jars on your classpath, then you need to do some homework. This means looking at the pom.xml of each and every of your dependencies, see which transitive dependencies they have, which dependencies those dependencies have, and so on ad nauseam.
In case of iText, you take a look at the pom.xml that you can find on Maven Central: https://search.maven.org/#artifactdetails%7Ccom.itextpdf%7Citextpdf%7C5.5.11%7Cjar
In particular this part:
<dependencies>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
<version>1.49</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
<version>1.49</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.santuario</groupId>
<artifactId>xmlsec</artifactId>
<version>1.5.1</version>
<optional>true</optional>
</dependency>
</dependencies>
This tells you that iText 5.5.11 has an optional dependency on BouncyCastle 1.49.
BouncyCastle has a bad reputation of randomly changing and breaking their API even with minor updates, that is why you need to be very precise with your BouncyCastle version.
Hi Just change in zookeeper.service file as Environment="KAFKA_ARGS=-javaagent:/home/ec2-user/prometheus/jmx_prometheus_javaagent-0.3.1.jar=8080:/home/ec2-user/prometheus/kafka-0-8-2.yml" to below and the issue resolved:
Environment="KAFKA_OPTS=-javaagent:/home/ec2-user/prometheus/jmx_prometheus_javaagent-0.3.1.jar=8080:/home/ec2-user/prometheus/zookeeper.yml"
I'm trying to get the Metadata Values from an Office Document and all it shows as key-value pair is this one:
Content-Type: application/zip
I just can't tell the issue in this one. Why does it only show the Content-Type?
What i'm interested in are Keys like title.
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class App
{
private static final String PATH = "C:/docs/myDocument.docx";
public static void main( String[] args ) throws IOException, SAXException, TikaException
{
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
InputStream fileStream = new FileInputStream(PATH);
BodyContentHandler handler = new BodyContentHandler();
parser.parse(fileStream, handler, metadata);
String[] metadataNames = metadata.names();
for (String key : metadataNames) {
String value = metadata.get(key);
System.out.println(key + ": " + value);
}
}
}
Promoting a comment to an answer - you appear to be missing some key Apache Tika jars or their dependencies.
If you're using Maven, then your pom should have (as of January 2015) should have something like:
<properties>
<tika.version>1.7</tika.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>${tika.version}</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>${tika.version}</version>
</dependency>
</dependencies>
The tika-core artifact gives you everything you need to run Tika, and develop your own parsers, but doesn't come with any parsers. It's the tika-parsers artifact (+dependencies!) which provides all the built-in Tika parsers, which you need to process files liek yours
I have an Microsoft excel worksheet with some data in it of the form:
String-String-Integer and I want to be able to choose an item and then print its info in pdf in this format:
Name: String
Date: String
ID: int
Is it possible? If yes, how? If no, are there some java libraries that would allow me to do that?
Okay, i would like to give you a small example, which can find a String Value on an XLS document:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
/**
*
* #author Patrick Ott <Patrick.Ott#professional-webworkx.de>
* #version 1.0
*/
public class App {
public static final String USER_HOME = System.getProperty("user.home");
public static final void main(String[] args) {
FileInputStream fis = null;
try {
// change the path, where your xls file is located
fis = new FileInputStream(new File(USER_HOME + "/test/workbook.xls"));
// create a new Workbook
HSSFWorkbook workbook = new HSSFWorkbook(fis);
// get the first Sheet, if you have more, then 1 or 2 or ....
HSSFSheet sheet = workbook.getSheetAt(0);
int lastRowNum = sheet.getLastRowNum();
// go through all rows
for (int i = 0; i < lastRowNum; i++) {
if (sheet.getRow(i) != null) {
HSSFRow row = sheet.getRow(i);
// get the last cell of the current row
short lastCellNum = row.getLastCellNum();
for(int j = 0; j < lastCellNum; j++) {
// if the current column is != null and contains a String and
// equals("Espresso") (i added this for test in my example) then
// print "Found", here you can start to create the PDF
if(row.getCell(j) != null && row.getCell(j).getCellType() == Cell.CELL_TYPE_STRING && row.getCell(j).getStringCellValue().equalsIgnoreCase("Espresso")) {
System.out.println("Found");
}
}
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
If you want to work with Maven, then here is the pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>de.professional_webworkx.apachepoi</groupId>
<artifactId>ApachePOIDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.10-FINAL</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.10-FINAL</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-contrib</artifactId>
<version>3.5-FINAL</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-excelant</artifactId>
<version>3.10-FINAL</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>3.10-FINAL</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
</dependencies>
</project>
Otherwise you have to add the Apache POI Jar's manually to your project.
For creating PDF Files with Java, you can have a look at iText or at PDFBox both libraries have also Maven Dependencies, which you can add to your pom.xml.
I hope this will help you to start with your own app, if you have special problems, ask again, but make sure, that the question wont be answered alright.
Patrick