WordnetSynonymParser in Lucene

WordnetSynonymParser in Lucene - java

I am new to Lucene and I'm trying to use WordnetSynonymParser to expand queries using the wordnet synonyms prolog. Here is what I have till now:
public class CustomAnalyzer extends Analyzer {
#Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader){
// TODO Auto-generated method stub
Tokenizer source = new ClassicTokenizer(Version.LUCENE_47, reader);
TokenStream filter = new StandardFilter(Version.LUCENE_47, source);
filter = new LowerCaseFilter(Version.LUCENE_47,filter);
SynonymMap mySynonymMap = null;
try {
mySynonymMap = buildSynonym();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
filter = new SynonymFilter(filter, mySynonymMap, false);
return new TokenStreamComponents(source, filter);
}
private SynonymMap buildSynonym() throws IOException
{
File file = new File("wn/wn_s.pl");
InputStream stream = new FileInputStream(file);
Reader rulesReader = new InputStreamReader(stream);
SynonymMap.Builder parser = null;
parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(Version.LUCENE_47));
((WordnetSynonymParser) parser).add(rulesReader);
SynonymMap synonymMap = parser.build();
return synonymMap;
}
}
I get the error "The method add(CharsRef, CharsRef, boolean) in the type SynonymMap.Builder is not applicable for the arguments (Reader)"
However, the documentation of WordnetSynonymParser expects a Reader argument for the add function.
What am I doing wrong here?
Any help is appreciated.

If you are seeing documentation stating that WordNetSynonymParser has a method add(Reader), you are probably looking at documentation for an older version. The method certainly isn't there in the source code for 4.7. As of version 4.6.0, the method you are looking for is WordnetSynonymParser.parse(Reader).

Related

Delete existing file and create/write data into same file using TransformerFactory within a loop

I want to transform the input.XML file with XSL and overwrite the output.XML file in a loop so that I get the final output.XML file.
I am able to do this, however instead of overwriting the output.XML file, it's appending the same file with all iteration data of loop.
To resolve this issue, I tried to delete the existing output.XML file in loop only just before transforming the input.XML file with XSL, but getting error -
java.nio.file.FileSystemException: Output.xml: The process cannot access the file because it is being used by another process.
So, I'm NOT able to delete the existing file also not able to overwrite the output.xml file.
Can anyone help on this pls.
I believe resolving any one of these issue should help out.
Thanks
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException {
// TODO Auto-generated method stub
try {
Double currentValue = 1.0;
String inputXMLPath = "C:/MySystem/Input.xml";
String outputXMLPath = "C:/MySystem/Output.xml";
StreamSource inputStream = new StreamSource(inputXMLPath);
FileOutputStream opStream = new FileOutputStream(new File(outputXMLPath));
while (currentValue != 7.0) {
String xslPath = "C:/MySystem/input.xsl";
Path path = FileSystems.getDefault().getPath(outputXMLPath);
Files.delete(path);
performTransformation(xslPath, inputStream, opStream, outputXMLPath);
StreamSource secondStream = new StreamSource(outputXMLPath);
inputStream = secondStream;
currentValue++;
opStream.flush();
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void performTransformation(String xslPath, StreamSource inputStream, FileOutputStream opStream,
String outputXMLPath) throws Exception {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = null;
transformer = tFactory.newTransformer(new StreamSource(xslPath));
transformer.transform(inputStream, new StreamResult(opStream));
opStream.flush();
}

Pretty sure you can't edit or delete this file because the input stream is still open, that's why it is still in use if you can close the input stream then delete the XML file and then make a new one it should work no problem.
This should fit your use case assuming the file is always called the same thing.
Also the deleteIfExists(Path) method is useful as it will still delete the file but will not throw an exception if it does not exist.

You can't delete the file because you haven't closed it. But instead of deleting it, why don't you re-create the file inside the loop, instead of outside it:
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException {
// TODO Auto-generated method stub *** DELETE THIS LINE ***
try {
Double currentValue = 1.0;
String inputXMLPath = "C:/MySystem/Input.xml";
String outputXMLPath = "C:/MySystem/Output.xml";
StreamSource inputStream = new StreamSource(inputXMLPath);
// *** DANGER! COMPARING DOUBLES IS PRONE TO ERRORS! ***
while (currentValue != 7.0) {
// *** USE TRY-WITH-RESOURCES TO ENSURE THE STREAM GETS CLOSED ***
try (FileOutputStream opStream = new FileOutputStream(outputXMLPath)) {
String xslPath = "C:/MySystem/input.xsl";
performTransformation(xslPath, inputStream, opStream, outputXMLPath);
StreamSource secondStream = new StreamSource(outputXMLPath);
inputStream = secondStream;
currentValue++;
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

"The method addAttribute(Class<CharTermAttribute>) is undefined for the type TokenStream" while using Apache Lucene in Java

I am using Apache Lucene for tokenizing string in Java and I am encountering the below error even after importing all the external jar files.
The error is:
The method addAttribute(Class) is undefined for the type TokenStream
The line of code in which the error occured:
CharTermAttribute charTermAttri=stream.addAttribute(CharTermAttribute.class);
The code is:
public static ArrayList<String> tokenizeString(Analyzer analyzer, String string) {
ArrayList<String> result = new ArrayList<String>();
try {
TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
**CharTermAttribute charTermAttri = stream.addAttribute(CharTermAttribute.class);**
stream.reset();
while (stream.incrementToken()) {
result.add(stream.getAttribute(CharTermAttribute.class).toString());
}
stream.close();
} catch (IOException e) {
// not thrown b/c we're using a string reader...
throw new RuntimeException(e);
}
return result;
}

converting a java.util.stream.Stream<String> into a java.io.Reader

Part of my application is given an InputStream and wants to do some processing on this to produce another InputStream.
try (
final BufferedReader inputReader = new BufferedReader(new InputStreamReader(inputStream, UTF_8), BUFFER_SIZE);
final Stream<String> resultLineStream = inputReader.lines().map(lineProcessor::processLine);
final InputStream resultStream = new ReaderInputStream(new StringStreamReader(resultLineStream), UTF_8);
) {
s3Client.putObject(targetBucket, s3File, resultStream, new ObjectMetadata());
} catch (IOException e) {
throw new RuntimeException("Exception", e);
}
I am using the new Java 8 BufferedReader.lines() to a Stream onto which I can easily map my processing function.
The only thing still lacking is class StringStreamReader() which is supposed to turn my Stream into a Reader from which Apache commons-io:ReaderInputStream can create an InputStream again. (The detour to readers and back seems reasonable to deal with encodings and line breaks.)
To be very clear, the code above assumes
public class StringStreamReader extends Reader {
public StringStreamReader(Stream<String> stringStream) { ... }
#Overwrite
public int read(char cbuf[], int off, int len) throws IOException { ... }
// possibly overwrite other methods to avoid bad performance or high resource-consumption
}
So is there any library that offers such a StringStreamReader class? Or this there another way to write the application code above without implementing a custom Reader or InputStream subclass?

You can do something like that:
PipedWriter writer = new PipedWriter();
PipedReader reader = new PipedReader();
reader.connect(writer);
strings.stream().forEach(string -> {
try {
writer.write(string);
writer.write("\n");
} catch (Exception e) {
e.printStackTrace();
}
});
But i guess you want some form of lazy processing. Stream api does not really help in that case, you need a dedicated Thread + some buffer to do that.

Extract raw text from rtf file

Once upon a time I was using apache POI to extract rtf files. I was actually using the TXTParser class because then I could use the raw output from the rtf (with all the formatting in text) to do various text extraction wizadry based on the formatting.
Then one day it just started to output blank strings and I have no idea why.
public class TextParser {
//#SuppressWarnings({ "rawtypes", "unchecked" })
public TextParser() {
// TODO Auto-generated constructor stub
}
public static void main(final String[] args) throws IOException,TikaException{
//detecting the file type
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("/Users/sebastianzeki/Documents/PhysJava/dance.rtf"));
ParseContext pcontext = new ParseContext();
//Text document parser
TXTParser TXTParser = new TXTParser();
try {
TXTParser.parse(inputstream, handler, metadata,pcontext);
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//Some tidying up
String s=handler.toString();
System.out.println(s);
I know there's nothing wrong with the file because if I use another class (ie RTFParser) I get the whole file.
HtmlParser would be an alternative but only gives me half the file returned.
Can anyone suggest an alternative way to get the rtf as required or a fix for this weird problem

Unable to move a file using java while using apache tika

I am passing a file as input stream to parser.parse() method while using apache tika library to convert file to text.The method throws an exception (displayed below) but the input stream is closed in the finally block successfully. Then while renaming the file, the File.renameTo method from java.io returns false. I am not able to rename/move the file despite successfully closing the inputStream. I am afraid another instance of file is created, while parser.parse() method processess the file, which doesn't get closed till the time exception is throw. Is that possible? If so what should I do to rename the file.
The Exception thrown while checking the content type is
java.lang.NoClassDefFoundError: Could not initialize class com.adobe.xmp.impl.XMPMetaParser
at com.adobe.xmp.XMPMetaFactory.parseFromBuffer(XMPMetaFactory.java:160)
at com.adobe.xmp.XMPMetaFactory.parseFromBuffer(XMPMetaFactory.java:144)
at com.drew.metadata.xmp.XmpReader.extract(XmpReader.java:106)
at com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112)
at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71)
at org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91)
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
Please suggest any solution. Thanks in advance.
public static void main(String args[])
{
InputStream is = null;
StringWriter writer = new StringWriter();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
File file = null;
File destination = null;
try
{
file = new File("E:\\New folder\\testFile.pdf");
boolean a = file.exists();
destination = new File("E:\\New folder\\test\\testOutput.pdf");
is = new FileInputStream(file);
parser.parse(is, new WriteOutContentHandler(writer), metadata, new ParseContext()); //EXCEPTION IS THROWN HERE.
String contentType = metadata.get(Metadata.CONTENT_TYPE);
System.out.println(contentType);
}
catch(Exception e1)
{
e1.printStackTrace();
}
catch(Throwable t)
{
t.printStackTrace();
}
finally
{
try
{
if(is!=null)
{
is.close(); //CLOSES THE INPUT STREAM
}
writer.close();
}
catch(Exception e2)
{
e2.printStackTrace();
}
}
boolean x = file.renameTo(destination); //RETURNS FALSE
System.out.println(x);
}

This might be due to other processes are still using the file, like anti-virus program and also it may be a case that any other processes in your application may possessing a lock.
please check that and deal with that, it may solve your problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

WordnetSynonymParser in Lucene - java

Related

Delete existing file and create/write data into same file using TransformerFactory within a loop

"The method addAttribute(Class<CharTermAttribute>) is undefined for the type TokenStream" while using Apache Lucene in Java

converting a java.util.stream.Stream<String> into a java.io.Reader

Extract raw text from rtf file

Unable to move a file using java while using apache tika

Categories

Resources