I need to index document while they are being uploaded into different indexes based on their content in a Java web application where multiple users can be uploading multiple documents each simoultaneously
I am using Lucene 6.2.1 for indexing
for this I have created a Stateless EJB. which Indexes the document while it is being uploaded called IndexingSessionBean
But as I can not have multiple IndexWriters open on one index I have created a #Singleton and #ApplicationScoped bean called CatagoryIndexWriters, which should have a map of Index writers for each catagory of document and pass it to IndexingSessionBean.
my code is as given below
IndexingSessionBean.java
#Stateless
public class IndexingSessionBean {
#EJB
CatagoryIndexWriters catagoryIndexWriters;
public void indexFile(String documentId, String catId, byte[] fileBytes, boolean isUpdate) {
String content = // get contents of the fileBytes in String
try {
IndexWriter writer = catagoryIndexWriters.getTargetIndexWriter(catId)
Document doc = new Document();
Field documentIdField = new StringField("documentId", documentId, Field.Store.YES);
doc.add(documentIdField);
doc.add(new TextField("contents", content, Field.Store.YES));
if (!isUpdate) {
LOG.log(Level.INFO, "Indexing file with documentId {0}", documentId);
writer.addDocument(doc);
} else {
LOG.log(Level.INFO, "Updating Index for file with documentId {0}", documentId);
writer.updateDocument(new Term("documentId", documentId), doc);
}
}
catch (IOException ex) {
LOG.log(Level.SEVERE, "Unable to index document!", ex);
}
}
}
CatagoryIndexWriters
#Singleton
#ApplicationScoped
#ConcurrencyManagement(BEAN)
public class CatagoryIndexWriters {
#EJB
SystemConfigBean systemConfigBean;
Map<String, IndexWriter> indexWritersMap =new HashMap<String, IndexWriter>();
private double RAMBufferSize = 256.00;
public IndexWriter getCatagoryIndexWriter(String catId){
IndexWriter writer;
writer = indexWritersMap.get(catId);
if (writer != null){
return writer;
}else{
addCatagoryIndexWriterToMap(catId);
return indexWritersMap.get(catId);
}
}
private void createCatagoryIndexPath(String catId){
String indexPath = systemConfigBean.getSearchindexPath();
String catIndexPathString = indexPath+systemConfigBean.SEPARATORCHAR+catId;
Path catIndexPath = new File(catIndexPathString).toPath();
//Check the Catagory Index Folder if there is no index folder create it.
}
private void addCatagoryIndexWriterToMap(String catId){
createCatagoryIndexPath(catId);
String indexPath = systemConfigBean.getSearchindexPath();
String catIndexPathString = indexPath+systemConfigBean.SEPARATORCHAR+catId;
Path catIndexPath = new File(catIndexPathString).toPath();
try {
Directory dir = FSDirectory.open(catIndexPath);
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
iwc.setRAMBufferSizeMB(this.RAMBufferSize);
try (IndexWriter writer = new IndexWriter(dir, iwc)) {
indexWritersMap.put(catId, writer);
}
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
But while adding document I get following exception..
Mai 12, 2017 12:54:59 PM org.apache.openejb.core.transaction.EjbTransactionUtil handleSystemException
SCHWERWIEGEND: EjbTransactionUtil.handleSystemException: this IndexWriter is closed
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1307)
at de.zaffar.docloaddoc.beans.IndexingSessionBean.indexFile(IndexingSessionBean.java:257)
I dont know from where the close method in IndexWriter bieng called
Your issue seems to be the line , try (IndexWriter writer = new IndexWriter(dir, iwc)) so this resource will be auto closed after try statement i.e. once you have put it into map.
try-with-resource has a very specific use case of using that resource with in the try - block otherwise it will be closed.
IndexWriterdoes implement AutoCloseable so it gets closed.
Remove it from try-with-resource and make it a normal statement then try again.
Related
using EJB3.0 + jersey restful API + lucene 6.1
The Analyzer is Jcseg Chinese Analyzer .
Code:
#Stateless
public class GoodsSearchBiz implements Serializable {
#Override
public List<String> test(){
Analyzer analyzer = new JcsegAnalyzer5X(JcsegTaskConfig.SEARCH_MODE);
JcsegAnalyzer5X jcseg = (JcsegAnalyzer5X) analyzer;
JcsegTaskConfig config = jcseg.getTaskConfig();
config.setAppendCJKSyn(true);
config.setAppendCJKPinyin(true);
TokenStream stream = null;
List<String> strList = new ArrayList<>();
try {
FSDirectory directory = FSDirectory.open(Paths.get(ResourcesUtils.loadGoodsMarketIndexDir()));
IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer);
iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
IndexWriter iwriter = new IndexWriter(directory, iwConfig);
iwriter.deleteAll();
String words = "中华人民共和国";
Document doc = new Document();
doc.add(new TextField(SearchGoodsVO.FIELD_NAME, words, Field.Store.YES));
iwriter.addDocument(doc);
iwriter.commit();
iwriter.close();
stream = analyzer.tokenStream(SearchGoodsVO.FIELD_NAME, words);
stream.reset();
CharTermAttribute offsetAtt = stream.addAttribute(CharTermAttribute.class);
while (stream.incrementToken()) {
strList.add(offsetAtt.toString());
}
stream.end();
if (stream != null) stream.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(strList);
return strList;
}
Run it in Main gives different results
public static void main(String[] args) {
GoodsSearchBiz goodsSearchBiz = new GoodsSearchBiz();
goodsSearchBiz.test();
}
}
/*The Api*/
#Path("/search")
#Produces(RestMediaType.JSON_HEADER)
#Consumes(RestMediaType.JSON_HEADER)
public class GoodsSearchApi {
#EJB
GoodsSearchBiz searchBiz;
#GET
#Path("/test")
public List<String> test() {
return searchBiz.test();
}
}
Results:
from Main:
[中华, 中华人民共和国, 华人, 人民, 人民共和国, 共和, 共和国]
Process finished with exit code 0
from API:
09:31:05,433 INFO [stdout] (default task-1) [中, 华, 人, 民, 共, 和, 国]
Why the same Code gives different Results like this?
u got to let jcseg load its lexicons.
at your api mode, Jcseg did't load the lexicon correctly.
visit https://github.com/lionsoul2014/jcseg for more help if u can read chinese
Im trying to match a text Config migration from ASA5505 8.2 to ASA5516 in column TITLE.
My program looks like this.
Directory directory = FSDirectory.open(indexDir);
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println("\"Title :\" " +d.get("TITLE") );
}
But its returning
"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall migration from ASA5585 to ASA5555
"Title :" Firewall migration from ASA5585 to ASA5555
Second 2 results are not expected.So what modification required to match exact text Config migration from ASA5505 8.2 to ASA5516
And my indexing function looks like this
public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"
private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";
public static void main(String[] args) throws Exception {
File indexDir = new File(INDEX_DIR);
Lucene indexer = new Lucene();
try {
Date start = new Date();
Class.forName(JDBC_DRIVER).newInstance();
Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
System.out.println("Indexing to directory '" + indexDir + "'...");
int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
indexWriter.close();
System.out.println(indexedDocumentCount + " records have been indexed successfully");
System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
} catch (Exception e) {
e.printStackTrace();
}
}
int indexDocs(IndexWriter writer, Connection conn) throws Exception {
String sql = QUERY;
Statement stmt = conn.createStatement();
stmt.setFetchSize(100000);
ResultSet rs = stmt.executeQuery(sql);
int i = 0;
while (rs.next()) {
System.out.println("Addind Doc No:" + i);
Document d = new Document();
System.out.println(rs.getString("TITLE"));
d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(d);
i++;
}
return i;
}
}
PVR is correct, that using a phrase query is probably the right solution here, but they missed on how to use the PhraseQuery class. You are already using QueryParser though, so just use the query parser syntax by enclosing you search text in quotes:
Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");
Based on your update, you are using a different analyzer at index-time and query-time. SimpleAnalyzer and StandardAnalyzer don't do the same things. Unless you have a very good reason to do otherwise, you should analyze the same way when indexing and querying.
So, change the analyzer in your indexing code to StandardAnalyzer (or vice-versa, use SimpleAnalyzer when querying), and you should see better results.
Here is what i have written for you which works perfectly:
USE: queryParser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
To create indexes
public static void main(String[] args)
{
IndexWriter writer = getIndexWriter();
Document doc = new Document();
Document doc1 = new Document();
Document doc2 = new Document();
doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED));
doc1.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
doc2.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
try
{
writer.addDocument(doc);
writer.addDocument(doc1);
writer.addDocument(doc2);
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static IndexWriter getIndexWriter()
{
IndexWriter indexWriter=null;
try
{
File file=new File("D://index//");
if(!file.exists())
file.mkdir();
IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
Directory directory=FSDirectory.open(file);
indexWriter=new IndexWriter(directory, conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return indexWriter;
}
}
2.To search string
public static void main(String[] args)
{
IndexReader reader=getIndexReader();
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(Version.LUCENE_34, "TITLE" ,new StandardAnalyzer(Version.LUCENE_34));
Query query;
try
{
query = parser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");
TopDocs hits = searcher.search(query,3);
ScoreDoc[] document = hits.scoreDocs;
int i=0;
for(i=0;i<document.length;i++)
{
Document doc = searcher.doc(i);
System.out.println("TITLE=" + doc.get("TITLE"));
}
searcher.close();
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static IndexReader getIndexReader()
{
IndexReader reader=null;
Directory dir;
try
{
dir = FSDirectory.open(new File("D://index//"));
reader=IndexReader.open(dir);
} catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
return reader;
}
Try PhraseQuery as follow:
BooleanQuery mainQuery= new BooleanQuery();
String searchTerm="config migration from asa5505 8.2 to asa5516";
String strArray[]= searchTerm.split(" ");
for(int index=0;index<strArray.length;index++)
{
PhraseQuery query1 = new PhraseQuery();
query1.add(new Term("TITLE",strArray[index]));
mainQuery.add(query1,BooleanClause.Occur.MUST);
}
And then execute the mainQuery.
Check out this thread of stackoverflow, It may help you to use PhraseQuery for exact search.
I am messing around with Lucene to see how it can help me and I am unable to get a very simple example working. I am using Lucene 5.1
Expectations is that when I search, I get the document ID for the document I added to the index in the console. I get nothing, no errors (just "Done" printed to console at the end)
Here is my code:
public static void main(String[] args) throws Exception {
// create structure on file system
IndexWriter writer = createOrGetIndexWriter(LocalDate.now());
writer.close();
// open for writing
writer = createOrGetIndexWriter(LocalDate.now());
Document document = new Document();
document.add(new IntField("test_field", 1, Field.Store.YES));
// write document and close.
writer.addDocument(document);
writer.commit();
writer.close();
// open reader
IndexReader reader = getIndexReader(LocalDate.now());
IndexSearcher indexSearcher = new IndexSearcher(reader);
Query q = new TermQuery(new Term("test_field", "1"));
// callback should be synchronous
indexSearcher.search(q, new SimpleCollector() {
#Override
public void collect(int i) throws IOException {
System.out.println(i);
}
#Override
public boolean needsScores() {
return false;
}
});
System.out.println("Done");
}
public static IndexWriter createOrGetIndexWriter(LocalDate date) throws Exception {
Directory directory = FSDirectory.open(Paths.get(date.toString()));
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
return new IndexWriter(directory, iwc);
}
public static IndexReader getIndexReader(LocalDate date) throws Exception {
return DirectoryReader.open(FSDirectory.open(Paths.get(date.toString())));
}
I need to fetch a user uploaded XML file from the DAM, parse this file and store the contents in the JCR. Here's what I have so far
public class foo implements Runnable {
private static final Logger log = LoggerFactory
.getLogger(foo.class);
#Reference
ResourceResolverFactory resourceResolverFactory;
#Reference
ResourceProvider resourceProvider;
ResourceResolver resourceResolver = null;
#Reference
SlingRepository repository;
Session session;
// private static ReadXMLFileUsingDomparserTest readxml;
File tempFile;
public void run(){
log.info("\n *** Seems okay ***\n");
ResourceResolver resourceResolver = null;
try {
resourceResolver = resourceResolverFactory.getAdministrativeResourceResolver(null);
Resource resource = resourceResolver.getResource("/content/dam/foo/file.xml");
Node node = resource.adaptTo(Node.class);
boolean isAssest = DamUtil.isAsset(resource);
if (isAssest) {
Asset asset = resource.adaptTo(Asset.class);
List<Rendition> rendition = asset.getRenditions();
for (Rendition re : rendition) {
InputStream in = re.getStream();
File xmlFile = copy(in,tempFile);
if(filetest.exists()){
ReadXMLFileUsingDomparserTest.parseXML(filetest,null);
}else {
log.info("File not found at all");
}
}
}
File xmlFile = copy(in,tempFile);*/
}catch (Exception e) {
log.error("Exception while running foo" , e);
}
}
private File copy(InputStream in, File file) {
try {
OutputStream out = new FileOutputStream(file);
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
out.close();
in.close();
} catch (Exception e) {
e.printStackTrace();
}
return file;
}
}
Although I'm able to pick up the Node object correctly (doing Node.getPath() returns the correct path), I am not able to translate this node into a File object. (cannot be Adapted). I want to access this in terms of a File object for parsing. This is why I went through the renditions of the asset and used the stream to copy it into a file.
However, this always shows null for the above code; the output is always File not found at all.
What is the correct way to get a File object with the requisite data from the DAM so that I can successfully parse it?
Uploaded xml file should have an nt:file node, which has a jcr:content node with jcr:data property. You can read the xml from jcr:data i.e: jcrContent.getProperty("jcr:data").getBinary().getStream();
Here are the build in adapters: http://dev.day.com/docs/en/cq/current/developing/sling-adapters.html
I think you can use InputStream here...
I have a method (getSingleNodeValue()) which when passed an xpatch expression will extract the value of the specified element in the xml document refered to in 'doc'. Assume doc at this point has been initialised as shown below and xmlInput is the buffer containing the xml content.
SAXBuilder builder = null;
Document doc = null;
XPath xpathInstance = null;
doc = builder.build(new StringReader(xmlInput));
When i call the method, i pass the following xpath xpression
/TOP4A/PERLODSUMDEC/TINPLD1/text()
Here is the method. It basically just takes an xml buffer and uses xpath to extract the value:
public static String getSingleNodeValue(String xpathExpr) throws Exception{
Text list = null;
try {
xpathInstance = XPath.newInstance(xpathExpr);
list = (Text) xpathInstance.selectSingleNode(doc);
} catch (JDOMException e) {
throw new Exception(e);
}catch (Exception e){
throw new Exception(e);
}
return list==null ? "?" : list.getText();
}
The above method always returns "?" i.e. nothing is found so 'list' is null.
The xml document it looks at is
<TOP4A xmlns="http://www.testurl.co.uk/enment/gqr/3232/1">
<HEAD>
<Doc>ABCDUK1234</Doc>
</HEAD>
<PERLODSUMDEC>
<TINPLD1>10109000000000000</TINPLD1>
</PERLODSUMDEC>
</TOP4A>
The same method works with other xml documents so i am not sure what is special about this one. There is no exception so the xml is valid xml. Its just that the method always sets 'list' to null. Any ideas?
Edit
Ok as suggested, here is a simple running program that demonstrates the above
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.xpath.*;
import java.io.IOException;
import java.io.StringReader;
public class XpathTest {
public static String getSingleNodeValue(String xpathExpr, String xmlInput) throws Exception{
Text list = null;
SAXBuilder builder = null;
Document doc = null;
XPath xpathInstance = null;
try {
builder = new SAXBuilder();
doc = builder.build(new StringReader(xmlInput));
xpathInstance = XPath.newInstance(xpathExpr);
list = (Text) xpathInstance.selectSingleNode(doc);
} catch (JDOMException e) {
throw new Exception(e);
}catch (Exception e){
throw new Exception(e);
}
return list==null ? "Nothing Found" : list.getText();
}
public static void main(String[] args){
String xmlInput1 = "<TOP4A xmlns=\"http://www.testurl.co.uk/enment/gqr/3232/1\"><HEAD><Doc>ABCDUK1234</Doc></HEAD><PERLODSUMDEC><TINPLD1>10109000000000000</TINPLD1></PERLODSUMDEC></TOP4A>";
String xpathExpr = "/TOP4A/PERLODSUMDEC/TINPLD1/text()";
XpathTest xp = new XpathTest();
try {
System.out.println(xp.getSingleNodeValue(xpathExpr, xmlInput1));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
When i run the above, the output is
Nothing found
Edit
I have run some further testing and it appears that if i remove the namespace url it does work. Not sure why yet. Is there any way i can tell it to ignore the namespace?
Edit
Please also note that the above is implemented on JDK1.4.1 so i dont have the options for later version of the JDKs. This is the reason why i had to stick with Jdom.
The problem is with XML namespaces: your XPath query starts by selecting a 'TOP4A' element in the default namespace. Your XML file, however, has a 'TOP4A' element in the 'http://www.testurl.co.uk/enment/gqr/3232/1' namespace instead.
Is it an option to remove the xmlns from the XML?