How to manipulate content of a comment with Apache POI

How to manipulate content of a comment with Apache POI - java

I would like to find a comment in Docx document (somehow, by author or ID...), then create new content. I was able to create a comment, with the help of this answer, but had no luck with manipulation.

As said in my answer linked in your question, until now the XWPFdocument will only read that package part while creating. There is neither write access nor a possibility to create that package part. This is mentioned in XWPFDocument.java - protected void onDocumentRead(): code line 210: "// TODO Create according XWPFComment class, extending POIXMLDocumentPart".
So we need doing this ourself until now. We need providing class extending POIXMLDocumentPart for comments and registering this relation instead of only relation to the simple POIXMLDocumentPart. So that and changings can be made which were committed while writing the XWPFDocument.
Example:
import java.io.*;
import org.apache.poi.*;
import org.apache.poi.openxml4j.opc.*;
import org.apache.xmlbeans.*;
import org.apache.poi.xwpf.usermodel.*;
import static org.apache.poi.POIXMLTypeLoader.DEFAULT_XML_OPTIONS;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import javax.xml.namespace.QName;
import java.math.BigInteger;
import java.util.GregorianCalendar;
import java.util.Locale;
public class WordChangeComments {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDocumentHavingComments.docx"));
for (POIXMLDocumentPart.RelationPart rpart : document.getRelationParts()) {
String relation = rpart.getRelationship().getRelationshipType();
if (relation.equals(XWPFRelation.COMMENT.getRelation())) {
POIXMLDocumentPart part = rpart.getDocumentPart(); //this is only POIXMLDocumentPart, not a high level class extending POIXMLDocumentPart
//provide class extending POIXMLDocumentPart for comments
MyXWPFCommentsDocument myXWPFCommentsDocument = new MyXWPFCommentsDocument(part.getPackagePart());
//and registering this relation instead of only relation to POIXMLDocumentPart
String rId = document.getRelationId(part);
document.addRelation(rId, XWPFRelation.COMMENT, myXWPFCommentsDocument);
//now the comments are available from the new MyXWPFCommentsDocument
for (CTComment ctComment : myXWPFCommentsDocument.getComments().getCommentArray()) {
System.out.print("Comment: Id: " + ctComment.getId());
System.out.print(", Author: " + ctComment.getAuthor());
System.out.print(", Date: " + ctComment.getDate());
System.out.print(", Text: ");
for (CTP ctp : ctComment.getPArray()) {
System.out.print(ctp.newCursor().getTextValue());
}
System.out.println();
//and changings can be made which were committed while writing the XWPFDocument
if (BigInteger.ONE.equals(ctComment.getId())) { //the second comment (Id 0 = first)
ctComment.setAuthor("New Author");
ctComment.setInitials("NA");
ctComment.setDate(new GregorianCalendar(Locale.US));
CTP newCTP = CTP.Factory.newInstance();
newCTP.addNewR().addNewT().setStringValue("The new Text for Comment with Id 1.");
ctComment.setPArray(new CTP[]{newCTP });
}
}
}
}
document.write(new FileOutputStream("WordDocumentHavingComments.docx"));
document.close();
}
//a wrapper class for the CommentsDocument /word/comments.xml in the *.docx ZIP archive
private static class MyXWPFCommentsDocument extends POIXMLDocumentPart {
private CTComments comments;
private MyXWPFCommentsDocument(PackagePart part) throws Exception {
super(part);
comments = CommentsDocument.Factory.parse(part.getInputStream(), DEFAULT_XML_OPTIONS).getComments();
}
private CTComments getComments() {
return comments;
}
#Override
protected void commit() throws IOException {
System.out.println("============MyXWPFCommentsDocument is committed=================");
XmlOptions xmlOptions = new XmlOptions(DEFAULT_XML_OPTIONS);
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTComments.type.getName().getNamespaceURI(), "comments"));
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
comments.save(out, xmlOptions);
out.close();
}
}
}
This works for apache poi 3.17. Since apache poi 4.0.0 the ooxml part is separated. So there must be:
...
import org.apache.poi.ooxml.*;
...
import static org.apache.poi.ooxml.POIXMLTypeLoader.DEFAULT_XML_OPTIONS;
...

Related

How to add an altChunk element to a XWPFDocument using Apache POI

I would like to add HTML as an altChunk to a DOCX file using Apache POI. I know that doc4jx can do this with a simpler API but for technical reasons I need to use Apache POI.
Using the CT classes to do low level stuff with the xml is a little tricky. I can create an altChunk with following code:
import java.io.File;
import java.io.FileOutputStream;
import javax.xml.namespace.QName;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;
public class AltChunkTest {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument();
doc.createParagraph().createRun().setText("AltChunk below:");
QName ALTCHUNK = new QName ( "http://schemas.openxmlformats.org/wordprocessingml/2006/main" , "altChunk" ) ;
CTDocument1 ctDoc = doc.getDocument() ;
CTBodyImpl ctBody = (CTBodyImpl) ctDoc. getBody();
XmlComplexContentImpl xcci = ( XmlComplexContentImpl ) ctBody.get_store().add_element_user(ALTCHUNK);
// what's need to now add "<b>Hello World!</b>"
FileOutputStream out = new FileOutputStream(new File("test.docx"));
doc.write(out);
}
}
But how do I add the html content to 'xcci' it now?

In Office Open XML for Word (*.docx) the altChunk provides a method for using pure HTML to describe document parts.
Two important notes about altChunk:
First: It is used only for importing content. If you open the document using Word and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it. Word saves all imported content as default Office Open XML elements.
Second: Most applications except Word which are able reading *.docx too will not reading the altChunk content at all. For example Libreoffice or OpenOffice Writer will not reading the altChunk content as well as apache poi will not reading the altChunk content when opening a *.docx file.
How is altChunk implemented in the *.docx ZIP file structure?
There are /word/*.html files in the *.docx ZIP file. Those are referenced by Id in /word/document.xml as <w:altChunk r:id="htmlDoc1"/> for example. The relation between the Ids and the /word/*.html files are given in /word/_rels/document.xml.rels as <Relationship Id="htmlDoc1" Target="htmlDoc1.html" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk"/> for example.
So we need at first POIXMLDocumentParts for the /word/*.html files and POIXMLRelations for the relation between the Ids and the /word/*.html files. Following code provides that by having a wrapper class which extends POIXMLDocumentPart for the /word/htmlDoc#.html files in the *.docx ZIP archive. This also provides methods for manipulating the HTML. Also it provides a method for creating the /word/htmlDoc#.html files in the *.docx ZIP archive and creating relations to it.
Code:
import java.io.*;
import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;
import org.apache.poi.xwpf.usermodel.*;
public class CreateWordWithHTMLaltChunk {
//a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//String id will be htmlDoc#.
private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception {
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
PackagePart part = oPCPackage.createPart(partName, "text/html");
MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id);
document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument);
return myXWPFHtmlDocument;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph;
XWPFRun run;
MyXWPFHtmlDocument myXWPFHtmlDocument;
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Default paragraph followed by first HTML chunk.");
myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1");
myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
"<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>"));
document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Default paragraph followed by second HTML chunk.");
myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc2");
myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
"<body>" +
"<table>"+
"<caption>A table></caption>" +
"<tr><th>Name</th><th>Date</th><th>Amount</th></tr>" +
"<tr><td>John Doe</td><td>2018-12-01</td><td>1,234.56</td></tr>" +
"</table>" +
"</body>"
));
document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());
FileOutputStream out = new FileOutputStream("CreateWordWithHTMLaltChunk.docx");
document.write(out);
out.close();
document.close();
}
//a wrapper class for the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//provides methods for manipulating the HTML
//TODO: We should *not* using String methods for manipulating HTML!
private static class MyXWPFHtmlDocument extends POIXMLDocumentPart {
private String html;
private String id;
private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
super(part);
this.html = "<!DOCTYPE html><html><head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\"><style></style><title>HTML import</title></head><body></body>";
this.id = id;
}
private String getId() {
return id;
}
private String getHtml() {
return html;
}
private void setHtml(String html) {
this.html = html;
}
#Override
protected void commit() throws IOException {
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(html);
writer.close();
out.close();
}
}
//the XWPFRelation for /word/htmlDoc#.html
private final static class XWPFHtmlRelation extends POIXMLRelation {
private XWPFHtmlRelation() {
super(
"text/html",
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
"/word/htmlDoc#.html");
}
}
}
Note: Because of using altChunk this code needs the full jar of all of the schemas ooxml-schemas-*.jar as mentioned in apache poi faq-N10025.
Result:

Based on Axel Richter's answer, I replaced the call to CTBody.addNewAltChunk() with CTBodyImpl.get_store().add_element_user(QName) which eliminates the added 15MB dependency on ooxml-schemas. Since this is being used in a desktop app, we are trying to keep the app size as small as possible. In case it may be of help to anyone else:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import javax.xml.namespace.QName;
import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackagePartName;
import org.apache.poi.openxml4j.opc.PackagingURIHelper;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.xmlbeans.SimpleValue;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;
public class AltChunkTest {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument();
doc.createParagraph().createRun().setText("AltChunk below:");
addHtml(doc,"chunk1","<!DOCTYPE html><html><head><style></style><title></title></head><body><b>Hello World!</b></body></html>");
FileOutputStream out = new FileOutputStream(new File("test.docx"));
doc.write(out);
}
static void addHtml(XWPFDocument doc, String id,String html) throws Exception {
OPCPackage oPCPackage = doc.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
PackagePart part = oPCPackage.createPart(partName, "text/html");
class HtmlRelation extends POIXMLRelation {
private HtmlRelation() {
super( "text/html",
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
"/word/htmlDoc#.html");
}
}
class HtmlDocumentPart extends POIXMLDocumentPart {
private HtmlDocumentPart(PackagePart part) throws Exception {
super(part);
}
#Override
protected void commit() throws IOException {
try (OutputStream out = part.getOutputStream()) {
try (Writer writer = new OutputStreamWriter(out, "UTF-8")) {
writer.write(html);
}
}
}
};
HtmlDocumentPart documentPart = new HtmlDocumentPart(part);
doc.addRelation(id, new HtmlRelation(), documentPart);
CTBodyImpl b = (CTBodyImpl) doc.getDocument().getBody();
QName ALTCHUNK = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "altChunk");
XmlComplexContentImpl altchunk = (XmlComplexContentImpl) b.get_store().add_element_user(ALTCHUNK);
QName ID = new QName("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id");
SimpleValue target = (SimpleValue)altchunk.get_store().add_attribute_user(ID);
target.setStringValue(id);
}
}

This feature is ok in poi-ooxml 4.0.0, where the class POIXMLDocumentPart and POIXMLRelation are in the package org.apache.poi.ooxml.*
import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;
But how we can use in poi-ooxml 3.9, where the class are little different and in the org.apache.poi.*
import org.apache.poi.POIXMLDocumentPart;
import org.apache.poi.POIXMLRelation;

Importing URLs for JSOUP to Scrape via Spreadsheet

I finally got IntelliJ to work. I'm using the code below. It works perfect. I need it to loop over and over and pull links from a spreadsheet to find the price over and over again on different items. I have a spreadsheet with a few sample URLs located in column C starting at row 2. How can I have JSOUP use the URLs in this spreadsheet then output to column D?
public class Scraper {
public static void main(String[] args) throws Exception {
final Document document = Jsoup.connect("examplesite.com").get();
for (Element row : document.select("#price")) {
final String price = row.select("#price").text();
System.out.println(price);
}
}
Thanks in advance for any help!
Eric

You can use JExcel library to read and edit sheets: https://sourceforge.net/projects/jexcelapi/ .
When you download the zip file with library there's also very useful tutorial.html.
Explanation in comments:
import java.io.File;
import java.io.IOException;
import jxl.Cell;
import jxl.CellType;
import jxl.Workbook;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class StackoverflowQuestion51577491 {
private static final int URL_COLUMN = 2; // Column C
private static final int PRICE_COLUMN = 3; // Column D
public static void main(final String[] args) throws Exception {
// open worksheet with URLs
Workbook originalWorkbook = Workbook.getWorkbook(new File("O:/original.xls"));
// create editable copy
WritableWorkbook workbook = Workbook.createWorkbook(new File("O:/updated.xls"), originalWorkbook);
// close read-only workbook as it's not needed anymore
originalWorkbook.close();
// get first available sheet
WritableSheet sheet = workbook.getSheet(0);
// skip title row 0
int currentRow = 1;
Cell cell;
// iterate each cell from column C until we find an empty one
while (!(cell = sheet.getCell(URL_COLUMN, currentRow)).getType().equals(CellType.EMPTY)) {
// raed cell contents
String url = cell.getContents();
System.out.println("parsing URL: " + url);
// parse and get the price
String price = parseUrlWithJsoupAndGetProductPrice(url);
System.out.println("found price: " + price);
// create new cell with price
Label cellWithPrice = new Label(PRICE_COLUMN, currentRow, price);
sheet.addCell(cellWithPrice);
// go to next row
currentRow++;
}
// save and close file
workbook.write();
workbook.close();
}
private static String parseUrlWithJsoupAndGetProductPrice(String url) throws IOException {
// download page and parse it to Document
Document doc = Jsoup.connect(url).get();
// get the price from html
return doc.select("#priceblock_ourprice").text();
}
}
before:
after:

Read /Edit / Write jpg IPTC metadata in Java

I am using Apache Commons but it is not enough for me because it is so old technology. So ,i found iCafe and it seems better but I am having the error below. Any idea what i am doing wrong?
private static List<IPTCDataSet> createIPTCDataSet() {
List<IPTCDataSet> iptcs = new ArrayList<IPTCDataSet>();
iptcs.add(new IPTCDataSet(IPTCApplicationTag.COPYRIGHT_NOTICE, "Copyright 2014-2016, yuwen_66#yahoo.com"));
iptcs.add(new IPTCDataSet(IPTCApplicationTag.CATEGORY, "ICAFE"));
iptcs.add(new IPTCDataSet(IPTCApplicationTag.KEY_WORDS, "Welcome 'icafe' user!"));
return iptcs;
}
private static IPTC createIPTC() {
IPTC iptc = new IPTC();
iptc.addDataSets(createIPTCDataSet());
return iptc;
}
public static void main(String[] args) throws IOException {
FileInputStream fin = new FileInputStream("C:/Users/rajab/Desktop/test/ibo.jpeg");
FileOutputStream fout = new FileOutputStream("C:/Users/rajab/Desktop/test/ibo/ibo.jpeg");
List<Metadata> metaList = new ArrayList<Metadata>();
//metaList.add(populateExif(TiffExif.class));
metaList.add(createIPTC());
metaList.add(new Comments(Arrays.asList("Comment1", "Comment2")));
Metadata.insertMetadata(metaList, fin, fout);
}
}
and my EXCEPTION
run:
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at com.icafe4j.image.meta.Metadata.(Unknown Source)
at vectorcleaner.Metadata1.populateExif(Metadata1.java:41)
at vectorcleaner.Metadata1.main(Metadata1.java:127)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
C:\Users\rajab\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53:
Java returned: 1
BUILD FAILED (total time: 0 seconds)

Not sure what exactly you want but here is what ICAFE can do with IPTC metadata:
Read IPTC from image.
Insert IPTC into image or update IPTC of existing image.
Delete IPTC from image.
For reading IPTC, here is an example:
import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.util.Iterator;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.MetadataEntry;
import com.icafe4j.image.meta.MetadataType;
import com.icafe4j.image.meta.iptc.IPTC;
public class ExtractIPTC {
public static void main(String[] args) throws IOException {
Map<MetadataType, Metadata> metadataMap = Metadata.readMetadata(args[0]);
IPTC iptc = (IPTC)metadataMap.get(MetadataType.IPTC);
if(iptc != null) {
Iterator<MetadataEntry> iterator = iptc.iterator();
while(iterator.hasNext()) {
MetadataEntry item = iterator.next();
printMetadata(item, "", " ");
}
}
}
private void printMetadata(MetadataEntry entry, String indent, String increment) {
logger.info(indent + entry.getKey() (StringUtils.isNullOrEmpty(entry.getValue())? "" : ": " + entry.getValue()));
if(entry.isMetadataEntryGroup()) {
indent += increment;
Collection<MetadataEntry> entries = entry.getMetadataEntries();
for(MetadataEntry e : entries) {
printMetadata(e, indent, increment);
}
}
}
}
For insert/update IPTC, here is an example:
import java.io.IOException;
import java.util.List;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.MetadataEntry;
import com.icafe4j.image.meta.MetadataType;
import com.icafe4j.image.meta.iptc.IPTCDataSet;
import com.icafe4j.image.meta.iptc.IPTCApplicationTag;
public class InsertIPTC {
public static void main(String[] args) throws IOException {
FileInputStream fin = new FileInputStream("C:/Users/rajab/Desktop/test/ibo.jpeg");
FileOutputStream fout = new FileOutputStream("C:/Users/rajab/Desktop/test/ibo/ibo.jpeg");
Metadata.insertIPTC(fin, fout, createIPTCDataSet(), true);
}
private static List<IPTCDataSet> createIPTCDataSet() {
List<IPTCDataSet> iptcs = new ArrayList<IPTCDataSet>();
iptcs.add(new IPTCDataSet(IPTCApplicationTag.COPYRIGHT_NOTICE, "Copyright 2014-2016, yuwen_66#yahoo.com"));
iptcs.add(new IPTCDataSet(IPTCApplicationTag.OBJECT_NAME, "ICAFE"));
iptcs.add(new IPTCDataSet(IPTCApplicationTag.KEY_WORDS, "Welcome 'icafe' user!"));
return iptcs;
}
}
The above example uses Metadata.insertIPTC instead of Metadata.insertMetadata because we need one more boolean parameter. If set to true, it will keep the existing IPTC data and only update the those we want. Some of the IPTC entries allow multiple values. In that case, we only append/add the new ones. For other unique entries, they will be replaced by the new ones.
Looks like you want to add key words and title. In your question, you already showed code to insert key words and in order to insert title, use OBJECT_NAME which can be found in the example above.
Note: you can add multiple key words as well. Some softwares can only handle one key words record. In that case, you can put all the key words in one record separated by semi-colon instead of insert multiple entries.

Exception while calling Parser method outside main class

In my application I have a method which I cant execute without main method. It only runs inside the main method. When I call that method inside my servlet class. It show an exception
My class with Main Method
package com.books.servlet;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.util.HashSet;
import java.util.Set;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class ParserTest {
// download
public void download(String url, File destination) throws IOException, Exception {
URL website = new URL(url);
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream(destination);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();
rbc.close();
}
public static Set<String> nounPhrases = new HashSet<>();
private static String line = "The Moon is a barren, rocky world ";
public void getNounPhrases(Parse p) {
if (p.getType().equals("NN") || p.getType().equals("NNS") || p.getType().equals("NNP")
|| p.getType().equals("NNPS")) {
nounPhrases.add(p.getCoveredText());
}
for (Parse child : p.getChildren()) {
getNounPhrases(child);
}
}
public void parserAction() throws Exception {
// InputStream is = new FileInputStream("en-parser-chunking.bin");
File modelFile = new File("en-parser-chunking.bin");
if (!modelFile.exists()) {
System.out.println("Downloading model.");
download("https://drive.google.com/uc?export=download&id=0B4uQtYVPbChrY2ZIWmpRQ1FSVVk", modelFile);
}
ParserModel model = new ParserModel(modelFile);
Parser parser = ParserFactory.create(model);
Parse topParses[] = ParserTool.parseLine(line, parser, 1);
for (Parse p : topParses) {
// p.show();
getNounPhrases(p);
}
}
public static void main(String[] args) throws Exception {
new ParserTest().parserAction();
System.out.println("List of Noun Parse : " + nounPhrases);
}
}
It gives me below output
List of Noun Parse : [barren,, world, Moon]
Then I commented the main method and. Called the ParserAction() method in my servlet class
if (name.equals("bkDescription")) {
bookDes = value;
try {
new ParserTest().parserAction();
System.out.println("Nouns Are"+ParserTest.nounPhrases);
} catch (Exception e) {
}
It gives me the below exceptions
And below error in my Browser
Why is this happening ? I can run this with main method. But when I remove main method and called in my servlet. it gives an exception. Is there any way to fix this issue ?
NOTE - I have read below instructions in OpenNLP documentation , but I have no clear idea about it. Please help me to fix his issue.
Unlike the other components to instantiate the Parser a factory method
should be used instead of creating the Parser via the new operator.
The parser model is either trained for the chunking parser or the tree
insert parser the parser implementation must be chosen correctly. The
factory method will read a type parameter from the model and create an
instance of the corresponding parser implementation.

Either create an object of ParserTest class or remove new keyword in this line new ParserTest().parserAction();

Validate each filed against multiple constraints using CSV Parser

I am working on a requirement where I need to parse CSV record fields against multiple validations. I am using supercsv which has support for field level processors to validate data.
My requirement is to validate each record/row field against multiple validations and save them to the database with success/failure status. for failure records I have to display all the failed validations using some codes.
Super CSV is working file but it is checking only first validation for a filed and if it is failed , ignoring second validation for the same field.Please look at below code and help me on this.
package com.demo.supercsv;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.List;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.StrMinMax;
import org.supercsv.cellprocessor.constraint.StrRegEx;
import org.supercsv.cellprocessor.constraint.UniqueHashCode;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;
public class ParserDemo {
public static void main(String[] args) throws IOException {
List<Employee> emps = readCSVToBean();
System.out.println(emps);
System.out.println("******");
writeCSVData(emps);
}
private static void writeCSVData(List<Employee> emps) throws IOException {
ICsvBeanWriter beanWriter = null;
StringWriter writer = new StringWriter();
try{
beanWriter = new CsvBeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
final String[] header = new String[]{"id","name","role","salary"};
final CellProcessor[] processors = getProcessors();
// write the header
beanWriter.writeHeader(header);
//write the beans data
for(Employee emp : emps){
beanWriter.write(emp, header, processors);
}
}finally{
if( beanWriter != null ) {
beanWriter.close();
}
}
System.out.println("CSV Data\n"+writer.toString());
}
private static List<Employee> readCSVToBean() throws IOException {
ICsvBeanReader beanReader = null;
List<Employee> emps = new ArrayList<Employee>();
try {
beanReader = new CsvBeanReader(new FileReader("src/employees.csv"),
CsvPreference.STANDARD_PREFERENCE);
// the name mapping provide the basis for bean setters
final String[] nameMapping = new String[]{"id","name","role","salary"};
//just read the header, so that it don't get mapped to Employee object
final String[] header = beanReader.getHeader(true);
final CellProcessor[] processors = getProcessors();
Employee emp;
while ((emp = beanReader.read(Employee.class, nameMapping,
processors)) != null) {
emps.add(emp);
if (!CaptureExceptions.SUPPRESSED_EXCEPTIONS.isEmpty()) {
System.out.println("Suppressed exceptions for row "
+ beanReader.getRowNumber() + ":");
for (SuperCsvCellProcessorException e :
CaptureExceptions.SUPPRESSED_EXCEPTIONS) {
System.out.println(e);
}
// for processing next row clearing validation list
CaptureExceptions.SUPPRESSED_EXCEPTIONS.clear();
}
}
} finally {
if (beanReader != null) {
beanReader.close();
}
}
return emps;
}
private static CellProcessor[] getProcessors() {
final CellProcessor[] processors = new CellProcessor[] {
new CaptureExceptions(new NotNull(new StrRegEx("\\d+",new StrMinMax(0, 2)))),//id must be in digits and should not be more than two charecters
new CaptureExceptions(new Optional()),
new CaptureExceptions(new Optional()),
new CaptureExceptions(new NotNull()),
// Salary
};
return processors;
}
}
Exception Handler:
package com.demo.supercsv;
import java.util.ArrayList;
import java.util.List;
import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.util.CsvContext;
public class CaptureExceptions extends CellProcessorAdaptor {
public static List<SuperCsvCellProcessorException> SUPPRESSED_EXCEPTIONS =
new ArrayList<SuperCsvCellProcessorException>();
public CaptureExceptions(CellProcessor next) {
super(next);
}
public Object execute(Object value, CsvContext context) {
try {
return next.execute(value, context);
} catch (SuperCsvCellProcessorException e) {
// save the exception
SUPPRESSED_EXCEPTIONS.add(e);
if(value!=null)
return value.toString();
else
return "";
}
}
}
sample csv file
ID,Name,Role,Salary
a123,kiran,CEO,"5000USD"
2,Kumar,Manager,2000USD
3,David,developer,1000USD
when I run my program supercsv exception handler displaying this message for the ID value in the first row
Suppressed exceptions for row 2:
org.supercsv.exception.SuperCsvConstraintViolationException: 'a123' does not match the regular expression '\d+'
processor=org.supercsv.cellprocessor.constraint.StrRegEx
context={lineNo=2, rowNo=2, columnNo=1, rowSource=[a123, kiran, CEO, 5000USD]}
[com.demo.supercsv.Employee#23bf011e, com.demo.supercsv.Employee#50e26ae7, com.demo.supercsv.Employee#40d88d2d]
for field Id length should not be null and more than two and it should be neumeric...I have defined field processor like this.
new CaptureExceptions(new NotNull(new StrRegEx("\\d+",new StrMinMax(0, 2))))
but super csv ignoring second validation (maxlenght 2) if given input is not neumeric...if my input is 100 then its validating max lenght..but how to get two validations for wrong input.plese help me on this

SuperCSV cell processors will work in sequence. So, if it passes the previous constraint validation then it will check next one.
To achieve your goal, you need to write a custom CellProcessor, which will check whether the input is a number (digit) and length is between 0 to 2.
So, that both of those checks are done in a single step.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to manipulate content of a comment with Apache POI - java

I would like to find a comment in Docx document (somehow, by author or ID...), then create new content. I was able to create a comment, with the help of this answer, but had no luck with manipulation.

Related

How to add an altChunk element to a XWPFDocument using Apache POI

Importing URLs for JSOUP to Scrape via Spreadsheet

Read /Edit / Write jpg IPTC metadata in Java

Exception while calling Parser method outside main class

Validate each filed against multiple constraints using CSV Parser

Categories

Resources