Apache commons CSV: quoted input doesn't work - java

import java.io.IOException;
import java.io.StringReader;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
I try to parse a simple csv file with Apache CSV parser. It works fine as long as I don't use quotes. When I try to add a quote to the input
"a";42
it gives me the error:
invalid char between encapsulated token and delimiter
Here is a simple, complete code:
public class Test {
public static void main(String[] args) throws IOException {
String DATA = "\"a\";12";
CSVParser csvParser =
CSVFormat.EXCEL
.withIgnoreEmptyLines()
.withIgnoreHeaderCase()
.withRecordSeparator('\n').withQuote('"')
.withEscape('\\').withRecordSeparator(';').withTrim()
.parse(new StringReader(DATA));
}
}
I simply can't find out what I've missed in the code.

The problem was so trivial I missed it.
I used withRecordSeparator instead of withDelimiter to set the field separator.
This works as I expected:
public class Test {
public static void main(String[] args) throws IOException {
String DATA = "\"a\";12";
CSVParser csvParser =
CSVFormat.EXCEL
.withIgnoreEmptyLines()
.withIgnoreHeaderCase()
.withRecordSeparator('\n').withQuote('"')
.withEscape('\\').withDelimeter(';').withTrim()
.parse(new StringReader(DATA));
}
}

Related

Retrieve element with Jdom / XPath and "

I'm working on an application that has these kinds of xml file (document.xml):
<root>
<subRoot myAttribute="CN=Ok">
Ok
</subRoot>
<subRoot myAttribute="CN="Problem"">
Problem
</subRoot>
</root>
I need to retrieve Element's using XPath expressions. I'm not able to retrieve the second element, which I need to select using the value of myAttribute. This is due to the " character ...
Here is a test class. The second assertion is throwing an AssertionError because the object is null.
import static org.junit.Assert.assertNotNull;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import org.apache.commons.io.IOUtils;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
import org.junit.Test;
public class XPathTest {
#Test
public void quotesXpath() throws JDOMException, IOException {
Document document = getDocumentFromContent(getClasspathResource("document.xml"));
String okXPath = "/root/subRoot[#myAttribute=\"CN=Ok\"]";
assertNotNull(getElement(document, okXPath)); // Ok ...
String problemXPath = "/root/subRoot[#myAttribute=\"CN="Problem"\"]";
assertNotNull(getElement(document, problemXPath)); // Why null ?
}
public String getClasspathResource(String filePath) throws IOException {
try (InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(filePath)) {
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}
public static Document getDocumentFromContent(String content) throws IOException, JDOMException {
try (InputStream is = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8))) {
SAXBuilder builder = new SAXBuilder();
return builder.build(is);
}
}
public Element getElement(Document document, String xpathExpression) throws JDOMException {
XPath xpath = XPath.newInstance(xpathExpression);
return (Element) xpath.selectSingleNode(document);
}
}
The application is using Jdom 1.1.3
<dependency>
<groupId>org.jdom</groupId>
<artifactId>jdom</artifactId>
<version>1.1.3</version>
</dependency>
How can I change my xpath expression so that the second element is returned ? Is this possible with this version of Jdom ?
Thank you for your help !
Try this expression:
String problemXPath = "/root/subRoot[#myAttribute='CN=\"Problem\"']";
Firstly, when the document is parsed, the entity " is replaced with the " character, so that should be used directly in the XPath expression.
Secondly, in XPath you can use either single or double quotes for string constants, which is convenient if you have strings that contain quotes.

Exception while calling Parser method outside main class

In my application I have a method which I cant execute without main method. It only runs inside the main method. When I call that method inside my servlet class. It show an exception
My class with Main Method
package com.books.servlet;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.util.HashSet;
import java.util.Set;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class ParserTest {
// download
public void download(String url, File destination) throws IOException, Exception {
URL website = new URL(url);
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream(destination);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();
rbc.close();
}
public static Set<String> nounPhrases = new HashSet<>();
private static String line = "The Moon is a barren, rocky world ";
public void getNounPhrases(Parse p) {
if (p.getType().equals("NN") || p.getType().equals("NNS") || p.getType().equals("NNP")
|| p.getType().equals("NNPS")) {
nounPhrases.add(p.getCoveredText());
}
for (Parse child : p.getChildren()) {
getNounPhrases(child);
}
}
public void parserAction() throws Exception {
// InputStream is = new FileInputStream("en-parser-chunking.bin");
File modelFile = new File("en-parser-chunking.bin");
if (!modelFile.exists()) {
System.out.println("Downloading model.");
download("https://drive.google.com/uc?export=download&id=0B4uQtYVPbChrY2ZIWmpRQ1FSVVk", modelFile);
}
ParserModel model = new ParserModel(modelFile);
Parser parser = ParserFactory.create(model);
Parse topParses[] = ParserTool.parseLine(line, parser, 1);
for (Parse p : topParses) {
// p.show();
getNounPhrases(p);
}
}
public static void main(String[] args) throws Exception {
new ParserTest().parserAction();
System.out.println("List of Noun Parse : " + nounPhrases);
}
}
It gives me below output
List of Noun Parse : [barren,, world, Moon]
Then I commented the main method and. Called the ParserAction() method in my servlet class
if (name.equals("bkDescription")) {
bookDes = value;
try {
new ParserTest().parserAction();
System.out.println("Nouns Are"+ParserTest.nounPhrases);
} catch (Exception e) {
}
It gives me the below exceptions
And below error in my Browser
Why is this happening ? I can run this with main method. But when I remove main method and called in my servlet. it gives an exception. Is there any way to fix this issue ?
NOTE - I have read below instructions in OpenNLP documentation , but I have no clear idea about it. Please help me to fix his issue.
Unlike the other components to instantiate the Parser a factory method
should be used instead of creating the Parser via the new operator.
The parser model is either trained for the chunking parser or the tree
insert parser the parser implementation must be chosen correctly. The
factory method will read a type parameter from the model and create an
instance of the corresponding parser implementation.
Either create an object of ParserTest class or remove new keyword in this line new ParserTest().parserAction();

Text Segmentation using Gate

I am trying to write my own program using Java in order to segment set of text files into sentences. I have make a search on the available NLP tools and I found that GATE but i couldn't use it to just segment using the pipeline.
Any ideas how to limit the functionality of the pipeline
Any piece of codes that can help me to write my program
Adapted from a different answer:
import gate.*;
import gate.creole.SerialAnalyserController;
import java.io.File;
import java.util.*;
public class Segmenter {
public static void main(String[] args) throws Exception {
Gate.setGateHome(new File("C:\\Program Files\\GATE_Developer_8.0"));
Gate.init();
regiterGatePlugin("ANNIE");
SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController");
pipeline.add((ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser"));
pipeline.add((ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter"));
Corpus corpus = Factory.newCorpus("SegmenterCorpus");
Document document = Factory.newDocument("Text to be segmented.");
corpus.add(document);
pipeline.setCorpus(corpus);
pipeline.execute();
AnnotationSet defaultAS = document.getAnnotations();
AnnotationSet sentences = defaultAS.get("Sentence");
for (Annotation sentence : sentences) {
System.err.println(Utils.stringFor(document, sentence));
}
//Clean up
Factory.deleteResource(document);
Factory.deleteResource(corpus);
for (ProcessingResource pr : pipeline.getPRs()) {
Factory.deleteResource(pr);
}
Factory.deleteResource(pipeline);
}
public static void regiterGatePlugin(String name) throws Exception {
Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), name).toURI().toURL());
}
}

How to print an String variable as italicized text

I have the following statements inside my class:
String myName = "Joe";
System.out.println("My name is " +myName);
I need the value on the variable myName to be printed as italic text.
Try:
System.out.println("\033[3mText goes here\033[0m");
Which will output italic text if your console supports it. You can use [1m for bold, etc. Play around with the different values of [nm.
Here is an example of how to do that:
import java.io.FileWriter;
import java.io.PrintWriter;
import java.io.IOException;
public class Foo
{
public static void main(String[] args) throws IOException
{
PrintWriter out = new PrintWriter(new FileWriter("myFile.html"));
out.println("<u><i>my output</i></u>");
out.flush();
out.close();
}
}

Find the second duplicate word

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class Example {
public static void main(String[] args) throws IOException
{
File fis=new File("D:/Testcode/Test.txt");
BufferedReader br;
String input;
String var = null;
if(fis.isAbsolute())
{
br=new BufferedReader(new FileReader(fis.getAbsolutePath()));
while ((input=br.readLine())!=null) {
var=input;
}
}
//String var="Duminy to Warner, OUT, Duminy gets a wicket again. He has been breaking...
if(var!=null)
{
String splitstr[]=var.split(",");
if(splitstr[0].contains("to"))
{
String ss=splitstr[0];
String a[]=ss.split("\\s+");
int value=splitstr[0].indexOf("to");
System.out.println("Subject:"+splitstr[0].substring(0,value));
System.out.println("Object:"+splitstr[0].substring(value+2));
System.out.println("Event:"+splitstr[1]);
int count=var.indexOf(splitstr[2]);
System.out.println("Narrated Information:"+var.substring(count));
}
}
}
}
The above program shown the following output:
Subject:Duminy
Object: Warner
Event: OUT
Narrated Information: Duminy gets a wicket again. He has been breaking....
my question is, the text may contain, For example: "Dumto to Warner, OUT, Duminy gets a wicket again. He has been breaking..." means, the above program wouldn't show output like above.. how to identity the text after the space for checking the condition
Instead of:
if(splitstr[0].contains("to")
Change it to:
if(splitstr[0].contains(" to ")
It should then work fine IMO.

Categories