OpenNLP train Thai language

OpenNLP train Thai language - java

I am experimenting with OpenNlp 1.7.2 and maxent-3.0.0.jar to train for thai language , below is the code that reads thai train data and creates the bin model.
public class TrainPerson {
public static void main(String[] args) throws IOException {
String trainFile = "/Documents/workspace/ThaiOpenNLP/bin/thaiPerson.train";
String modelFile = "/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
writePersonModel(trainFile, modelFile);
}
private static void writePersonModel(String trainFile, String modelFile)
throws FileNotFoundException, IOException {
Charset charset = Charset.forName("UTF-8");
InputStreamFactory fileInputStream = new MarkableFileInputStreamFactory(new File(trainFile));
ObjectStream<String> lineStream = new PlainTextByLineStream(fileInputStream, charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
try {
model = NameFinderME.train("th", "person", sampleStream , TrainingParameters.defaultParams(), new TokenNameFinderFactory());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null) {
modelOut.close();
}
}
}}
Thai data looks like as attached in the file trainingData
I am using the output model to detect person name as shown in the below programme. It fails to identify the name.
public class ThaiPersonNameFinder {
static String modelFile = "/Users/avinashpaula/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
public static void main(String[] args) {
try {
InputStream modelIn = new FileInputStream(new File(modelFile));
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
String sentence[] = new String[]{
"จอห์น",
"30",
"ปี",
"จะ",
"เข้าร่วม",
"ก",
"เริ่มต้น",
"ขึ้น",
"บน",
"มกราคม",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
for (int i = 0; i < nameSpans.length; i++) {
System.out.println(nameSpans[i]);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
What am i doing wrong.

Related

How to create multiple csv file from large csv file in Java

I am kind of stuck, I usually know how to create single csv, it looks like I am missing or disconnecting from this code. I am not able to create multiple csv file from Pojo class. The file usually is more than 15mb, but I need to split into multiple csv file like 5mb each. Any suggestion would be great helped. Here is sample code that I am trying but failing.
public static void main(String[] args) throws IOException {
getOrderList();
}
public static void getOrderList() throws IOException {
List<Orders> ordersList = new ArrayList<>();
Orders orders = new Orders();
orders.setOrderNumber("1");
orders.setProductName("mickey");
Orders orders1 = new Orders();
orders1.setOrderNumber("2");
orders1.setProductName("mini");
ordersList.add(orders);
ordersList.add(orders1);
Object [] FILE_HEADER = {"orderNumber","productName"};
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int rowCount = 0;
int fileCount = 1;
try {
BufferedWriter fileWriter = new BufferedWriter(new OutputStreamWriter(byteArrayOutputStream));
CSVPrinter csvFilePrinter = new CSVPrinter(fileWriter,
CSVFormat.DEFAULT.withRecordSeparator("\n"));
csvFilePrinter.printRecord(FILE_HEADER);
for (Orders patient : ordersList) {
rowCount++;
patient.getOrderNumber();
patient.getProductName();
if (rowCount <= 1) {
csvFilePrinter.printRecord(patient);
csvFilePrinter.flush();
}
if (rowCount > 1 ) {
csvFilePrinter.printRecord(patient);
fileCount++;
csvFilePrinter.flush();
}
}
} catch (IOException e) {
throw new RuntimeException("Cannot generate csv file", e);
}
byte[] csvOutput = byteArrayOutputStream.toByteArray();
OutputStream outputStream = null;
outputStream = new FileOutputStream("demos" + fileCount + ".csv");
byteArrayOutputStream = new ByteArrayOutputStream();
byteArrayOutputStream.write(csvOutput);
byteArrayOutputStream.writeTo(outputStream);
}
public static class Orders {
private String orderNumber;
private String productName;
public String getOrderNumber() {
return orderNumber;
}
public void setOrderNumber(String orderNumber) {
this.orderNumber = orderNumber;
}
public String getProductName() {
return productName;
}
public void setProductName(String productName) {
this.productName = productName;
}
}

Lucene IndexWriter.commit() doesn't finished in ubuntu

Here is initialize code
public class Main {
public void index(String input_path, String index_dir, String separator, String extension, String field, DataHandler handler) {
Index index = new Index(handler);
index.initWriter(index_dir, new StandardAnalyzer());
index.run(input_path, field, extension, separator);
}
public List<?> search(String input_path, String index_dir, String separator, String extension, String field, DataHandler handler) {
Search search = new Search(handler);
search.initSearcher(index_dir, new StandardAnalyzer());
return search.runUsingFiles(input_path, field, extension, separator);
}
#SuppressWarnings("unchecked")
public static void main(String[] args) {
String lang = "en-US";
String dType = "data";
String train = "res/input/" +lang+ "/" +dType +"/train/";
String test = "res/input/"+ lang+ "/" +dType+ "/test/";
String separator = "\\|";
String extension = "csv";
String index_dir = "res/index/" +lang+ "." +dType+ ".index";
String output_file = "res/result/" +lang+ "." +dType+ ".output.json";
String searched_field = "utterance";
Main main = new Main();
DataHandler handler = new DataHandler();
main.index(train, index_dir, separator, extension, searched_field, handler);
//List<JSONObject> result = (List<JSONObject>) main.search(test, index_dir, separator, extension, searched_field, handler);
//handler.writeOutputJson(result, output_file);
}
}
It is my code
public class Index {
private IndexWriter writer;
private DataHandler handler;
public Index(DataHandler handler) {
this.handler = handler;
}
public Index() {
this(new DataHandler());
}
public void initWriter(String index_path, Directory store, Analyzer analyzer) {
IndexWriterConfig config = new IndexWriterConfig(analyzer);
try {
this.writer = new IndexWriter(store, config);
} catch (IOException e) {
e.printStackTrace();
}
}
public void initWriter(String index_path, Analyzer analyzer) {
try {
initWriter(index_path, FSDirectory.open(Paths.get(index_path)), analyzer);
} catch (IOException e) {
e.printStackTrace();
}
}
public void initWriter(String index_path) {
List<String> stopWords = Arrays.asList();
CharArraySet stopSet = new CharArraySet(stopWords, false);
initWriter(index_path, new StandardAnalyzer(stopSet));
}
#SuppressWarnings("unchecked")
public void indexDocs(List<?> datas, String field) throws IOException {
FieldType fieldType = new FieldType();
FieldType fieldType2 = new FieldType();
fieldType.setStored(true);
fieldType.setTokenized(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
fieldType2.setStored(true);
fieldType2.setTokenized(false);
fieldType2.setIndexOptions(IndexOptions.DOCS);
for(int i = 0 ; i < datas.size() ; i++) {
Map<String,String> temp = (Map<String,String>) datas.get(i);
Document doc = new Document();
for(String key : temp.keySet()) {
if(key.equals(field))
continue;
doc.add(new Field(key, temp.get(key), fieldType2));
}
doc.add(new Field(field, temp.get(field), fieldType));
this.writer.addDocument(doc);
}
}
public void run(String path, String field, String extension, String separator) {
List<File> files = this.handler.getInputFiles(path, extension);
List<?> data = this.handler.readDocs(files, separator);
try {
System.out.println("start index");
indexDocs(data, field);
this.writer.commit();
this.writer.close();
System.out.println("done");
} catch (IOException e) {
e.printStackTrace();
}
}
public void run(String path) {
run(path, "search_field", "csv", "\t");
}
I made simple search module using Java and Lucene.
This module consisted of two phase, index and search.
In index phase, It read csv files and convert to Document each row and add to IndexWriter object using IndexWriter.addDocument() method.
Finaly, It call IndexWriter.commit() method.
It is working well in my local PC (windows)
but in Ubuntu PC, doesn't finished IndexWriter.commit() method.
Of course IndexWriter.flush() method doesn't work.
What is the problem?

Sentiment Analysis with OpenNLP on a text file

I have 100 sentences of test data. I am trying to run sentiment analysis on them but no matter what input String I am using, I am only getting a positive estimation of the input string. Each sentence gets a return value of 1.0. Any idea why this might be happening? Even if I use negative example inputs from the .txt file, the result is a positive value.
public class StartSentiment
{
public static DoccatModel model = null;
public static String[] analyzedTexts = {"Good win"};
public static void main(String[] args) throws IOException {
// begin of sentiment analysis
trainModel();
for(int i=0; i<analyzedTexts.length;i++){
classifyNewText(analyzedTexts[i]);}
}
private static String readFile(String pathname) throws IOException {
File file = new File(pathname);
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner = new Scanner(file);
String lineSeparator = System.getProperty("line.separator");
try {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + lineSeparator);
}
return fileContents.toString();
} finally {
scanner.close();
}
}
public static void trainModel() {
MarkableFileInputStreamFactory dataIn = null;
try {
dataIn = new MarkableFileInputStreamFactory(
new File("src\\sentiment\\Results.txt"));
ObjectStream<String> lineStream = null;
lineStream = new PlainTextByLineStream(dataIn, StandardCharsets.UTF_8);
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters tp = new TrainingParameters();
tp.put(TrainingParameters.CUTOFF_PARAM, "1");
tp.put(TrainingParameters.ITERATIONS_PARAM, "100");
DoccatFactory df = new DoccatFactory();
model = DocumentCategorizerME.train("en", sampleStream, tp, df);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (dataIn != null) {
try {
} catch (Exception e2) {
e2.printStackTrace();
}
}
}
}
public static void classifyNewText(String text) throws IOException{
DocumentCategorizerME myCategorizer = new DocumentCategorizerME(model);
double[] outcomes = myCategorizer.categorize(text.split(" ") );
String category = myCategorizer.getBestCategory(outcomes);
if (category.equalsIgnoreCase("1")){
System.out.print("The text is positive");
} else {
System.out.print("The text is negative");
}
}

JACOB:Presentation.Export : PowerPoint can't save ^0 to ^1

public boolean PptExport2Png(String filePath,String exportPath){
Boolean flag = false;
ActiveXComponent component = new ActiveXComponent("PowerPoint.Application");
try{
Dispatch presentations = component.getProperty("Presentations").toDispatch();
Dispatch presentation = Dispatch.call(presentations, "Open", new Variant(filePath),
new Variant(-1), new Variant(-1), new Variant(0))
.toDispatch();
Dispatch.call(presentation,"Export",new Variant(exportPath),new Variant(720),new Variant(540));
}catch (Exception e){
System.out.println("|||" + e.toString());
}finally {
}
return false;
}
public static void main(String[] strs)throws Exception{
String filePath="D://ppttest.ppt";
String pngPath="D://folder22";
JacobPptUtils jac = new JacobPptUtils(filePath,true);
jac.PptExport2Png(filePath,pngPath);
}
I found that when i was using JACOB to export the ppt ,now,i have no any idea.please,give me some advice to deal it.
Here are the way where i has try:
1. to modify the file path expression

public static void main(String args[]) throws Exception{
source = "D:/test.ppt" ;
dest = "D:/xxx.pdf" ;
File file = new File(source);
if(!file.exists()){
throw new Exception("error");
}
ppt2Pdf(source,dest);
}
}

StackOverflowError while performing external sort

I am trying to do external merge sort. Method: opening all the files in the folder 'output' and getting 1st line and sorting it, and writing it in the 'final' file and then getting the 2nd line of that file and repeating. I get an StackOverflowError. Here my file size is greater then memory.
public class mergefile6 {
public static ArrayList<String> al = new ArrayList<String>();
static HashMap hm = new HashMap();
public static String line;
public static String[][] filepoint = new String[100][2];
public static int fileline=1;
public static int i=0;
public static void main(String[] args) throws Exception{
fileread();
}
public static void fileread() throws Exception{
FileReader fileReader = null;
BufferedReader bufferedReader = null;
try {
File folder = new File("./output/");
if (folder.isDirectory()) {
for (File file : folder.listFiles()) {
fileReader = new FileReader(file);
bufferedReader = new BufferedReader(fileReader);
int lineCount = 0;
while ((line = bufferedReader.readLine())!=null) {
lineCount++;
if (1 == lineCount) {
hm.put(line,file);
al.add(line);
filepoint[i][0]=file.toString();
filepoint[i][1]=Integer.toString(fileline);
++i;
}
}
}
}
if (null != fileReader){
try {
fileReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (null != bufferedReader){
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Sorting(al);
test(al);
} catch (Exception e) {
} finally {
}
}
public static void Sorting(ArrayList<String> al)throws Exception{
int length = al.size();
ArrayList<String> tmp = new ArrayList<String>(al);
mergeSort(al, tmp, 0, al.size() - 1);
}
private static void mergeSort(ArrayList<String> al, ArrayList<String> tmp, int left, int right){
//sort code
}
public static void test(ArrayList<String> al) throws Exception{
BufferedWriter bw = null;
FileWriter fw = null;
fw = new FileWriter("final",true);
bw = new BufferedWriter(fw);
bw.write(al.get(0)+" \n");
//bw.flush();
bw.close();
fw.close();
String filename = hm.get(al.get(0)).toString();
hm.remove(al.get(0));
al.remove(0);
fileforward(filename,al);
}
public static void fileforward(String filename,ArrayList<String> al) throws Exception{
long list;
FileReader fr = null;
BufferedReader br = null;
fr = new FileReader(filename);
br = new BufferedReader(fr);
for(int j=0;j<i;++j){
if(filepoint[j][0] == filename){
fileline = Integer.parseInt(filepoint[j][1]);
list = br.skip(99*fileline);
if((line = br.readLine())!=null){
hm.put(line,filename);
al.add(line);
++fileline;
filepoint[j][1]=Integer.toString(fileline);
br.close(); fr.close();
}else{}
}
}
if(al.size()==3){
Sorting(al);
test(al); }
}
}
What may be causing this error to come?

It might be an overflow caused by the mutual calls between fileforward() and test(). I don't know try debugging the ArrayList's size with logs or prints. If it's always equal to 3 that's the problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

OpenNLP train Thai language - java

Related

How to create multiple csv file from large csv file in Java

Lucene IndexWriter.commit() doesn't finished in ubuntu

Sentiment Analysis with OpenNLP on a text file

JACOB:Presentation.Export : PowerPoint can't save ^0 to ^1

StackOverflowError while performing external sort

Categories

Resources