Here is initialize code
public class Main {
public void index(String input_path, String index_dir, String separator, String extension, String field, DataHandler handler) {
Index index = new Index(handler);
index.initWriter(index_dir, new StandardAnalyzer());
index.run(input_path, field, extension, separator);
}
public List<?> search(String input_path, String index_dir, String separator, String extension, String field, DataHandler handler) {
Search search = new Search(handler);
search.initSearcher(index_dir, new StandardAnalyzer());
return search.runUsingFiles(input_path, field, extension, separator);
}
#SuppressWarnings("unchecked")
public static void main(String[] args) {
String lang = "en-US";
String dType = "data";
String train = "res/input/" +lang+ "/" +dType +"/train/";
String test = "res/input/"+ lang+ "/" +dType+ "/test/";
String separator = "\\|";
String extension = "csv";
String index_dir = "res/index/" +lang+ "." +dType+ ".index";
String output_file = "res/result/" +lang+ "." +dType+ ".output.json";
String searched_field = "utterance";
Main main = new Main();
DataHandler handler = new DataHandler();
main.index(train, index_dir, separator, extension, searched_field, handler);
//List<JSONObject> result = (List<JSONObject>) main.search(test, index_dir, separator, extension, searched_field, handler);
//handler.writeOutputJson(result, output_file);
}
}
It is my code
public class Index {
private IndexWriter writer;
private DataHandler handler;
public Index(DataHandler handler) {
this.handler = handler;
}
public Index() {
this(new DataHandler());
}
public void initWriter(String index_path, Directory store, Analyzer analyzer) {
IndexWriterConfig config = new IndexWriterConfig(analyzer);
try {
this.writer = new IndexWriter(store, config);
} catch (IOException e) {
e.printStackTrace();
}
}
public void initWriter(String index_path, Analyzer analyzer) {
try {
initWriter(index_path, FSDirectory.open(Paths.get(index_path)), analyzer);
} catch (IOException e) {
e.printStackTrace();
}
}
public void initWriter(String index_path) {
List<String> stopWords = Arrays.asList();
CharArraySet stopSet = new CharArraySet(stopWords, false);
initWriter(index_path, new StandardAnalyzer(stopSet));
}
#SuppressWarnings("unchecked")
public void indexDocs(List<?> datas, String field) throws IOException {
FieldType fieldType = new FieldType();
FieldType fieldType2 = new FieldType();
fieldType.setStored(true);
fieldType.setTokenized(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
fieldType2.setStored(true);
fieldType2.setTokenized(false);
fieldType2.setIndexOptions(IndexOptions.DOCS);
for(int i = 0 ; i < datas.size() ; i++) {
Map<String,String> temp = (Map<String,String>) datas.get(i);
Document doc = new Document();
for(String key : temp.keySet()) {
if(key.equals(field))
continue;
doc.add(new Field(key, temp.get(key), fieldType2));
}
doc.add(new Field(field, temp.get(field), fieldType));
this.writer.addDocument(doc);
}
}
public void run(String path, String field, String extension, String separator) {
List<File> files = this.handler.getInputFiles(path, extension);
List<?> data = this.handler.readDocs(files, separator);
try {
System.out.println("start index");
indexDocs(data, field);
this.writer.commit();
this.writer.close();
System.out.println("done");
} catch (IOException e) {
e.printStackTrace();
}
}
public void run(String path) {
run(path, "search_field", "csv", "\t");
}
I made simple search module using Java and Lucene.
This module consisted of two phase, index and search.
In index phase, It read csv files and convert to Document each row and add to IndexWriter object using IndexWriter.addDocument() method.
Finaly, It call IndexWriter.commit() method.
It is working well in my local PC (windows)
but in Ubuntu PC, doesn't finished IndexWriter.commit() method.
Of course IndexWriter.flush() method doesn't work.
What is the problem?
Related
I don't know why the isFile() of the element f in the files list traversed by the for loop in the FileInit class in the code below is always false.false. But if delete the content related to FileItem, it will return to normal.It's kind of weird to me, and I didn't find a targeted answer. It might be some peculiarity of Java.A cloud disk project is used to learn Java, using Google's Gson library.
public class FileInit {
String path;
public String getPath() {
return path;
}
public void setPath(String path) {
this.path = path;
}
//Query file method, will return a list containing file information
public String queryFiles() throws NoSuchAlgorithmException, IOException {
//Query the path and build a File collection
File file = new File(path);
File[] files = file.listFiles();
//Build a List to store the converted FileItem
List<FileItem> fileItems = new ArrayList<>();
//Traversing files to make judgments
for (File f:files){
FileItem fileItem = new FileItem(f);
fileItem.printFile();
fileItems.add(fileItem);
}
Gson gson = new Gson();
return gson.toJson(fileItems);
}
//formatted output
public void printFiles(String files){
Gson gson = new Gson();
Type type = new TypeToken<List<FileItem>>(){}.getType();
List<FileItem> fileItems = gson.fromJson(files, type);
// Format output file list
int count = 0;
for (FileItem f :fileItems ) {
String name;
if (f.getFileType()==null){
name = f.getFileName() + "/";
}else {
name = f.getFileName();
}
System.out.printf("%-40s", name);
count++;
if (count % 3 == 0) {
System.out.println();
}
}
System.out.println();
}
//Change directory command
public void changeDic(String addPath){
File fileDic1 = new File(path+addPath);
File fileDic2 = new File(addPath);
if (addPath.equals("..")) {
File parent = new File(path).getParentFile();
if (parent != null) {
this.path = parent.getPath();
}else{
System.out.println("Parent directory does not exist");
}
}
else {
if (fileDic1.exists()){
this.path = path+addPath;
} else if (fileDic2.exists()) {
this.path = addPath;
}else{
System.out.println("Illegal input path");
}
}
}
}
public class FileItem {
private String fileName;
private String fileHash;
private String filePath;
private long fileLength;
private String fileType;
//Construction method of FileItem
/*
Build a construction method that only needs fileName and filePath, and judge whether to calculate the hash value, file size, and file type according to the folder or file
*/
public FileItem(String fileName,String filePath) {
this.fileName = fileName;
this.filePath = filePath;
File file =new File(this.filePath+"/"+this.fileName);
if (file.isFile()){
try {
//Get the file size through the file built-in method
this.fileLength = file.length();
// Define a regular expression for extracting the suffix of the file name
String regex = "\\.(\\w+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(fileName);
// If the match is successful
if (matcher.find()) {
// Get the suffix of the file name
this.fileType = matcher.group(1);
}else{
this.fileType = null;
}
//Calculate the Hash value of the file by calling the FileHash method
this.fileHash=FileHash(file.getPath());
System.out.printf(fileHash);
System.out.print("\n");
} catch (NoSuchAlgorithmException | IOException e) {
throw new RuntimeException(e);
}
}else{
this.fileName=fileName;
this.fileLength=0;
this.fileType = null;
this.fileHash=null;
}
}
//Build a constructor that only needs a json file
public FileItem(String json){
Gson gson = new Gson();
FileItem fileItem = gson.fromJson(json,FileItem.class);
this.fileName=fileItem.getFileName();
this.filePath=fileItem.getFilePath();
this.fileHash=fileItem.getFileHash();
this.fileLength=fileItem.getFileLength();
this.fileType=fileItem.getFileType();
}
//Realize mutual conversion between FileItem and File class
public File toFile(){
return new File(this.filePath+"/"+this.fileName);
}
public FileItem(File file) throws NoSuchAlgorithmException, IOException {
FileItem fileItem = new FileItem(file.getName(),file.getPath());
this.fileName=fileItem.getFileName();
this.filePath=fileItem.getFilePath();
this.fileHash=fileItem.getFileHash();
this.fileLength=fileItem.getFileLength();
this.fileType=fileItem.getFileType();
}
//Display FileItem related information
public void printFile(){
if (fileType==null){
System.out.println(fileName+"是文件夹");
}else {
System.out.println(fileName+"是"+fileType+"文件");
}
double fileLenOp=(double)fileLength;
//Different file sizes use different output methods
if(fileLength<=1024*1024){
System.out.printf("file size is:%.3fKB\n",fileLenOp/1024);
} else if ((fileLength<1024*1024*1024)) {
System.out.printf("file size is:%.3fMB\n",fileLenOp/1024/1024);
}else {
System.out.printf("file size is:%.3fGB\n",fileLenOp/1024/1024/1024);
}
System.out.println("file hash is:"+fileHash);
}
//Getter and Setter methods for FileItem
public String getFileName() {
return fileName;
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
public String getFileHash() {
return fileHash;
}
public void setFileHash(String fileHash) {
this.fileHash = fileHash;
}
public String getFilePath() {
return filePath;
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
public long getFileLength() {
return fileLength;
}
public void setFileLength(long fileLength) {
this.fileLength = fileLength;
}
public String getFileType() {
return fileType;
}
public void setFileType(String fileType) {
this.fileType = fileType;
}
}
I try to query the file information in a folder by using the methods related to the File class. And return them as a collection. Then realize the conversion of the File class and the FileItem class by traversing this collection to form a new FileItem class collection , so that it can be converted into Json format data for network transmission. However, all fileitems in the generated FileItem collection are judged as folders. And if you comment out the FileItem related content, the judgment of File will be normal, but if you don’t do this , the judgment of File will always remain false.
I have written a controller which is a default for MototuploadService(for Motor Upload), but I need to make one Factory Design so that
based on parentPkId, need to call HealUploadService, TempUploadService, PersonalUploadService etc which will have separate file processing stages.
controller is below.
#RequestMapping(value = "/csvUpload", method = RequestMethod.POST)
public List<String> csvUpload(#RequestParam String parentPkId, #RequestParam List<MultipartFile> files)
throws IOException, InterruptedException, ExecutionException, TimeoutException {
log.info("Entered method csvUpload() of DaoController.class");
List<String> response = new ArrayList<String>();
ExecutorService executor = Executors.newFixedThreadPool(10);
CompletionService<String> compService = new ExecutorCompletionService<String>(executor);
List< Future<String> > futureList = new ArrayList<Future<String>>();
for (MultipartFile f : files) {
compService.submit(new ProcessMutlipartFile(f ,parentPkId,uploadService));
futureList.add(compService.take());
}
for (Future<String> f : futureList) {
long timeout = 0;
System.out.println(f.get(timeout, TimeUnit.SECONDS));
response.add(f.get());
}
executor.shutdown();
return response;
}
Here is ProcessMutlipartFile class which extends the callable interface, with CompletionService's compService.submit() invoke this class, which in turn executes call() method, which will process a file.
public class ProcessMutlipartFile implements Callable<String>
{
private MultipartFile file;
private String temp;
private MotorUploadService motUploadService;
public ProcessMutlipartFile(MultipartFile file,String temp, MotorUploadService motUploadService )
{
this.file=file;
this.temp=temp;
this.motUploadService=motUploadService;
}
public String call() throws Exception
{
return motUploadService.csvUpload(temp, file);
}
}
Below is MotorUploadService class, where I'm processing uploaded CSV file, line by line and then calling validateCsvData() method to validate Data,
which returns ErrorObject having line number and Errors associated with it.
if csvErrorRecords is null, then error-free and proceed with saving to Db.
else save errorList to Db and return Upload Failure.
#Component
public class MotorUploadService {
#Value("${external.resource.folder}")
String resourceFolder;
public String csvUpload(String parentPkId, MultipartFile file) {
String OUT_PATH = resourceFolder;
try {
DateFormat df = new SimpleDateFormat("yyyyMMddhhmmss");
String filename = file.getOriginalFilename().split(".")[0] + df.format(new Date()) + file.getOriginalFilename().split(".")[1];
Path path = Paths.get(OUT_PATH,fileName)
Files.copy(file.getInputStream(), path, StandardCopyOption.REPLACE_EXISTING);
}
catch(IOException e){
e.printStackTrace();
return "Failed to Upload File...try Again";
}
List<TxnMpMotSlaveRaw> txnMpMotSlvRawlist = new ArrayList<TxnMpMotSlaveRaw>();
try {
BufferedReader br = new BufferedReader(new InputStreamReader(file.getInputStream()));
String line = "";
int header = 0;
int lineNum = 1;
TxnMpSlaveErrorNew txnMpSlaveErrorNew = new TxnMpSlaveErrorNew();
List<CSVErrorRecords> errList = new ArrayList<CSVErrorRecords>();
while ((line = br.readLine()) != null) {
// TO SKIP HEADER
if (header == 0) {
header++;
continue;
}
lineNum++;
header++;
// Use Comma As Separator
String[] csvDataSet = line.split(",");
CSVErrorRecords csvErrorRecords = validateCsvData(lineNum, csvDataSet);
System.out.println("Errors from csvErrorRecords is " + csvErrorRecords);
if (csvErrorRecords.equals(null) || csvErrorRecords.getRecordNo() == 0) {
//Function to Save to Db
} else {
// add to errList
continue;
}
}
if (txnMpSlaveErrorNew.getErrRecord().size() == 0) {
//save all
return "Successfully Uploaded " + file.getOriginalFilename();
}
else {
// save the error in db;
return "Failure as it contains Faulty Information" + file.getOriginalFilename();
}
} catch (IOException ex) {
ex.printStackTrace();
return "Failure Uploaded " + file.getOriginalFilename();
}
}
private TxnMpMotSlaveRaw saveCsvData(String[] csvDataSet, String parentPkId) {
/*
Mapping csvDataSet to PoJo
returning Mapped Pojo;
*/
}
private CSVErrorRecords validateCsvData(int lineNum, String[] csvDataSet) {
/*
Logic for Validation goes here
*/
}
}
How to make it as a factory design pattern from controller,
so that if
parentPkId='Motor' call MotorUploadService,
parentPkId='Heal' call HealUploadService
I'm not so aware of the Factory Design pattern, please help me out.
Thanks in advance.
If I understood the question, in essence you would create an interface, and then return a specific implementation based upon the desired type.
So
public interface UploadService {
void csvUpload(String temp, MultipartFile file) throws IOException;
}
The particular implementations
public class MotorUploadService implements UploadService
{
public void csvUpload(String temp, MultipartFile file) {
...
}
}
public class HealUploadService implements UploadService
{
public void csvUpload(String temp, MultipartFile file) {
...
}
}
Then a factory
public class UploadServiceFactory {
public UploadService getService(String type) {
if ("Motor".equals(type)) {
return new MotorUploadService();
}
else if ("Heal".equals(type)) {
return new HealUploadService();
}
}
}
The factory might cache the particular implementations. One can also use an abstract class rather than an interface if appropriate.
I think you currently have a class UploadService but that is really the MotorUploadService if I followed your code, so I would rename it to be specific.
Then in the controller, presumably having used injection for the UploadServiceFactory
...
for (MultipartFile f : files) {
UploadService uploadSrvc = uploadServiceFactory.getService(parentPkId);
compService.submit(new ProcessMutlipartFile(f ,parentPkId,uploadService));
futureList.add(compService.take());
}
So with some additional reading in your classes:
public class ProcessMutlipartFile implements Callable<String>
{
private MultipartFile file;
private String temp;
private UploadService uploadService;
// change to take the interface UploadService
public ProcessMutlipartFile(MultipartFile file,String temp, UploadService uploadService )
{
this.file=file;
this.temp=temp;
this.uploadService=uploadService;
}
public String call() throws Exception
{
return uploadService.csvUpload(temp, file);
}
}
I have 100 sentences of test data. I am trying to run sentiment analysis on them but no matter what input String I am using, I am only getting a positive estimation of the input string. Each sentence gets a return value of 1.0. Any idea why this might be happening? Even if I use negative example inputs from the .txt file, the result is a positive value.
public class StartSentiment
{
public static DoccatModel model = null;
public static String[] analyzedTexts = {"Good win"};
public static void main(String[] args) throws IOException {
// begin of sentiment analysis
trainModel();
for(int i=0; i<analyzedTexts.length;i++){
classifyNewText(analyzedTexts[i]);}
}
private static String readFile(String pathname) throws IOException {
File file = new File(pathname);
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner = new Scanner(file);
String lineSeparator = System.getProperty("line.separator");
try {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + lineSeparator);
}
return fileContents.toString();
} finally {
scanner.close();
}
}
public static void trainModel() {
MarkableFileInputStreamFactory dataIn = null;
try {
dataIn = new MarkableFileInputStreamFactory(
new File("src\\sentiment\\Results.txt"));
ObjectStream<String> lineStream = null;
lineStream = new PlainTextByLineStream(dataIn, StandardCharsets.UTF_8);
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters tp = new TrainingParameters();
tp.put(TrainingParameters.CUTOFF_PARAM, "1");
tp.put(TrainingParameters.ITERATIONS_PARAM, "100");
DoccatFactory df = new DoccatFactory();
model = DocumentCategorizerME.train("en", sampleStream, tp, df);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (dataIn != null) {
try {
} catch (Exception e2) {
e2.printStackTrace();
}
}
}
}
public static void classifyNewText(String text) throws IOException{
DocumentCategorizerME myCategorizer = new DocumentCategorizerME(model);
double[] outcomes = myCategorizer.categorize(text.split(" ") );
String category = myCategorizer.getBestCategory(outcomes);
if (category.equalsIgnoreCase("1")){
System.out.print("The text is positive");
} else {
System.out.print("The text is negative");
}
}
I am experimenting with OpenNlp 1.7.2 and maxent-3.0.0.jar to train for thai language , below is the code that reads thai train data and creates the bin model.
public class TrainPerson {
public static void main(String[] args) throws IOException {
String trainFile = "/Documents/workspace/ThaiOpenNLP/bin/thaiPerson.train";
String modelFile = "/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
writePersonModel(trainFile, modelFile);
}
private static void writePersonModel(String trainFile, String modelFile)
throws FileNotFoundException, IOException {
Charset charset = Charset.forName("UTF-8");
InputStreamFactory fileInputStream = new MarkableFileInputStreamFactory(new File(trainFile));
ObjectStream<String> lineStream = new PlainTextByLineStream(fileInputStream, charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
try {
model = NameFinderME.train("th", "person", sampleStream , TrainingParameters.defaultParams(), new TokenNameFinderFactory());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null) {
modelOut.close();
}
}
}}
Thai data looks like as attached in the file trainingData
I am using the output model to detect person name as shown in the below programme. It fails to identify the name.
public class ThaiPersonNameFinder {
static String modelFile = "/Users/avinashpaula/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
public static void main(String[] args) {
try {
InputStream modelIn = new FileInputStream(new File(modelFile));
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
String sentence[] = new String[]{
"จอห์น",
"30",
"ปี",
"จะ",
"เข้าร่วม",
"ก",
"เริ่มต้น",
"ขึ้น",
"บน",
"มกราคม",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
for (int i = 0; i < nameSpans.length; i++) {
System.out.println(nameSpans[i]);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
What am i doing wrong.
For Lucene 3.6.2 I have a following Analyzer:
public final class StandardAnalyzerV36 extends Analyzer {
private Analyzer analyzer;
public StandardAnalyzerV36() {
analyzer = new StandardAnalyzer(Version.LUCENE_36);
}
public StandardAnalyzerV36(Set<?> stopWords) {
analyzer = new StandardAnalyzer(Version.LUCENE_36, stopWords);
}
#Override
public final TokenStream tokenStream(String fieldName, Reader reader) {
return analyzer.tokenStream(fieldName, new HTMLStripCharFilter(CharReader.get(reader)));
}
#Override
public final TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
return analyzer.reusableTokenStream(fieldName, reader);
}
}
Could you please help me to port it on Analyzer for Lucene 5.5.0 ? The Analyzer interface was changed in the new version.
UPDATED
I have reimplemented this Analyzer to following:
public final class StandardAnalyzerV36 extends Analyzer {
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
#Override
protected TokenStreamComponents createComponents(String fieldName) {
final ClassicTokenizer src = new ClassicTokenizer();
TokenStream tok = new StandardFilter(src);
tok = new StopFilter(new LowerCaseFilter(tok), STOP_WORDS_SET);
return new TokenStreamComponents(src, tok);
}
#Override
protected Reader initReader(String fieldName, Reader reader) {
return new HTMLStripCharFilter(reader);
}
but my tests fails on following call:
tokens = LuceneUtils.tokenizeString(analyzer, "[{(RDBMS)}]");
public static List<String> tokenizeString(Analyzer analyzer, String string) {
List<String> result = new ArrayList<String>();
try {
TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
stream.reset();
while (stream.incrementToken()) {
result.add(stream.getAttribute(CharTermAttribute.class).toString());
}
} catch (IOException e) {
// not thrown b/c we're using a string reader...
throw new RuntimeException(e);
}
return result;
}
with a following exception:
java.lang.IllegalStateException: TokenStream contract violation: close() call missing
at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)
What is wrong with this code ?
Finally I got it working:
public final class StandardAnalyzerV36 extends Analyzer {
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
#Override
protected TokenStreamComponents createComponents(String fieldName) {
final ClassicTokenizer src = new ClassicTokenizer();
TokenStream tok = new StandardFilter(src);
tok = new StopFilter(new LowerCaseFilter(tok), STOP_WORDS_SET);
return new TokenStreamComponents(src, tok);
}
#Override
protected Reader initReader(String fieldName, Reader reader) {
return new HTMLStripCharFilter(reader);
}
}
public class LuceneUtils {
public static List<String> tokenizeString(Analyzer analyzer, String string) {
List<String> result = new ArrayList<String>();
TokenStream stream = null;
try {
stream = analyzer.tokenStream(null, new StringReader(string));
stream.reset();
while (stream.incrementToken()) {
result.add(stream.getAttribute(CharTermAttribute.class).toString());
}
} catch (IOException e) {
// not thrown b/c we're using a string reader...
throw new RuntimeException(e);
} finally {
IOUtils.closeQuietly(stream);
}
return result;
}
}