Java: convert PDF to image using ghost4j

Java: convert PDF to image using ghost4j - java

I am trying to use the ghost4j library to convert a pdf to image. My code looks like this:
import org.ghost4j.document.*;
import org.ghost4j.renderer.*;
import java.util.ArrayList;
import java.util.List;
.
.
.
PDFDocument document = new PDFDocument();
document.load(new File("M:\\test.pdf"));
SimpleRenderer renderer = new SimpleRenderer();
renderer.setResolution(300);
List<Image> images = renderer.render(document);
but I get the following error:
error: cannot find symbol
List images = renderer.render(document);
^ symbol: class Image location: class ImageRead 1 error
which indicates that compiler cannot find symbol "Image". I'm not sure why?

Related

How can I load a .csv file in java code to test with weka?

I have a model obtained from weka classifier and I want to test it in java code, But when I read instances, an error appear:
Exception in thread "main" java.io.IOException: keyword #relation expected, read Token[Word], line 1
at weka.core.Instances.errms(Instances.java:1863)
at weka.core.Instances.readHeader(Instances.java:1740)
at weka.core.Instances.<init>(Instances.java:119)
at licenta1.LoadModelWeka.main(LoadModelWeka.java:18)
My code is:
package licenta1;
import weka.core.Instances;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.trees.J48;
import weka.classifiers.Evaluation;
import java.util.Random;
import java.io.BufferedReader;
import java.io.FileReader ;
public class LoadModelWeka
{
public static void main(String[] args) throws Exception {
// training
BufferedReader reader = null;
reader=new BufferedReader(new FileReader("D:\\aaaaaaaaaaaaaaaaaaaaaa\\Licenta\\BioArtLicTrainSetTask1.csv"));
Instances train =new Instances (reader);
train.setClassIndex(0);
reader.close();
NaiveBayes nb = new NaiveBayes();
nb.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.crossValidateModel(nb, train, 10 , new Random(1));
System.out.println(eval.toSummaryString("\n Results \n=====\n",true));
System.out.println(eval.fMeasure(1)+" "+eval.precision(1)+" "+eval.recall(1)+" ");
}
}
Can somebody help me?
Mt training set is in .csv format

This snippet is useful to directly load csv content and convert them to Instances. Usually .arff is used for weka operations and this loader directly converts csv files to arff internally and then to Instances class.
CSVLoader loader = new CSVLoader();
loader.setSource(new File("filename.csv"));
Instances trainingDataSet = loader.getDataSet();

Instead of using Buffered Reader you can try
DataSource source = new DataSource("/some/where/data.arff");
For more information visit this link http://weka.wikispaces.com/Use+WEKA+in+your+Java+code

im using weka jar 3.7.10, and this is how i can load csv using weka :
DataSource source1 = new DataSource("D:\\aaaaaaaaaaaaaaaaaaaaaa\\Licenta\\BioArtLicTrainSetTask1.csv");
Instances pred_test = source1.getDataSet();

Search and replce a string in 10000 files

EDIT: I have 10,000 identical files with below text in it.
TRANSACTION_ID=9093626660000000001,VAULT_REPORT_NAME=VUpldr_QA_1Mb_Report00001,DIMENSION=REGION:Europe;LOB:All LOB;CATEGORY:RO Reporting;CUSTOMER:All Customer;FREQUENCY:Daily;REPORT AUDIENCE:Apple RO Reporting;REPORT SUBSCRIPTION:Apple RO Reporting
My requirement is to replace as below in 10 k files.
TRANSACTION_ID from 9093626660000000001 to 9093626660000010000.
and
VAULT_REPORT_NAME from VUpldr_QA_1Mb_Report00001 to
VUpldr_QA_1Mb_Report10000
So my output files contents would be
file 1st:
TRANSACTION_ID=9093626660000000001,VAULT_REPORT_NAME=VUpldr_QA_1Mb_Report00001,DIMENSION=REGION:Europe;LOB:All LOB;CATEGORY:RO Reporting;CUSTOMER:All Customer;FREQUENCY:Daily;REPORT AUDIENCE:Apple RO Reporting;REPORT SUBSCRIPTION:Apple RO Reporting
file 2nd:
TRANSACTION_ID=9093626660000000002,VAULT_REPORT_NAME=VUpldr_QA_1Mb_Report00002,DIMENSION=REGION:Europe;LOB:All LOB;CATEGORY:RO Reporting;CUSTOMER:All Customer;FREQUENCY:Daily;REPORT AUDIENCE:Apple RO Reporting;REPORT SUBSCRIPTION:Apple RO Reporting
file 10000th:
TRANSACTION_ID=9093626660000010000,VAULT_REPORT_NAME=VUpldr_QA_1Mb_Report10000,DIMENSION=REGION:Europe;LOB:All LOB;CATEGORY:RO Reporting;CUSTOMER:All Customer;FREQUENCY:Daily;REPORT AUDIENCE:Apple RO Reporting;REPORT SUBSCRIPTION:Apple RO Reporting
I wrote below code but that isn't working:
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class mdScript {
public static void main(String[] args) throws IOException {
for (int i=1;i<=10000;i++){
StringBuffer pathVarBuf = new StringBuffer();
pathVarBuf.append("/Users/564169/Desktop/Vault_Testing/vUpldr/MD/1MbReport");
pathVarBuf.append(i);
pathVarBuf.append(".md");
//System.out.println(pathVarBuf);
Path path = Paths.get(pathVarBuf.toString());
Charset charset = StandardCharsets.UTF_8;
String content = new String(Files.readAllBytes(path), charset);
content = content.replaceAll("VUpldr_QA_1Mb_Report00001", "RVUpldr_QA_1Mb_Report"+i);
int id=999900001; // (Transaction id in the org MD file is 90)
id = id+i;
content = content.replaceAll("999900001", Integer.toString(id));
Files.write(path, content.getBytes(charset));
System.out.println(content);
}
}
}
I am getting below error:
Exception in thread "main" java.lang.NullPointerException
at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:267)
at java.nio.file.Paths.get(Paths.java:84)
at mdScript.main(mdScript.java:22)

There are many ways to do this. You can do it programmatically as well.
Simple solution will be using TextPad
Here is a link for the same
TextPad find and replace in files

OCR implementation using using java

I have written java code to convert images into text using java.But my code is taking only single image as input . I want that the program should fetch images from a folder and then run the OCR on it.
My code is:
import java.io.FileOutputStream;
import org.bytedeco.javacpp.*;
import org.junit.Test;
import static org.bytedeco.javacpp.lept.*;
import static org.bytedeco.javacpp.tesseract.*;
import static org.junit.Assert.assertTrue;
import java.io.File;
public class BasicTesseractExampleTest {
#Test
public void givenTessBaseApi_whenImageOcrd_thenTextDisplayed() throws Exception {
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api.Init(".", "ENG") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
PIX image = pixRead("IMG_0012 (1).jpg");
api.SetImage(image);
// Get OCR result
outText = api.GetUTF8Text();
String string = outText.getString();
assertTrue(!string.isEmpty())
System.out.println(str);
// Destroy used object and release memory
api.End();
outText.deallocate();
pixDestroy(image);
}
}

To read a list of files out of a given Path use for example:
File f = new File("C:/programs");
File[] fileArray = f.listFiles();
Now you can check every File out of the fileArray if it is a directory and skip that with:
if(fileArray[0].isDirectory()) continue;
To find the images you can check for example the ending of the filename with:
fileArray[0].getName().endsWith(".jpg")
Do this check for all files out ouf the fileArray and call your method with the right files. To check the right file you have to change this line of your code:
PIX image = pixRead("IMG_0012 (1).jpg");
and add the fileArray[?] where the ? must be replaced with the right number.

ANTLR: Could not find symbol

I have a ANTLR project called "Test.g4" and with antlrworks2 I created without any problems the files: Test.tokens, TestBaseListner.java, TestLexer.java, TestLexer.tokens, TestListener.java and TestParser.java.
Now I want to use the grammer in my program Test.java:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
// create a CharStream that reads from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
// create a lexer that feeds off of input CharStream
TestLexer lexer = new TestLexer(input);
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// create a parser that feeds off the tokens buffer
TestParser parser = new TestParser(tokens);
ParseTree tree = parser.init(); // begin parsing at init rule
System.out.println(tree.toStringTree(parser)); // print LISP-style tree
}
}
When I try to compile it with "javac -classpath /path/java2/antlr-4.4-complete.jar Test.java" I get this errors:
Test.java:19: error: cannot find symbol
TestLexer lexer = new TestLexer(input);
^
symbol: class TestLexer
location: class Test
Test.java:19: error: cannot find symbol
TestLexer lexer = new TestLexer(input);
^
symbol: class TestLexer
location: class Test
Test.java:25: error: cannot find symbol
TestParser parser = new TestParser(tokens);
^
symbol: class TestParser
location: class Test
Test.java:25: error: cannot find symbol
TestParser parser = new TestParser(tokens);
^
symbol: class TestParser
location: class Test
4 errors
Thank you!

TestLexer.java and TestParser.java should also be compiled with Test.java in the same command, otherwise the compiler will not know where to look for their binaries. Try calling javac as follows:
javac -classpath /path/java2/antlr-4.4-complete.jar *java
Or manually pass all files:
javac -classpath /path/java2/antlr-4.4-complete.jar Test.java TestLexer.java TestParser.java

You need to build and import lexer and parser generated by ANTLR. To do it you need to:
add import statement to a file with your main method
put classes generated by ANTLR into a package as in your import statement
build both generated classes and class with your main method

How to get pictures with names from an xls file using Apache POI

Using workbook.getAllPictures() I can get an array of picture data but unfortunately it is only the data and those objects have no methods for accessing the name of the picture or any other related information.
There is a HSSFPicture class which would contain all the details of the picture but how to get for example an array of those objects from the xls?
Update:
Found SO question How can I find a cell, which contain a picture in apache poi which has a method for looping through all the pictures in the worksheet. That works.
Now that I was able to try the HSSFPicture class I found out that the getFileName() method is returning the file name without the extension. I can use the getPictureData().suggestFileExtension() to get a suggested file extension but I really would need to get the extension the picture had when it was added into the xls file. Would there be a way to get it?
Update 2:
The pictures are added into the xls with a macro. This is the part of macro that is adding the images into the sheet. fname is the full path and imageName is the file name, both are including the extension.
Set img = Sheets("Receipt images").Pictures.Insert(fname)
img.Left = 10
img.top = top + 10
img.Name = imageName
Set img = Nothing
The routine to check if the picture already exists in the Excel file.
For Each img In Sheets("Receipt images").Shapes
If img.Name = imageName Then
Set foundImage = img
Exit For
End If
Next
This recognizes that "image.jpg" is different from "image.gif", so the img.Name includes the extension.

The shape names are not in the default POI objects. So if we need them we have to deal with the underlying objects. That is for the shapes in HSSF mainly the EscherAggregate (http://poi.apache.org/apidocs/org/apache/poi/hssf/record/EscherAggregate.html) which we can get from the sheet. From its parent class AbstractEscherHolderRecord we can get all EscherOptRecords which contains the options of the shapes. In those options are also to find the groupshape.shapenames.
My example is not the complete solution. It is only provided to show which objects could be used to achieve this.
Example:
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.hssf.record.*;
import org.apache.poi.ddf.*;
import java.util.List;
import java.util.ArrayList;
class ShapeNameTestHSSF {
public static void main(String[] args) {
try {
InputStream inp = new FileInputStream("workbook1.xls");
Workbook wb = WorkbookFactory.create(inp);
Sheet sheet = wb.getSheetAt(0);
EscherAggregate escherAggregate = ((HSSFSheet)sheet).getDrawingEscherAggregate();
EscherContainerRecord escherContainer = escherAggregate.getEscherContainer().getChildContainers().get(0);
//throws java.lang.NullPointerException if no Container present
List<EscherRecord> escherOptRecords = new ArrayList<EscherRecord>();
escherContainer.getRecordsById(EscherOptRecord.RECORD_ID, escherOptRecords);
for (EscherRecord escherOptRecord : escherOptRecords) {
for (EscherProperty escherProperty : ((EscherOptRecord)escherOptRecord).getEscherProperties()) {
System.out.println(escherProperty.getName());
if (escherProperty.isComplex()) {
System.out.println(new String(((EscherComplexProperty)escherProperty).getComplexData(), "UTF-16LE"));
} else {
if (escherProperty.isBlipId()) System.out.print("BlipId = ImageId = ");
System.out.println(((EscherSimpleProperty)escherProperty).getPropertyValue());
}
System.out.println("=============================");
}
System.out.println(":::::::::::::::::::::::::::::");
}
FileOutputStream fileOut = new FileOutputStream("workbook1.xls");
wb.write(fileOut);
fileOut.flush();
fileOut.close();
} catch (InvalidFormatException ifex) {
} catch (FileNotFoundException fnfex) {
} catch (IOException ioex) {
}
}
}
Again: This is not a ready to use solution. A ready to use solution cannot be provided here, because of the complexity of the EscherRecords. Maybe to get the correct EscherRecords for the image shapes and their related EscherOptRecords, you have recursive to loop through all EscherRecords in the EscherAggregate checking whether they are ContainerRecords and if so loop through its children and so on.

Start here:
http://poi.apache.org/spreadsheet/quick-guide.html#Images
this tutorial can help you to extract an image's information from an xls spreadsheet using Apache POI

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: convert PDF to image using ghost4j - java

Related

How can I load a .csv file in java code to test with weka?

Search and replce a string in 10000 files

OCR implementation using using java

ANTLR: Could not find symbol

How to get pictures with names from an xls file using Apache POI

Categories

Resources