I'm trying to use the pdf library from jpedal, using the code snippet found here: http://www.jpedal.org/simple_image_example.php
/**instance of PdfDecoder to convert PDF into image*/
PdfDecoder decode_pdf = new PdfDecoder(true);
/**set mappings for non-embedded fonts to use*/
FontMappings.setFontReplacements();
/**open the PDF file - can also be a URL or a byte array*/
try {
decode_pdf.openPdfFile("C:/myPDF.pdf"); //file
//decode_pdf.openPdfFile("C:/myPDF.pdf", "password"); //encrypted file
//decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF
//decode_pdf.openPdfFileFromURL("http://www.mysite.com/myPDF.pdf",false);
/**get page 1 as an image*/
//page range if you want to extract all pages with a loop
//int start = 1, end = decode_pdf.getPageCount();
BufferedImage img=decode_pdf.getPageAsImage(1);
/**close the pdf file*/
decode_pdf.closePdfFile();
} catch (PdfException e) {
e.printStackTrace();
}
But on this line:
decode_pdf.openPdfFile("C:/myPDF.pdf"); //file
Eclipse trows an error:
The type javax.swing.JPanel cannot be resolved. It is indirectly
referenced from required .class files
It seems as if I'm missing javax.swing.*
Intellisence does give me other javax.* options but not the swing class.
I already searched google for this but I had no luck finding a solution.
Any ideas?
Can't get any clearer than this:
http://www.jpedal.org/PDFblog/2011/09/java-is-not-android/
Appears that the library I wanted to use was not compatible with android at all.
Also not the last sentence:
that makes converting a Java PDF viewer to Android a major task.
Thanks for crushing that last bit of hope for me jpedal...
I doubt path is not resolved, Try this
decode_pdf.openPdfFile("C:\\myPDF.pdf");
Related
I am working on a project where I need to iterate through a file system, extract text from a pdf, and scan through that text. Previously, the file system was an N drive (which acts as a local file system), so using the java File API, I could access each pdf file. Using this method, I would then extract the text:
public static String returnStringOfPDFiText(File file)
{
try {
PdfReader reader = new PdfReader(file.getPath());
int n = reader.getNumberOfPages();
String pdfText = null;
for(int i = 1; i<=n; i++)
{
pdfText += PdfTextExtractor.getTextFromPage(reader, n);
}
reader.close();
System.out.println(pdfText);
return pdfText;
}
catch(Exception e)
{
System.out.print(e);
return null;
}
}
From here, I could scan through the text.
I now need to do this, but using a dropbox file system. I can only find a way to get the metadata of each file, though, and not the actual file, so I can extract text.
Is there a way to get the file so I can call this method on the file to extract the text, or to just extract the text directly from the dropbox file?
Edit: I am working with the DropboxAPI already (though I might be missing some methods, I haven't read through a lot of the documentation). I am aware of the download method, but I don't want to use it, since we will be working with around 1 gb of pdfs, and downloading it would be super inefficient.
Dropbox does offer an API you can use for listing, uploading, and downloading files, among other operations. You can find everything you need to get started with the Dropbox API, including documentation, tutorials, and SDKs here:
https://www.dropbox.com/developers
For Java specifically, we recommend you use the official Dropbox Java SDK:
https://github.com/dropbox/dropbox-sdk-java
To download a file's contents using that, you can use the download method:
https://dropbox.github.io/dropbox-sdk-java/api-docs/v5.2.0/com/dropbox/core/v2/files/DbxUserFilesRequests.html#download(java.lang.String)
You can find an example of that here:
https://github.com/dropbox/dropbox-sdk-java/blob/e52fc828c7c753e04c3fa9d47ab6de7e85d000c4/examples/tutorial/src/main/java/com/dropbox/core/examples/tutorial/Main.java#L54
This question already has answers here:
Java, display chm file loaded from resources in jar
(1 answer)
How do I determine the correct path for FXML files, CSS files, Images, and other resources needed by my JavaFX Application?
(1 answer)
Closed 1 year ago.
So I have created a java application in eclipse and I use javafx, and then exported it in a runnable jar file. However, when I run my .jar file it gives the error of not finding my images. It works perfectly fine when I run inside eclipse.
My file structure:
In UserPane I have a function that takes the image name as "title" and returns the imageview:
InputStream is=null;
try {
is=new FileInputStream("./Images/"+title+".jpg");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Image img=new Image(is);
ImageView imgview=new ImageView();
imgview.setImage(img);
imgview.setFitHeight(300);
imgview.setFitWidth(300);
return imgview;
I have tried some solutions, but it won't run in eclipse. I need it to run in eclipse and from the .jar.
I need it to be set in the imageview please.
Update: I did not use any of the suggestions mentioned because none of them were helpful, none of them worked. What worked for me is moving my .jar in the same folder where my images folder is, while still keeping the above code.
Thank you
FileInputStream is called that because it operates on files. In java, a file means exactly what it says: A file is a thing on a disk someplace.
An entry in a jar file is NOT ITSELF a file!
Therefore, when you write new FileInputStream? You lost the game. That can never work.
What you're looking for is a thing that lets you ask the JVM to give you resources from the same place the JVM is loading your class files.
Fortunately, that exists!
URL url = UserPane.class.getResource("/Images/" + title + ".png");
This gets you a url object, which you can pass to e.g. ImageIcon.
If you want to instead read it directly:
try (InputStream in = UserPane.class.getResourceAsStream("/Images/" + title + ".png")) {
// use 'in' here. It's `null` if the resource doesn't exist.
}
For this specific use case, you're looking for the URL variant. You don't need the InputStream at all.
I am using the Tesseract Java API (tess4J) to convert Tiff images to PDFs.
This works nicely, but I am forced to write both the source Tiff image and the output PDF to local filestore as actual physical files in order to use the TessAPI1.TessPDFRendererCreate API.
Please note the following in the code snippet below: -
The input Tiff is originally a java.awt.image.BufferedImage, but I have to write it to a physical file (sourceTiffFile is a File object).
I must specify a file path for the output (pdfFullFilepath is a String representing an absolute path for the new PDF file).
try {
ImageIO.write(bufferedImage, "tiff", sourceTiffFile);
} catch (Exception ioe) {
//handling code...
}
TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0);
TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0));
int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);
I would really like to avoid creating physical files, but am not sure if it is possible with this API. Ideally, I would like to pass the Tiff as a java.awt.image.BufferedImage or a byte array and receive the output PDF as a byte array.
Any suggestions would be most welcome as always. Thank you :)
You can pass in ProcessPage API method a Pix, which can be converted from a BufferedImage, but the output will still be a physical file. Tesseract API dictates that.
https://tesseract-ocr.github.io/tessapi/4.0.0/a01625.html
http://tess4j.sourceforge.net/docs/docs-4.4/net/sourceforge/tess4j/TessAPI1.html
For ex:
int result = TessAPI1.TessBaseAPIProcessPage(handle, LeptUtils.convertImageToPix(bufferedImage), page_index, "input file name", null, 0, renderer);
I have to create a function that tells me is an uploaded PDF file encrypted-only, timestamped-only, none, or both.
So far I am using PDFBox 2.0.5 and only success knowing timestamped-only and none file.
Here is the current code:
try{
InputStream fis = new ByteArrayInputStream(file.getBytes());
PDDocument d = PDDodument.load(fis); //encrypted file will go to InvalidPasswordException error
//***file not encrypted***
List<PDSignature> a = d.getSignatureDictionaries();
List<PDSignatureField> b = d.getSignatureFields();
if(a.size()==0 && b.size()==0){
//***file not timestamped***
}else{
//***file timestamped***
}
} catch(InvalidPasswordException e){
//***file encrypted***
//***is file timestamped?***
//***is file not timestamped?***
} catch(IOException e){
}
My question is, how to know if there any timestamp (as signature) on encrypted pdf file?
Note: I have found several PDFBox usage example but rarely find PDFBox latest version implemented example
Update note: file is an uploaded MultipartFile object, having hard time getting its path avoiding copying or transferring it into another object type
I'm working on a project that entails photographing text (from any hard copy of text) and converting that text into a text file. Then I'd like to use that text file to do some different things, such as provide hyperlinks to news articles or allow the user to edit the document.
The tool I've tried so far is Java OCR from sourceforge.net, which works fine on the images provided in the package. But when I photograph my own text, it doesnt work at all. Is there some training process I should be implementing? If so, does anybody know how to implement it? Any help will go a long way. Thank you!
I have a java application where I ended up deciding to use Tesseract OCR, and just call out to it using Runtime.exec(). Perhaps not quite the answer you need, but just in case you'd not considered it.
Edit + code added in response to comment reply
On a Windows installation I think I was able to use an installer, or unzip a ready made binary.
On a Linux server, I needed to compile Tesseract myself, but it's not too hard if you're used to that kind of thing (gcc); the only gotcha is that there's a dependency on Leptonica which also needs to be compiled.
// Tesseract can only handle .tif format, so we have to convert it
ImageIO.write( ImageIO.read( new java.io.File(file.getPath())), "tif", tmpFile[0]);
String[] tesseractCmd = new String[]{"tesseract", tmpFile[0].getAbsolutePath(), StringUtils.removeEnd(tmpFile[1].getAbsolutePath(), ".txt")};
final Process process = Runtime.getRuntime().exec(tesseractCmd);
try {
int exitValue = process.waitFor();
if(exitValue == 0) {
final String extractedText = SearchableTextExtractionUtils.extractPlainText(new FileReader(tmpFile[1]));
return extractedText;
}
throw new SearchableTextExtractionException(exitValue, Arrays.toString(tesseractCmd));
} catch (InterruptedException e) {
throw new SearchableTextExtractionException(e);
} finally {
process.destroy();
}