I have a REST webservice built with Jersey that does OCR (Optical Character Recognition) using Tesseract via the Tess4J Java binding. Now the Tess4J library expects you to send it an image file (png, jpg, tif amongst others), but with Jersey processing I get an InputStream that contains the image.
How do I convert this InputStream to a file type that Tesseract would recognise? I've tried the following:
import org.apache.commons.io.IOUtils;
.....
private static File stream2file (InputStream in) throws IOException {
final File tempFile = File.createTempFile("stream2file", ".tmp");
tempFile.deleteOnExit();
try (FileOutputStream out = new FileOutputStream(tempFile)) {
IOUtils.copy(in, out);
}
return tempFile;
}
But then the Tesseract library throws an exception saying that it doesn't accept the file type I'm sending (Which now in this case is 'tmp'). I've tried changing that little 'tmp' to 'tif' and other supported file types but that just yielded the same results, so I'm obviously missing something here.
So how can I take an InputStream, convert it, and forward it to Tesseract as one of the supported file types that it expects?
The file extension of the temp file has to match that of the original input image file.
Besides File type, Tess4J also accepts BufferedImage as input. Just convert inputstream to it, as follows:
BufferedImage image = ImageIO.read(is);
try (FileOutputStream out = new FileOutputStream(tempFile)). You have got an error at this line.
You should use FileOutputStream (String) not FileOutputStream(File).
So it should be FileOutputStream(tempfile.getName()).
The parameter you pass to the constructor of FileOutputStream is a string that is the path to the real file or the name of the file. It's not a File object.
Related
I am trying to read and write an image dataset in Hadoop using java. I am only convenient with the normal bufferImage.
folder = new File("");
img = ImageIO.read(f);
ImageIO.read accepts an InputStream
You can get an InputStream from an HDFS file using the FileSystem class's open function
I got a strange issue with a GIF image in Java. The image is provided by an XML API as Base64 encoded string. To decode the Base64, I use the commons-codec library in version 1.13.
When I just decode the Base64 string and write the bytes out to a file, the image shows properly in browsers and MS Paint (nothing else to test here).
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
File sigGif = new File("C:/Temp/pod_1Z12345E5991872040.org.gif");
try (FileOutputStream fos = new FileOutputStream()) {
fos.write(sigImg);
fos.flush();
}
The resulting file opened in MS Paint:
But when I now start consuming this file using Java (for example creating a PDF document from HTML using the openhtmltopdf library), it is corrupted and does not show properly.
final String htmlLetterStr = "[HTML as provided by API]";
final Document doc = Jsoup.parse(htmlLetterStr);
try (FileOutputStream fos = new FileOutputStream(new File("C:/Temp/letter_1Z12345E5991872040.pdf"))) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "file:///C:/Temp/");
builder.toStream(fos);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.run();
fos.flush();
}
When I now open the resulting PDF, the image created above looks like this. It seems that only the first pixel lines are printed, some layer is missing, or something like that.
The same happens, if I read the image again with ImageIO and try to convert it into PNG. The resulting PNG looks exactly the same as the image printed in the PDF document.
How can I get the image to display properly in the PDF document?
Edit:
Link to original GIF Base64 as provided by API: https://pastebin.com/sYJv6j0h
As #haraldK pointed out in the comments, the GIF file provided via the XML API does not conform to the GIF standard and thus cannot be parsed by Java's ImageIO API.
Since there does not seem to exist a pure Java tool to repair the file, the workaround I came up with now is to use ImageMagick via Java's Process API. Calling the convert command with the -coalesce option will parse the broken GIF and create a new one that does conform to the GIF standard.
// Decode broken GIF image and write to disk
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
Path gifPath = Paths.get("C:/Temp/pod_1Z12345E5991872040.tmp.gif");
if (!Files.exists(gifPath)) {
Files.createFile(gifPath);
}
Files.write(gifPath, sigImg, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING);
// Use the Java Process API to call ImageMagick (on Linux you would use the 'convert' binary)
ProcessBuilder procBuild = new ProcessBuilder();
procBuild.command("C:\\Program Files\\ImageMagick-7.0.9-Q16\\magick.exe", "C:\\Temp\\pod_1Z12345E5991872040.tmp.gif", "-coalesce", "C:\\Temp\\pod_1Z12345E5991872040.gif");
Process proc = procBuild.start();
// Wait for ImageMagick to complete its work
proc.waitFor();
The newly created file can be read by Java's ImageIO API and be used as expected.
String fileName="raj.doc";
ServletOutputStream stream=null;
BufferedInputStream buf=null;
stream=res.getOutputStream();
String s1=getServletContext().getRealPath("/web-inf/lib/raj.doc");
File doc=new File(s1);
res.setContentType("application/vnd.ms-word");
res.addHeader("Content-Disposition","attachment;filename= "+fileName);
res.setContentLength((int)doc.length());
FileInputStream input=new FileInputStream(doc);
buf=new BufferedInputStream(input);
int readBytes=0;
while((readBytes=buf.read())!=-1)
stream.write(readBytes);
Give me an example of downloading MS-word file in java. Tell me jar files which are needed.
You don't need any jars if you want to only download the file and not work with it.
Just use this code and replace the URL with the URL of your document. Then you should be able to create a new File and just feed everything you read from the URL in the outputstream of the file.
I have a web page with an upload feature which lets you upload a excel file, on hitting upload an Ajax call is fired. From there I get the FileItem input stream and using the method fileItem.getInputStream(), I have another class with a method which I need to pass the file to, which has a FileInputStream parameter. So my question is how do I convert the input stream to a FileInputStream?
A detailed solution would be appreciated as I am a junior developer, so I am still learning.
Many thanks.
From JavaDoc
A FileInputStream obtains input bytes from a file in a file system.
I would suggest two solutions:
The proper one is to change the API and to have InputStream as a parameter. I don't see a reason why you have FileInputStream in your API.
If you don't own the API and cannot change it I'm afraid you will need to save the InputStream to temp file and then create FileInputStream giving a path to this file (it's a suboptimal solution as you first write the file to disk - risking out of space - and then read it and streaming API is designed for reading / writing data on the fly)
If you are using org.apache.commons.fileupload.FileItem interface then your class is probably DefaultFileItem which is a subclass of DiskFileItem. So you can cast FileItem to DiskFileItem. then if you look at the source code of DiskFileItem you'll find that getInputStream() is actually returning a FileInputStream or a ByteArrayInputStream If you get a FileInputStream from DiskFileItem you can pass it directly to your other class. But if you get a ByteArrayInputStream you will have to write the contents to your own temporary file and then open another FileInputStream on this temp file. There is also another method DiskFileItem.getStoreLocation() which seem to return the server side File used for upload, but it may return null if the file is cached in memory.
In conclusion: you cannot be sure that there is going to be a server side file because the upload may be cached in memory. Therefore if you need a FileInputStream elsewhere you will have to create it yourself by creating a temp file. There is an example on how to pipe between two streams here.
//Pass file path/name directly to FileInputStream
FileInputStream input1 = new FileInputStream("input.txt");
//Save file path that has been passed in by the user, into a string variable.
String fileName = args[0];
//pass path to File object
File inputFile = new File(fileName);
//pass file object to FileOutputStream
FileOutputStream output = new FileOutputStream(inputFile);
I have an InputStream which I would like to convert to a PDF, and save that PDF in a directory. Currently, my code is able to convert the InputStream to a PDF and the PDF does show up in the correct directory. However, when I try to open it, the file is damaged.
Here is the current code:
InputStream pAdESStream = signingServiceConnector.getDirectClient().getPAdES(this.statusReader.getStatusResponse().getpAdESUrl());
byte[] buffer = new byte[pAdESStream.available()];
pAdESStream.read(buffer);
File targetFile = new File(System.getProperty("user.dir") + "targetFile2.pdf");
OutputStream outStream = new FileOutputStream(targetFile);
outStream.write(buffer);
Originally, the InputStream was a pAdES-file (https://en.wikipedia.org/wiki/PAdES). However, it should be able to be read as just a regular PDF.
Does anyone know how to convert the InputStream to a PDF, without getting a damaged PDF as a result?
Hello it might be a bit late but you can use PDFBOX api (or itextpdf)
https://www.tutorialkart.com/pdfbox/create-write-text-pdf-file-using-pdfbox/
here is a tuto of the process gl