PDF to Image convert high size image - java

I am using below code to convert PDF to PNG image.
Document document = new Document();
try {
document.setFile(myProjectPath);
System.out.println("Parsed successfully...");
} catch (PDFException ex) {
System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
System.out.println("Error file not found " + ex);
} catch (IOException ex) {
System.out.println("Error handling PDF document " + ex);
}
// save page caputres to file.
float scale = 1.0f;
float rotation = 0f;
// Paint each pages content to an image and write the image to file
InputStream fis2 = null;
File file = null;
for (int i = 0; i < 1; i++) {
BufferedImage image = (BufferedImage) document.getPageImage(i,
GraphicsRenderingHints.SCREEN,
Page.BOUNDARY_CROPBOX, rotation, scale);
RenderedImage rendImage = image;
// capture the page image to file
try {
System.out.println("\t capturing page " + i);
file = new File(myProjectActualPath + "myImage.png");
ImageIO.write(rendImage, "png", file);
fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png"));
} catch (IOException ioe) {
System.out.println("IOException :: " + ioe);
} catch (Exception e) {
System.out.println("Exception :: " + e);
}
image.flush();
}
myProjectPath is the path of the pdf file.
The problem is that I have pdf image of size 305 KB. When I use above code to convert image, the image size is 5.5 MB which is unexpected. Any reason why this is happening? Is there way to compress this? If I get solution to compress the size (by making down the pixel size), it is also OK.
Note : For other pdf files, images are coming to 305 KB. This is happening with one PDF file and not sure why this is happening.
Edit 1
I am using jar files as
icepdf-core.jar
icepdf-viewer.jar
The import that I have are
import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;

You could extract the images from the pdf (example using PDFBox):
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
for(PDPage page : pages) {
Map<String, PDXObjectImage> images = page.getResources().getImages();
for(PDXObjectImage image : images.values()){
//TODO: write image to disk
}
}
OR/AND you may want to save them as jpg to disk, as jpg overs compression as opposed to png.
You could even identify the format of the orignal image and use that when writing to disk by calling:
image.getSuffix();

You should be able to change the size of the file by changing scale. PDFs are often much smaller then rendered images. They can represent text and vector graphics which the rendered image will use a lot of bytes to represent. I'm actually somewhat surprised that any of your pngs are about the same size as the pdfs (unless the pdfs are just pictures).

Related

Why to upload file to AWS by writing to disk first?

This example uses a file which most likely resides not in RAM:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpJava.html
but I already got a buffered file from a certain client request and in the code below, this file gets written to disk, but why ? it makes the whole process slow by writing to disk, can't I avoid it?
EDIT (Below is explanation of what I am trying to achieve):
A user's image is uploaded then scaled by the server and then saved on the server's disk and only then this scaled image is sent to AWS, afterwards the user gets an aws link where the image resides on the amazon server.
public void transferToS3(String region, String bucket, String entity, String resolution, String filename, BufferedImage bufferedImage) {
if (bufferedImage != null) {
String objectpath = "/" + "images" + "/" + entity + "/" + resolution + "/" + filename + "." + "png";
Path tmpFile = null;
try {
tmpFile = Files.createTempFile(imagesPath, "tmp_", ".png");
} catch (IOException e) {
e.printStackTrace();
}
tmpFile.toFile().deleteOnExit();
try {
ImageIO.write(bufferedImage, "png", tmpFile.toFile());
S3AsyncClient client = S3AsyncClient.builder().region(Region.of(region)).build();
CompletableFuture<PutObjectResponse> future =
client.putObject(PutObjectRequest.builder()
.bucket(bucket)
.key(objectpath)
.contentType("image/png")
.build(),
AsyncRequestProvider.fromFile(tmpFile.toAbsolutePath()));
Path finalTmpFile = tmpFile;
future.whenComplete((resp, err) -> {
try {
if (resp != null) {
logger.debug(resp.toString());
} else {
logger.error(err.toString());
}
Files.deleteIfExists(finalTmpFile.toAbsolutePath());
} catch (IOException e) {
e.printStackTrace();
} finally {
FunctionalUtils.invokeSafely(client::close);
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
Scaling routine returns a scaled BufferedImage which is then used in the transferToS3 method.
public BufferedImage scale(int width, int height, BufferedImage bufferedImage) {
BufferedImage scaledBufferedImage = null;
if (bufferedImage != null) {
Image image = bufferedImage.getScaledInstance(width, height, Image.SCALE_SMOOTH);
scaledBufferedImage = new BufferedImage(image.getWidth(null), image.getHeight(null), BufferedImage.TYPE_INT_ARGB);
scaledBufferedImage.getGraphics().drawImage(image, 0, 0, null);
}
return scaledBufferedImage;
}
The 2 above together:
BufferedImage scaledBufferedImage = imageService.scale(width, height finalBufferedImage);
imageService.transferToS3(region, bucket, name, k, file, scaledBufferedImage);
You can do whatever you wish with the data stream from the request. Feel free to scale the image in memory and send it back in the response. The example you linked writes the file to disk because this is by far the most common scenario. It also allows the author to focus on the details of uploading a file without polluting the example with unrelated code.
Note that bufferedImage is not a file. It is a stream. I suspect the author saved the image to disk in order to avoid assuming anything about the size of the image. If the image is too large to fit in RAM, then you will have difficulties doing the scaling in memory.

Error trying to read a region from a very large image file in java

I need to read a very large image file (56000 px X 49000 px). I need to read it in small rectangular chunks, so I am trying to follow this example: Read region from very large image file in Java
However, I get this error: java.lang.IllegalArgumentException: width*height > Integer.MAX_VALUE!
My code snippet is below (taken more or less exactly from the link above):
ImageInputStream stream = null;
try {
stream = ImageIO.createImageInputStream(new File(this.inFile)); // File or input stream
} catch (Exception ex) {
Logger.getLogger(CreateTrainingSetFromImage.class.getName()).log(Level.SEVERE, null, ex);
}
Iterator<ImageReader> readers = ImageIO.getImageReaders(stream);
ImageReader r = readers.next();
System.out.println("Using reader: " + r);
r.setInput(stream);
try {
System.out.println("Height = " + r.getHeight(0));
System.out.println("Width = " + r.getWidth(0));
} catch (IOException ex) {
Logger.getLogger(CreateTrainingSetFromImage.class.getName()).log(Level.SEVERE, null, ex);
}
ImageReadParam param = r.getDefaultReadParam();
Rectangle sourceRegion = new Rectangle(0, 0, 200, 200);
param.setSourceRegion(sourceRegion); // Set region
BufferedImage image = null;
try {
image = r.read(0, param); // Will read only the region specified
System.out.println("Read file " + this.inFile);
System.out.println("Width = " + image.getWidth());
System.out.println("Height = " + image.getHeight());
} catch (Exception ex) {
Logger.getLogger(CreateTrainingSetFromImage.class.getName()).log(Level.SEVERE, null, ex);
}
My understanding is that specifying the Rectangle sourceRegion for param would make the ImageReader read only that small chunk of the image, so I don't understand what's causing the error. Any help would be much appreciated. If it helps, I am using the TwelveMonkeys ImageIO plugins.
Here is the output:
Using reader: com.twelvemonkeys.imageio.plugins.jpeg.JPEGImageReader#5437dd04
Height = 49429
Width = 56281
Apr 23, 2017 11:57:17 AM createtrainingsetfromimage.CreateTrainingSetFromImage test
SEVERE: null
java.lang.IllegalArgumentException: width*height > Integer.MAX_VALUE!
at javax.imageio.ImageReader.getDestination(ImageReader.java:2840)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1066)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1034)
at com.twelvemonkeys.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:382)
at createtrainingsetfromimage.CreateTrainingSetFromImage.test(CreateTrainingSetFromImage.java:102)
at createtrainingsetfromimage.CreateTrainingSetFromImage.createTrainingSet(CreateTrainingSetFromImage.java:168)
at createtrainingsetfromimage.CreateTrainingSetFromImage.main(CreateTrainingSetFromImage.java:46)
There's a known bug/limitation in the getDestination method of the ImageReader base class (the super class of the JPEGImageReader and all other ImageIO reader implementations), that calculates the width * height of the input image, rather than the region you are actually trying to read... This prevents you from reading even small parts of such images.
The code looks like this, and the width and height parameters are the dimensions of the input:
if ((long) width * height > Integer.MAX_VALUE) {
throw new IllegalArgumentException("width*height > Integer.MAX_VALUE!");
}
In most of my (the TwelveMonkeys ImageIO library's) ImageReaders I work around this limitation by using a different implementation of the getDestination method. But for the JPEGImageReader I delegate the actual decoding to the com.sun....JPEGImageReader, which uses the original method, and causes this exception.
It might be possible to work around the problem by using the readRaster method instead of read (as it does not use the getDestination method), but it requires extra work, and I haven't had the possibility to test this yet.

Java can't read an image with javax.imageio or Sanselan

I want to read an image to scale it using awt and Apache Commons Imaging, previously known as Apache Commons Sanselan
I can not scale the image but I can see it in the browser properly as jpg without any problem.
Getting the image info using
Sanselan.getMetadata(fileData)
I get this info:
No Exif metadata.
Photoshop (IPTC) metadata:
The code
public static byte[] scale(byte[] fileData, int width, int height) {
ByteArrayInputStream in = new ByteArrayInputStream(fileData);
try {
BufferedImage img = javax.imageio.ImageIO.read(in);
....
return buffer.toByteArray();
} catch (IOException e1) {
System.out.println ("e1 -> " + e1.getMessage());
try {
BufferedImage img = Sanselan.getBufferedImage(in);
} catch (ImageReadException | IOException e2) {
System.out.println ("e2 -> " + e2.getMessage());
}
}
return fileData;
}
but I got this exceptions:
e1 -> Unsupported Image Type
e2 -> Can't parse this format.
Image scaling can be done without external libraries.
Image img = ImageIO.read(URL);
Image scaledImg = img.getScaledInstance(IMG_WIDTH, IMG_HEIGHT, Image.SCALE_DEFAULT);
See the docs for further inspiration.

To view Tiff image on web and do image processing on it using java applet

I want to browse and display big Tiff image on web using HTML and Javascript and do image processing on it using java applet.
All the image loading and processing should be on client machine.
For viewing of image I want to use HTML and Javascript.
For image processing I want to use java applet.
i have face same prob as u face. As we all know that tiff image can't be visible on web browser. so we have to convert into Png or any other format.
File file = new File(path_of_tiff_file, name_of_tiff_file);
String newName = file.getName();
// if (!file.exists()) {
item.write(file);
if(item.getName().toLowerCase().indexOf(".tif") >=0 || item.getName().toLowerCase().indexOf(".tiff") >=0 ){
newName =item.getName().subSequence(0, item.getName().lastIndexOf(".")) + ".png";
File newFile = new File(path , newName);
BufferedImage image= null;
try {
image = Sanselan.getBufferedImage(file);
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
Sanselan.writeImage(image, newFile, ImageFormat.IMAGE_FORMAT_PNG, new Hashtable());
}

writing to pbm file using java ImageIo

I am trying to convert pages of a pdf to .pbm images. I am using following code for conversion:
for (int i=0; i<pages.size(); i++) {
PDPage singlePage = (PDPage) pages.get(i);
int pageno=i+1;
BufferedImage buffImage = null;
String imagefilename=prefix+"-"+pageno+".pbm";
try {
buffImage = singlePage.convertToImage();
} catch (IOException e1) {
System.out.println("Font not found");
return;
}
try {
File output=new File(imagefilename);
if (buffImage!=null){
ImageIO.write( buffImage, "pbm", output);
}
} catch (Exception e) {
System.out.println("File can not be written, check permission?");
}
}
When I am trying to write to png files, this seem to work perfectly, but I can not write to pbm files. ImageIO.write( buffImage, "pbm", output); returns false. What can be a possible remedy?

Categories