Dom manipulation [duplicate] - java

How would one go about converting a SVG file to a PDF programatically? (I need to alter the SVG in certain respects before generating the PDF so simply pre-converting it using a tool won't be sufficient.)
Ideally using Java but Perl or PHP would be fine too.
Obviously I am basically considering Apache FOP and Batik with Java. However no matter how long I search I cannot find a simple introduction on how to do it. Things like SVGConverter have descriptions like "Defines the interface for classes that are able to convert part or all of a GraphicContext", but I don't really know what that means.
I have this feeling there must be an API to do this quite simply, provided by FOP or Batik, but I'm just not able to find it at the moment (or perhaps it really doesn't exist.)
In terms of the supported SVG features I need, the file has some paths which are filled with some linear gradients.
Ideally if I could pass the SVG in as a DOM Document that would be ideal; then I would load my template SVG file, change it as specified by the user, and then generate the PDF.

Thanks to Adrian for showing how the Batik rasterizer API is supposed to be used. However, I needed a more lightweight solution--- I can't write to temporary files, and I want fewer dependencies. So, starting from the methods he pointed to, I found a way to access the lower-level code to do the conversion and nothing else.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import org.apache.batik.transcoder.Transcoder;
import org.apache.batik.transcoder.TranscoderException;
import org.apache.batik.transcoder.TranscoderInput;
import org.apache.batik.transcoder.TranscoderOutput;
import org.apache.fop.svg.PDFTranscoder;
public class Test {
public static void main(String[] argv) throws TranscoderException, FileNotFoundException {
Transcoder transcoder = new PDFTranscoder();
TranscoderInput transcoderInput = new TranscoderInput(new FileInputStream(new File("/tmp/test.svg")));
TranscoderOutput transcoderOutput = new TranscoderOutput(new FileOutputStream(new File("/tmp/test.pdf")));
transcoder.transcode(transcoderInput, transcoderOutput);
}
}
The compile-and-run commands are
javac -cp batik-rasterizer.jar -d build Test.java
java -cp build:batik-rasterizer.jar Test
The important point is that TranscoderInput and TranscoderOutput can work with any InputStream and OutputStream, not just file streams. Note that one of the constructors takes a org.w3c.dom.Document, which means that you don't even need to serialize an SVG DOM into an SVG string, saving an additional step.
This version also doesn't write anything to stdout/stderr, unlike the high-level API.
For JPEG, PNG, or TIFF output, replace org.apache.fop.svg.PDFTranscoder with org.apache.batik.transcoder.image.JPEGTranscoder, PNGTranscoder, or TIFFTranscoder (note that these raster formats are in a different package).
(I'm not quite sure how Java finds the org.apache.batk.transcoder.* and org.apache.fop.svg.PDFTranscoder classes, since I don't see them in the batik-rasterizer.jar.)
Edit:
Although the simple commandline-compilation works with the batik-rasterizer.jar only, it's doing some sort of classloader magic to find all the necessary classes. In a more realistic case (building a project with Ant), you have to find the classes by hand. They can be found in batik-1.7.zip from the Batik project and fop-1.1.zip from the FOP project. From Batik, you need to compile with batik-transcoder.jar and run with
batik-transcoder.jar
batik-anim.jar
batik-awt-util.jar
batik-bridge.jar
batik-css.jar
batik-dom.jar
batik-ext.jar
batik-gvt.jar
batik-parser.jar
batik-script.jar
batik-svg-dom.jar
batik-util.jar
batik-xml.jar
xml-apis-ext.jar
From FOP, you need to compile with fop.jar and run with
fop.jar
avalon-framework-4.2.0.jar
xmlgraphics-commons-1.5.jar

I finally managed to find the appropriate lines of code to solve this using the Batik.
You need to have the SVG file and the resulting PDF as files on the disk, i.e. I couldn't find a way to do it in-memory (I am writing a HTTP Servlet so I have no intrinsic need to write anything as a file, ideally I would stream the result to the HTTP client). I used File.createTemporaryFile to create a file to dump out my SVG to a file, and for the resulting PDF to be written to.
So the lines I used are the following:
import org.apache.batik.apps.rasterizer.DestinationType;
import org.apache.batik.apps.rasterizer.SVGConverter;
import ...
// SVG available as a DOM object (created programatically by my program)
Document svgXmlDoc = ...
// Save this SVG into a file (required by SVG -> PDF transformation process)
File svgFile = File.createTempFile("graphic-", ".svg");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMSource source2 = new DOMSource(svgXmlDoc);
FileOutputStream fOut = new FileOutputStream(svgFile);
try { transformer.transform(source2, new StreamResult(fOut)); }
finally { fOut.close(); }
// Convert the SVG into PDF
File outputFile = File.createTempFile("result-", ".pdf");
SVGConverter converter = new SVGConverter();
converter.setDestinationType(DestinationType.PDF);
converter.setSources(new String[] { svgFile.toString() });
converter.setDst(outputFile);
converter.execute();
And I have the following JARs (search using Google to find the projects and download them):
avalon-framework-4.2.0.jar
batik-all-1.7.jar
commons-io-1.3.1.jar
commons-logging-1.0.4.jar
fop-0.95.jar
log4j-1.2.15.jar
xml-apis-ext.jar
xmlgraphics-commons-1.3.1.jar

you will need a libray for rendering svg's and pdf's.
I recommend SVG salamander for the former, and iText for the latter. With svg salamander you can to read the svg and create an image object, and with itext you can write that image to a pdf.

I use Altsoft Xml2PDF. If I understood correctly all your needs and requirement, you'd better try their Server version of Xml2PDF.

All you need is phantomjs. You don't need the unwieldy Batik for this at all; just get to a point where you can run phantomjs, calling rasterize.js, using the url of the pdf as a source, and a location as the output. Depending on what you want to do with the .pdf, you don't even need Java.
http://phantomjs.org/screen-capture.html
Look at the part starting with "Beside PNG format, PhantomJS supports JPEG, GIF, and PDF."

Related

Apache FOP and Java Image Issues - Combining multiple sources

I am trying to "automate" the building of a PDF using Apache FOP and Java. I want to minimize the hard coding since I don't know in advance all the file combinations I am going to need to support. In addition I want to try and not save files on the hard drive. Files on the HD introduces security, performance, threading and cleanup considerations I would rather not handle.
The test case I am using right now has 1 FO and 2 PNG files. One of the PNG files is over 1MB.
Ideally I would create 3 sources:
InputStream fo = new InputStream(new File("C:\\Temp\\FOP\\Test\\blah.fo"));
InputStream png1 = new InputStream(new File("C:\\Temp\\FOP\\Test\\image-1.png"));
InputStream png2 = new InputStream(new File("C:\\Temp\\FOP\\Test\\image-2.png"));
Source foSrc = new StreamSource(fo);
Source png1Src = new StreamSource(png1);
Source png2Src = new StreamSource(png2);
and then combine them all together to generate the PDF. I can't find a way using the API to do that.
The FO files refers to the images via:
<fo:external-graphic src="file:image-1.png"/>
<fo:external-graphic src="file:image-2.png"/>
When I use the command line FOP tools, it builds the PDF as I would expect. As long as the two images are in the same directory as the FO file, then all is good. Using the command line, there is no need to point out the existence or location of the images.
When using Java, I have tried a number of configurations, but none of them fit my need:
I saved the FO file and the 2 images into the same directory and referred to them using the following FopFactory constructor:
private static final FopFactory fopFactory = FopFactory.newInstance(new File("C:\\Temp\\FOP\\test").toURI());
This code base only finds the smaller of the two images. It seems like the larger one is being ignored since it is bigger than some limit.
I have tried the above constructor using various relative and absolute paths.
I have tried constructing FopFactory using the default "fop.xconf" file and adding the "C:\Temp\FOP\Test" directory to the classpath.
I have "hardcoded" the files and their locations in the FO file.
I have tried using intermediate files structure (IFDocumentHandler, IFSerializer and IFConcatenator) for the images and get errors that way. Seems the intermediate files are not intended for images.
I have been able to embed the file into the FO file using base64 encoding and the syntax:
<fo:external-graphic src="url('data:image/png;base64,iVBORw...ggg==')"/>
The last one seems like the best solution other than taking 3 sources and using all 3 to generate the PDF. Any suggestions on how to use the API to combine the 3 sources?
Thanks.

How to copy and compress and then paste multiple jpg images [duplicate]

From pagespeed I am getting only image link and possible optimizations in bytes & percentage like,
Compressing and resizing https://example.com/…ts/xyz.jpg?036861 could save 212KiB (51% reduction).
Compressing https://example.com/…xyz.png?303584508 could save 4.4KiB (21% reduction).
For an example I have image of size 300kb and for this image pagespeed is displaying 100kb & 30% of reduction.
This is only for one image but I am sure I will have lots of images for compression.
so how can I compress image by passing bytes or percentage as a parameter or using anyother calculations in java
(by using API or image-processing Tool) so,that I can get compressed version of image as suggested by google.
Thanks in advance.
You can use Java ImageIO package to do the compression for many images formats, here is an example
import java.awt.image.BufferedImage;
import java.io.*;
import java.util.Iterator;
import javax.imageio.*;
import javax.imageio.stream.*;
public class Compresssion {
public static void main(String[] args) throws IOException {
File input = new File("original_image.jpg");
BufferedImage image = ImageIO.read(input);
File compressedImageFile = new File("compressed_image.jpg");
OutputStream os = new FileOutputStream(compressedImageFile);
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("jpg");
ImageWriter writer = (ImageWriter) writers.next();
ImageOutputStream ios = ImageIO.createImageOutputStream(os);
writer.setOutput(ios);
ImageWriteParam param = writer.getDefaultWriteParam();
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
param.setCompressionQuality(0.05f); // Change the quality value you prefer
writer.write(null, new IIOImage(image, null, null), param);
os.close();
ios.close();
writer.dispose();
}
}
You can find more details about it here
Also there are some third party tools like these
https://collicalex.github.io/JPEGOptimizer/
https://github.com/depsypher/pngtastic
EDIT: If you want to use Google PageSpeed in your application, it is available as web server module either for Apache or Nginx, you can find how to configure it for your website here
https://developers.google.com/speed/pagespeed/module/
But if you want to integrate the PageSpeed C++ library in your application, you can find build instructions for it here.
https://developers.google.com/speed/pagespeed/psol
It also has a Java Client here
https://github.com/googleapis/google-api-java-client-services/tree/main/clients/google-api-services-pagespeedonline/v5
There is colour compression ("compression quality") and there is resolution compression ("resizing"). Fujy's answer deals with compression quality, but this is not where the main savings come from: the main savings come from resizing down to a smaller size. E.g. I got a 4mb photo to 207K using the maximum compression quality using fujy's answer, and it looked awful, but I got it down to 12K using a reasonable quality but a smaller size.
So the above code should be used for "compression quality", but this is my recommendation for resizing:
https://github.com/rkalla/imgscalr/blob/master/src/main/java/org/imgscalr/Scalr.java
I wish resizing was part of the standard Java libraries, but it seems it's not, (or there are image quality problems with the standard methods?). But Riyad's library is really small - it's just one class. I just copied this class into my project, because I never learnt how to use Maven, and it works great.
One liner java solution: thumbnailator.
Maven dependency:
<!-- https://mvnrepository.com/artifact/net.coobird/thumbnailator -->
<dependency>
<groupId>net.coobird</groupId>
<artifactId>thumbnailator</artifactId>
<version>0.4.17</version>
</dependency>
The one liner:
Thumbnails.of(inputImagePathString).scale(scalingFactorFloat).outputQuality(qualityFactorFloat).toFile(outputImagePathString);
As a solution for this problem I can recommend the API of TinyPNG.
You can use it for compressing as well as resizing the image.
Documentation: tinypng.com/developers/reference/java

Error in read .doc and .docx file's content

I want to read a .txt, .doc and .docx files and print the contents of those files.when i run the below code some .doc and .txt files are read but many files are not able to read.
import java.io.File;
import javax.swing.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
public class FindYourDocx
{
public static void main(String[] args)
{
String text = "";
int read, N = 1024 * 1024;
char[] buffer = new char[N];
try {
JFileChooser openFile=new JFileChooser();
openFile.setCurrentDirectory(new File("."));
openFile.showOpenDialog(null);
File f1=openFile.getSelectedFile();
String file1=f1.toString();
File f =new File(file1);
JOptionPane.showMessageDialog(null,f);
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
while(true) {
read = br.read(buffer, 0, N);
text += new String(buffer, 0, read);
System.out.println("Follows"+text+" ");
if(read < N) {
break;
}
System.out.println("Follows"+text+" "); }
} catch(Exception ex) {
ex.printStackTrace();
}
}}
by executing the above code (for some files) i got some wired messages as follows
http://i.stack.imgur.com/RwNWM.jpg
Someone please help me to solve this issues....
to read .docx i came across something like XWPFDocument using apacheio ....what is this ?
First of all you should think about your problem: What do different file types look like as a file, what is their structure, what's the content which you would like to print and what does "printing" mean at all? What your are doing is reading files, treating them as text and printing them to STDOUT. Does "printing" mean this in your case? I interpret "printing" as being able to send content to a printer and get some paper.
Another hint: Doc and Docx are binary files, which contain "printable" text "somewhere". You can't just read the files and do something with the data. You need to know how those file formats look like, were the content is etc. Java can't do that out of the box, you need additional libraries to parse those file formats and do something with them.
There are many tutorials and questions around formats like docx:
How to read docx file content in java api using poi jar
to read .docx i came across something like XWPFDocument using apacheio ....what is this ?
You mean Apache POI. To find out more, check the website. In brief, both Apache POI and docx4j (which I note you have tagged) are Java libraries aimed at reading, manipulating, and writing Microsoft Office files.
'doc' files are Microsoft proprietary binary files. If you try to read them in and display them using the Java IO API alone, all you will see is a representation of the binary data. It won't be useful to you. You need to use an API specifically for loading up and traversing Word files, which is where Apache POI or docx4j come in.
'docx' files are a newer XML-based Microsoft Office format. A docx file is essentially a zipped folder containing the various assets that make up a Word file.
As I said, in order to read a Word file properly, you will need to use one of the libraries mentioned. Both the Apache and docx4j websites contain plenty of example code to get you started opening and traversing Word documents (note that POI can work with the older .doc format, whereas docx4j is only for .docx files).
http://www.docx4java.org
http://poi.apache.org

Convert SVG to PDF

How would one go about converting a SVG file to a PDF programatically? (I need to alter the SVG in certain respects before generating the PDF so simply pre-converting it using a tool won't be sufficient.)
Ideally using Java but Perl or PHP would be fine too.
Obviously I am basically considering Apache FOP and Batik with Java. However no matter how long I search I cannot find a simple introduction on how to do it. Things like SVGConverter have descriptions like "Defines the interface for classes that are able to convert part or all of a GraphicContext", but I don't really know what that means.
I have this feeling there must be an API to do this quite simply, provided by FOP or Batik, but I'm just not able to find it at the moment (or perhaps it really doesn't exist.)
In terms of the supported SVG features I need, the file has some paths which are filled with some linear gradients.
Ideally if I could pass the SVG in as a DOM Document that would be ideal; then I would load my template SVG file, change it as specified by the user, and then generate the PDF.
Thanks to Adrian for showing how the Batik rasterizer API is supposed to be used. However, I needed a more lightweight solution--- I can't write to temporary files, and I want fewer dependencies. So, starting from the methods he pointed to, I found a way to access the lower-level code to do the conversion and nothing else.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import org.apache.batik.transcoder.Transcoder;
import org.apache.batik.transcoder.TranscoderException;
import org.apache.batik.transcoder.TranscoderInput;
import org.apache.batik.transcoder.TranscoderOutput;
import org.apache.fop.svg.PDFTranscoder;
public class Test {
public static void main(String[] argv) throws TranscoderException, FileNotFoundException {
Transcoder transcoder = new PDFTranscoder();
TranscoderInput transcoderInput = new TranscoderInput(new FileInputStream(new File("/tmp/test.svg")));
TranscoderOutput transcoderOutput = new TranscoderOutput(new FileOutputStream(new File("/tmp/test.pdf")));
transcoder.transcode(transcoderInput, transcoderOutput);
}
}
The compile-and-run commands are
javac -cp batik-rasterizer.jar -d build Test.java
java -cp build:batik-rasterizer.jar Test
The important point is that TranscoderInput and TranscoderOutput can work with any InputStream and OutputStream, not just file streams. Note that one of the constructors takes a org.w3c.dom.Document, which means that you don't even need to serialize an SVG DOM into an SVG string, saving an additional step.
This version also doesn't write anything to stdout/stderr, unlike the high-level API.
For JPEG, PNG, or TIFF output, replace org.apache.fop.svg.PDFTranscoder with org.apache.batik.transcoder.image.JPEGTranscoder, PNGTranscoder, or TIFFTranscoder (note that these raster formats are in a different package).
(I'm not quite sure how Java finds the org.apache.batk.transcoder.* and org.apache.fop.svg.PDFTranscoder classes, since I don't see them in the batik-rasterizer.jar.)
Edit:
Although the simple commandline-compilation works with the batik-rasterizer.jar only, it's doing some sort of classloader magic to find all the necessary classes. In a more realistic case (building a project with Ant), you have to find the classes by hand. They can be found in batik-1.7.zip from the Batik project and fop-1.1.zip from the FOP project. From Batik, you need to compile with batik-transcoder.jar and run with
batik-transcoder.jar
batik-anim.jar
batik-awt-util.jar
batik-bridge.jar
batik-css.jar
batik-dom.jar
batik-ext.jar
batik-gvt.jar
batik-parser.jar
batik-script.jar
batik-svg-dom.jar
batik-util.jar
batik-xml.jar
xml-apis-ext.jar
From FOP, you need to compile with fop.jar and run with
fop.jar
avalon-framework-4.2.0.jar
xmlgraphics-commons-1.5.jar
I finally managed to find the appropriate lines of code to solve this using the Batik.
You need to have the SVG file and the resulting PDF as files on the disk, i.e. I couldn't find a way to do it in-memory (I am writing a HTTP Servlet so I have no intrinsic need to write anything as a file, ideally I would stream the result to the HTTP client). I used File.createTemporaryFile to create a file to dump out my SVG to a file, and for the resulting PDF to be written to.
So the lines I used are the following:
import org.apache.batik.apps.rasterizer.DestinationType;
import org.apache.batik.apps.rasterizer.SVGConverter;
import ...
// SVG available as a DOM object (created programatically by my program)
Document svgXmlDoc = ...
// Save this SVG into a file (required by SVG -> PDF transformation process)
File svgFile = File.createTempFile("graphic-", ".svg");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMSource source2 = new DOMSource(svgXmlDoc);
FileOutputStream fOut = new FileOutputStream(svgFile);
try { transformer.transform(source2, new StreamResult(fOut)); }
finally { fOut.close(); }
// Convert the SVG into PDF
File outputFile = File.createTempFile("result-", ".pdf");
SVGConverter converter = new SVGConverter();
converter.setDestinationType(DestinationType.PDF);
converter.setSources(new String[] { svgFile.toString() });
converter.setDst(outputFile);
converter.execute();
And I have the following JARs (search using Google to find the projects and download them):
avalon-framework-4.2.0.jar
batik-all-1.7.jar
commons-io-1.3.1.jar
commons-logging-1.0.4.jar
fop-0.95.jar
log4j-1.2.15.jar
xml-apis-ext.jar
xmlgraphics-commons-1.3.1.jar
you will need a libray for rendering svg's and pdf's.
I recommend SVG salamander for the former, and iText for the latter. With svg salamander you can to read the svg and create an image object, and with itext you can write that image to a pdf.
I use Altsoft Xml2PDF. If I understood correctly all your needs and requirement, you'd better try their Server version of Xml2PDF.
All you need is phantomjs. You don't need the unwieldy Batik for this at all; just get to a point where you can run phantomjs, calling rasterize.js, using the url of the pdf as a source, and a location as the output. Depending on what you want to do with the .pdf, you don't even need Java.
http://phantomjs.org/screen-capture.html
Look at the part starting with "Beside PNG format, PhantomJS supports JPEG, GIF, and PDF."

How do we convert WMF/EMF (MS metafiles) into standard images like JPG or PNG using any Java API?

I have been stuck in converting WMF/EMF images into standard image format such as JPG or PNG using Java.
What are the best options available?
The Batik library is a toolkit to handle SVG in Java. There are converters included like WMFTranscoder to convert from WMF to SVG and JPEGTranscoder and PNGTranscoder to convert SVG to JPEG/PNG. See Transcoder API Docs for more details.
Another alternative is ImageMagick. It's not Java but has Java bindings: im4java and JMagick.
wmf is a vector file format. For best results, convert them to .svg or .pdf format.
I did it in two stages
1) wmf2fig --auto XXXX.wmf
2) fig2pdf --nogv XXXX.fig
I created a python script for bulk conversion
import subprocess as sbp
a = sbp.Popen("ls *.wmf",shell=True, stderr=sbp.PIPE, stdout=sbp.PIPE)
filelist = a.communicate()[0].splitlines()
for ele in filelist:
cmdarg = 'wmf2fig --auto '+ ele.rsplit('.',1)[0]+'.wmf'
a = sbp.Popen(cmdarg, shell=True, stderr=sbp.PIPE, stdout=sbp.PIPE)
out = a.communicate()
for ele in filelist:
cmdarg = 'fig2pdf --nogv '+ ele.rsplit('.',1)[0]+'.fig'
a = sbp.Popen(cmdarg, shell=True, stderr=sbp.PIPE, stdout=sbp.PIPE)
out = a.communicate()
cmdarg = 'rm *.fig'
a = sbp.Popen(cmdarg, shell=True, stderr=sbp.PIPE, stdout=sbp.PIPE)
out = a.communicate()
If you are deploying your application in a Windows environment, then SWT can handle the conversion for you.
Image image = new Image(Display.getCurrent(), "test.wmf");
ImageLoader loader = new ImageLoader();
loader.data = new ImageData[] { image.getImageData() };
try(FileOutputStream stream = new FileOutputStream("test.png"))
{
loader.save(stream, SWT.IMAGE_PNG);
}
image.dispose();
The purpose of SWT is to provide a Java wrapper around native functionality, and in this case it is calling the windows GDI directly to get it to render the WMF.
I've created some wrappers around the Batik package (as mentioned by vanje's answer) some time ago, that provides ImageIO support for SVG and WMF/EMF.
With these plugins you should be able to write:
ImageIO.write(ImageIO.read(wmfFile), pngFile, "png");
Source code on GitHub.
While the ImageIO plugins are convenient, im4java and JMagick might still have better format support.
Here is one way.
Get (or make) a Java component that can render the files in question.
Create a BufferedImage the same size as the component needs to display the image.
Get the Graphics object from the BufferedImage.
Call renderComponent.paintComponent(Graphics)
Save the image using one of the ImageIO.write() variants.
See my answer to Swing: Obtain Image of JFrame for steps 2-5. Step 1. is something I'd ask Google about.

Categories