Apache FOP and Java Image Issues - Combining multiple sources - java

I am trying to "automate" the building of a PDF using Apache FOP and Java. I want to minimize the hard coding since I don't know in advance all the file combinations I am going to need to support. In addition I want to try and not save files on the hard drive. Files on the HD introduces security, performance, threading and cleanup considerations I would rather not handle.
The test case I am using right now has 1 FO and 2 PNG files. One of the PNG files is over 1MB.
Ideally I would create 3 sources:
InputStream fo = new InputStream(new File("C:\\Temp\\FOP\\Test\\blah.fo"));
InputStream png1 = new InputStream(new File("C:\\Temp\\FOP\\Test\\image-1.png"));
InputStream png2 = new InputStream(new File("C:\\Temp\\FOP\\Test\\image-2.png"));
Source foSrc = new StreamSource(fo);
Source png1Src = new StreamSource(png1);
Source png2Src = new StreamSource(png2);
and then combine them all together to generate the PDF. I can't find a way using the API to do that.
The FO files refers to the images via:
<fo:external-graphic src="file:image-1.png"/>
<fo:external-graphic src="file:image-2.png"/>
When I use the command line FOP tools, it builds the PDF as I would expect. As long as the two images are in the same directory as the FO file, then all is good. Using the command line, there is no need to point out the existence or location of the images.
When using Java, I have tried a number of configurations, but none of them fit my need:
I saved the FO file and the 2 images into the same directory and referred to them using the following FopFactory constructor:
private static final FopFactory fopFactory = FopFactory.newInstance(new File("C:\\Temp\\FOP\\test").toURI());
This code base only finds the smaller of the two images. It seems like the larger one is being ignored since it is bigger than some limit.
I have tried the above constructor using various relative and absolute paths.
I have tried constructing FopFactory using the default "fop.xconf" file and adding the "C:\Temp\FOP\Test" directory to the classpath.
I have "hardcoded" the files and their locations in the FO file.
I have tried using intermediate files structure (IFDocumentHandler, IFSerializer and IFConcatenator) for the images and get errors that way. Seems the intermediate files are not intended for images.
I have been able to embed the file into the FO file using base64 encoding and the syntax:
<fo:external-graphic src="url('data:image/png;base64,iVBORw...ggg==')"/>
The last one seems like the best solution other than taking 3 sources and using all 3 to generate the PDF. Any suggestions on how to use the API to combine the 3 sources?
Thanks.

Related

Is there any way to update the Exif tag in jpg file using java?

Im cleaning up my jpg files by tagging them in categories. However, the amount is too much to do it with hand operation so Im trying to tag the jpg files using java code (to automate it).
I googled several times but could not find the answer.
// What I want is something like this.
String path = "F:\\test.jpg"
File test_file = new File(path);
test_file.property.tag = "school";
test_file.property.tag = "xxxxCity";
None since I don't know the source code of doing it.

Save image file to HDFS using Spark

I have an image file
image = JavaSparkContext.binaryFiles("/path/to/image.jpg");
I would like to process then save the binary info using Spark to HDFSSomething like :
image.saveAsBinaryFile("hdfs://cluster:port/path/to/image.jpg")
Is this possible, not saying 'as simple', just possible to do this? if so how would you do this. Trying to keep a one to one if possible as in keeping the extension and type, so if I directly download using hdfs command line it would still be a viable image file.
Yes, it is possible. But you need some data serialization plugin, for example avro(https://github.com/databricks/spark-avro).
Assume image is presented as binary(byte[]) in your program, so the images can be a Dataset<byte[]>.
You can save it using
datasetOfImages.write()
.format("com.databricks.spark.avro")
.save("hdfs://cluster:port/path/to/images.avro");
images.avro would be a folder contains multiple partitions and each partition would be an avro file saving some images.
Edit:
it is also possible but not recommended to save the images as separated files. You can call foreach on the dataset and use HDFS api to save the image.
see below for a piece of code written in Scala. You should be able to translate it into Java.
import org.apache.hadoop.fs.{FileSystem, Path}
datasetOfImages.foreachPartition { images =>
val fs = FileSystem.get(sparkContext.hadoopConfiguration)
images.foreach { image =>
val out = fs.create(new Path("/path/to/this/image"))
out.write(image);
out.close();
}
}

Is it possible to read a shapefile using geotools WITHOUT specifying the url of the file?

I am creating a web application which will allow the upload of shape files for use later on in the program. I want to be able to read an uploaded shapefile into memory and extract some information from it without doing any explicit writing to the disk. The framework I am using (play-framework) automatically writes a temporary file to the disk when a file is uploaded, but it nicely handles the creation and deletion of said file for me. This file does not have any extension, however, so the traditional means of reading a shapefile via Geotools, like this
public void readInShpAndDoStuff(File the_upload){
Map<String, Serializable> map = new HashMap<>();
map.put( "url", the_upload.toURI().toURL() );
DataStore dataStore = DataStoreFinder.getDataStore( map );
}
fails with an exception which states
NAME_OF_TMP_FILE_HERE is not one of the files types that is known to be associated with a shapefile
After looking at the source of Geotools I see that the file type is checked by looking at the file extension, and since this is a tmp file it has none. (Running file FILENAME shows that the OS recognizes this file as a shapefile).
So at long last my question is, is there a way to read in the shapefile without specifying the Url? Some function or constructor which takes a File object as the argument and doesn't rely on a path? Or is it too much trouble and I should just save a copy on the disk? The latter option is not preferable, since this will likely be operating on a VM server at some point and I don't want to deal with file system specific stuff.
Thanks in advance for any help!
I can't see how this is going to work for you, a shapefile (despite it's name) is a group of 3 (or more) files which share a basename and have extensions of .shp, .dbf, .sbx (and usually .prj, .sbn, .fix, .qix etc).
Is there someway to make play write the extensions with the tempfile name?

Dom manipulation [duplicate]

How would one go about converting a SVG file to a PDF programatically? (I need to alter the SVG in certain respects before generating the PDF so simply pre-converting it using a tool won't be sufficient.)
Ideally using Java but Perl or PHP would be fine too.
Obviously I am basically considering Apache FOP and Batik with Java. However no matter how long I search I cannot find a simple introduction on how to do it. Things like SVGConverter have descriptions like "Defines the interface for classes that are able to convert part or all of a GraphicContext", but I don't really know what that means.
I have this feeling there must be an API to do this quite simply, provided by FOP or Batik, but I'm just not able to find it at the moment (or perhaps it really doesn't exist.)
In terms of the supported SVG features I need, the file has some paths which are filled with some linear gradients.
Ideally if I could pass the SVG in as a DOM Document that would be ideal; then I would load my template SVG file, change it as specified by the user, and then generate the PDF.
Thanks to Adrian for showing how the Batik rasterizer API is supposed to be used. However, I needed a more lightweight solution--- I can't write to temporary files, and I want fewer dependencies. So, starting from the methods he pointed to, I found a way to access the lower-level code to do the conversion and nothing else.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import org.apache.batik.transcoder.Transcoder;
import org.apache.batik.transcoder.TranscoderException;
import org.apache.batik.transcoder.TranscoderInput;
import org.apache.batik.transcoder.TranscoderOutput;
import org.apache.fop.svg.PDFTranscoder;
public class Test {
public static void main(String[] argv) throws TranscoderException, FileNotFoundException {
Transcoder transcoder = new PDFTranscoder();
TranscoderInput transcoderInput = new TranscoderInput(new FileInputStream(new File("/tmp/test.svg")));
TranscoderOutput transcoderOutput = new TranscoderOutput(new FileOutputStream(new File("/tmp/test.pdf")));
transcoder.transcode(transcoderInput, transcoderOutput);
}
}
The compile-and-run commands are
javac -cp batik-rasterizer.jar -d build Test.java
java -cp build:batik-rasterizer.jar Test
The important point is that TranscoderInput and TranscoderOutput can work with any InputStream and OutputStream, not just file streams. Note that one of the constructors takes a org.w3c.dom.Document, which means that you don't even need to serialize an SVG DOM into an SVG string, saving an additional step.
This version also doesn't write anything to stdout/stderr, unlike the high-level API.
For JPEG, PNG, or TIFF output, replace org.apache.fop.svg.PDFTranscoder with org.apache.batik.transcoder.image.JPEGTranscoder, PNGTranscoder, or TIFFTranscoder (note that these raster formats are in a different package).
(I'm not quite sure how Java finds the org.apache.batk.transcoder.* and org.apache.fop.svg.PDFTranscoder classes, since I don't see them in the batik-rasterizer.jar.)
Edit:
Although the simple commandline-compilation works with the batik-rasterizer.jar only, it's doing some sort of classloader magic to find all the necessary classes. In a more realistic case (building a project with Ant), you have to find the classes by hand. They can be found in batik-1.7.zip from the Batik project and fop-1.1.zip from the FOP project. From Batik, you need to compile with batik-transcoder.jar and run with
batik-transcoder.jar
batik-anim.jar
batik-awt-util.jar
batik-bridge.jar
batik-css.jar
batik-dom.jar
batik-ext.jar
batik-gvt.jar
batik-parser.jar
batik-script.jar
batik-svg-dom.jar
batik-util.jar
batik-xml.jar
xml-apis-ext.jar
From FOP, you need to compile with fop.jar and run with
fop.jar
avalon-framework-4.2.0.jar
xmlgraphics-commons-1.5.jar
I finally managed to find the appropriate lines of code to solve this using the Batik.
You need to have the SVG file and the resulting PDF as files on the disk, i.e. I couldn't find a way to do it in-memory (I am writing a HTTP Servlet so I have no intrinsic need to write anything as a file, ideally I would stream the result to the HTTP client). I used File.createTemporaryFile to create a file to dump out my SVG to a file, and for the resulting PDF to be written to.
So the lines I used are the following:
import org.apache.batik.apps.rasterizer.DestinationType;
import org.apache.batik.apps.rasterizer.SVGConverter;
import ...
// SVG available as a DOM object (created programatically by my program)
Document svgXmlDoc = ...
// Save this SVG into a file (required by SVG -> PDF transformation process)
File svgFile = File.createTempFile("graphic-", ".svg");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMSource source2 = new DOMSource(svgXmlDoc);
FileOutputStream fOut = new FileOutputStream(svgFile);
try { transformer.transform(source2, new StreamResult(fOut)); }
finally { fOut.close(); }
// Convert the SVG into PDF
File outputFile = File.createTempFile("result-", ".pdf");
SVGConverter converter = new SVGConverter();
converter.setDestinationType(DestinationType.PDF);
converter.setSources(new String[] { svgFile.toString() });
converter.setDst(outputFile);
converter.execute();
And I have the following JARs (search using Google to find the projects and download them):
avalon-framework-4.2.0.jar
batik-all-1.7.jar
commons-io-1.3.1.jar
commons-logging-1.0.4.jar
fop-0.95.jar
log4j-1.2.15.jar
xml-apis-ext.jar
xmlgraphics-commons-1.3.1.jar
you will need a libray for rendering svg's and pdf's.
I recommend SVG salamander for the former, and iText for the latter. With svg salamander you can to read the svg and create an image object, and with itext you can write that image to a pdf.
I use Altsoft Xml2PDF. If I understood correctly all your needs and requirement, you'd better try their Server version of Xml2PDF.
All you need is phantomjs. You don't need the unwieldy Batik for this at all; just get to a point where you can run phantomjs, calling rasterize.js, using the url of the pdf as a source, and a location as the output. Depending on what you want to do with the .pdf, you don't even need Java.
http://phantomjs.org/screen-capture.html
Look at the part starting with "Beside PNG format, PhantomJS supports JPEG, GIF, and PDF."

Convert SVG to PDF

How would one go about converting a SVG file to a PDF programatically? (I need to alter the SVG in certain respects before generating the PDF so simply pre-converting it using a tool won't be sufficient.)
Ideally using Java but Perl or PHP would be fine too.
Obviously I am basically considering Apache FOP and Batik with Java. However no matter how long I search I cannot find a simple introduction on how to do it. Things like SVGConverter have descriptions like "Defines the interface for classes that are able to convert part or all of a GraphicContext", but I don't really know what that means.
I have this feeling there must be an API to do this quite simply, provided by FOP or Batik, but I'm just not able to find it at the moment (or perhaps it really doesn't exist.)
In terms of the supported SVG features I need, the file has some paths which are filled with some linear gradients.
Ideally if I could pass the SVG in as a DOM Document that would be ideal; then I would load my template SVG file, change it as specified by the user, and then generate the PDF.
Thanks to Adrian for showing how the Batik rasterizer API is supposed to be used. However, I needed a more lightweight solution--- I can't write to temporary files, and I want fewer dependencies. So, starting from the methods he pointed to, I found a way to access the lower-level code to do the conversion and nothing else.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import org.apache.batik.transcoder.Transcoder;
import org.apache.batik.transcoder.TranscoderException;
import org.apache.batik.transcoder.TranscoderInput;
import org.apache.batik.transcoder.TranscoderOutput;
import org.apache.fop.svg.PDFTranscoder;
public class Test {
public static void main(String[] argv) throws TranscoderException, FileNotFoundException {
Transcoder transcoder = new PDFTranscoder();
TranscoderInput transcoderInput = new TranscoderInput(new FileInputStream(new File("/tmp/test.svg")));
TranscoderOutput transcoderOutput = new TranscoderOutput(new FileOutputStream(new File("/tmp/test.pdf")));
transcoder.transcode(transcoderInput, transcoderOutput);
}
}
The compile-and-run commands are
javac -cp batik-rasterizer.jar -d build Test.java
java -cp build:batik-rasterizer.jar Test
The important point is that TranscoderInput and TranscoderOutput can work with any InputStream and OutputStream, not just file streams. Note that one of the constructors takes a org.w3c.dom.Document, which means that you don't even need to serialize an SVG DOM into an SVG string, saving an additional step.
This version also doesn't write anything to stdout/stderr, unlike the high-level API.
For JPEG, PNG, or TIFF output, replace org.apache.fop.svg.PDFTranscoder with org.apache.batik.transcoder.image.JPEGTranscoder, PNGTranscoder, or TIFFTranscoder (note that these raster formats are in a different package).
(I'm not quite sure how Java finds the org.apache.batk.transcoder.* and org.apache.fop.svg.PDFTranscoder classes, since I don't see them in the batik-rasterizer.jar.)
Edit:
Although the simple commandline-compilation works with the batik-rasterizer.jar only, it's doing some sort of classloader magic to find all the necessary classes. In a more realistic case (building a project with Ant), you have to find the classes by hand. They can be found in batik-1.7.zip from the Batik project and fop-1.1.zip from the FOP project. From Batik, you need to compile with batik-transcoder.jar and run with
batik-transcoder.jar
batik-anim.jar
batik-awt-util.jar
batik-bridge.jar
batik-css.jar
batik-dom.jar
batik-ext.jar
batik-gvt.jar
batik-parser.jar
batik-script.jar
batik-svg-dom.jar
batik-util.jar
batik-xml.jar
xml-apis-ext.jar
From FOP, you need to compile with fop.jar and run with
fop.jar
avalon-framework-4.2.0.jar
xmlgraphics-commons-1.5.jar
I finally managed to find the appropriate lines of code to solve this using the Batik.
You need to have the SVG file and the resulting PDF as files on the disk, i.e. I couldn't find a way to do it in-memory (I am writing a HTTP Servlet so I have no intrinsic need to write anything as a file, ideally I would stream the result to the HTTP client). I used File.createTemporaryFile to create a file to dump out my SVG to a file, and for the resulting PDF to be written to.
So the lines I used are the following:
import org.apache.batik.apps.rasterizer.DestinationType;
import org.apache.batik.apps.rasterizer.SVGConverter;
import ...
// SVG available as a DOM object (created programatically by my program)
Document svgXmlDoc = ...
// Save this SVG into a file (required by SVG -> PDF transformation process)
File svgFile = File.createTempFile("graphic-", ".svg");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMSource source2 = new DOMSource(svgXmlDoc);
FileOutputStream fOut = new FileOutputStream(svgFile);
try { transformer.transform(source2, new StreamResult(fOut)); }
finally { fOut.close(); }
// Convert the SVG into PDF
File outputFile = File.createTempFile("result-", ".pdf");
SVGConverter converter = new SVGConverter();
converter.setDestinationType(DestinationType.PDF);
converter.setSources(new String[] { svgFile.toString() });
converter.setDst(outputFile);
converter.execute();
And I have the following JARs (search using Google to find the projects and download them):
avalon-framework-4.2.0.jar
batik-all-1.7.jar
commons-io-1.3.1.jar
commons-logging-1.0.4.jar
fop-0.95.jar
log4j-1.2.15.jar
xml-apis-ext.jar
xmlgraphics-commons-1.3.1.jar
you will need a libray for rendering svg's and pdf's.
I recommend SVG salamander for the former, and iText for the latter. With svg salamander you can to read the svg and create an image object, and with itext you can write that image to a pdf.
I use Altsoft Xml2PDF. If I understood correctly all your needs and requirement, you'd better try their Server version of Xml2PDF.
All you need is phantomjs. You don't need the unwieldy Batik for this at all; just get to a point where you can run phantomjs, calling rasterize.js, using the url of the pdf as a source, and a location as the output. Depending on what you want to do with the .pdf, you don't even need Java.
http://phantomjs.org/screen-capture.html
Look at the part starting with "Beside PNG format, PhantomJS supports JPEG, GIF, and PDF."

Categories