As SVG is a regular XML file and ImageTranscoder.transcode() API accepts org.w3c.dom.Document, respective TranscoderInput constructor accepts org.w3c.dom.Document; one would expect that loading and parsing file with a Java stock XML parser would work:
TranscoderInput input = new TranscoderInput(loadSvgDocument(new FileInputStream(svgFile)));
BufferedImageTranscoder t = new BufferedImageTranscoder();
t.transcode(input, null);
Where loadSvgDocument() method is defined as:
Document loadSvgDocument(String svgFileName, InputStream is) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// using stock Java 8 XML parser
Document document;
try {
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(is);
} catch (...) {...}
return document;
}
It does not work. I am getting strange casting exceptions.
Exception in thread "main" java.lang.ClassCastException: org.apache.batik.dom.GenericElement cannot be cast to org.w3c.dom.svg.SVGSVGElement
at org.apache.batik.anim.dom.SVGOMDocument.getRootElement(SVGOMDocument.java:235)
at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(SVGAbstractTranscoder.java:193)
at org.apache.batik.transcoder.image.ImageTranscoder.transcode(ImageTranscoder.java:92)
at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(XMLAbstractTranscoder.java:142)
at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(SVGAbstractTranscoder.java:156)
Note: class BufferedImageTranscoder is my class, created as per Batik blueprints, extending ImageTranscoder which in turn extends SVGAbstractTranscoder mentioned in the stack trace above.
Unfortunately I cannot use Batik own parser, SAXSVGDocumentFactory:
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new SAXSVGDocumentFactory(parser);
svgDocument = (SVGDocument) f.createDocument(..);
I am trying to render Papirus SVG icons but they all have <svg ... version="1"> and SAXSVGDocumentFactory does not like that and fails on the createDocument(..) with Unsupport SVG version '1'. They probably meant unsupported.
Exception in thread "main" java.lang.RuntimeException: Unsupport SVG version '1'
at org.apache.batik.anim.dom.SAXSVGDocumentFactory.getDOMImplementation(SAXSVGDocumentFactory.java:327)
at org.apache.batik.dom.util.SAXDocumentFactory.startElement(SAXDocumentFactory.java:640)
. . .
at org.apache.batik.anim.dom.SAXSVGDocumentFactory.createDocument(SAXSVGDocumentFactory.java:225)
Changing version="1" to version="1.0" in the file itself fixes the problem and the icon is rendered nicely for me. But there are hundreds (thousands) of icons and fixing them all is tedious and I would effectively create a port of their project. This is not a way forward for me. Much easier is to make the fix in run time, using DOM API:
Element svgDocumentNode = svgDocument.getDocumentElement();
String svgVersion = svgDocumentNode.getAttribute("version");
if (svgVersion.equals("1")) {
svgDocumentNode.setAttribute("version", "1.0");
}
But that can be done only with stock Java XML parser, Batik XML parser blows too early, before this code can be reached, before Document is generated. But when I use stock XML parser, make the version fix, then Batik Transcoder (rasterizer) does not like it. So I hit a wall here.
Is there a convertor from a stock XML parser produced org.w3c.dom.Document and Batik compatible org.w3c.dom.svg.SVGDocument?
OK, I found a solution bypassing the problem. Luckily class SAXSVGDocumentFactory can be easily subclasses and critical method
getDOMImplementation() overriden.
protected Document loadSvgDocument(InputStream is) {
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new LenientSaxSvgDocumentFactory(parser);
SVGDocument svgDocument;
try {
svgDocument = (SVGDocument) f.createDocument("aaa", is);
} catch (...) {
...
}
return svgDocument;
}
static class LenientSaxSvgDocumentFactory extends SAXSVGDocumentFactory {
public LenientSaxSvgDocumentFactory(String parser) {
super(parser);
}
#Override
public DOMImplementation getDOMImplementation(String ver) {
// code is mostly rip-off from original Apache Batik 1.9 code
// only the condition was extended to accept also "1" string
if (ver == null || ver.length() == 0
|| ver.equals("1.0") || ver.equals("1.1") || ver.equals("1")) {
return SVGDOMImplementation.getDOMImplementation();
} else if (ver.equals("1.2")) {
return SVG12DOMImplementation.getDOMImplementation();
}
throw new RuntimeException("Unsupported SVG version '" + ver + "'");
}
}
This time I got lucky, the main question remains however: is there a convertor from a stock XML parser produced org.w3c.dom.Document and Batik compatible org.w3c.dom.svg.SVGDocument?
Related
I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks
Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.
I want to read an XML file in Java and then update certain elements in that file with new values. My file is > 200mb and performance is important, so the DOM model cannot be used.
I feel that a StaX Parser is the solution, but there is no decent literature on using Java StaX to read and then write XML back to the same file.
(For reference I have been using the java tutorial and this helpful tutorial to get what I have so far)
I am using Java 7, but there doesn't seem to be any updates to the XML parsing API since...a long time ago. So this probably isn't relevant.
Currently I have this:
public static String readValueFromXML(final File xmlFile, final String value) throws FileNotFoundException, XMLStreamException
{
XMLEventReader reader = new XMLInputFactory.newFactory().createXMLEventReader(new FileReader(xmlFile));
String found = "";
boolean read = false;
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
if (event.isStartElement() &&
event.asStartElement().getName().getLocalPart().equals(value))
{
read = true;
}
if (event.isCharacters() && read)
{
found = event.asCharacters().getData();
break;
}
}
return found;
}
which will read the XMLFile and return the value of the selected element. However, I have another method updateXMLFile(final File xmlFile, final String value) which I want to use in conjunction with this.
So my question is threefold:
Is there a StaX implementation for editing XML
Will XPath be any help? Can that be used without converting my file to a Document?
(More Generally) Why doesn't Java have a better XML API?
There are two things you may want to look at. The first is to use JAXB to bind the XML to POJOs which you can then have your way with and serialize the structure back to XML when needed.
The second is a JDBC driver for XML, there are several available for a fee, not sure if there are any open source ones or not. In my experience JAXB is the better choice. If the XML file is too large to handle efficiently with JAXB I think you need to look at using a database as a replacement for the XML file.
This is my approach, which reads events from the file using StaX and writes them to another file. The values are updated as the loop passes over the correctly named elements.
public void read(String key, String value)
{
try (FileReader fReader = new FileReader(inputFile); FileWriter fWriter = new FileWriter(outputFile))
{
XMLEventFactory factory = XMLEventFactory.newInstance();
XMLEventReader reader = XMLInputFactory.newFactory().createXMLEventReader(fReader);
XMLEventWriter writer = XMLOutputFactory.newFactory().createXMLEventWriter(fWriter);
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
boolean update = false;
if (event.isStartElement() && event.asStartElement().getName().getLocalPart().equals(key))
{
update = true;
}
else if (event.isCharacters() && update)
{
Characters characters = factory.createCharacters(value);
event = characters;
update = false;
}
writer.add(event);
}
}
catch (XMLStreamException | FactoryConfigurationError | IOException e)
{
e.printStackTrace();
}
}
I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks
Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.
I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.
The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)
I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.
I'm having problems parsing an xml string using XmlBeans. The problem itself is in a J2EE application where the string itself is received from external systems, but i replicated the problem in a small test project.
The only solution i found is to let XmlBeans parse a File instead of a String, but that's not an option in the J2EE application. Plus i really want to know what exactly the problem is because i want to solve it.
Source of test class:
public class TestXmlSpy {
public static void main(String[] args) throws IOException {
InputStreamReader reader = new InputStreamReader(new FileInputStream("d:\\temp\\IE734.xml"),"UTF-8");
BufferedReader r = new BufferedReader(reader);
String xml = "";
String str;
while ((str = r.readLine()) != null) {
xml = xml + str;
}
xml = xml.trim();
System.out.println("Ready reading XML");
XmlOptions options = new XmlOptions();
options.setCharacterEncoding("UTF-8");
try {
XmlObject xmlObject = XmlObject.Factory.parse(new File("D:\\temp\\IE734.xml"), options);
System.out.println("Ready parsing File");
XmlObject.Factory.parse(xml, options);
System.out.println("Ready parsing String");
} catch (XmlException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The XML file validates perfectly against the XSD's im using. Also, parsing it as a File object works fine and gives me a parsed XmlObject to work with. However, parsing the xml-String gives the stacktrace below. I've checked the string itself in the debugger and don't really see anything wrong with it at first sight, especially not at row 1 column 1 where i think the Sax parser is having a problem with if i'm interpreting the error correctly.
Stacktrace:
Ready reading XML
Ready parsing File
org.apache.xmlbeans.XmlException: error: Unexpected element: CDATA
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3511)
at org.apache.xmlbeans.impl.store.Locale.parse(Locale.java:713)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:697)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:684)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:208)
at org.apache.xmlbeans.XmlObject$Factory.parse(XmlObject.java:658)
at xmlspy.TestXmlSpy.main(TestXmlSpy.java:37)
Caused by: org.xml.sax.SAXParseException; systemId: file:; lineNumber: 1; columnNumber: 1; Unexpected element: CDATA
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportFatalError(Piccolo.java:1038)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:723)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
... 6 more
This is an encoding problem, I used the below code that worked for me:
File xmlFile = new File("./data/file.xml");
FileDocument fileDoc = FileDocument.Factory.parse(xmlFile);
The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.
The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.
What does the org.apache.xmlbeans.XmlException with a message of “Unexpected element: CDATA” mean?
XMLBeans - Problem with XML files if length is exactly 8193bytes
Issue reported on XMLBean Jira