ghost4j class cast exception during joining two PostScripts - java

I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.

The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)

I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.

Related

Using Groovy to overwrite a FlowFile in NiFi

I'm trying to do something fairly simple and read an i9 PDF form from an incoming FlowFile, parse the first and last name out of it into a JSON, then output the JSON to the outgoing FlowFile.
I found no official documentation on how to do this, but someone has written up several cookbooks on doing things in several scripting languages in NiFi here. It seems pretty straightforward and I'm pretty sure I'm doing what is written there, but I'm not even sure the PDF is being read at all. It simply passes the PDF unmodified out to REL_SUCCESS every time.
Link to sample PDF
import java.nio.charset.StandardCharsets
import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
import java.nio.charset.StandardCharsets
def flowFile = session.get()
flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
//Load Flowfile contents
PDDocument document = PDDocument.load(inputStream)
PDFTextStripperByArea stripper = new PDFTextStripperByArea()
//Get the first page
List<PDPage> allPages = document.getDocumentCatalog().getAllPages()
PDPage page = allPages.get(0)
//Define the areas to search and add them as search regions
stripper = new PDFTextStripperByArea()
Rectangle lname = new Rectangle(25, 226, 240, 15)
stripper.addRegion("lname", lname)
Rectangle fname = new Rectangle(276, 226, 240, 15)
stripper.addRegion("fname", fname)
//Load the results into a JSON
def boxMap = [:]
stripper.setSortByPosition(true)
stripper.extractRegions(page)
regions = stripper.getRegions()
for (String region : regions) {
String box = stripper.getTextForRegion(region)
boxMap.put(region, box)
}
Gson gson = new Gson()
//Remove random noise from the output
json = gson.toJson(boxMap, LinkedHashMap.class)
json = json.replace('\\n', '')
json = json.replace('\\r', '')
json = json.replace(',"', ',\n"')
//Overwrite flowfile contents with JSON
outputStream.write(json.getBytes(StandardCharsets.UTF_8))
} catch (Exception e){
System.out.println(e.getMessage())
session.transfer(flowFile, REL_FAILURE)
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
EDIT:
Was able to confirm that the flowFile object is being read properly by subbing a txt file in. So the problem seems to be that the inputStream is never being handed off to the PDDocument or something is happening when it does. I edited the code to try reading it into a File object first but that resulted in an error:
FlowFileHandlingException: null is not known in this session
EDIT Edit:
Solved by moving my try/catch. I don't seem to understand how that works, my code above has been edited and works properly.
session.get can return null, so definitely add a line after that if(!flowFile) return. Also put the try/catch outside the session.write, that way you can put the session.transfer(flowFile, REL_SUCCESS) after the session.write (inside the try) and the catch can transfer to failure.
Also I can't tell from the code how the PDFTextStripperByArea works to get the info from the incoming document. It looks like all the document stuff is inside the try, so wouldn't be available to the PDFTextStripper (and isn't passed in).
None of these things explain why you're getting the original flow file on the success relationship, but maybe there's something I'm not seeing that would be magically fixed by the changes above :)
Also, if you use log.info() or log.error() rather than System.out.println, you will see the output in the NiFi logs (and for error it will post a bulletin to the processor and you can see the message if you hover over the top right corner (red square if bulletin is present) of the processor.

Using vanilla W3C Document and Apache Batik SVG rasterizer

As SVG is a regular XML file and ImageTranscoder.transcode() API accepts org.w3c.dom.Document, respective TranscoderInput constructor accepts org.w3c.dom.Document; one would expect that loading and parsing file with a Java stock XML parser would work:
TranscoderInput input = new TranscoderInput(loadSvgDocument(new FileInputStream(svgFile)));
BufferedImageTranscoder t = new BufferedImageTranscoder();
t.transcode(input, null);
Where loadSvgDocument() method is defined as:
Document loadSvgDocument(String svgFileName, InputStream is) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// using stock Java 8 XML parser
Document document;
try {
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(is);
} catch (...) {...}
return document;
}
It does not work. I am getting strange casting exceptions.
Exception in thread "main" java.lang.ClassCastException: org.apache.batik.dom.GenericElement cannot be cast to org.w3c.dom.svg.SVGSVGElement
at org.apache.batik.anim.dom.SVGOMDocument.getRootElement(SVGOMDocument.java:235)
at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(SVGAbstractTranscoder.java:193)
at org.apache.batik.transcoder.image.ImageTranscoder.transcode(ImageTranscoder.java:92)
at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(XMLAbstractTranscoder.java:142)
at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(SVGAbstractTranscoder.java:156)
Note: class BufferedImageTranscoder is my class, created as per Batik blueprints, extending ImageTranscoder which in turn extends SVGAbstractTranscoder mentioned in the stack trace above.
Unfortunately I cannot use Batik own parser, SAXSVGDocumentFactory:
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new SAXSVGDocumentFactory(parser);
svgDocument = (SVGDocument) f.createDocument(..);
I am trying to render Papirus SVG icons but they all have <svg ... version="1"> and SAXSVGDocumentFactory does not like that and fails on the createDocument(..) with Unsupport SVG version '1'. They probably meant unsupported.
Exception in thread "main" java.lang.RuntimeException: Unsupport SVG version '1'
at org.apache.batik.anim.dom.SAXSVGDocumentFactory.getDOMImplementation(SAXSVGDocumentFactory.java:327)
at org.apache.batik.dom.util.SAXDocumentFactory.startElement(SAXDocumentFactory.java:640)
. . .
at org.apache.batik.anim.dom.SAXSVGDocumentFactory.createDocument(SAXSVGDocumentFactory.java:225)
Changing version="1" to version="1.0" in the file itself fixes the problem and the icon is rendered nicely for me. But there are hundreds (thousands) of icons and fixing them all is tedious and I would effectively create a port of their project. This is not a way forward for me. Much easier is to make the fix in run time, using DOM API:
Element svgDocumentNode = svgDocument.getDocumentElement();
String svgVersion = svgDocumentNode.getAttribute("version");
if (svgVersion.equals("1")) {
svgDocumentNode.setAttribute("version", "1.0");
}
But that can be done only with stock Java XML parser, Batik XML parser blows too early, before this code can be reached, before Document is generated. But when I use stock XML parser, make the version fix, then Batik Transcoder (rasterizer) does not like it. So I hit a wall here.
Is there a convertor from a stock XML parser produced org.w3c.dom.Document and Batik compatible org.w3c.dom.svg.SVGDocument?
OK, I found a solution bypassing the problem. Luckily class SAXSVGDocumentFactory can be easily subclasses and critical method
getDOMImplementation() overriden.
protected Document loadSvgDocument(InputStream is) {
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new LenientSaxSvgDocumentFactory(parser);
SVGDocument svgDocument;
try {
svgDocument = (SVGDocument) f.createDocument("aaa", is);
} catch (...) {
...
}
return svgDocument;
}
static class LenientSaxSvgDocumentFactory extends SAXSVGDocumentFactory {
public LenientSaxSvgDocumentFactory(String parser) {
super(parser);
}
#Override
public DOMImplementation getDOMImplementation(String ver) {
// code is mostly rip-off from original Apache Batik 1.9 code
// only the condition was extended to accept also "1" string
if (ver == null || ver.length() == 0
|| ver.equals("1.0") || ver.equals("1.1") || ver.equals("1")) {
return SVGDOMImplementation.getDOMImplementation();
} else if (ver.equals("1.2")) {
return SVG12DOMImplementation.getDOMImplementation();
}
throw new RuntimeException("Unsupported SVG version '" + ver + "'");
}
}
This time I got lucky, the main question remains however: is there a convertor from a stock XML parser produced org.w3c.dom.Document and Batik compatible org.w3c.dom.svg.SVGDocument?

How to get the opened document using UNO?

I'm writing an add-on that opens a dialog and I need to access the currently opened text document but I don't know how get it.
I'm using the OpenOffice plug-in in NetBeans and I started from an Add-on project. It created a class that gives me a XComponentContext instance but I don't know how to use it to get a OfficeDocument instance of the current document.
I've been googling for some time and I can't find any example that uses an existing, opened, document. They all start from a new document or a document that is loaded first so they have an URL for it.
I gave it a try based on the OpenOffice wiki (https://wiki.openoffice.org/wiki/API/Samples/Java/Office/DocumentHandling) and this is what I came up with:
private OfficeDocument getDocument() {
if (this.officeDocument == null) {
try {
// this causes the error
XMultiComponentFactory xMultiComponentFactory = this.xComponentContext.getServiceManager();
Object oDesktop = xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", this.xComponentContext);
XComponentLoader xComponentLoader = UnoRuntime.queryInterface(XComponentLoader.class, oDesktop);
String url = "private:factory/swriter";
String targetFrameName = "_self";
int searchFlags = FrameSearchFlag.SELF;
PropertyValue[] propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = Boolean.TRUE;
XComponent xComponent = xComponentLoader.loadComponentFromURL(url, targetFrameName, searchFlags, propertyValues);
XModel xModel = UnoRuntime.queryInterface(XModel.class, xComponent);
this.officeDocument = new OfficeDocument(xModel);
} catch (com.sun.star.uno.Exception ex) {
throw new RuntimeException(ex);
}
}
return this.officeDocument;
}
But there is something strange going on. Just having this method in my class, even if it's never been called anywhere, causes an error when adding the add-on.
(com.sun.star.depoyment.DeploymentDescription){{ Message = "Error during activation of: VaphAddOn.jar", Context = (com.sun.star.uno.XInterface) #6ce03e0 }, Cause = (any) {(com.sun.star.registry.CannotRegisterImplementationException){{ Message = "", Context = (com.sun.star.uno.XInterface) #0 }}}}
It seems this line causes the error:
XMultiComponentFactory xMultiComponentFactory = this.xComponentContext.getServiceManager();
I have no idea how to preceed.
I posted this question on the OpenOffice forum but I haven't got a response there. I'm trying my luck here now.
Use this in your code to get the current document:
import com.sun.star.frame.XDesktop;
...
XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
XComponent xComponent = xDesktop.getCurrentComponent();
I opened the BookmarkInsertion sample in NetBeans and added this code to use the current document instead of loading a new document.
As far as the error, there may be a problem with how it is getting built. A couple of things to check:
Does the Office SDK version match the Office version? Check version number and whether it's 32- or 64-bit.
Make sure that 4 .jar files (juh.jar, jurt.jar, unoil.jar, ridl.jar) are shown under Libraries in NetBeans, because they need to be included along with the add-on.
If you get frustrated with trying to get the build set up correctly, then you might find it easier to use python, since it doesn't need to be compiled. Also python does not require queryInterface().

Change id3 tag version programatically (pref java)

I need a way to change id3 tag version of mp3 files to some id3v2.x programatically, preferably using java though anything that works is better than nothing. Bonus points if it converts the existing tag so that already existing data isn't destroyed, rather than creating a new tag entirely.
Edit: Jaudiotagger worked, thanks. Sadly I had to restrict it to mp3 files and only saving data contained in previous tags if they were id3. I decided to convert the tag to ID3v2.3 since windows explorer can't handle v2.4, and it was a bit tricky since the program was a bit confused about whether to use the copy constructor or the conversion constructor.
MP3File mf = null;
try {
mf = (MP3File)AudioFileIO.read(new File(pathToMp3File));
} catch (Exception e) {}
ID3v23Tag tag;
if (mf.hasID3v2Tag()) tag = new ID3v23Tag(mf.getID3v2TagAsv24());
else if (mf.hasID3v1Tag()) tag = new ID3v23Tag(mf.getID3v1Tag());
else tag = new ID3v23Tag();
My application must be able to read id3v1 or id3v11, but shall only write v23, so I needed a little bit longer piece of code:
AudioFile mf;
Tag mTagsInFile;
...
mf = ... // open audio file the usual way
...
mTagsInFile = mf.getTag();
if (mTagsInFile == null)
{
//contrary to getTag(), getTagOrCreateAndSetDefault() ignores id3v1 tags
mTagsInFile = mf.getTagOrCreateAndSetDefault();
}
// mp3 id3v1 and id3v11 are suboptimal, convert to id3v23
if (mf instanceof MP3File)
{
MP3File mf3 = (MP3File) mf;
if (mf3.hasID3v1Tag() && !mf3.hasID3v2Tag())
{
// convert ID3v1 tag to ID3v23
mTagsInFile = new ID3v23Tag(mf3.getID3v1Tag());
mf3.setID3v1Tag(null); // remove v1 tags
mf3.setTag(mTagsInFile); // add v2 tags
}
}
Basically we have to know that getTagOrCreateAndSetDefault() and similar unfortunately ignores id3v1, so we first have to call getTag(), and only if this fails, we call the mentioned function.
Additionally, the code must also deal with flac and mp4, so we make sure to do our conversion only with mp3 files.
Finally there is a bug in JaudioTagger. You may replace this line
String genre = "(" + genreId + ") " + GenreTypes.getInstanceOf().getValueForId(genreId);
in "ID3v24Tag.java" with this one
String genre = GenreTypes.getInstanceOf().getValueForId(genreId);
Otherwise genre 12 from idv1 will get "(12) Other" which later is converted to "Other Other" and this is not what we would expect. Maybe someone has a more elegant solution.
You can use different libraries for this purpose, for example this or this.

XmlBeans error: unexpected element CDATA when parsing String

I'm having problems parsing an xml string using XmlBeans. The problem itself is in a J2EE application where the string itself is received from external systems, but i replicated the problem in a small test project.
The only solution i found is to let XmlBeans parse a File instead of a String, but that's not an option in the J2EE application. Plus i really want to know what exactly the problem is because i want to solve it.
Source of test class:
public class TestXmlSpy {
public static void main(String[] args) throws IOException {
InputStreamReader reader = new InputStreamReader(new FileInputStream("d:\\temp\\IE734.xml"),"UTF-8");
BufferedReader r = new BufferedReader(reader);
String xml = "";
String str;
while ((str = r.readLine()) != null) {
xml = xml + str;
}
xml = xml.trim();
System.out.println("Ready reading XML");
XmlOptions options = new XmlOptions();
options.setCharacterEncoding("UTF-8");
try {
XmlObject xmlObject = XmlObject.Factory.parse(new File("D:\\temp\\IE734.xml"), options);
System.out.println("Ready parsing File");
XmlObject.Factory.parse(xml, options);
System.out.println("Ready parsing String");
} catch (XmlException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The XML file validates perfectly against the XSD's im using. Also, parsing it as a File object works fine and gives me a parsed XmlObject to work with. However, parsing the xml-String gives the stacktrace below. I've checked the string itself in the debugger and don't really see anything wrong with it at first sight, especially not at row 1 column 1 where i think the Sax parser is having a problem with if i'm interpreting the error correctly.
Stacktrace:
Ready reading XML
Ready parsing File
org.apache.xmlbeans.XmlException: error: Unexpected element: CDATA
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3511)
at org.apache.xmlbeans.impl.store.Locale.parse(Locale.java:713)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:697)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:684)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:208)
at org.apache.xmlbeans.XmlObject$Factory.parse(XmlObject.java:658)
at xmlspy.TestXmlSpy.main(TestXmlSpy.java:37)
Caused by: org.xml.sax.SAXParseException; systemId: file:; lineNumber: 1; columnNumber: 1; Unexpected element: CDATA
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportFatalError(Piccolo.java:1038)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:723)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
... 6 more
This is an encoding problem, I used the below code that worked for me:
File xmlFile = new File("./data/file.xml");
FileDocument fileDoc = FileDocument.Factory.parse(xmlFile);
The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.
The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.
What does the org.apache.xmlbeans.XmlException with a message of “Unexpected element: CDATA” mean?
XMLBeans - Problem with XML files if length is exactly 8193bytes
Issue reported on XMLBean Jira

Categories