I am using a library which converts html to image and saves image as a file.
Here I don't want image to be saved as a file, rather want it to be in a string which I can process further.
Is there any way I can do that.
Sorry if this question is already asked.
The short answer is: NO!
I assume Image is a binary format of some sort ... and it will represent the parsed html output as a .jpg and thus has basically merged all layers of the DOM without you being able to reproduce the DOM from the binary representation.
You may want to save html code as text and even better as (.html or if need be as .xml) and put that into a file!
You want to work with Binary!
If you want Binary in Base64 use the Apache Comons Library for convertion:
http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
This is the most commonly used Base64 library!
Also check out Apache Commons IOUtils!!
Example Usage here:
Base64.encode(FileUtils.readFileToByteArray(file));
Related
how to convert .prn file in to html page using java.
I am treating it as a text file and reading it line by line but thats quite cumbersome as each line requires its own splitting logic. As prn file is nicely formatted can we directly extract the file and load it as an html?any suggessions?
Since a .prn file is byte stream that is sent to printer for printing, I think you are going to have to keep using your custom parser as it doesn't appear that the Java Print Service has any options for parsing.
If the tags are consistent with other file formats it may be worth while to check out other parsing libraries such as simple.json and modify them to your needs.
I have used itext in Java to convert a HTML to PDF.
Now I want to test if the PDF generated by me is correct i.e the positions and contents all are correct and at correct positions.
Is there away to do the testing of my code?
Basically, Your question is about validating itext output.
If You do not trust library for converting HTML to PDF, You probably do not trust reading raw PDF data as well. You can therefore use other libraries (PDF clown) for parsing PDF as a validation.
You have 2 approaches.
First one requires rasterization of PDF (GhostScript) and comparing to HTML. Indeed, the performance overhead is significant.
Second one parses the document format. I have gone into depth in my previous answer about searching for text inside PDF file.
I have mentioned there searching for text as well as finding it's position on page.
I would suggest just simply avoid validating of output, unless You know something is wrong.
These libraries are widely-used and well-tested.
I have created a program that should one day become a PDF editor
It's purpose will be saving GUI's textual content to the PDF, and loading it from it. GUI resembles text editor, but it only has certain fields(JTextAreas, actually).
It can look like this (this is only one page, it can have many more, also upper and lower margins are cut out of the picture) It should actually resemble A4 in pixel size.
I have looked around for a bit for PDF libraries and found out that iText could suit my PDF creating needs, however, if I understood it correct, it retirevs text from a whole page as a string which won't work for me, because I will need to detect diferent fields/paragaphs/orsomething to be able to load them back into the program.
Now, I'm a bit lazy, but I don't want to spend hours going trough numerus PDF libraries just to find out that they won't work for me.
Instead, I'm asking someone with a bit more Java PDF handling experience to recommend me one according to my needs.
Or maybe recommend me how to add invisible parts to PDF which will help my program to determine where is it exactly situated insied a PDF file...
Just to be clear (I formed my question wrong before), only thing I need to put in my PDF is text, and that's all I need to later be able to get out. My program should be able to read PDF's which he created himself...
Also, because of the designated use of files created with this program, they need to be in the PDF format.
Short Answer: Use an intermediate format like JSON or XML.
Long Answer: You're using PDF's in a manner that they wasn't designed for. PDF's were not designed to store data; they were designed to present and format data in an portable form. Furthermore, a PDF is a very "heavy" way to store data. I suggest storing your data in another manner, perhaps in a format like JSON or XML.
The advantage now is that you are not tied to a specific output-format like PDF. This can come in handy later on if you decide that you want to export your data into another format (like a Word document, or an image) because you now have a common representation.
I found this link and another link that provides examples that show you how to store and read back metadata in your PDF. This might be what you're looking for, but again, I don't recommend it.
If you really insist on using PDF to store data, I suggest that you store the actual data in either XML or RDF and then attach that to the PDF file when you generate it. Then you can read the XML back for the data.
Assuming that your application will only consume PDF files generated by the same application, there is one part of the PDF specification called Marked Content, that was introduced precisely for this purpose. Using Marked Content you can specify the structure of the text in your document (chapter, paragraph, etc).
Read Chapter 14 - Document Interchange of the PDF Reference Document for more details.
I created a Microsoft Word document and tried to write the buffered image to it but all I got was garbled text. Is there a way to write (preferably append) a buffered image to a doc or RTF file?
I want to avoid using docx4j or iText or any external package for that matter due to some constraints. But if there is no other way then please do let me know.
My code in case anyone needs for reference:
ps_file = new File("ps_file.doc");
ImageIO.write(i1, "jpg", ps_file);
Word Documents have their own syntax to store their data so you can't just append text to them and expect it to just work.
You will have to use a 3rd party library unless if you're willing to reinvent the car.
You can however create an RTF file which stores the image. There's a question similar to it that's been answered here:
Programmatically adding Images to RTF Document
Obviously it's for C# but the same procedures can easily be applied in Java.
I have an xml file with which i want to print as a PDF using PCL. I am new to PCL. Can i use PCL to get the xml printed in PDF format directly or should i have some intermediate process to create a PDF file and then use PCL to get it printed as PDF?
If you have a xml, there are two ways to recieve PDF file.
1. Create stylesheet for your xml, and use XEP
or
2. use just your xml and VisualXSL, which will help you create your pdf for print.
More additional: If you will create your xsl stylsheet, you can format by XEP many type of PDFs, for example PDF/1A, or another levels
Both XEP and VisualXSL are Renderx products(http://www.renderx.com/tools/index.html) and they have trial versions, that you can use:). I have used both products many times, and was satisfied.
You can also visit the forum where you can find answers about how to use and how usefull are products described above. http://cooltools.renderx.com
PCL is a printer control language. In other words command bytes you send to a (usually HP) printer which is then converted to ink on a page. This is normally not the way you will generate a PDF since too much information from the original will be lost.
You will normally want to convert your XML to something describing the actual print you want to have. A reasonable choice for this is the XSL-FO XML dialect which, however, is not very nice to do by hand. You can then choose to convert your XML into DocBook XML which in turn has very nice style sheets for converting further on to XSL-FO and other formats.
You can then use Apache FOP to convert XSL-FO into many formats, one being PDF. This allows you to - if FOP gets too small - to replace with one of several commercial XSL_FO rendering engines at a later date.