I have SVG file (created by com.spire.pdf) and this SVG created from pdf file (pdf file created from html using iText lib).
So i need to convert this SVG to SVG with another tags(path), or SVG to PDF.
All tags in my example is like this: <text style="fill:#000000;font-family:Arial;font-weight:bold;" font-size="10" x="36" y="46.21002" letter-spacing="-0.075">2020</text>
I'm find lib, where it's possible to convert (inkscape) where i can by this paramether --export-text-to-path made my SVG file unable to select text.
The final goal i`m need PDF file where unposible to select text from there. (like it was created from image).
There is ways to convert SVG "text" file to SVG "image" file or to pdf?
Inkscape not good for me as i want to do it in java code (Spring boot app)
For now, I’m find only one lib where it's possible to do: inkscape with --export-text-to-path param.
Inkscape work in few threads, and can be runned from java. Convertions are slow, and size of exported files depends of version of inkscape.
Related
I use Velocity engine to generate a pdf file and I am trying to create a signature. I use a svg file for the signature and I want it to be part of my output pdf file.
Therefore i first tried to integrate the svg in the velocity file(.vm), first by putting it inside an image and then by creating a new vm file and calling it from there, but these options don't seem to work.
When i try to include svg in the velocity file, inside image tags it is not recognized.
I need to parse various document formats (eg: .docx, .pdf) and convert their content (including) to an .xhtml file. I'm using Apache Tika 1.17 (as maven dependency) in a Java project
I've analyzed several already existing questions about this (one, another), and using a custom EmbeddedDocumentExtractor, I was able to extract the included .png images alongside the generated .xhtml file.
The problem is that in both cases (.docx and .pdf input files), inside the generated .xhtml file, the images are referred to not simply by their name, instead using this kind of syntax:
<img src="embedded:image5.png" alt="image0.png" />.
So only the content of the alt element is displayed, not the image itself.
Could I somehow change / configure this ?
Would it be possible to somehow include the images inside the .xhtml file, as binary data ?
Or what other options would I have around this problem ?
Thank you.
Problem Statement
I have a SVG markup sent from front end javascript to back-end action classes. I am using Jasper Reports to generate PDF which will contain the SVG image( i have markup data only). How do i do that.
What i have tried
I have tried to embed SVG image( having the link to image file) to the PDF file while generating reports.
Looking for
How to embed svg markup so that i can see the image in PDF. Any other best approach to solve it.
In later versions of jasper reports you no longer need to add class="net.sf.jasperreports.engine.JRRenderable" to the imageExpression.
It is the default in JasperReports 6+ and JasperSoftStudio (JSS) will remove it if you add it in the Source pane.
The Tomcat SVG file provided in the JasperSoft Community answer works nicely. My own SVG file would show properly in MS Edge or Chrome but didn't appear in JSS.
When I added width, height, viewbox and overflow attributes to the svg element, it then did appear in JSS - so try the example first before trying the svg you actually want.
According to their own web site (here: http://community.jaspersoft.com/wiki/how-add-svg-image-your-report-jrxml) you need to include the SVG using a special HTML element:
<imageExpression class="net.sf.jasperreports.engine.JRRenderable">
<![CDATA[net.sf.jasperreports.renderers.BatikRenderer.getInstance(new java.io.File("D:\\tomcat.svg"))]]>
</imageExpression>
As the question indicates the absence of a SVG markup file, you could stick to the answer by David van Driessche but use a different getter:
BatikRenderer.getInstanceFromText($P{svgMarkup}) for example where svgMarkup is a parameter of String type containing the SVG markup data.
Any java library?
How to make searchable text using any java library?
Open source or Paid.
how to apply OCR to pdf using PDFBox?
how to make pdf text searchable programmatically using pdfbox
I searched alot. Didn't find any solution.
Can anyone paste code for OCR PDFBox.
Try Apache PDFBox.
To extract text: Textextraction.
Any java library? How to make searchable text using any java library? Open source or Paid.
You can achieve this using Gnostice XtremeDocumentStudio for Java. For more details, follow the link below.
http://www.gnostice.com/nl_article.asp?id=289&t=How_to_convert_scanned_images_to_searchable_PDF_in_Java
FYI, in the article, we have demonstrated how to convert scanned image to searchable PDF. In fact the input can be any scanned document (images, PDF or DOCX).
Disclaimer: I work for Gnostice.
You can use PDFBox to extract images from a PDF file, and then use the OCR system of your choice (for example, Tesseract) to obtain the text. Alternatively, if the PDF is mixed text and images, you can use Ghostscript to create an image of each PDF page, and then run OCR.
If you then need a searchable PDF file, build a new PDF by writing the text first, and then drawing the image over top of the text. The text will be searchable, but you will only see the image.
Note that OCR engines like Tesseract and Google Vision will return positional information for each word, so you will be able to place the text in the correct position.
I'm developing an web application using Flex and JSP.
I am having some performance issues with displaying multiple PDF files.
I am trying to display about 50-100 PDF files. I know that is a little crazy.
Hence, I made the project to convert PDF files to JPG format and display the JPG files.
I'm wondering if there is a way to decrease the file size of PDF to size of JPG.
Additionally, I would like to seek other way that may improve the performance.
Does anyone know a good way to display many PDF files (that will be mostly just text) for web application? Or, should I just have it display JPG files?
If the PDF files are mostly text you should probably use HTML. Is there something that would prevent you from making regular pages from your PDFs?
You can convert the PDF to rtf text file, use the text from rtf file to populate your HTML page perhaps in a table.
Check out ghostscript lib for doing this conversion.