How to extract graphics which is not an image - java

The first page at this PDF displays the following white decorated text on top of an image.
When using the PDFBox utility PrintImageLocations, this graphics is not extracted as an image, only the background image is extracted, without the white decorated text. When converting to Word doc, the decorated text is extracted as a shape with properties which can be modified, such as fill color, border color, and much more.
Is it possible to extract that shape from the PDF, using PDFBox? How?

The simplest way to extract such graphics is to reverse engineer those that can be into ScaledVectorGraphics as here I had to change colour from white to magenta otherwise it would look like a snowscape.
I dont use PDFbox so cant say how easy that may be possible .I simply exported page 1 as SVG using
MuPDF\mutool.exe convert -o page1.svg -O no-reuse-images Xcel_Energy-AR2018.pdf 1
However you will get all SVG output such as the lower text and note the extra header text in the top left corner and lower left corner page number that were not visible behind the pixel grapics.
Note: that everything (thus any conventional text and image pixels are converted to SVG objects) there is no easier way to extract all the PostScript Printer style moves and lineto's. So yes it is overkill as it needs parsing to get just the object of interest (more easily done in a GUI such as inkscape or InDesign where it was constructed). It is not a good methodology for shape recognition since the y x values are described as rectangles, and will have positions and scalars that most likely vary from page to page, thus there are no constants other than filled appearance. The filled object would best be "seen" by regeneration as pixels for visual symbol recognition (much like OCR).

Related

Again having invisible text coming from PdfTextStripper

File example: file.
Problem - when extracting text using PdfTextStripper, there is token "9/1/2017" and "387986" after "ASSETS" in the page start which should be removed, and some others hidden tokens.
I have already applied this solution (so I do not copy-paste it here, because actually problem is exactly the same) and still that hidden text is appearing on page. Could it be hidden by something else except clip path?
thanks!
Could it be hidden by something else except clip path?
Yes. In case of your new document the text is written in white on white, e.g. the 387986 after ASSETS is drawn like this:
1 1 1 rg
/TT0 16 Tf
-1011.938 115.993 Td
(#A,BAC)Tj
The initial 1 1 1 rg sets the fill color to RGB WHITE. (Additionally that text is quite tiny but would still be visible if drawn in e.g. BLACK.)
The solution you refer to was implemented for documents like the sample document presented in that issue in which the invisible text is made invisible by defining clip paths (outside the bounds of which the text is) and by filling paths (hiding the text underneath). Thus, your white text won't be recognized by it as hidden.
Unfortunately recognizing invisibility of WHITE on WHITE text is more difficult to determine than that of clipped or covered text because one not only needs to know the a property of the current graphics state (like the clip path) or remove all text inside a given path, one also needs to know the color of the part of the page right before the text is drawn (to check the on WHITE detail).
If, on the other hand, you assume the page background to be essentially WHITE, it is fairly simple to ignore all white text: Simply also detect the current fill color in processTextPosition:
PDColor fillColor = gs.getNonStrokingColor();
and compare it to the flavors of WHITE you want to consider invisible. (Usually it should suffice to compare with RGB, CMYK, and Grayscale WHITE; in seldom cases you'll also have to correctly interpret more complex color spaces. Additionally you might also consider nearly WHITE colors invisible, (.99, .99, .99) RGB can hardly be distinguished from WHITE.)
If you find the current color to be WHITE, ignore the current TextPosition.
Be aware, though, just like the solution you referenced this is not yet the final solution recognizing all WHITE text: For that you'll also have to check the text rendering mode: If it is just filling (the default), the above holds, but if it is (also) stroking, you'll (also) have to consider the stroking color; if it is rendered invisible, there is no color to consider; and if the text rendering mode includes adding to path for clipping, you'll have to wait and determine what will be later drawn in this part of the page as long as the clip path holds, definitely not trivial!

Need a way to get the Text in the Image in Black/white

I have a colored textual Image to be in black and white.
Specifically the text to be in black and background to be in white for whatever colored it might be
I am using JAVA to perform this operation.
Can anybody help me out to with a code snippet or point me to a discussion which is useful.
if you want to develop your own converter RGB to Black/White, please check Java - get pixel array from image . Here you import a image, access the pixel and with a specific threshold of the color you can decide to convert to black or white.
Or you use http://www.imagemagick.org/ as commandline tool (with api http://www.imagemagick.org/script/api.php).
Regards

Jpeg to Svg and Image Tracing

Currently we have a requirement where we have an image depicting the blueprint of the mall (red specifies the booked up areas and white specifies the available areas) and the image is available in a raster (JPEG) format.
We would like to drag and drop some icons onto the available areas of the image (in white). There should also be zoom in and zoom out functionality to be given for the above image as well
Since the JPEG has a lossy scaling, zooming after a certain limit can result in a jagged image. One proposed solution is to convert the image to SVG (Scalable Vector graphics).
Going with the expanded form of SVG, it simply tells us that image is:
s=>scalable (i.e. you can zoom to any level without compromising the quality)
v=>vectorized (i.e co-ordinates are available)
So by simply looking at the XML format of the image, we can predict whether to allow dropping an object at fill=red or fill=white where red and white are the two colors in the image. This might not be appropriate solution, but I'm just guessing it this way
Now the problems I see with this approach is:
Converting an image with some open source tool (InkSpace) - if we trace it with ink-space, which uses portace inside it to trace the image, it can handle only black and white colors.
Note-: Most of the tools comes with some license.
Problem with inkspace is that it embeds the image into the SVG map and does not create the co-ordinates. If you trace it with inkspace, it only creates the outline of the image.
Converting it with some online utility - Not recommended in our case, but doing so results in a large size of the SVG image. For a 700 KB file, the SVG generated is about 39 MB, which when opened up on a browser crashes the browser.
Most of the time when the image is converted to an SVG, it becomes way too large a big factor to worry about. There are utilities available like Gzip to compress files, but this is a two way route - first you convert, then you compress.
Using delinate (which employs a portace and autotrace engines in it) - the quality of the image produced is not good.
Using Java code - Again the quality suffers. Java graphics are not fully developed to handle the conversion (size is again way too large)
Converting the image to PDF, then to SVG - this also embeds the image into the SVG file, which is useless as no co-ordinates are available
Does anybody got any idea on this ,how to deal with this situation?,Can we handle the drag and drop on raster(jpeg,png...etc) images itself?
Thanks
Dishant Anand

Create image from text with unknown number of lines

I would like to convert a string of text into an image. The issue is, I want the text to wrap if it is wider than the length of the image, and the height of the image to be dynamically sized to perfectly fit the text, so that I know how much space the text takes up.
I'm working in Java and there are several things I have tried:
Rendering HTML in a JPanel and saving as a BufferedImage. The problem here was that most of the css I used was ignored by the JPanel and the image was unusable.
Using ImageMagick and img4Java. The two big failures with this solution was that I needed the command-line tool installed, which I can't do on our server. The second was that I couldn't easily convert the image to buffered image for use in the rest of the app.
Does anyone know a way to do this in Java?
Thanks!
In this example, an arbitrary panel is rendered into a BufferedImage and displayed in an adjacent panel at half-scale. The example uses a grid of labels, but you can use the wrap feature of JTextArea or the geometry supplied by TextLayout, examined here.
You might use a label containing HTML for the line-wrap, as shown here.
To get an image of that, see LabelRenderTest.

Java: Taking a Image and turning it into Black in White with RGB

So I am doing a group project for my programming class, we are makeing a photo editing program and one of my parts of the program is taking the image and turning it into black and white using rgb. I was wondering what would be the best value or way in RGB to achieve black and white?
I would recommend letting the Java 2D library worry about the conversion:
create a greyscale BufferedImage (BufferedImage.TYPE_BYTE_GRAY);
get a graphics context by createGraphics()
ensure that colour rendering is accurate on that graphics context: call setRenderingHint(RenderingHints.KEY_COLOR_RENDERING, RenderingHints.VALUE_COLOR_RENDER_QUALITY)
draw the colour image you want to "convert" to the graphics context
If you do the conversion "manually" and you want to do it as accurately as possible, then you need to take into account that the eye is more sensitive to certain colour components than others. (If you want a "rough and ready" conversion, you can average the colour components, but this isn't strictly speaking the most accurate conversion.)
For each pixel, you can convert the RGB to HSB (using Color.RGBtoHSB), set the saturation to 0, and convert back to a Color instance using Color.getHSBColor.
Wikipedia actually has a good piece about how to perform the transformation once you have the RGB colors.
Hey , this page has many Java Image filters which are freely available for download.All filters are standard Java BufferedImageOps and can be plugged directly into existing programs.The GrayscaleFilter can convert the image to black and white
Using java.awt.image.ColorConvertOp with a gray destination ColorSpace is very efficient. There's an excellent example here.

Categories