Algorithm for estimating text width based on contents - java

This is a long shot, but does anyone know of an algorithm for estimating and categorising text width (for a variable width font) based on its contents?
For example, I'd like to know that iiiiiiii is not as wide as abcdefgh, which in turn is not as wide as WWWWWWWW, even though all three strings are eight characters in length.
This is actually an attempt to build some smarts into a string truncation method, which at the moment is correctly truncating a visually wide string, but is also unnecessarily truncating a visually narrow string, because both strings contain the same number of characters. It's probably sufficient for the algorithm to categorise the input string as narrow, normal or wide and then truncate as appropriate.
This question isn't really language-specific, but if there is an algorithm then I'll implement it in Java. This is for a web application. I'm aware that there are answers on SO that deal with this problem using JavaScript to obtain the width of a containing div element, but I wondered if a server-side solution is possible.

Most GUI frameworks provide some way to calculate text metrics for fonts on given output devices.
Using java.awt.FontMetrics, for example, I believe you can do this:
import java.awt.Font;
import java.awt.FontMetrics;
import java.awt.Graphics;
public int measureText(Graphics g, String text) {
g.setFont(new Font("TimesRoman", Font.PLAIN, 12));
FontMetrics metrics = g.getFontMetrics();
return metrics.stringWidth(text);
}
Not tested, but you get the idea.
Under .Net you can use the Graphics.MeasureString method. In C#:
private void MeasureStringMin(PaintEventArgs e)
{
// Set up string.
string measureString = "Measure String";
Font stringFont = new Font("Arial", 16);
// Measure string.
SizeF stringSize = new SizeF();
stringSize = e.Graphics.MeasureString(measureString, stringFont);
// Draw rectangle representing size of string.
e.Graphics.DrawRectangle(new Pen(Color.Red, 1), 0.0F, 0.0F, stringSize.Width, stringSize.Height);
// Draw string to screen.
e.Graphics.DrawString(measureString, stringFont, Brushes.Black, new PointF(0, 0));
}

This worked for me:
AffineTransform af = new AffineTransform();
FontRenderContext fr = new FontRenderContext(af,true,true);
Font f = new Font("Arial", 0, 10); // use exact font
double width= f.getStringBounds("my string", fr).getWidth();

For a web application, you cannot (really) get a proper estimation. Different fonts have different widths, so that this not only depends on the client (browser) and its zoom and DPI settings, but also on the fonts present on that machine (and operating system) or their substitutions.
If you need exact measuring, create a graphic (bitmap, SVG, or even some PDF or whatever) which will be layouted and rendered on the server and not on the client.

There is no reliable server side solution for calculating width of text. (outside of creating an image of the text and probably SVG)
If you try a tool out like browser-shots and run it against relatively basic pages, you'll immediately see why. It's hard to predict how wide even the most mundane examples will turn out, much less if a user decides to zoom in the browser etc...
It's not stated precisely you might want to truncate the string (it might be helpful in giving potential solutions), but a common one is because you want to cut off the text at some point and provide an ellipse.
This can be done reliably on many browser platforms by using a css property, and NO javascript:
http://www.jide.fr/emulate-text-overflowellipsis-in-firefox-with-css

You really have no way of knowing what browser, font settings, screen size etc the client is using. (OK, sometimes the request headers provide an indication, but really nothing consistent or reliable.)
So, I would:
display some sample text in Internet Explorer with default settings on a screen/window size of 1024x768 (this is generally the most common size)
take an average characters/line with this configuration and use that for your estimate.
I've generally found this "good enough", for example, for estimating the number of lines that some text will take up on a browser, in order to estimate how many adverts to show next to the text.
If it was really really crucial to you, then I can imagine a convoluted scheme whereby you initially send some Javascript to the client, which then takes a measurement and sends it back to your server, and you then use this information for future pages for that client. I can't imagine it's usually worth the effort, though.

I think you should choose one of these solutions:
Exact solution: Sum up the width for every character in the string (most APIs will give you this information)
Quick estimate: Either take the maximum or the minimum width and multiply it with the numbers of characters.
Of course some mixed estimate is possible, but I think this is too much effort in the wrong direction.

For a nice* client-side solution, you could try a hybrid CSS-and-Javascript approach as suggested by RichieHindle's answer to my question.
Given that you don't know what font the user will see the page in (they can always override your selection, Ctrl-+ the page, etc), the "right" place to do this is on the browser... although browsers don't make it easy!
* when I say "nice", I mean "probably nice but I haven't actually tried it yet".

This is actually an attempt to build some smarts into a string truncation method [...]
Is it really worth the effort? We had this exact problem. And this was across languages. The fix was to leave it as-is. The complexity of keeping this intelligence up increases rapidly (and probably exponentially) with every language that you add support for. Hence our decision.
[...] an algorithm for estimating and categorising text width (for a variable width font) based on its contents?
Most font-libraries will give you this information. But this is pretty low-level stuff. The basic idea is to pass in a string and get back the width in points.

Related

Comparing images using color difference

I'm trying to figure out a good method for comparing two images in terms of their color. One idea I had was to take the average color of both images and subtract that amount to get a "color distance." Whichever two images have the smallest color distance would be a match. Does this seem like a viable option for identifying an image from a database of images?
Ideally I would like to use this to identify playing cards put through an image scanner.
For example if I were to scan a real version of this card onto my computer I would want to be able to compare that with all the images in my database to find the closest one.
Update:
I forgot to mention the challenges involved in my specific problem.
The scanned image of the card and the original image of the card are most likely going to be different sizes (in terms of width and height).
I need to make this as efficient as possible. I plan on using this to scan/identify hundreds of cards at a time. I figured that finding (and storing) a single average color value for each image would be far more efficient than comparing the individual pixels of each image in the database (the database has well over 10,000 images) for each scanned card that needed to be identified. The reason why I was asking about this was to see if anyone had tried to compare average color values before as a means of image recognition. I have a feeling it might not work as I envision due to issues with both color value precision and accuracy.
Update 2:
Here's an example of what I was envisioning.
Image to be identified = A
Images in database = { D1, D2 }
average color of image A = avg(A) = #8ba489
average color of images in database = { #58727a, #8ba489 }
D2 matches with image A because #8ba489 - #8ba489 is less than #8ba489 - #58727a.
Of course the test image would not be an exact match with any of those images because it would be scanned in; however, I'm trying to find the closest match.
Content based image retrieval (CBIR) can do the trick for you. There's LIRE, a java library for that. You can even first try several approaches using different color based image features with the demo. See https://code.google.com/p/lire/ for downloads & source. There's also the "Simple Application" which gets you started with indexing and search really fast.
Based on my experience I'd recommend to use either the ColorLayout feature (if the images are not rotated), the OpponentHistogram, or the AutoColorCorrelogram. The CEDD feature might also yield good results, and it's the smallest with ~ 60 bytes of data per image.
If you want to check color difference like this:
http://en.wikipedia.org/wiki/Color_difference
You can use Catalano Framework,
http://code.google.com/p/catalano-framework/
It works in Java and Android.
Example using Color Difference:
float[] lab = ColorConverter.RGBtoLAB(100, 120, 150, ColorConverter.CIE2_D65);
float[] lab2 = ColorConverter.RGBtoLAB(50, 80, 140, ColorConverter.CIE2_D65);
double diff = ColorDifference.DeltaC(lab, lab2);
I think your idea is not good enough to do the task.
Your method will say all images below are the same (average color of all images are 128).
Your color averaging approach would most likely fail, as #Heejin already explained.
You can do try it in different way. Shrink all images to some arbitrary size, and then subtract uknown image from all know images, the one with smallest difference is the one you are looking for. It's really simple method and it would't be slower than the averaging.
Another option is to use some smarter algorithm:
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
I have used this method in past and the results are okay-ish. Ir works great for finding same images, not so well for finding siilar images.

Approximating a fitting image size

The solution I am aiming for does select the best fitting image size from a given number of sizes.
Given a number of rather random resolutions, I would like to find an image sized as close as possible to my preferred size.
Suppose I would like to use an image sized width x height (preferredImageSize).
Example: 320x200
Suppose I have the following image sizes at my disposal (availableImageSize) width1 x height1, width2 x height2, ... (maybe up to 10 different sizes).
Examples: 474x272, 474x310, 264x150, 226x128, 640x365, 474x410, 480x276, 256x144, 160x90, 320x182, 640x365, 192x108, 240x137, 480x276
For developing some generic approach to make the preferredImageSize variable I am trying to find a good solution that computes rather quick but also results into something that does look good on the screen.
I define looks good on the screen as an image that is:
hardly upscaled
as close to the given aspect-ratio (preferredImageSize.width / preferredImageSize.height) as possible
may be heavily downscaled
may be cropped/stretched in very small amounts
My initial (rather trivial) approach:
Run through the available image sizes once and find the smallest width delta (abs(preferredImageSize.width - availableImageSize.width)). The image with that smallest delta is then chosen (bestFitWidth).
That is certainly a way to solve the issue but definitely does not comply with my looks good on the screen hopes.
Any hints, no matter if text, source or links would be awesome. Ow, and if you think that my requirements (aka hopes) are already leading into the wrong direction, go ahead, let me know...
Edit: added cropping and stretching as options - which, I am afraid will make the issue even harder to solve. So if needed leave it out of the equation.
Simple "if/then" approach:
I would do two things:
Since you would rather not upscale, but are OK with downscaling (which I find a good choice), NEVER use a source image that is smaller than your target, unless none is available.
Since "heavy" downscaling is OK, I would try to find an image that matches the aspect ratio as closely as possible, starting with the smallest acceptable image and going to progressively larger images.
To put it together, first throw out all images from the list that are smaller than your target. Then, start with the smallest image left and check its aspect ratio against your target. If the mismatch is acceptable (which you need to quantify), use the image, otherwise go to the next bigger one. If you don't find any acceptable ones, use the one with the best match.
If you've already thrown out all images as smaller than your target, you will likely end up with a bad-looking image either way, but you should then try out whether it is worse the use an image that requires more upscaling, or whether it is worse to use an image that is a worse aspect ratio match.
One other thing you need to think about is whether you want to stretch or crop the images to match your target aspect ratio.
More complex quantitative approach:
The most flexible approach, though, would be to define yourself a "penalty" function that depends on the size mismatch and the aspect ratio mismatch and then find the source image that gives you the lowest "penalty". This is what you have currently done and you've defined your penalty function as abs(preferredImageSize.width - availableImageSize.width). You could go with something a little more complex, like for example:
width_diff = preferredImageSize.width - availableImageSize.width
height_diff = preferredImageSize.height - availableImageSize.height
if (width_diff > 0) width_penalty = upscale_penalty * width_diff
else width_penalty = downscale_penalty * width_diff
if (height_diff > 0) height_penalty = upscale_penalty * height_diff
else height_penalty = downscale_penalty * height_diff
aspect_penalty = ((preferredImageSize.width / preferredImageSize.height) -
(availableImageSize.width / availableImageSize.height)) * stretch_penalty;
total_penalty = width_penalty + height_penalty + aspect_penalty;
Now you can play with the 3 numbers upscale_penalty, downscale_penalty, and stretch_penalty to give these three quality reducing operations different importance. Just try a couple of combinations and see which works best.

How to find true visible size of a text string in Java

Is it possible to find the true bounding box of a string in Java? ie the smallest rectangle which includes the pixels which actually get painted?
I have looked at FontMetrics and LineMetrics, and although they allow a string to be passed in, they don't appear to take account of the characters themselves, eh "a", "p" and "P" all return the same height.
Specifically, they seem to include the descent in the string height even if the actual character does not descend below the baseline. Are there other metrics I can access which return a true bounding box?
Alternatively, is there any way to tell if a particular character has a descender?
See this tutorial on measuring text, which is heavily focused on FontMetrics.
The more advanced measurements (to get the bounding box of a particular string), then TextLayout is your friend, as explained here.
In addition to that tutorial on TextLayout, the javadoc contains examples of its use.
You can use javax.swing.SwingUtilities.layoutCompoundLabel. Do not be deterred by the many parameters. There are two versions, the version with the JComponent (may be null) does more flags. It is used for JLabel, so quite versatile, and yields a Rectangle.
BTW That even on "a" a descender might be added to the bounds, is likely to happen here too. You could take the GlyphVector and calculate a bounding box there, but what when font hinting is on, so the pixel positions are slightly off, which error might accumulate over several chars?

Resize Image files

I have a lot of images that taken by my Digital camera with very high resolution 3000 * 4000 and it takes a lot of Hard disk space, I used Photoshop to open each Image and re-size it o be with small resolution, but it needs a lot of time and effort
I think that I can write simple program that open the folder of images and read each file and get it's width and height and if it's very high change it and overwrite the image with the small one.
Here some code I use in a Java-EE project (should work in normal application to:
int rw = the width I needed;
BufferedImage image = ImageIO.read(new File(filename));
ResampleOp resampleOp = new ResampleOp(rw,(rw * image.getHeight()) / image.getWidth() );
resampleOp.setFilter(ResampleFilters.getLanczos3Filter());
image = resampleOp.filter(image, null);
File tmpFile = new File(tmpName);
ImageIO.write(image, "jpg", tmpFile);
The resample filter comes from java-image-scaling library. It also contains BSpline and Bicubic filters among others if you don't like the Lanczos3. If the images are not in sRGB color space Java silently converts the color space to sRGB (which accidentally was what I needed).
Also Java loses all EXIF data, thought it does provide some (very hard to use) methods to retrieve it. For color correct rendering you may wish to at least add a sRGB flag to the file. For that see here.
+1 to what some of the other folks said about not specifically needing Java for this, but I imagine you must have known this and were maybe asking because you either wanted to write such a utility or thought it would be fun?
Either way, getting the image file listing from a dir is straight forward, resizing them correctly can take a bit more leg work as you'll notice from Googling for best-practices and seeing about 9 different ways to actually resize the files.
I wrote imgscalr to address this exact issue; it's a dead-simple API (single class, bunch of static methods) and has some good adoption in webapps and other tools utilizing it.
Steps to resize would look like this (roughly):
Get file list
BufferedImage image = ImageIO.read(files[i]);
image = Scalr.resize(image, width);
ImageIO.write(image);
There are a multitude of "resize" methods to call on the Scalr class, and all of them honor the image's original proportions. So if you scale only using a targetWidth (say 1024 pixels) the height will be calculated for you to make sure the image still looks exactly right.
If you scale with width and height, but they would violate the proportions of the image and make it look "Stretched", then based on the orientation of the image (portrait or landscape) one dimension will be used as the anchor and the other incorrect dimension will be recalculated for you transparently.
There are also a multitude of different Quality settings and FIT-TO scaling modes you can use, but the library was designed to "do the right thing" always, so using it is very easy.
You can dig through the source, it is all Apache 2 licensed. You can see that it implements the Java2D team's best-practices for scaling images in Java and pedantically cleans up after itself so no memory gets leaked.
Hope that helps.
You do not need Java to do this. It's a waste of time and resources. If you have photoshop you can do it with recording actions: batch resize using actions
AffineTransformOp offers the additional flexibility of choosing the interpolation type, as shown here.
You can individually or batch resize with our desktop image resizing application called Sizester. There's a full functioning 15-day free trial on our site (www.sizester.com).

Java 2D Image resize ignoring bicubic/bilinear interpolation rendering hints (OS X + linux)

I'm trying to create thumbnails for uploaded images in a JRuby/Rails app using the Image Voodoo plugin - the problem is the resized thumbnails look like... ass.
It seems that the code to generate the thumbnails is absolutely doing everything correctly to set the interpolation rendering hint to "bicubic", but it isn't honoring them on our dev environment (OS X), or on the production web server (Linux).
I've extracted out the code to generate the thumbnails, rewritten it as a straight Java app (ie kicked off from a main() method) with the interpolation rendering hint explicitly set to "bicubic", and have reproduced the (lack of) bicubic and bilinear resizing.
As expected on both OS X and Linux the thumbanils are ugly and pixelated, but on Windows, it resizes the images nicely with bicubic interpolation used.
Is there any JVM environment setting and/or additional libraries that I'm missing to make it work? I'm doing a lot of banging of head against wall for this one.
I realize this question was asked a while ago, but incase anyone else is still running into this.
The reason the thumbnails look like ass are caused by two things (primarily the first one):
Non-incremental image scaling in Java is very rough, throws a lot of pixel data out and averages the result once regardless of the rendering hint.
Processing a poorly supported BufferedImage type in Java2D (typically GIFs) can result in very poor looking/dithered results.
As it turns out the old AreaAveragingScaleFilter does a decent job of making good looking thumbnails, but it is slow and deprecated by the Java2D team -- unfortunately they didn't replace it with any nice out-of-the-box alternative and left us sort of on our own.
Chris Campbell (from the Java2D team) addressed this a few years ago with the concept of incremental scaling -- instead of going from your starting resolution to the target resolution in one operation, you do it in steps, and the result looks much better.
Given that the code for this is decently large, I wrote all the best-practices up into a library called imgscalr and released it under the Apache 2 license.
The most basic usage looks like this:
BufferedImage img = ImageIO.read(...); // load image
BufferedImage scaledImg = Scalr.resize(img, 640);
In this use-case the library uses what is called it's "automatic" scaling mode and will fit the resulting image (honoring it's proportions) within a bounding box of 640x640. So if the image is not a square and is a standard 4:3 image, it will resize it to 640x480 -- the argument is just it's largest dimension.
There are a slew of other methods on the Scalr class (all static and easy to use) that allow you to control everything.
For the best looking thumbnails possible, the command would look like this:
BufferedImage img = ImageIO.read(...); // load image
BufferedImage scaledImg = Scalr.resize(img, Method.QUALITY,
150, 100, Scalr.OP_ANTIALIAS);
The Scalr.OP_ANTIALIAS is optional, but a lot of users feel that when you scale down to a small enough thumbnail in Java, some of the transitions between pixel values are a little too discrete and make the image look "sharp", so a lot of users asked for a way to soften the thumbnail a bit.
That is done through a ConvolveOp and if you have never used them before, trying to figure out the right "kernel" to use is... a pain in the ass. That OP_ANTIALIAS constant defined on the class it the best looking anti-aliasing op I found after a week of testing with another user who had deployed imgscalr into their social network in Brazil (used to scale the profile photos). I included it to make everyone's life a bit easier.
Also, ontop of all these examples, you might have noticed when you scale GIFs and some other types of images (BMPs) that sometimes the scaled result looks TERRIBLE compared to the original... that is because of the image being in a poorly supported BufferedImage type and Java2D falling back to using it's software rendering pipeline instead of the hardware accelerated one for better supported image types.
imgscalr will take care of all of that for you and keep the image in the best supported image type possible to avoid that.
Anyway, that is a REALLY long way of saying "You can use imgscalr to do all that for you and not have to worry about anything".
maybe is this a solution for you:
public BufferedImage resizeImage(BufferedImage source, int width, int height)
{
BufferedImage result = new BufferedImage(widht, height, BufferedImage.TYPE_INT_ARGB);
Graphics g = result.getGraphics();
g.drawImage(source, 0, 0, widht, height, null);
g.dispose();
return result;
}
In the end, upgrading to the latest version of ImageVoodoo seemed to improve quality.
Looking through the source code, it looks like they're doing some funky AWT rendering, and then pulling that out. Nasty, but it seems to work.
Still not as good as ImageMagick, but better than it was.
#Riyad, the code for incremental scaling isn't "decently large", it's quite small (As you can see from a post back in 2007, http://today.java.net/pub/a/today/2007/04/03/perils-of-image-getscaledinstance.html#creating-scaled-instances) having a library that gives other options might be useful, but making a library to use a library is nonsense.

Categories