I am trying to parse a html page to find the most prominent image. So, after parsing the html page to extract all img tags, i am trying to find the largest image by comparing the dimension of the image.
Is it right to compare the images by calculating the area as (width * height)?
That depends entirely on your definition of 'largest'. width * height is certainly a valid approach, but it has the flaw that a 1x1000 image is 'larger' than a 30x30 one even though the latter could very well be more noticeable. It also has the problem that a large image that's mostly the same as the background color will be more 'noticeable' than a medium image that isn't, which might not be the case.
In order to figure out how to determine how to find the 'largest' image, you need to specify why you want it.
Related
I have one BufferedImage image1 and BufferedImage image2, and I want to know if they are equal.
image1 is made before-hand and stored into an image file, where I convert using ImageIO. However, image2 is made on the spot, so it is pretty much guaranteed that they have different sizes. What I do know is that image2 will equal one of 9 different image1's.
So, what I want to do is check if they are the same image's, but ignoring all the white pixels on the edge because they are different size, so if I compare all the pixels they would be different no matter what. If you're wondering why there is the color white on the edge, the images are numbers so the remaining space will be white.
If you want to make it simpler, the color of the real image will always be black, but I would like it better if you make it a generic solution (meaning taking in account all colors) so I could use the concepts later.
private boolean equals(BufferedImage image1, BufferedImage image2) {
// This is what I want to fill out.
}
What I first tried to do was to find the first non-white pixel of image1, and the first non-whiten pixel of image2, and then check the rows after that to see if everthing is equal. However, the images are pretty big, and this approach takes more than O(n ^ 2). I need a faster way.
What I first tried to do was to find the first non-white pixel of image1, and the first non-whiten pixel of image2, and then check the rows after that to see if everthing is equal. However, the images are pretty big, and this approach takes more than O(n ^ 2). I need a faster way.
Most probably there is no very faster way using this approach. You can use edge detection, but the algorithms for that aren't really faster too.
I would try to work with bounding boxes for each image (number).
If it is possible to save image1 the size the number is, this were the way to go. Just shrink the image to the real size of the number and save that image to disk. You then can shrink image2 to its bounding box too and the comparison is quite simple and fast.
If shrinking is no option, calculation of the bounding box is an option. Go through the image array and detect the top most and the left most pixel in both images. You then get at least the bounding edges for the top and left side, which is all you need to compare the images. (If images can differ in size, you need the whole bounding box)
By the way, you don't need to run in O(n^2). If you detect the top most or left most pixel in both images, you can set an offset to work from. You only need to find a difference to state that these numbers are different. You can work with logic to determine, which number it must be based on simple tests. For example take numbers one (1) and zero (0). Whereas zero has white pixels in the middle part, the one must have black pixels there and vice versa. So detecting areas where the numbers definitely are black or white can help you estimate the number in the image by testing up to 9 areas.
I'm trying to figure out a good method for comparing two images in terms of their color. One idea I had was to take the average color of both images and subtract that amount to get a "color distance." Whichever two images have the smallest color distance would be a match. Does this seem like a viable option for identifying an image from a database of images?
Ideally I would like to use this to identify playing cards put through an image scanner.
For example if I were to scan a real version of this card onto my computer I would want to be able to compare that with all the images in my database to find the closest one.
Update:
I forgot to mention the challenges involved in my specific problem.
The scanned image of the card and the original image of the card are most likely going to be different sizes (in terms of width and height).
I need to make this as efficient as possible. I plan on using this to scan/identify hundreds of cards at a time. I figured that finding (and storing) a single average color value for each image would be far more efficient than comparing the individual pixels of each image in the database (the database has well over 10,000 images) for each scanned card that needed to be identified. The reason why I was asking about this was to see if anyone had tried to compare average color values before as a means of image recognition. I have a feeling it might not work as I envision due to issues with both color value precision and accuracy.
Update 2:
Here's an example of what I was envisioning.
Image to be identified = A
Images in database = { D1, D2 }
average color of image A = avg(A) = #8ba489
average color of images in database = { #58727a, #8ba489 }
D2 matches with image A because #8ba489 - #8ba489 is less than #8ba489 - #58727a.
Of course the test image would not be an exact match with any of those images because it would be scanned in; however, I'm trying to find the closest match.
Content based image retrieval (CBIR) can do the trick for you. There's LIRE, a java library for that. You can even first try several approaches using different color based image features with the demo. See https://code.google.com/p/lire/ for downloads & source. There's also the "Simple Application" which gets you started with indexing and search really fast.
Based on my experience I'd recommend to use either the ColorLayout feature (if the images are not rotated), the OpponentHistogram, or the AutoColorCorrelogram. The CEDD feature might also yield good results, and it's the smallest with ~ 60 bytes of data per image.
If you want to check color difference like this:
http://en.wikipedia.org/wiki/Color_difference
You can use Catalano Framework,
http://code.google.com/p/catalano-framework/
It works in Java and Android.
Example using Color Difference:
float[] lab = ColorConverter.RGBtoLAB(100, 120, 150, ColorConverter.CIE2_D65);
float[] lab2 = ColorConverter.RGBtoLAB(50, 80, 140, ColorConverter.CIE2_D65);
double diff = ColorDifference.DeltaC(lab, lab2);
I think your idea is not good enough to do the task.
Your method will say all images below are the same (average color of all images are 128).
Your color averaging approach would most likely fail, as #Heejin already explained.
You can do try it in different way. Shrink all images to some arbitrary size, and then subtract uknown image from all know images, the one with smallest difference is the one you are looking for. It's really simple method and it would't be slower than the averaging.
Another option is to use some smarter algorithm:
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
I have used this method in past and the results are okay-ish. Ir works great for finding same images, not so well for finding siilar images.
The solution I am aiming for does select the best fitting image size from a given number of sizes.
Given a number of rather random resolutions, I would like to find an image sized as close as possible to my preferred size.
Suppose I would like to use an image sized width x height (preferredImageSize).
Example: 320x200
Suppose I have the following image sizes at my disposal (availableImageSize) width1 x height1, width2 x height2, ... (maybe up to 10 different sizes).
Examples: 474x272, 474x310, 264x150, 226x128, 640x365, 474x410, 480x276, 256x144, 160x90, 320x182, 640x365, 192x108, 240x137, 480x276
For developing some generic approach to make the preferredImageSize variable I am trying to find a good solution that computes rather quick but also results into something that does look good on the screen.
I define looks good on the screen as an image that is:
hardly upscaled
as close to the given aspect-ratio (preferredImageSize.width / preferredImageSize.height) as possible
may be heavily downscaled
may be cropped/stretched in very small amounts
My initial (rather trivial) approach:
Run through the available image sizes once and find the smallest width delta (abs(preferredImageSize.width - availableImageSize.width)). The image with that smallest delta is then chosen (bestFitWidth).
That is certainly a way to solve the issue but definitely does not comply with my looks good on the screen hopes.
Any hints, no matter if text, source or links would be awesome. Ow, and if you think that my requirements (aka hopes) are already leading into the wrong direction, go ahead, let me know...
Edit: added cropping and stretching as options - which, I am afraid will make the issue even harder to solve. So if needed leave it out of the equation.
Simple "if/then" approach:
I would do two things:
Since you would rather not upscale, but are OK with downscaling (which I find a good choice), NEVER use a source image that is smaller than your target, unless none is available.
Since "heavy" downscaling is OK, I would try to find an image that matches the aspect ratio as closely as possible, starting with the smallest acceptable image and going to progressively larger images.
To put it together, first throw out all images from the list that are smaller than your target. Then, start with the smallest image left and check its aspect ratio against your target. If the mismatch is acceptable (which you need to quantify), use the image, otherwise go to the next bigger one. If you don't find any acceptable ones, use the one with the best match.
If you've already thrown out all images as smaller than your target, you will likely end up with a bad-looking image either way, but you should then try out whether it is worse the use an image that requires more upscaling, or whether it is worse to use an image that is a worse aspect ratio match.
One other thing you need to think about is whether you want to stretch or crop the images to match your target aspect ratio.
More complex quantitative approach:
The most flexible approach, though, would be to define yourself a "penalty" function that depends on the size mismatch and the aspect ratio mismatch and then find the source image that gives you the lowest "penalty". This is what you have currently done and you've defined your penalty function as abs(preferredImageSize.width - availableImageSize.width). You could go with something a little more complex, like for example:
width_diff = preferredImageSize.width - availableImageSize.width
height_diff = preferredImageSize.height - availableImageSize.height
if (width_diff > 0) width_penalty = upscale_penalty * width_diff
else width_penalty = downscale_penalty * width_diff
if (height_diff > 0) height_penalty = upscale_penalty * height_diff
else height_penalty = downscale_penalty * height_diff
aspect_penalty = ((preferredImageSize.width / preferredImageSize.height) -
(availableImageSize.width / availableImageSize.height)) * stretch_penalty;
total_penalty = width_penalty + height_penalty + aspect_penalty;
Now you can play with the 3 numbers upscale_penalty, downscale_penalty, and stretch_penalty to give these three quality reducing operations different importance. Just try a couple of combinations and see which works best.
I have a lot of images that taken by my Digital camera with very high resolution 3000 * 4000 and it takes a lot of Hard disk space, I used Photoshop to open each Image and re-size it o be with small resolution, but it needs a lot of time and effort
I think that I can write simple program that open the folder of images and read each file and get it's width and height and if it's very high change it and overwrite the image with the small one.
Here some code I use in a Java-EE project (should work in normal application to:
int rw = the width I needed;
BufferedImage image = ImageIO.read(new File(filename));
ResampleOp resampleOp = new ResampleOp(rw,(rw * image.getHeight()) / image.getWidth() );
resampleOp.setFilter(ResampleFilters.getLanczos3Filter());
image = resampleOp.filter(image, null);
File tmpFile = new File(tmpName);
ImageIO.write(image, "jpg", tmpFile);
The resample filter comes from java-image-scaling library. It also contains BSpline and Bicubic filters among others if you don't like the Lanczos3. If the images are not in sRGB color space Java silently converts the color space to sRGB (which accidentally was what I needed).
Also Java loses all EXIF data, thought it does provide some (very hard to use) methods to retrieve it. For color correct rendering you may wish to at least add a sRGB flag to the file. For that see here.
+1 to what some of the other folks said about not specifically needing Java for this, but I imagine you must have known this and were maybe asking because you either wanted to write such a utility or thought it would be fun?
Either way, getting the image file listing from a dir is straight forward, resizing them correctly can take a bit more leg work as you'll notice from Googling for best-practices and seeing about 9 different ways to actually resize the files.
I wrote imgscalr to address this exact issue; it's a dead-simple API (single class, bunch of static methods) and has some good adoption in webapps and other tools utilizing it.
Steps to resize would look like this (roughly):
Get file list
BufferedImage image = ImageIO.read(files[i]);
image = Scalr.resize(image, width);
ImageIO.write(image);
There are a multitude of "resize" methods to call on the Scalr class, and all of them honor the image's original proportions. So if you scale only using a targetWidth (say 1024 pixels) the height will be calculated for you to make sure the image still looks exactly right.
If you scale with width and height, but they would violate the proportions of the image and make it look "Stretched", then based on the orientation of the image (portrait or landscape) one dimension will be used as the anchor and the other incorrect dimension will be recalculated for you transparently.
There are also a multitude of different Quality settings and FIT-TO scaling modes you can use, but the library was designed to "do the right thing" always, so using it is very easy.
You can dig through the source, it is all Apache 2 licensed. You can see that it implements the Java2D team's best-practices for scaling images in Java and pedantically cleans up after itself so no memory gets leaked.
Hope that helps.
You do not need Java to do this. It's a waste of time and resources. If you have photoshop you can do it with recording actions: batch resize using actions
AffineTransformOp offers the additional flexibility of choosing the interpolation type, as shown here.
You can individually or batch resize with our desktop image resizing application called Sizester. There's a full functioning 15-day free trial on our site (www.sizester.com).
I have a folder with many images with different backgrounds. I have got requirement to sort these images on the basis of background color.
Can I make a java program to read the folder and each image file in there, and decide the image of each file? please share options.
Yes, it is possible. You can load images with ImageIO.
BufferedImage img = ImageIO.read(imageFile);
int rgb = img.getRGB(x,y);
Color color = new Color(rgb);
But you have to create an algorithm that finds out which color is the backround color. It depends on the kind of images.
So, not knowing what your images really look like, you may want to average as much of the background as you can to come up with a good representation of the background color.
I would consider a couple things:
* Read in the pixels of each of the four edges. If there's little variance in the pixel color, then you may be done, just take the average.
* Do the same, but also read in lines from the edge to the middle until you hit a pixel that has a rather different color than your running average. Do this for all edges.
Those would be the cheapest things that I can think of to cover variances in background color. Depending on the images you're working with, you may have to get fancier.
A BufferedImage should get you your image data.
Mark