I'm trying to figure out a geo-hashing method for images. It is hard because the space of possible images is of much higher dimensionality than lat/lng. (geo-hashing converts a location to a string where the string progressively refines the location)
So, what I need is something that:
INPUT: A list of JPG or PNG images on disk
OUTPUT: For each image a string WHERE the longer the string prefix in common between any two images, the higher chance that the two images are the same.
It doesn't need to be perfect, and it doesn't need to handle extreme cases, like cropped images or heavily adjusted images. It is intended for multiple copies of the same image at different resolutions and compression levels.
I can't use:
File or image-data hashing, because even a teeny change between two images makes a completely different hash and you don't get any proximity
Image subtraction, because it won't be a N-to-N comparison.
I've read in other answers to try wavelet compression or a laplacian/gaussian pyramid, but I'm not sure how to implement in Java or Python. However, I have made progress!
Resize to 32x32 using http://today.java.net/pub/a/today/2007/04/03/perils-of-image-getscaledinstance.html to not discard data. Ok that everything gets turned into a square.
Create a pyramid of successively smaller thumbnails all the way down to 2x2.
In the 2x2, encode up a string of "is the next pixel brighter than the current? If so, 1, else 0" (This throws away all hue and saturation, I may want to use hue somehow)
Encode successive binary numbers from the 8x8 and 32x32 pyramids
Convert the big binary number to some higher radix representation, like Base62.
This seems to work well! Minor differences from compression or color balancing aren't enough to change a "is the left side of this area brighter than the right side". However, I think I'm re-inventing the wheel, some sort of progressive encoding might be better? SIFT and other feature-detection is overkill, I don't need to be able to handle cropping or rotation.
How about this. The hash string is made up of groups of three characters, representing red green and blue:
{R0, G0, B0}, {R1, G1, B1}, {R2, G2, B2}, ...
For each group, the image is resized to a 2^N by 2^N square. Then, the value is the sum (mod, say, 255, or whatever your encoding is) of the differences in intensity of each of the colours over some walk through the pixels.
So as a small example, to compute e.g group 1 (2x2 image) one might use the following code (I have only bothered with the red pixel)
int rSum = 0;
int rLast = 0;
for (int i=0; i<2; i++) {
for (int j=0; j<2; j++) {
rSum += Math.abs(image[i][j].r - rLast);
rLast = image[i][j].r;
}
}
rSum %= 255;
I believe this has the property that similar images should be close to each other, both for each character in the hash and in terms of successive characters in the hash.
Although for higher values of N the chance of a collision gets higher (many images will have the the same sum-of-difference values for R G and B intensities across them), each successive iteration should reveal new information about the image that was not tested with the previous iteration.
Could be fairly computationally expensive, but you have the advantage (which I infer from your question you might desire) that you can end the computation of the hash as soon as a negative is detected within a certain threshold.
Just an idea, let me know if I wasn't clear!
What you're describing seems to me to be an example of Locally Sensitive Hashing applied to the image similarity problem.
I'm not sure that the common prefix property is desirable for a good hash function. I would expect a good hash function to have two properties:
1) Good localization - for images I1 and I2 ,norm(Hash(I1)-Hash(I2)) should represent the visually percepted simiarity of I1 and I2.
2) Good compression - The high-dimension image data should be embedded in the low-dimension space of hash functions in the most discriminative way.
Getting good results from the following:
Scale down (using good scaling that doesn't discard information) to three images:
1x7
7x1
and a 6x6 image.
Convert all to grayscale.
For each image, do the "is next pixel brighter?'1':'0' encoding, output as base62.
Those outputs become the values for three columns. Nice successively refined differencing, packed into 2 chars, 2 chars, and 6 chars. True, discards all color, but still good!
Related
I've built a matrix of LEDs controlled by a Java program on my Raspberry Pi. I want to display characters on this matrix. So what I need to do is convert the characters to a two-dimensional boolean-Array (each LED is represented by one boolean).
The only way to do this I can think of is to design a separate matrix for each existing character, but this is way to much work.
Is there any way to do this differently?
You could rasterize (draw) a given font at a given point size using something like AWT or FreeType and then examine the image to see which pixels/LEDs should be on or off.
This will break down as the font size gets smaller. Below some point, you're probably better off coming up with the matrixes yourself rather than pouring a bunch of effort into something that doesn't work.
OTOH, "render-and-read" would be Much Less Boring... so YMMV.
you could load a monochrome image for a character with a pixel size regarding to your led matrix and check with two for loops, whether a pixel at a certain position is black (true) or white (false).
I am working with on a project where I have to display some pictures (grayscale) and I notice that many of them were too dark to properly see.
Then looking at ImageJ API documentation, I found the class: ij.plugin.ContrastEnhancer
And there is two methods there that I am having a hard time to understand their conceptual differences stretchHistogram() and equalize() both make the image brighter, but I still want to understand the differences.
My question is: what is the conceptual differences between those methods?
A histogram stretch is where you have an image that has a low dynamic range - so all of the pixel intensities are concentrated in a smaller band than the 0 to 255 range of an 8-bit greyscale image, for example. So the darkest pixel in the image may be 84 and the brightest 153. Stretching just takes this narrow range and performs a linear mapping to the full 0 to 255 range. Something like this:
Histogram equalisation attempts to achieve a flat histogram - so all possible pixel intensities are equally represented in the image. This means that where there are peaks in the histogram - concentrations of values in a certain range - these are expanded to cover a wider range so that the peak is flattened, and where there are troughs in the histogram, these are mapped to a narrower range so that the trough is levelled out. Again, something like this:
For a uni-modal histogram with a low dynamic range, the two operations are roughly equivalent, but in cases where the histogram already covers the full range of intensities the histogram equalisation gives a useful visual improvement while stretching does nothing (because there's nothing to stretch). The curve for mapping to equalise a histogram is derived from the cumulative distribution (so imagine each histogram bar is the sum of all previous values) and theoretically it's possible to achieve a perfectly flat histogram. However, because we are (normally) dealing with discrete values of pixel intensities, histogram equalisation gives an approximation to a flat histogram as shown above.
Note that the images above were taken from this web page.
The solution I am aiming for does select the best fitting image size from a given number of sizes.
Given a number of rather random resolutions, I would like to find an image sized as close as possible to my preferred size.
Suppose I would like to use an image sized width x height (preferredImageSize).
Example: 320x200
Suppose I have the following image sizes at my disposal (availableImageSize) width1 x height1, width2 x height2, ... (maybe up to 10 different sizes).
Examples: 474x272, 474x310, 264x150, 226x128, 640x365, 474x410, 480x276, 256x144, 160x90, 320x182, 640x365, 192x108, 240x137, 480x276
For developing some generic approach to make the preferredImageSize variable I am trying to find a good solution that computes rather quick but also results into something that does look good on the screen.
I define looks good on the screen as an image that is:
hardly upscaled
as close to the given aspect-ratio (preferredImageSize.width / preferredImageSize.height) as possible
may be heavily downscaled
may be cropped/stretched in very small amounts
My initial (rather trivial) approach:
Run through the available image sizes once and find the smallest width delta (abs(preferredImageSize.width - availableImageSize.width)). The image with that smallest delta is then chosen (bestFitWidth).
That is certainly a way to solve the issue but definitely does not comply with my looks good on the screen hopes.
Any hints, no matter if text, source or links would be awesome. Ow, and if you think that my requirements (aka hopes) are already leading into the wrong direction, go ahead, let me know...
Edit: added cropping and stretching as options - which, I am afraid will make the issue even harder to solve. So if needed leave it out of the equation.
Simple "if/then" approach:
I would do two things:
Since you would rather not upscale, but are OK with downscaling (which I find a good choice), NEVER use a source image that is smaller than your target, unless none is available.
Since "heavy" downscaling is OK, I would try to find an image that matches the aspect ratio as closely as possible, starting with the smallest acceptable image and going to progressively larger images.
To put it together, first throw out all images from the list that are smaller than your target. Then, start with the smallest image left and check its aspect ratio against your target. If the mismatch is acceptable (which you need to quantify), use the image, otherwise go to the next bigger one. If you don't find any acceptable ones, use the one with the best match.
If you've already thrown out all images as smaller than your target, you will likely end up with a bad-looking image either way, but you should then try out whether it is worse the use an image that requires more upscaling, or whether it is worse to use an image that is a worse aspect ratio match.
One other thing you need to think about is whether you want to stretch or crop the images to match your target aspect ratio.
More complex quantitative approach:
The most flexible approach, though, would be to define yourself a "penalty" function that depends on the size mismatch and the aspect ratio mismatch and then find the source image that gives you the lowest "penalty". This is what you have currently done and you've defined your penalty function as abs(preferredImageSize.width - availableImageSize.width). You could go with something a little more complex, like for example:
width_diff = preferredImageSize.width - availableImageSize.width
height_diff = preferredImageSize.height - availableImageSize.height
if (width_diff > 0) width_penalty = upscale_penalty * width_diff
else width_penalty = downscale_penalty * width_diff
if (height_diff > 0) height_penalty = upscale_penalty * height_diff
else height_penalty = downscale_penalty * height_diff
aspect_penalty = ((preferredImageSize.width / preferredImageSize.height) -
(availableImageSize.width / availableImageSize.height)) * stretch_penalty;
total_penalty = width_penalty + height_penalty + aspect_penalty;
Now you can play with the 3 numbers upscale_penalty, downscale_penalty, and stretch_penalty to give these three quality reducing operations different importance. Just try a couple of combinations and see which works best.
I am trying to add noise to a BufferedImage in Java, but I am more interested in the algorithm used to add noise to an image rather than the Java or any other language-specific implementation.
I have searched the web and found out about the Gaussian Noise, but the tutorials/articles either show only code samples that is not very useful to me, or complex mathematical explanations.
It's not clear what your question is, but here's some random observations in case they help:
If the image is relatively unprocessed (it hasn't been scaled in size) then the noise in each pixel is roughly independent. So you can simulate that by looping over each pixel in turn, calculating a new noise value, and adding it.
Even when images have been processed the approach above is often a reasonable approximation.
The amount of noise in an image depends on a lot of factors. For typical images generated by digital sensors a common approximation is that the noise in each pixel is about the same. In other words you choose some standard deviation (SD) and then, in the loop above, select a value from a Gaussian distribution with that SD.
For astronomical images (and other low-noise electronic images), there is a component of the noise where the SD is proportional to the square root of the brightness of the pixel.
So likely what you want to do is:
Pick a SD (how noisy you want the image to be)
In a loop, for each pixel:
Generate a random number from a Gaussian with the given SD (and mean of zero) and add it to the pixel (assuming a greyscale image). For a colour image generate three values and add them to red, green and blue, respectively.
Update I imagine nightvision is going to be something like astronomical imaging. In which case you might try varying the SD for each pixel so that it includes a constant plus something that depends on the square root of the brightness. So, say, if a pixel has brightness b then you might use 100 + 10 * sqrt(b) as the SD. You'll need to play with the values but that might look more realistic.
I currently have a Java program that will get the rgb values for each of the pixels in an image. I also have a method to calculate a Haar wavelet on a 2d matrix of values. However I don't know which values I should give to my method that calculates the Haar wavelet. Should I average each pixels rgb value and computer a haar wavelet on that? or maybe just use 1 of r, g,b.
I am trying to create a unique fingerprint for an image. I read elsewhere that this was a good method as I can take the dot product of 2 wavelets to see how similar the images are to each other.
Please let me know of what values I should be computing a Haar wavelet on.
Thanks
Jess
You should regard the R/G/B components as different images: Create one matrix for R, G and B each, then apply the wavelet to parts of those independently.
You then reconstruct the R/G/B-images from the 3 wavelet-compressed channels and finally combine those to a 3-channel bitmap.
Since eznme didn't answer your question (You want fingerprints, he explains compression and reconstruction), here's a method you'll often come across:
You separate color and brightness information (chrominance and luma), and weigh them differently. Sometimes you'll even throw away the chrominance and just use the luma part. This reduces the size of your fingerprint significantly (~factor three) and takes into account how we perceive an image - mainly by local brightness, not by absolute color. As a bonus you gain some robustness concerning color manipulation of the image.
The separation can be done in different ways, e.g. transforming your RGB image to YUV or YIQ color space. If you only want to keep the luma component, these two color spaces are equivalent. However, they encode the chrominance differently.
Here's the linear transformation for the luma Y from RGB:
Y = 0.299*R + 0.587*G + 0.114*B
When you take a look at the mathematics, you notice that we're doing nothing else than creating a grayscale image – taking into account that we perceive green brighter than red and red brighther than blue when they all are numerically equal.
Incase you want to keep a bit of chrominance information, in order to keep your fingerprint as concise as possible, you could reduce the resolution of the two U,V components (each actually 8 bit). So you could join them both into one 8 bit value by reducing their information to 4 bit and combining them with the shift operator (don't know how that works in java). The chrominance should weigh less in comparison to the luma, in the final fingerprint-distance calculation (the dot product you mentioned).