I am trying to add noise to a BufferedImage in Java, but I am more interested in the algorithm used to add noise to an image rather than the Java or any other language-specific implementation.
I have searched the web and found out about the Gaussian Noise, but the tutorials/articles either show only code samples that is not very useful to me, or complex mathematical explanations.
It's not clear what your question is, but here's some random observations in case they help:
If the image is relatively unprocessed (it hasn't been scaled in size) then the noise in each pixel is roughly independent. So you can simulate that by looping over each pixel in turn, calculating a new noise value, and adding it.
Even when images have been processed the approach above is often a reasonable approximation.
The amount of noise in an image depends on a lot of factors. For typical images generated by digital sensors a common approximation is that the noise in each pixel is about the same. In other words you choose some standard deviation (SD) and then, in the loop above, select a value from a Gaussian distribution with that SD.
For astronomical images (and other low-noise electronic images), there is a component of the noise where the SD is proportional to the square root of the brightness of the pixel.
So likely what you want to do is:
Pick a SD (how noisy you want the image to be)
In a loop, for each pixel:
Generate a random number from a Gaussian with the given SD (and mean of zero) and add it to the pixel (assuming a greyscale image). For a colour image generate three values and add them to red, green and blue, respectively.
Update I imagine nightvision is going to be something like astronomical imaging. In which case you might try varying the SD for each pixel so that it includes a constant plus something that depends on the square root of the brightness. So, say, if a pixel has brightness b then you might use 100 + 10 * sqrt(b) as the SD. You'll need to play with the values but that might look more realistic.
Related
I have a neural network (made in java) which classifies handwritten digits, trained using the mnist data set.
I have a GUI where the user will draw a number (number on the left) and when the user hits the "guess" button the drawing is converted into a 400 by 470 image and is down-scaled to a 20 by 20 image, then is centered to a 28 by 28 image to feed into the network where the output is given on the right.
Here is what the GUI looks like:
My problem however, is that when I have a number that doesn't take up the majority of the panel (such as the 3 in the image above) the down-scaled image that is used as the input for the network is too small which causes the network to guess incorrectly.
Here is the final input image when the number is drawn small:
Here is the final input image when the number is drawn large:
What I'm asking is: is there any way to make the number that is drawn small the same size as the number drawn large while still keeping the size of the image as 28 by 28?
You can either use another object-detection network just to find the bounding box, or just calculate where the leftmost, rightmost, upmost, and bottom-most drawn pixel is. If you fear there will be outliers (there should not unless the user purposefully clicks an area far from the figure), you can remove outliers fairly easily. There are a number of ways, but method is to compute the distance of each drawn pixel to the center of the image, putting them into a distribution (normal might be good enough), and then compute which are outliers, and get rid of them. (Or compute the distance beyond which pixels become outliers, and cropping the box to fit). Then you scale the rectangle up to the correct size.
This is just a general method. As for specifics, I do not know how exactly your images are represented, but you can iterate over every pixel and note their positions (the number of iterations is not overly expensive).
I implemented the diamond square algorithm in Java, but i'm not entirely satisfied with the results as a height map. It forms a lot of "lakes" - small areas of low height. The heights are generated using the diamond square algorithm, then normalized. In the example below, white = high, black = low and blue is anything below height 15: a placeholder for oceans.
This image shows the uncolored height map
How can I smooth the terrain to reduce the number of lakes?
I've investigated a simple box blurring function (setting each pixel to the average of its neighbors), but this causes strange artifacts, possibly because of the square step of the diamond square.
Would a different (perhaps gaussian) blur be appropriate, or is this a problem with my implementation? This link says the diamond square has some inherent issues, but these don't seem to be regularly spaced artifacts, and my heightmap is seeded with 16 (not 4) values.
Your threshold algorithm needs to be more logical. You need to actually specify what is to be removed in terms of size, not just height. Basically the simple threshold sets "sea level" and anything below this level will be water. The problem is that because the algorithm used to generate the terrain is does so in a haphazard way, small areas could be filled by water.
To fix this you need to essentially determine the size of regions of water and only allow larger areas.
One simple way to do this is to not allow single "pixels" to represent water. Essentially either do not set them as water(could use a bitmap where each bit represents if there is water or not) or simply raise the level up. This should get most of the single pixels out of your image and clear it up quite a bit.
You can extend this for N pixels(essentially representing area). Basically you have to identify the size of the regions of water by counting connected pixels. The problem is this, is that it allows long thin regions(which could represent rivers).
So it it is better to take it one step further and count the width and length separate.
e.g., to detect a simple single pixel
if map[i,j] < threshold && (map[i-1,j-1] > threshold && ... && map[i+1,j+1] > threshold) then Area = 1
will detect isolated pixels.
You can modify this to detect larger groups and write a generic algorithm to measure any size of potential "oceans"... then it should be simple to get generate any height map with any minimum(and maximum) size oceans you want. The next step is to "fix" up(or use a bitmap) the parts of the map that may be below sea level but did not convert to actual water. i.e., since we generally expect things below sea level to contain water. By using a bitmap you can allow for water in water or water in land, etc.
If you use smoothing, it might work just as well but you still will always run in to such problems. Smoothing reduces the size of the "oceans" but a large ocean might turn in to a small one and a small one eventually in to a single pixel. Depending on the overall average of the map, you might end up with all water or all land after enough iterations. Blurring also reduces the detail of the map.
The good news is, that if you design your algorithm with controllable parameters then you can control things like how many oceans are in the map, or how large they are, how square they are(or how circular if you want), or how much total water can be used, etc).
The more effort you put in to this you more accurate you can simulate reality. Ultimately, if you want to be infinitely complex you can take in to account how terrains are actually formed, etc... but, of course, the whole point of these simple algorithms is to allow them to be computable in reasonable amounts of time.
Most modern mobile cameras has a family of techniques called Image Stabilization to reduce shaky effects in photographs due the motion of the camera lens or associated hardware. But still quite a number of mobile cameras produce shaky photographs. Is there a reliable algorithm or method that can be implemented on mobile devices, specifically on Android for finding whether a given input image is shaky or not? I do not expect the algorithm to stabilize the input image, but the algorithm/method should reliably return a definitive boolean whether the image is shaky or not. It doesn't have to be Java, but can also be C/C++ so that one can build it through the native kit and expose the APIs to the top layer. The following illustration describes the expected result. Also, this question deals with single image problems, thus multiple frames based solutions won't work in this case. It is specifically about images, not videos.
Wouldn't out of focus images imply that
a) Edges are blurred, so any gradient based operator will have a low values compared to the luminance in the image
b) edges are blurred, so any curvature based operator will have low values
c) for shaky pictures, the pixels will be correlated with other pixels in the direction of the shake (a translation or a rotation)
I took your picture in gimp, applied Sobel for a) and Laplacian for b) (available in openCV), and got images that are a lot darker in the above portion.
Calibrating thresholds for general images would be quite difficult I guess.
Are you dealing with video stream or a single image
In case of video stream: The best way is calculate the difference between each 2 adjacent frames. And mark each pixel with difference. When the amount of such pixels is low - you are in a non shaky frame. Note, that this method does not check if image is in focus, but only designed to combat motion blur in the image.
Your implementation should include the following
For each frame 'i' - normalize the image (work with gray level, when working with floating points normalize the mean to 0 and standard deviation to 1)
Save the previous video frame.
On each new video frame calculate pixel-wise difference between the images and count the amount of pixels for whom the difference exceed some threshold. If the amount of such pixels is too high (say > 5% of the image) that means that the movement between the previous frame and current frame is big and you expect motion blur. When person holds the phone firmly, you will see a sharp drop in the amount of pixels that changed.
If your images are represented not in floating point but in fixed point (say 0..255) than you can match the histograms of the images prior to subtraction in order to reduce noise.
As long as you are getting images with motion, just drop those frames and display a message to the user "hold your phone firmly". Once you get a good stabilized image, process it but keep remembering the previous one and do the subtraction for each video frame.
The algorithm above should be strong enough (I used it in one of my projects, and it worked like a magic).
In case of Single Image: The algorithm above does not solve unfocused images and is irrelevant for a single image.
To solve the focus I recommend calculating image edges and counting
the amount of pixels that have strong edges (higher than a
threshold). Once you get high amount of pixels with edges (say > 5%
of the image), you say that the image is in focus. This algorithm is far from being perfect and may do many mistakes, depending on the texture of the image. I recommend using X,Y and diagonal edges, but smooth the image before edge detection to reduce noise.
A stronger algorithm would be taking all the edges (derivatives) and calculating their histogram (how many pixels in the image had this specific edge intensity). This is done by first calculating an image of edges and than calculating a histogram of the edge-image. Now you can analyse the shape of the histogram (the distribution of the edges strength). For example take only the top 5% of pixels with strongest edges and calculate the variance of their edge intensity.
Important fact: In unfocused images you expect the majority of the pixels to have very low edge response, few to have medium edge response and almost zero with strong edge response. In images with perfect focus you still have the majority of the pixels with low edge response but the ratio between medium response to strong response changes. You can clearly see it in the histogram shape. That is why I recommend taking only a few % of the pixels with the strongest edge response and work only with them. The rest are just a noise. Even a simple algorithm of taking the ratio between the amount of pixels with strong response divided by the amount of pixels with medium edges will be quite good.
Focus problem in video:
If you have a video stream than you can use the above described algorithms for problematic focus detection, but instead of using constant thresholds, just update them as the video runs. Eventually they will converge to better values than a predefined constants.
Last note: The focus detection problem in a single image is a very tough one. There are a lot of academic papers (using Fourier transform wavelets and other "Big algorithmic cannons"). But the problem remains very difficult because when you are looking at a blurred image you cannot know whether it is the camera that generated the blur with wrong focus, or the original reality is already blurred (for example, white walls are very blurry, pictures taken in a dark tend to be blurry even under perfect focus, pictures of water surface, table surface tend to be blurry).
Anyway there are few threads in stack overflow regarding focus in the image. Like this one. Please read them.
You can also compute the Fourier Transform of the image and then if there is a low accumulation in the high frequencies bins, then the image is probably blurred. JTransform is a reasonable library that provides FFT's if you wish to travel down this route.
There is also a fairly extensive blog post here about different methods that could be used
There is also another stack overflow question asking this but with OpenCV, OpenCV also has Java bindings and can be used in Android projects so this answer could also be helpful.
I am working with on a project where I have to display some pictures (grayscale) and I notice that many of them were too dark to properly see.
Then looking at ImageJ API documentation, I found the class: ij.plugin.ContrastEnhancer
And there is two methods there that I am having a hard time to understand their conceptual differences stretchHistogram() and equalize() both make the image brighter, but I still want to understand the differences.
My question is: what is the conceptual differences between those methods?
A histogram stretch is where you have an image that has a low dynamic range - so all of the pixel intensities are concentrated in a smaller band than the 0 to 255 range of an 8-bit greyscale image, for example. So the darkest pixel in the image may be 84 and the brightest 153. Stretching just takes this narrow range and performs a linear mapping to the full 0 to 255 range. Something like this:
Histogram equalisation attempts to achieve a flat histogram - so all possible pixel intensities are equally represented in the image. This means that where there are peaks in the histogram - concentrations of values in a certain range - these are expanded to cover a wider range so that the peak is flattened, and where there are troughs in the histogram, these are mapped to a narrower range so that the trough is levelled out. Again, something like this:
For a uni-modal histogram with a low dynamic range, the two operations are roughly equivalent, but in cases where the histogram already covers the full range of intensities the histogram equalisation gives a useful visual improvement while stretching does nothing (because there's nothing to stretch). The curve for mapping to equalise a histogram is derived from the cumulative distribution (so imagine each histogram bar is the sum of all previous values) and theoretically it's possible to achieve a perfectly flat histogram. However, because we are (normally) dealing with discrete values of pixel intensities, histogram equalisation gives an approximation to a flat histogram as shown above.
Note that the images above were taken from this web page.
I currently have a Java program that will get the rgb values for each of the pixels in an image. I also have a method to calculate a Haar wavelet on a 2d matrix of values. However I don't know which values I should give to my method that calculates the Haar wavelet. Should I average each pixels rgb value and computer a haar wavelet on that? or maybe just use 1 of r, g,b.
I am trying to create a unique fingerprint for an image. I read elsewhere that this was a good method as I can take the dot product of 2 wavelets to see how similar the images are to each other.
Please let me know of what values I should be computing a Haar wavelet on.
Thanks
Jess
You should regard the R/G/B components as different images: Create one matrix for R, G and B each, then apply the wavelet to parts of those independently.
You then reconstruct the R/G/B-images from the 3 wavelet-compressed channels and finally combine those to a 3-channel bitmap.
Since eznme didn't answer your question (You want fingerprints, he explains compression and reconstruction), here's a method you'll often come across:
You separate color and brightness information (chrominance and luma), and weigh them differently. Sometimes you'll even throw away the chrominance and just use the luma part. This reduces the size of your fingerprint significantly (~factor three) and takes into account how we perceive an image - mainly by local brightness, not by absolute color. As a bonus you gain some robustness concerning color manipulation of the image.
The separation can be done in different ways, e.g. transforming your RGB image to YUV or YIQ color space. If you only want to keep the luma component, these two color spaces are equivalent. However, they encode the chrominance differently.
Here's the linear transformation for the luma Y from RGB:
Y = 0.299*R + 0.587*G + 0.114*B
When you take a look at the mathematics, you notice that we're doing nothing else than creating a grayscale image – taking into account that we perceive green brighter than red and red brighther than blue when they all are numerically equal.
Incase you want to keep a bit of chrominance information, in order to keep your fingerprint as concise as possible, you could reduce the resolution of the two U,V components (each actually 8 bit). So you could join them both into one 8 bit value by reducing their information to 4 bit and combining them with the shift operator (don't know how that works in java). The chrominance should weigh less in comparison to the luma, in the final fingerprint-distance calculation (the dot product you mentioned).