Most modern mobile cameras has a family of techniques called Image Stabilization to reduce shaky effects in photographs due the motion of the camera lens or associated hardware. But still quite a number of mobile cameras produce shaky photographs. Is there a reliable algorithm or method that can be implemented on mobile devices, specifically on Android for finding whether a given input image is shaky or not? I do not expect the algorithm to stabilize the input image, but the algorithm/method should reliably return a definitive boolean whether the image is shaky or not. It doesn't have to be Java, but can also be C/C++ so that one can build it through the native kit and expose the APIs to the top layer. The following illustration describes the expected result. Also, this question deals with single image problems, thus multiple frames based solutions won't work in this case. It is specifically about images, not videos.
Wouldn't out of focus images imply that
a) Edges are blurred, so any gradient based operator will have a low values compared to the luminance in the image
b) edges are blurred, so any curvature based operator will have low values
c) for shaky pictures, the pixels will be correlated with other pixels in the direction of the shake (a translation or a rotation)
I took your picture in gimp, applied Sobel for a) and Laplacian for b) (available in openCV), and got images that are a lot darker in the above portion.
Calibrating thresholds for general images would be quite difficult I guess.
Are you dealing with video stream or a single image
In case of video stream: The best way is calculate the difference between each 2 adjacent frames. And mark each pixel with difference. When the amount of such pixels is low - you are in a non shaky frame. Note, that this method does not check if image is in focus, but only designed to combat motion blur in the image.
Your implementation should include the following
For each frame 'i' - normalize the image (work with gray level, when working with floating points normalize the mean to 0 and standard deviation to 1)
Save the previous video frame.
On each new video frame calculate pixel-wise difference between the images and count the amount of pixels for whom the difference exceed some threshold. If the amount of such pixels is too high (say > 5% of the image) that means that the movement between the previous frame and current frame is big and you expect motion blur. When person holds the phone firmly, you will see a sharp drop in the amount of pixels that changed.
If your images are represented not in floating point but in fixed point (say 0..255) than you can match the histograms of the images prior to subtraction in order to reduce noise.
As long as you are getting images with motion, just drop those frames and display a message to the user "hold your phone firmly". Once you get a good stabilized image, process it but keep remembering the previous one and do the subtraction for each video frame.
The algorithm above should be strong enough (I used it in one of my projects, and it worked like a magic).
In case of Single Image: The algorithm above does not solve unfocused images and is irrelevant for a single image.
To solve the focus I recommend calculating image edges and counting
the amount of pixels that have strong edges (higher than a
threshold). Once you get high amount of pixels with edges (say > 5%
of the image), you say that the image is in focus. This algorithm is far from being perfect and may do many mistakes, depending on the texture of the image. I recommend using X,Y and diagonal edges, but smooth the image before edge detection to reduce noise.
A stronger algorithm would be taking all the edges (derivatives) and calculating their histogram (how many pixels in the image had this specific edge intensity). This is done by first calculating an image of edges and than calculating a histogram of the edge-image. Now you can analyse the shape of the histogram (the distribution of the edges strength). For example take only the top 5% of pixels with strongest edges and calculate the variance of their edge intensity.
Important fact: In unfocused images you expect the majority of the pixels to have very low edge response, few to have medium edge response and almost zero with strong edge response. In images with perfect focus you still have the majority of the pixels with low edge response but the ratio between medium response to strong response changes. You can clearly see it in the histogram shape. That is why I recommend taking only a few % of the pixels with the strongest edge response and work only with them. The rest are just a noise. Even a simple algorithm of taking the ratio between the amount of pixels with strong response divided by the amount of pixels with medium edges will be quite good.
Focus problem in video:
If you have a video stream than you can use the above described algorithms for problematic focus detection, but instead of using constant thresholds, just update them as the video runs. Eventually they will converge to better values than a predefined constants.
Last note: The focus detection problem in a single image is a very tough one. There are a lot of academic papers (using Fourier transform wavelets and other "Big algorithmic cannons"). But the problem remains very difficult because when you are looking at a blurred image you cannot know whether it is the camera that generated the blur with wrong focus, or the original reality is already blurred (for example, white walls are very blurry, pictures taken in a dark tend to be blurry even under perfect focus, pictures of water surface, table surface tend to be blurry).
Anyway there are few threads in stack overflow regarding focus in the image. Like this one. Please read them.
You can also compute the Fourier Transform of the image and then if there is a low accumulation in the high frequencies bins, then the image is probably blurred. JTransform is a reasonable library that provides FFT's if you wish to travel down this route.
There is also a fairly extensive blog post here about different methods that could be used
There is also another stack overflow question asking this but with OpenCV, OpenCV also has Java bindings and can be used in Android projects so this answer could also be helpful.
Related
Is it a best practice to calculate EXACT graphic assets' size for different screen resolution to increase performance? Or is it good enough to just calculate approximate final size of those assets and then scale them so they won't lose sharpness/details?
It's difficult to answer with all the different things you've tagged.
On a modern mobile platform, the compositing is probably GPU-side. The image will cover the same number of pixels regardless of the texture size, which means it'll run the pixel shader the same number of times. You may see moderate variation in how fast the pixel shader runs based on whether the texture fits into the GPU's texture cache. Smaller is better and dimensions that are powers of 2 are usually better.
Aside from speed, if you match the exact size it'll be visually crisper because it avoids resampling.
If it's CPU side, making it the exact right size will probably skip rescaling altogether and be much, much faster. (By CPU standards at least, either way you're better off on the GPU)
Edit: And of course the real answer is to measure the frame rate difference in your particular situation, instead of asking the internet. :)
I am currently working on a road sign detection application on Android, using OpenCV, and found out that while processing frames in real-time, my camera often get focus on brighter parts of image, such as sky, and everything below (road, trees, and signs) is getting dark. Because of it my application is not able to detect these signs because they are too dark in this particular condition.
Have anyone of you had to do with such problem and found decent solution? If so I would appreciate any clues to do this (especially with good performance which is important in real-time processing).
As a preprocessing, you can apply intensity normalization. As a particular example histogram equalization can be applied:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_equalization/histogram_equalization.html
with an example code in java:
http://answers.opencv.org/question/7490/i-want-a-code-for-histogram-equalization/?answer=11014#post-id-11014
Note that such additional steps may slow down your overall detection operation.To increase overall speed, you can decrease your region-of-interest by sky detection.
You said that the camera gets focused on bright objects such as sky.
In most modern phones you can set the area of the image which is included in auto focus calculation. Since the sky is always in the upper part of the image (after you take care of phone orientation) you can set the focus zone to the lowest half of the image. This will take care of the problem in the first place.
If however you meant that the camera is not focused on the bright objects but rather does white balancing using the white objects, you can solve this in the same way as described for focus. If that does not help, try histogram equalization and gamma correction techniques. This will help improve the contrast
apologies as this is a common topic and haven't found a widely-agreed on solution.
We have a game world "grid" size of 1220 x 1080 (based on our Designer's photoshop designs). Currently we test on a Nexus 4 (1280x768 #320DPI) and TF201 Transformer Prime Tablet (1280x800 #149DPI).
When packing textures, with the TexturePacker, we're a bit confused about which combination of filters to use. We've read the following page:
http://www.badlogicgames.com/wordpress/?p=1403
.. and when using "Nearest, Nearest", our FPS was fine at 60, but assets became pixelated. Now we packed using "Mipmap, Mipmap", and our FPS went down to 30, but the textures are smoothly edged again.
Is there an agreed upon combination of these filters, or are they simply dependent on requirements? There are quite a lot of combinations to set for "min filter" and "mag filter" in the Packer, so don't want to keep randomly setting them until everything is smoothly resized and FPS is high again, without fully understanding what it is doing.
Many thanks.
J
If you are supporting multiple screen sizes (which you are if targeting Android), the Mag filter should always be Linear. There is no such thing as a mip-mapped mag filter, and on some devices that won't even work (you'll get pure black). It's kind of a "gotcha", because some devices will just assume you meant Linear and fix it for you, so if you fail to test on a device that doesn't do this for you, you'll be unaware of the problem. Nearest will look pixelated when stretched bigger, and you would only want to use it if your are doing retro low resolution graphics, or drawing something pixel perfect.
You can choose one of the following for the Min filter, from fastest (and worst looking) to slowest (and best looking):
Nearest - this will look pixelated and I can't think of any situation where this would be the right choice for a min filter.
MipMapNearestNearest - Won't look or perform better than nearest, and uses more memory. No reason to ever use this.
MipMapNearestLinear - Gets the nearest pixel from the two nearest mips and then linearly interpolates between them. This will still look pixelated. I don't think this is ever used either.
MipMapLinearNearest - Gets the nearest mip level and linearly determines the pixel color. This is most commonly used on mobile for smooth graphics, I think. It performs significantly faster than the below option, but there are cases where it will look slightly blurry (when the nearest mip is kind of on the small side for what's on screen).
MipMapLinearLinear - Gets the two nearest mip levels, linearly determines the pixel color on each of them, and then linearly blends between the two. If you have a sprite that shrinks from nothing to full size, you probably won't be able to detect any difference in quality from smallest to largest. But this is also slow. In the past, I have limited its use to my fonts. I have also done one project that could run at 60fps on new devices three years ago, where I used this on everything. I was very careful about overdraw in that app, so I could get away with it.
Finally, there's linear filtering, which looks and performs worse than the mip-mapping options (for a Min filter):
Linear - this will look smooth if the image is slightly smaller on screen than its original texture. This doesn't use up the 33% extra texture memory that mip mapping does, but the performance will be worse than it would with mip mapping if the texture gets any smaller than 50% of the original, because for each screen pixel it will have to sample and blend more than four pixels from the original texture.
I have a very large hi-res map which I want to use in an application (imagesize is around 80 mb).
I would like to know the following:
How can I load this image the best way possible? I know it will take some seconds to load the image (which is ok) but I would like to notify the user of the progress. I would like to use a determined mode and show this in some sort of JProgressBar to the user. This should reflect the number of bytes that have been loaded or something like that. Is there any Image loading method that can provide this functionality (like ImageIO.read())?
Because the map is of very high resolution I would like to offer the user to scroll to zoom in and out. How can I do this the best way? I know for a fact that rescaling a BufferedImage the standard way would take a VERY long time for such a big file. Is there any efficient way of doing this?
Thank you for your input!
kind regards,
Héctor van den Boorn
p.s. The image will be drawn on the canvas of a JPanel.
Hi Andrew, Thank you so much for your help; everything worked out perfectly and is loading quick.
Without your expertise and explanation I would have still been working on this so you've earned the bounty fair and square.
What I did was the following; using the imagemagick I created multiple images of different resolution and at the start of execution I load only the smallest res. image. The rest are loaded in seperate threads so execution is not stalled. Using the information you provided me I then use the appropriate images when zooming in or out. I'm a bit sceptical of using the tiles because I need to draw my own images on top of the map and I couldn't find the paint function in the external jar you told me to use, so I ended up using something simple; when zooming or panning the rescale mode is set to fast and when you're not zooming or panning the rescale is set to smooth for pixel-perfect images (just like you suggested), but this turns out to be fast enough and I don't need tiles (altough I do see that with even larger images this would be necesarry and I understand the information you've given me).
So thanks again and everything is working perfectly :)
There are two approaches you should (simultaneously) take:
Downscaling your image into various sizes. You should downscale your image at a series of lower resolutions (1/2, 1/4, 1/8, etc until the image is about the largest likely screen resolution). When the user first opens the image, you show the lower resolution image. This will load fast and allow the user to pan. When the user zooms in, you use a higher resolution image. You can use ImageMagick for this: http://www.imagemagick.org/Usage/resize/
Tile your larger images. This breaks down the single, large image into a large number of small images in a grid pattern. When a user zooms in on an area, you compute which tiles the user is looking at, and you render only them, not the other areas of the image. You can use ImageMagick to do split an image into tile, eg ImageMagick. What is the correct way to dice an image into sub-tiles. The documentation is http://www.imagemagick.org/Usage/crop/#crop_tile
(Providing a cache of appropriately sized and tiles images is what allows GoogleEarth and countless other mapping applications, to render so fast, yet zoom into the map at incredibly high resolution)
Once you have your tiles, you can use one of several engines in Java:
https://wiki.openstreetmap.org/wiki/Tirex
http://www.slick2d.org/wiki/index.php/Tiled
There may be others as well.
You can implement arbitrary zooming (suitable for pinch-to-zoom or similar) within this framework. Within the zoom limits you allow, your algorithm would be something like:
For the zoom level chosen by the user, choose the closest higher resolution cache. For example, if you have 100%, 50%, 25% and 12.5% tiles, and the user chooses 33% zoom, select the 50% tiles
Set the layout for the tiles so the tile squares have the correct size for the chosen zoom (this might be a single tile at lowest zoom levels). For example, at 33% zoom using 50% tiles, with the tiles being 100 pixels square, the grid will be 67 pixel squares
Individually load and scale the tile images to fit the screen (this can be multi-threaded which works well on modern CPU architectures)
There are a couple of points to note:
The scaling algorithm changes when you reach the greatest resolution you have tiles for.
Up to 100% zooming for the image, use bilinear or bicubic scaling. This provides excellent appearance for photographs with little jaggedness
Above 100%, you probably want to show the pixels, so nearest-neighbour might be a good choice
For higher fidelity, use a higher scale tile and downscale > 50%. For example, suppose you have tiles prepared at 100%, 50%, 25% and 12.5%. To show 40% zoom, don't scale down the 50% tiles; instead use the 100% tiles and scale them down to 40%. This is useful:
If your images are textual or diagrams (i.e. the raster images containing many straight lines). Scaling these type of images will often produce nasty artefacts if you don't oversample
If you need very high fidelity on photographic-style images
If you need to render a preview of the zoom (eg while the user is still pinching-and-zooming), grab a screenshot at the start of the gesture and zoom that. It matters much more that the animation is smooth than the zoom preview is pixel-perfect.
Selection of the right size of tile is important. Very large tiles (<1 per screen) is slow to render. Too small tiles creates other overheads and often produces nasty rendering artefacts where you see the screen filling up randomly. A good compromise between performance and complexity is to make the tiles about a quarter of the full-screen size.
When using these techniques, the images should load very much faster and so the progress bar is not so important. If it is, then you need to register a IIOReadProgressListener on the ImageReader:
ImageReader.addIIOReadProgressListener()
From the JavaDoc:
An interface used by ImageReader implementations to notify callers of their image and thumbnail reading methods of progress.
This interface receives general indications of decoding progress (via the imageProgress and thumbnailProgress methods), and events indicating when an entire image has been updated (via the imageStarted, imageComplete, thumbnailStarted and thumbnailComplete methods). Applications that wish to be informed of pixel updates as they happen (for example, during progressive decoding), should provide an IIOReadUpdateListener.
I am trying to add noise to a BufferedImage in Java, but I am more interested in the algorithm used to add noise to an image rather than the Java or any other language-specific implementation.
I have searched the web and found out about the Gaussian Noise, but the tutorials/articles either show only code samples that is not very useful to me, or complex mathematical explanations.
It's not clear what your question is, but here's some random observations in case they help:
If the image is relatively unprocessed (it hasn't been scaled in size) then the noise in each pixel is roughly independent. So you can simulate that by looping over each pixel in turn, calculating a new noise value, and adding it.
Even when images have been processed the approach above is often a reasonable approximation.
The amount of noise in an image depends on a lot of factors. For typical images generated by digital sensors a common approximation is that the noise in each pixel is about the same. In other words you choose some standard deviation (SD) and then, in the loop above, select a value from a Gaussian distribution with that SD.
For astronomical images (and other low-noise electronic images), there is a component of the noise where the SD is proportional to the square root of the brightness of the pixel.
So likely what you want to do is:
Pick a SD (how noisy you want the image to be)
In a loop, for each pixel:
Generate a random number from a Gaussian with the given SD (and mean of zero) and add it to the pixel (assuming a greyscale image). For a colour image generate three values and add them to red, green and blue, respectively.
Update I imagine nightvision is going to be something like astronomical imaging. In which case you might try varying the SD for each pixel so that it includes a constant plus something that depends on the square root of the brightness. So, say, if a pixel has brightness b then you might use 100 + 10 * sqrt(b) as the SD. You'll need to play with the values but that might look more realistic.