I have a screenshot of the screen where I need to find the coordinate of the button's center (approximately). Screenshot and sample buttons in *.png format. I'm assuming a method with this signature:
public Coordinate getBtnCoordinate(BufferedImage src, BufferedImage dst) {
...
}
#Data
class Coordinate {
private int x;
private int y;
}
In the future this will be used in this way:
Coordinate сoordinate = getBtnCoordinates(...);
Robot robot = new Robot();
robot.mouseMove(сoordinate.getX(), сoordinate. getY());
robot.mousePress(InputEvent.BUTTON1_MASK);
But my attempts to implement getBtnCoordinates do not lead to anything for almost a week (((. Help me please implement this method. I will be grateful for any help.
The implementation depends on whether the searched button matches one of the example images exactly or if you need to find "similar" buttons, which is probably a lot harder.
If you search for an exact match, you could scan the screenshot for a pixel or that matches a pixel color at a given position in the button. Once you find a matching pixel, compare the other pixels until there is too much mismatch or the whole button matches.
If the screenshots have a single color background in the interesting area, you can use segmentation, i.e. find rectangular areas with content first, then compare only these areas to your example buttons.
If the buttons might vary in size, you probably want to segment the buttons themselves into content and top, left, right and bottom border to be able to do partial matches.
If the matches need to be really fuzzy, you may need more advanced techniques, e.g. use machine learning to train a model of the example buttons.
Without seeing typical example screenshots and buttons for your case, it's hard to provide more concrete advice for this problem.
Related
I'm developing a program with Java and Sikuli and I want to click on a red image with a specific shape that is located on screen.
The problem is that on the screen there is another image with that same shape but different color, blue.
import org.sikuli.script.Screen;
this.screen.type("C:\\Images\\TestImage.png", "a"); // this is what I'm using.
My mouse keeps moving between the two images because it can't tell the difference in color.
There is no way Sikuli can make the right choice for you. It can only locate a matching are based on your pattern (color in this case). To work around this issue you should provide some reference points that are unique and can be used to "help" Sikuli find the right match. For example, if the pattern you are interested in is located at the left side of the screen, then you can limit the search to the left side of the screen only. Or if you have a unique visual object in the are of interested you can use it as a pivot and look only around it.
On top of that, if you have few similar items appear in some ordered fashion (one under another for example), you can let Sikuli find all of them, calculate their coordinates and select the object you need based on these coordinates.
Searching using color is possible with Brobot, a state-based, testable, automation framework for Java. Brobot wraps Sikuli methods and uses Sikuli variables such as Match in its core functionality. Its color methods depend heavily on matrix operations with OpenCV. The Brobot documentation gives more detailed information on the framework. Additionally, there is a page in the documentation specifically for color searches.
Color Selection Methods
There are 3 methods used to identify which colors to search for.
Color averaging. The average color of all pixels is determined from all image files (a single Brobot image can contain multiple image files). The range of acceptable colors around the target color can be adjusted, as well as the required size of the color region.
K-Means. Images that contain a variety of colors are not suitable to color averaging. The k-Means method from OpenCV finds the salient colors in an image, which are then used in the search. Adjustable options include the number of k-Means, acceptable color range, and required size.
Histogram. Brobot divides images into 5 regions (4 corners + the middle region) in order to preserve some spatial color characteristics (for example, blue sky on the top of an image will not match blue sea on the bottom of an image). The search can be adjusted by color range, as well as the number of histogram bins for hue, saturation, and value.
Chained Find Operations
Each of these color methods can be used independently or combined with the traditional Sikuli pattern searches by chaining find operations together. There are two main flavors of chained find operations:
Nested finds. Matches are searched for inside the matches from the previous find operation. The last find operation is responsible for providing the function’s return value.
Confirmed finds. The matches from the first find operation are filtered by the results of following find operations. The later find operations act as confirmation or rejection of the initial matches. Confirmed matches from the first find operation are returned.
Documentation for nested finds and confirmed finds
Finding the Red Image
For your specific question, it would be best to use a chain of two find operations, set up as a confirmed find. The first operation would be a pattern search (equivalent to findAll in Sikuli) and the second operation, the confirmation operation, would be a color search.
Actions in Brobot are built with:
The action configuration (an ActionOptions variable)
The objects to act on (images, regions, locations, etc.)
Brobot encourages defining states with a collection of images that belong together. The code below assumes you have a state StateWithRedImage that contains the image RedImage. The results are returned in a Matches object, which contains information about the matches and the action performed.
public class RedImageFinder {
private final Action action;
private final StateWithRedImage stateWithRedImage;
public RedImageFinder(Action action, StateWithRedImage stateWithRedImage) {
this.action = action;
this.stateWithRedImage = stateWithRedImage;
}
public Matches find() {
ActionOptions actionOptions = new ActionOptions.Builder()
.setAction(ActionOptions.Action.FIND)
.setFind(ActionOptions.Find.ALL)
.addFind(ActionOptions.Find.COLOR)
.keepLargerMatches(true)
.build();
return action.perform(actionOptions, stateWithRedImage.getRedImage());
}
}
Disclaimer: I'm the developer of Brobot. It's free and open source.
Here's something that might help. Create a Region and try to find the image in that region as in example in the link
http://seleniumqg.blogspot.com/2017/06/findfailed-exception-in-sikuili-script.html?m=1
I am working on a game using LibGDX, and right now, I am working on the menu screen. What I want to do, is have a small image, set a bounding box, and expand however large I need it. What I think would be optimal, would be to set 2 rectangles. One for width and one for height. If it needs to get bigger or smaller, it would take that rectangle, and duplicate it beside, or beneath the current one, depending if it is for the width or height. I believe there is a builtin class for this, but I cannot seem to find it.
You might want to take a look at NinePatch if I'm not misunderstanding your question. Link here: https://github.com/libgdx/libgdx/wiki/Ninepatches
I am trying to use the Java Robot class to create a bot to automate some tedious tasks for me, I have never used the Robot class. I have looked up the Class in the Java docs, usage seems straightforward but I have an issue of finding a certain image(I say image, I mean a certain part of the screen) effectively. Is there any other way other than loading 'x' ammount of pixels, checking them, checking the next ammount etc until I find the image I am looking for? Also is there any list of the Button and MouseButton identifiers needed for the Java Robot class as I cna not find any.
For the mouse button identifiers, you are supposed to use BUTTON1_MASK and other button mask constants from java.awt.event.MouseEvent. For example, to click the mouse you would do something like:
Robot r = new Robot();
r.mousePress(MouseEvent.BUTTON1_MASK);
r.mouseRelease(MouseEvent.BUTTON1_MASK);
I believe BUTTON1_MASK is the left mouse button, BUTTON2_MASK is the middle mouse button, and BUTTON3_MASK is the right mouse button, but it has been a month or so since I have used Robot.
As for the checking for an image, I have no idea how that is normally done. But the way you specified in your question where you just check every group of pixels shouldn't be too computationally expensive because you can get the screen image as an array of primitives, then just access the desired pixel with a bit of math. So when checking the "rectangle" of pixels that you are searching for your image in, only keep checking the pixels as long as the pixels keep matching. The moment you find a pixel that does not match, move onto the next "rectangle" of pixels. The probability that you will find a bunch of pixels that match the image that end up not being the image is extremely low, meaning that each rectangle will only need to check about 5 or fewer pixels on average. Any software that performs this task would have to check every pixel on the screen at least once (unless it makes a few shortcuts/assumptions based on probabilities of image variations occurring), and the algorithm I described would check each pixel about 5 times, so it is not that bad to implement, unless you have a huge image to check.
Hope this helps!
I have an app where the user draws pictures and then these pictures are converted to pdf. I need to be able to crop out the whitespace before conversion. Originally I kept track of the highest and lowest x and y values (http://stackoverflow.com/questions/13462088/cropping-out-whitespace-from-a-user-drawn-image). This worked for a while, but now I want to give the user the ability to erase. This is a problem because if for example the user erases the topmost point the bounding box would change, but I wouldn't the new dimensions of the box.
Right now I'm going through the entire image, pixel by pixel, to determine the bounding box. This isn't bad for one image, but I'm going to have ~70, it's way too slow for 70. I also thought about keeping every pixel in an arraylist, but I don't feel like that would work well at all.
Is there an algorithm that would help me solve this? Perhaps something already built in? Speed is more important to me than accuracy. If there is some whitespace left on each side it won't be a tragedy.
Thank you so much.
You mentioned that you are keeping track of the min and max values for X and Y co-ordinates (that also seems the solution you have chosen in the earlier question).
In similar way to this, you should be able to find the min and max X & Y co-ordinates for the erased area, from the erase event...
When the user erases part of the image, you can simply compare the co-ordinates of the erased part with the actual image to find the final co-ordinates.
There is a related problem of trying to see if 2 rectangles overlap:
Determine if two rectangles overlap each other?
You can use similar logic (though slightly different) and figure out the final min/max X & Y values.
I'm building an Android puzzle game where the user rotates and shifts pieces of a puzzle to form a final picture. It's a bit like a sliding block puzzle but the shape and size of pieces is not uniform - more like a sliding block version of tetris.
At the moment I've got puzzle pieces as imageViews which can be selected and moved around a view to position them. I've got the vector forms of the shapes behind the scenes as ArrayLists of Points.
But...I'm stuck on how to snap align the pieces together. I.e. when a piece is nearby another, shift one piece so that the nearby edges overlay each other (i.e. essentially share a boundary).
I'm sure this has been done plenty of times but can't find examples with code (in any language). It's similar to snapping to a grid but not the same and is the same kind of functionality you get in a diagramming type interface when you can snap objects to each other.
Can anyone point me toward a tutorial (any langauge) / code / or advise on how to implement it?
Urs is like Tangram game. I think it cannot be done with pieces of image to form a final picture. It can be done by Creating Geometry shapes(for both Final shape and pieces/slices of final picture) using android.Graphics package. Its quite easy to determine the final shape from the edges and vertices of pieces/slices.
http://code.google.com/p/photogaffe/ is worth checking out. It is an opensource sliding puzzle consisting of 15 pieces and allows the user to choose an image from their gallery.
You would only have to figure out your various shapes and how to rotate them. And if you are supplying your own images...how to load them.
Hope that helps.
What about drawing a box around each shape. Afterwards you define the middle of it. Then you can store a value for the rotation for each piece. And you would need to store the neighbours together with a vector the their middle.
Then you simply have to compute that the vector is in a reasonable range and the rotation is +-X degree. For example if the vector is in a range of +-10pixels and the rotation is +-3° you could rotate the piece and fit it into the puzzle.