I'm developing a program with Java and Sikuli and I want to click on a red image with a specific shape that is located on screen.
The problem is that on the screen there is another image with that same shape but different color, blue.
import org.sikuli.script.Screen;
this.screen.type("C:\\Images\\TestImage.png", "a"); // this is what I'm using.
My mouse keeps moving between the two images because it can't tell the difference in color.
There is no way Sikuli can make the right choice for you. It can only locate a matching are based on your pattern (color in this case). To work around this issue you should provide some reference points that are unique and can be used to "help" Sikuli find the right match. For example, if the pattern you are interested in is located at the left side of the screen, then you can limit the search to the left side of the screen only. Or if you have a unique visual object in the are of interested you can use it as a pivot and look only around it.
On top of that, if you have few similar items appear in some ordered fashion (one under another for example), you can let Sikuli find all of them, calculate their coordinates and select the object you need based on these coordinates.
Searching using color is possible with Brobot, a state-based, testable, automation framework for Java. Brobot wraps Sikuli methods and uses Sikuli variables such as Match in its core functionality. Its color methods depend heavily on matrix operations with OpenCV. The Brobot documentation gives more detailed information on the framework. Additionally, there is a page in the documentation specifically for color searches.
Color Selection Methods
There are 3 methods used to identify which colors to search for.
Color averaging. The average color of all pixels is determined from all image files (a single Brobot image can contain multiple image files). The range of acceptable colors around the target color can be adjusted, as well as the required size of the color region.
K-Means. Images that contain a variety of colors are not suitable to color averaging. The k-Means method from OpenCV finds the salient colors in an image, which are then used in the search. Adjustable options include the number of k-Means, acceptable color range, and required size.
Histogram. Brobot divides images into 5 regions (4 corners + the middle region) in order to preserve some spatial color characteristics (for example, blue sky on the top of an image will not match blue sea on the bottom of an image). The search can be adjusted by color range, as well as the number of histogram bins for hue, saturation, and value.
Chained Find Operations
Each of these color methods can be used independently or combined with the traditional Sikuli pattern searches by chaining find operations together. There are two main flavors of chained find operations:
Nested finds. Matches are searched for inside the matches from the previous find operation. The last find operation is responsible for providing the function’s return value.
Confirmed finds. The matches from the first find operation are filtered by the results of following find operations. The later find operations act as confirmation or rejection of the initial matches. Confirmed matches from the first find operation are returned.
Documentation for nested finds and confirmed finds
Finding the Red Image
For your specific question, it would be best to use a chain of two find operations, set up as a confirmed find. The first operation would be a pattern search (equivalent to findAll in Sikuli) and the second operation, the confirmation operation, would be a color search.
Actions in Brobot are built with:
The action configuration (an ActionOptions variable)
The objects to act on (images, regions, locations, etc.)
Brobot encourages defining states with a collection of images that belong together. The code below assumes you have a state StateWithRedImage that contains the image RedImage. The results are returned in a Matches object, which contains information about the matches and the action performed.
public class RedImageFinder {
private final Action action;
private final StateWithRedImage stateWithRedImage;
public RedImageFinder(Action action, StateWithRedImage stateWithRedImage) {
this.action = action;
this.stateWithRedImage = stateWithRedImage;
}
public Matches find() {
ActionOptions actionOptions = new ActionOptions.Builder()
.setAction(ActionOptions.Action.FIND)
.setFind(ActionOptions.Find.ALL)
.addFind(ActionOptions.Find.COLOR)
.keepLargerMatches(true)
.build();
return action.perform(actionOptions, stateWithRedImage.getRedImage());
}
}
Disclaimer: I'm the developer of Brobot. It's free and open source.
Here's something that might help. Create a Region and try to find the image in that region as in example in the link
http://seleniumqg.blogspot.com/2017/06/findfailed-exception-in-sikuili-script.html?m=1
Related
As the title says, I'm trying to find a way to generate a transformation matrix to best best align two images (the solution with the smallest error value computed with an arbitrary metric - for example the SAD of all distances between corresponding points). Example provided below:
This is just an example in the sense that the outer contour can be any shape, the "holes" can be any shape, any size and any number.
The "from" image was drawn by hand in order to show that the shape is not perfect, but rather a contour extracted from a camera acquired image.
The API function that seems to be what I need is Video.estimateRigidTransform but I ran into a couple of issues and I'm stuck:
The transformation must be rigid in the strongest sense, meaning it must not do any kind of scaling, only translation and rotation.
Since the shapes in the "from" image are not perfect, the number of points in the contour are not the same as the ones in the "To" image, and the function above need two sets of corresponding points. In order to bypass this I have tried another approach: I have calculated the centroids of the holes and outer contour and tried aligning those. There are two issues here:
I need alignment even if one of the holes is missing in the "from" image.
The points must be in the same order in both lists passed to Video.estimateRigidTransform and there is no guarantee that function findContours will provide them in the same order in both shapes.
I have yet to try to run a feature extractor and matcher to obtain some corresponding points but I'm not very confident in this method, especially since the "From" image is a natural image with irregularities.
Any ideas would be greatly appreciated.
I am trying to use the Java Robot class to create a bot to automate some tedious tasks for me, I have never used the Robot class. I have looked up the Class in the Java docs, usage seems straightforward but I have an issue of finding a certain image(I say image, I mean a certain part of the screen) effectively. Is there any other way other than loading 'x' ammount of pixels, checking them, checking the next ammount etc until I find the image I am looking for? Also is there any list of the Button and MouseButton identifiers needed for the Java Robot class as I cna not find any.
For the mouse button identifiers, you are supposed to use BUTTON1_MASK and other button mask constants from java.awt.event.MouseEvent. For example, to click the mouse you would do something like:
Robot r = new Robot();
r.mousePress(MouseEvent.BUTTON1_MASK);
r.mouseRelease(MouseEvent.BUTTON1_MASK);
I believe BUTTON1_MASK is the left mouse button, BUTTON2_MASK is the middle mouse button, and BUTTON3_MASK is the right mouse button, but it has been a month or so since I have used Robot.
As for the checking for an image, I have no idea how that is normally done. But the way you specified in your question where you just check every group of pixels shouldn't be too computationally expensive because you can get the screen image as an array of primitives, then just access the desired pixel with a bit of math. So when checking the "rectangle" of pixels that you are searching for your image in, only keep checking the pixels as long as the pixels keep matching. The moment you find a pixel that does not match, move onto the next "rectangle" of pixels. The probability that you will find a bunch of pixels that match the image that end up not being the image is extremely low, meaning that each rectangle will only need to check about 5 or fewer pixels on average. Any software that performs this task would have to check every pixel on the screen at least once (unless it makes a few shortcuts/assumptions based on probabilities of image variations occurring), and the algorithm I described would check each pixel about 5 times, so it is not that bad to implement, unless you have a huge image to check.
Hope this helps!
I want to achieve the goal that having two images in JAVA, having defined control points for both of them, I will be able to overlap and compose a final image based on these control points.
This means that each control point on ONE image has a direct relationship with a control point on the second, so that when composing the two images they will match perfectly.
An example of this usage could be for example wearing a person with different clothes (the shirt has control points which match control points on the body) by overlapping and redimensioning.
The question is that the normal redimension methods redimensionate images in a "proportionate" way, this means only width and height. I'd like to create some control points on an image in such a way:
So that I can redimension the image just based on those control points.
Any help?
You need to look at nonrigid image deformation techniques, sometimes referred to as image morphing or image warping. Be aware that they require a good deal of mathematics to understand, and numerical software components (esp. a good linear solver) to be implemented.
A classic method for control-point-based image deformation is the Thin Plate Spline. I find the original paper more helpful for implementation than the Wikipedia entry.
This is a page with some other techniques.
Problem Description
I am writing a Java application that lets programmers query for page elements on a web page by specifying visible attributes. One of the most important and difficult is Color.
To be specific, i need a way to get the user-visible color of web page elements using Selenium 2 and Webdriver. I want to be able to query for color values (#ff0000) or names (red).
One parameter should control the percentage of similar colors needed to be "dominating" enough. If set to 100% the element is not allowed to have any other color. If set to 50% the element needs to be halfway filled with the color.
There should be another parameter to control the "tolerance" of these colors. With a higher tolerance, red could also match the orange "Ask Question" button here on Stackoverflow.
Example
Given the well-known Stackoverflow web page, i highlighted the page element to check:
With a higher color tolerance and a not too high domination percentage, the following queries should return the specified result:
color('#FFEFC6') // exact match: true
color('yellow') // match in tolerance range: true
color('orange') // true
color('blue') // false
color('green') // false
My first approach
Best bet would be using CSS attributes like color and background-color. But these do not take images into account, which are needed for good color queries. Also, they could produce difficulties because of css selector inheritance and the handling of transparency. In addition, absolutely positioned elements with a higher z-index above the current element could produce unexpected results.
Given is the web page element to check. It is represented either as JavaScript DOM element (or JQuery object) or as RemoteWebElement in the Java bindings of Webdriver.
It is possible to take automated Screenshots of the current state of the web page (i am using Firefox), see here: Take a screenshot with Selenium WebDriver
The coordinates of the page element to check are known. Therefore, the screenshot image could cropped to that size and area and be analyzed somehow to check if the query returns true or false.
Implementation
I am not limited to Java in this case. JavaScript would be very nice because i am doing the other queries with the help of JQuery too. Performance matters. I am counting on you, i fear this is a very difficult task. Therefore i need your input.
UPDATE
I solved this issue by taking screenshots and analyzing the pixel data of the relevant part. That way i can deal with all kinds of background images and transparency. It's part of the Abmash framework, which is open source and free to anybody to use: Abmash on Github
Easiest way:
Get screenshot (save in memory)
Crop screenshot to the element top = el.offsetTop, left = el.offsetLeft, width = el.offsetWidth, height = el.offsetHeight
Get the pixel data for the cropped image
Loop through the pixels getting the total sums of the R, G, B elements then divide the total sums by the pixel count to get the average. Test the average color against your constraints.
If you really want to use JavaScript
You could send the pixel data to JavaScript for processing if you're intent on doing the final check in JavaScript.
Or you could send JavaScript the IMAGE URI for the cropped image. Then draw that IMG to a CANVAS then loop through the pixel with ctx2d.getImageData(...)
Only do the above if the element is an IMG or a has a background-image CSS. Just use color and background-color CSS checks otherwise.
i am facing an issue while using sikuli through java, if there are 2 elements of same kind(or similar image) it fails to click on the correct element. so i wanted to know if it is possible to make sikuli just work inside a particular region and can some one please explain how can it be done ??
Yes sikuli can work within a particular region. The challenge is defining a region that only contains one of your two elements. You define a region by x,y coordinates. You can also increase the size of a region based on the location of a unique pattern (image) on your display.
while exists("foo.png"):
hover("bar.png")
ClickMeRegion = find("bar.png").nearby(5).right()
ClickMeRegion.click("baz.png")
So in the above I look for image foo.png/bar.png/baz.png image pairs that are being displayed. First I hover on bar.png so that visually I can see which pair the script is looking at. Then I create a region extending 5 pixels around the center of bar.png and extend this to the right of the display. This highlights a single baz.png image. I can then click on the one baz.png that I am interested in.
For more info on regions see: http://doc.sikuli.org/region.html