Google custom image search returns bad images - java

I want to get images from the Google Custom Search API. My problem is that Iam getting very weird images and no matter what I change in the settings.
keywords: empty
edition: free, with ads
image search: on
safe search: off
speech input: off
language: english
sites to search: -
restrictions: empty
search entire web: on
(Sorry if something is wrong translated, my UI is in german).
Some other user also had this problem but his solution didnt help me. Google custom search - poor image results
So no matter what I change in the settings, Iam getting the same images.
If I search "apfel" (english: apple) Iam getting this image link:
https://scontent-atl3-1.cdninstagram.com/v/t51.2885-19/s150x150/31514744_140795226776868_4684314220345425920_n.jpg?_nc_ht=scontent-atl3-1.cdninstagram.com&_nc_ohc=FdhVBUbROnkAX9AJdVR&oh=ea552d4c8b23acd0a3c82d83632e0895&oe=5ECA7F0E
But when I search it in the UI I get this:
It should not be the issue but here the code:
public static void main(String[] args) throws Exception {
String key = "";
String cx = "";
String keyword = "apfel";
URL url = new URL("https://www.googleapis.com/customsearch/v1?key=" + key + "&cx=" + cx + "&q=" + keyword);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
BufferedReader br = new BufferedReader(new InputStreamReader((conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
if(output.contains("\"src\": \"")){
System.out.println(output); //Will print the google search links
}
}
conn.disconnect();
}

Related

Google custom image search - only returns website images

I want to fetch some google images with the Google custom search API. But instead of the google images Iam getting the thumbnails of the websites. Here an example:
Iam getting the link of these thumbnail images:
But I want to have the links of these images:
Maybe somoene can tell me how to do that!
The Code:
public static void main(String[] args) throws Exception {
String key = "";
String cx = "";
String keyword = "coke";
URL url = new URL("https://www.googleapis.com/customsearch/v1?key=" + key + "&cx=" + cx + "&q=" + keyword);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
BufferedReader br = new BufferedReader(new InputStreamReader((conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
if((output.contains("jpg") || output.contains("png")) && output.contains("src")){
System.out.println(output); //Will print the google search links
}
}
conn.disconnect();
}
Thanks a lot!
You aren't specifying that you want image search from Google. You are just searching for possible images in normal results. You'll need to add searchType=image.
Check this question and learn more about querying here.

An issue with an URLConnection using java

I'm trying to read out the code of a website.
But there is an issue if I want to receive the code of this site for example: "https://www.amazon.de/gp/bestsellers/pet-supplies/#2"
I tried a lot, but still im just receiving the code of https://www.amazon.de/gp/bestsellers/pet-supplies". So something does not work right as I want to receive place 21-40 and not 1-20.
I'm using an URLConneciton and a BufferedReader:
public String fetchPage(String urlS){
String s = null;
String qc = null;
try{
URL url = new URL(urlS);
URLConnection uc = url.openConnection();
uc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(uc.getInputStream()));
while((s = reader.readLine()) != null){
qc += s;
}
reader.close();
} catch(IOException e) {
e.printStackTrace();
qc = "receiving qc failed";
}
return qc;
}
Thank you in advance for your effort :)
The URL you're fetching, contains an achor (the #2 at the end). An anchor is a client-side concept and is originally used to jump to a certain part of the page. Some webapps (mostly single-page apps) use the anchor to keep track of some sort of state (eg. what page of products you're viewing).
Since the anchor is a client side concept, the responding webserver (or your browser/HTTP client library) just drops any anchors as if you actually requested https://www.amazon.de/gp/bestsellers/pet-supplies.
Bottom line is that you'll never get the second page... Goog luck in scraping Amazon though ;)

Searching image with its description on google

I have successfully created an API key for using Google Custom Search Api,The task now I want to perform is to upload some image from my hard drive and get the results from the website I have specified while getting my API key from Google console(from control panel).I have tried the code from this question asked on stackoverflow(code also given below)
public static void main(String[] args) throws Exception {
String key="YOUR KEY";
String qry="Android";
URL url = new URL(
"https://www.googleapis.com/customsearch/v1?key="+key+ "&cx=013036536707430787589:_pqjad5hr1a&q="+ qry + "&alt=json");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
if(output.contains("\"link\": \"")){
String link=output.substring(output.indexOf("\"link\": \"")+("\"link\": \"").length(), output.indexOf("\","));
System.out.println(link); //Will print the google search links
}
}
conn.disconnect();
}
Now how can I search my image and get the results. And also while searching,this piece of code is searching the whole Google , but I want it to search only the websites I have specified in the control panel at the google console while creating API KEY.

Using Regex Pattern with Jsoup on a Weblink

I'm trying to make a parser to get products info on a Website. I've made a similar tool with Php and Regex, and I wish to do the same with Java. The objective is to get a parent link, to make child products links with regex for getting their products info in a loop
String curl = TextField1.getText();
URL url = new URL(curl);
URLConnection spoof = url.openConnection();
spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream(),"UTF-8"));
String strLine = "";
while ((strLine = in.readLine()) != null){
Pattern pattern = Pattern.compile("style='color:#000000;font-weight:bold;'>(.*?)</a>");
strLine = strLine.replaceAll(" ","_");
strLine = strLine.replaceAll("d'","d");
Matcher m = pattern.matcher(strLine);
while(m.find()){
String enfurl = "http://www.exemple.com/fr/"+m.group(1)+".htm";
System.out.println(enfurl);
}
}
This code works, but someone tell me that Jsoup is a better solution to parse html. I'm reading the Jsoup documentation, but after establish a connexion, I don't know which syntax I must choose. Could you help me ?
EDIT : Ok, with this code :
Elements links = doc.select("a[href][title*=Cliquer pour obtenir des détails]");
for (Element link : links) {
System.out.println(link.attr("href"));
String urlenf = link.attr("href");
Document docenf = Jsoup.connect(urlenf).get();
System.out.println(docenf.body().text());
}
I've got the links... but now, I must open another Jsoup connexion to get product info, and this test don't works. How Could I use another Jsoup in the for loop ? thanks
Try to get the urls (and generally, the content) like this.
String url = "PAGE_URL_GOES_HERE";
InputStream is = new URL(url).openStream();
String encoding = "UTF-8";
Document doc = Jsoup.parse(is , encoding , url);
Update
Are you sure the problem is with the encoding of the url?
I tried the below code, and it works just fine.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String[] args) {
try {
String url = "http://www.larousse.fr/dictionnaires/francais-anglais/écrémer/27576?q=écrémé";
Document doc = Jsoup.connect(url)
.userAgent("Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)")
.get();
System.out.println(doc.toString());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Update 2
In any case, try this one too, Jsoup.connect(new String(url.getBytes("UTF-8")))
There are plenty of examples of jsoup usage on the net.
Document document = Jsoup.connect(targerUrl).get(); //get html page
Elements descElements = document
.select("table#searchResult td:nth-child(2) font.detDesc"); // find elemets by css selectors
for (int i = 0; i < descElements.size(); i++) {
String torrentDesc = descElements.get(i).html(); //get tag content
}

Google search using custom search

I'm requested to write an inverted index, so I would like as a start to write a java program which google searches a word and putting the results into an arraylist.
Here's my code:
String search = "Dan";
String google = "http://www.google.com/cse/publicurl?cx=012216303403008813404:kcqyeryhhm8&q=" + search;
URL url = new URL(google);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
BufferedReader reader = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
// Gather the results to a String array
List<String> resultsList = new ArrayList<String>();
String r;
while ((r = reader.readLine()) != null)
resultsList.add(r);
conn.disconnect();
System.out.println("Google Search for: " + search + " Is Done!");
The programs runs with no crashes in the middle, but I get only a source code of a page (which does not contain any links).
What do I need to change in the code? Maybe I need a whole different method?
If you want to use google search in your app you should use Google's API for that:
Custom search API
You get search results in JSON format.

Categories