Protocol get Java URL - java

I'm trying to get a JSON format of all the websites found when querying google.
Code:
import java.io.FileWriter;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
/**
* Created by Vlad on 19/03/14.
*/
public class Query {
public static void main(String[] args){
try{
String arg;
arg = "random";
URL url = new URL("GET https://www.googleapis.com/customsearch/v1?key=&cx=017576662512468239146:omuauf_lfve&q=" + arg);
InputStreamReader reader = new InputStreamReader(url.openStream(),"UTF-8");
int ch;
while((ch = reader.read()) != -1){
System.out.print(ch);
}
}catch(Exception e)
{
System.out.println("This ain't good");
System.out.println(e);
}
}
}
Exception:
java.net.MalformedURLException: no protocol: GET https://www.googleapis.com/customsearch/v1?key=AIzaSyCS26VtzuCs7bEpC821X_l0io_PHc4-8tY&cx=017576662512468239146:omuauf_lfve&q=random

You should delete the GET at the beginning ;)
You should replace your code by :
URL url = new URL("https://www.googleapis.com/customsearch/v1?key=AIzaSyCS26VtzuCs7bEpC821X_l0io_PHc4-8tY&cx=017576662512468239146:omuauf_lfve&q=" + arg);
Url never start by GET or POSTor anything like that ;)

Urls are supposed to start with a protocol for transfer and GET https://www.googleapis.com/customsearch/v1?key=AIzaSyCS26VtzuCs7bEpC821X_l0io_PHc4-8tY&cx=017576662512468239146:omuauf_lfve&q=random is starting with GET, that is why the exception is received.
Change it to https://www.googleapis.com/customsearch/v1?key=AIzaSyCS26VtzuCs7bEpC821X_l0io_PHc4-8tY&cx=017576662512468239146:omuauf_lfve&q=random

Related

My HTML fetcher program in java returns incomplete results

My java code is:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class celebGrepper {
static class CelebData {
URL link;
String name;
CelebData(URL link, String name) {
this.link=link;
this.name=name;
}
}
public static String grepper(String url) {
URL source;
String data = null;
try {
source = new URL(url);
HttpURLConnection connection = (HttpURLConnection) source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
/**
* Attempting to fetch an entire line at a time instead of just a character each time!
*/
StringBuilder str = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while((data = br.readLine()) != null)
str.append(data);
data=str.toString();
} catch (IOException e) {
e.printStackTrace();
}
return data;
}
public static ArrayList<CelebData> parser(String html) throws MalformedURLException {
ArrayList<CelebData> list = new ArrayList<CelebData>();
Pattern p = Pattern.compile("<td class=\"image\".*<img src=\"(.*?)\"[\\s\\S]*<td class=\"name\"><a.*?>([\\w\\s]+)<\\/a>");
Matcher m = p.matcher(html);
while(m.find()) {
CelebData current = new CelebData(new URL(m.group(1)),m.group(2));
list.add(current);
}
return list;
}
public static void main(String... args) throws MalformedURLException {
String html = grepper("https://www.forbes.com/celebrities/list/");
System.out.println("RAW Input: "+html);
System.out.println("Start Grepping...");
ArrayList<CelebData> celebList = parser(html);
for(CelebData item: celebList) {
System.out.println("Name:\t\t "+item.name);
System.out.println("Image URL:\t "+item.link+"\n");
}
System.out.println("Grepping Done!");
}
}
It's supposed to fetch the entire HTML content of https://www.forbes.com/celebrities/list/. However, when I compare the actual result below to the original page, I find the entire table that I need is missing! Is it because the page isn't completely loaded when I start getting the bytes from the page via the input stream? Please help me understand.
The Output of the page:
https://jsfiddle.net/e0771aLz/
What can I do to just extract the Image link and the names of the celebs?
I know it's an extremely bad practice to try to parse HTML using regex and is the stuff of nightmares, but on a certain video training course for android, that's exactly what the guy did, and I just wanna follow along since it's just in this one lesson.

Programmatically Fetching Google+ status updates

Is there a way to programmatically fetch Google+ updates for a user's profile? I can't seem to find much in the documentation at https://developers.google.com/+/api/latest/people and http://developer.android.com/reference/com/google/android/gms/plus/model/people/Person.html about fetching statuses. I would like to fetch the data by making an HTTP request or if there is some sort of SDK for Android that will help me, that would work to.
The API you are looking for is plus.activities.list. This will list the Google+ equivalent of Facebook status updates. The referenced page has example code to get you started.
When accessing the API, you should use the Google API client as documented here.
The following code will be useful to retrieve the Http responses.
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Type;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
import java.util.zip.GZIPInputStream;
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
public class GooglePlusStatusHelper {
public GooglePlusStatusHelper() {
}
public static void main(String... args) {
GooglePlusStatusHelper googlePlusStatusHelper = new GooglePlusStatusHelper();
try {
googlePlusStatusHelper.tagsUsed();
} catch (IOException e) {
e.printStackTrace();
}
}
private void tagsUsed() throws IOException {
URL url = createQuery("users");
Type dataType = new TypeToken<Wrapper<Status>>(){}.getType();
Status status = executeQuery(url, dataType);
System.out.println(status);
}
private URL createQuery(String inputParam) throws MalformedURLException {
String baseUrl = "http://api.example.com/" + inputParam ;
System.out.println(baseUrl);
URL url = new URL(baseUrl);
return url;
}
private Status executeQuery(URL url, Type clz) throws IOException {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.connect();
System.out.println("Response Code:" + conn.getResponseCode());
System.out.println("Response Message:" + conn.getResponseMessage());
System.out.println("TYPE:" + conn.getContentType());
InputStream content = conn.getInputStream();
String encoding = conn.getContentEncoding();
if (encoding != null && encoding.equals("gzip")) {
content = new GZIPInputStream(content);
}
String result = new Scanner(content, "UTF-8").useDelimiter("\\A").next();
content.close();
Gson gson = new Gson();
return gson.fromJson(result, clz);
}
}
Status class :
public class Status {
private int count;
private String status;
......
public String toString() {
String result = "\ncount: " + count +
"\status:" + status;
result = result + "\n------------";
return result;
}
}

StringIndexOutOfBoundsException in Java?

I try to run this java program which returns a webpage in my webroot folder
import java.io.DataOutputStream;
import java.io.File;
import java.io.InputStreamReader;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.Scanner;
public class WebServer {
static ServerSocket requestListener;
static Socket requestHandler;
static Scanner requestReader, pageReader;
static DataOutputStream pageWriter;
static String HTTPMessage;
static String requestedFile;
public static int HTTP_PORT = 12346;
public static void main(String[] args) {
try {
requestListener = new ServerSocket(HTTP_PORT);
System.out.println("Waiting For IE to request a page:");
requestHandler = requestListener.accept();
System.out.println("Page Requested: Request Header:");
requestReader = new Scanner(new InputStreamReader(
requestHandler.getInputStream()));
//THis is the part where its throwing the error
int lineCount = 0;
do {
lineCount++; // This will be used later
HTTPMessage = requestReader.next();
System.out.println(HTTPMessage);
if (lineCount == 1) {
requestedFile = "WebRoot\\"
+ HTTPMessage.substring(5,
HTTPMessage.indexOf("HTTP/1.1") - 1);
requestedFile = requestedFile.trim();
}
// localhost:12346/default.htm
// HTTPMessage = requestReader.nextLine();
pageReader = new Scanner(new File(requestedFile));
pageWriter = new DataOutputStream(
requestHandler.getOutputStream());
while (pageReader.hasNext()) {
String s = pageReader.nextLine();
// System.out.println(s);
pageWriter.writeBytes(s);
}
// Tells the Browser we’re done sending
pageReader.close();
pageWriter.close();
requestHandler.close();
} while (HTTPMessage.length() != 0);
} catch (Exception e) {
System.out.println(e.toString());
System.out.println("\n");
e.printStackTrace();
}
}
}
and I get this error message. I am supposed to get a webpage in IE but all I get this error message.
Waiting For IE to request a page:
Page Requested: Request Header:
GET
java.lang.StringIndexOutOfBoundsException: String index out of range: -7
at java.lang.String.substring(Unknown Source)
at WebServer.main(WebServer.java:39)
This error is being thrown because the String 'HTTPMessage' does not contain the string 'HTTP/1.1'. Hence
HTTPMessage.indexOf("HTTP/1.1") => returns -1
So inside yoour substring function this is whats getting passed :
HTTPMessage.substring(5, -2);
Hence the error.
To solve this error, you should first try to check if HTTPMessage contains the required string and then try to compute the substring. Make the following change :
if (lineCount == 1 && HTTPMessage.indexOf("HTTP/1.1") != -1) {
requestedFile = "WebRoot\\"
+ HTTPMessage.substring(5,
HTTPMessage.indexOf("HTTP/1.1") - 1);
requestedFile = requestedFile.trim();
}

Grabbing text from websites

i have this small chunk of code that will grab the html code from a website. Im interested in parsing a certain section of the code though, several times. More specifically, im making a pokedex, and would like to parse certain descriptions from say a bulbapedia page, http://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon) for example. How would I make this parser take just the description of bulbasaur? How would I create any boundary to stop and start?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class WebCrawler{
public static void main(String[] args) {
try {
URL google = new URL("http://pokemondb.net/pokedex/bulbasaur");
URLConnection yc = google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
You can use Jsoup, with this code you can get the description of Bulbasaur:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) throws IOException {
Document doc = Jsoup
.connect(
"http://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon)")
.get();
Elements newsHeadlines = doc.select("#mw-content-text p");
for (Object o : newsHeadlines) {
System.out.println(o.toString());
}
}
}
Where mw-content is the main div.
Try with Jsoup
Syntax is JQuery selectors liked.

Parse HTML links from a google query

First the revised code which throws javax.swing.text.ChangedCharSetException:
import java.io.*;
import java.net.*;
public class Main
{
public static void main(String[] args) throws IOException, Exception
{
String query = "#pragma";
Socket s = new Socket("google.com",80);
PrintStream p = new PrintStream(s.getOutputStream());
p.print("GET /search?q="+query+" HTTP/1.0\r\n");
p.print("User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)\r\n");
p.print("Connection: close\r\n\r\n");
InputStreamReader in = new InputStreamReader(s.getInputStream());
BufferedReader buffer = new BufferedReader(in);
// String line;
//
// while ((line = buffer.readLine()) != null)
// { System.out.println(line); }
HTMLUtils.ParseLinks (buffer);
in.close();
}
}
import java.io.BufferedReader;
import java.io.IOException;
//import java.io.FileReader;
import java.io.Reader;
import java.util.List;
import java.util.ArrayList;
import javax.swing.text.html.parser.ParserDelegator;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.MutableAttributeSet;
public class HTMLUtils
{
private HTMLUtils() {}
public static List<String> extractLinks(Reader reader) throws IOException
{
final ArrayList<String> list = new ArrayList<String>();
ParserDelegator parserDelegator = new ParserDelegator();
ParserCallback parserCallback = new ParserCallback()
{
public void handleText(final char[] data, final int pos) { }
public void handleStartTag(Tag tag, MutableAttributeSet attribute, int pos)
{
if (tag == Tag.A) {
String address = (String) attribute.getAttribute(Attribute.HREF);
list.add(address);
}
}
public void handleEndTag(Tag t, final int pos) { }
public void handleSimpleTag(Tag t, MutableAttributeSet a, final int pos) { }
public void handleComment(final char[] data, final int pos) { }
public void handleError(final java.lang.String errMsg, final int pos) { }
};
parserDelegator.parse(reader, parserCallback, false);
return list;
}
public static void ParseLinks(BufferedReader buffer) throws Exception{
//FileReader reader = new FileReader("buffer");
List<String> links = HTMLUtils.extractLinks(buffer);
for (String link : links) {
System.out.println(link);
}
}
}
Notice that the user agent is for IE in this example.
Now I Have 3 problems:
How/can I pass the HTMLUtils.ParseLinks method a "raw buffer" instead of an HTML file she's expecting (I can write the buffer to a file but I guess that is unnecessary)
I don't know how to enter inverted commas (" ") inside the query statment in order to get the whole string i.e.: query=" "New York Yankees" "
Is it so complicated to get the User-Agent string from the host machine ???
link text
I have to say that it is imported class that I use and I don't really understand whats going on there. I'll try to understand when it will work [-8
THNX
Have a read of http://code.google.com/apis/ajaxsearch/, it's going to be a lot easier to get the data out of a JSON string than digging through acres of HTML. There's an open source Java class for digesting JSON: http://www.json.org/java/. Transferring the JSON will require a lot less bandwidth too!
If you want to do it in Java, you should consider using XPath to extract all links from the response. Therefore you first have to convert the response to XML. Then you can apply an XPath query like
//a/#href
to extract all href attributes for links. You can modify the query to only include links from the Google results and not from advertisements etc.
Here is another Tutorial to get you started.
Happy coding.
BTW: To avoid mistakes when you create your HTTP request and (even more important) to avoid unnecessary work, you could use a library like Apache Commons HTTPClient. This would reduce your work to:
HttpClient client = new HttpClient();
HttpMethod method = new GetMethod("http://www.google.com/search?q=" + query);
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + method.getStatusLine());
}
String response = new String(method.getResponseBody());

Categories