Resolve HtmlCleaner issue of getting HTTP respond code 403 - java

I'm using html cleaner to get data from a website...but I keep getting this error.
Server returned HTTP response code: 403 for URL: http://www.groupon.com/browse/chicago?z=skip
I'm not sure what I do wrong because I've use the same code before and its work perfectly.
is anyone able to help me please?.
Code is below:
public ArrayList ParseGrouponDeals(ArrayList arrayList) {
try {
CleanerProperties props = new CleanerProperties();
props.setTranslateSpecialEntities(true);
props.setTransResCharsToNCR(true);
props.setOmitComments(true);
TagNode root = new HtmlCleaner(props).clean(new URL("http://www.groupon.com/browse/chicago?z=skip"));
//Get the Wrapper.
Object[] objects = root.evaluateXPath("//*[#id=\"browse-deals\"]");
TagNode dealWrapper = (TagNode) objects[0];
//Get the childs
TagNode[] todayDeals = dealWrapper.getElementsByAttValue("class", "deal-list-tile grid_5_third", true, true);
System.out.println("++++ Groupon Deal Today: " + todayDeals.length + " deals");
for (int i = 0; i < todayDeals.length; i++) {
String link = String.format("http://www.groupon.com%s", todayDeals[i].findElementByAttValue("class", "deal-permalink", true, true).getAttributeByName("href").toString());
arrayList.add(link);
}
return arrayList;
} catch (Exception e) {
System.out.println("Error parsing Groupon:" + e.getMessage());
e.printStackTrace();
}
return null;
}

For me adding the 'User-Agent' solves the problem; use it like this snippet:
final URL urlSB = new URL("http://www.groupon.com/browse/chicago?z=skip");
final URLConnection urlConnection = urlSB.openConnection();
urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0");
urlConnection.connect();
final HtmlCleaner cleaner = new HtmlCleaner();
final CleanerProperties props = cleaner.getProperties();
props.setNamespacesAware(false);
final TagNode tagNodeRoot = cleaner.clean(urlConnection.getInputStream());

Related

Why is it saying this is not a json object?

i am new to json parsing and i'm trying to figure out why it's returning null.
Here is my java code (if you aren't familiar with the spigot api https://hub.spigotmc.org/javadocs/spigot/overview-summary.html)
Can you tell me what i am doing wrong? i'll give the gson part of the code and then i'll give the rest. think of it as just outputting the json in a console if you don't feel like reading the api.
try {
URL hypixel = new URL("https://api.hypixel.net/player?key=apikey&name=" + username);
URLConnection urlConn = hypixel.openConnection();
urlConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
urlConn.getDoOutput();
try(final BufferedReader reader = new BufferedReader(new InputStreamReader(urlConn.getInputStream()))) {
final JsonParser parser = new JsonParser();
parser.parse(reader.readLine());
final JsonObject object = parser.parse("").getAsJsonObject();
String userId = object.getAsJsonObject("player").get("_id").getAsString();
p.sendMessage(ChatColor.GREEN + "UID: " + userId);
}
} catch (IOException e) {
p.sendMessage(ChatColor.RED + "Something went wrong!");
}
(p.sendmessage would be the thing going in console)
Here is all of the code:
#Override
public boolean onCommand(CommandSender sender, Command command, String label, String[] args) {
Player p = (Player) sender;
if(command.getName().equalsIgnoreCase("hypixel")) {
if(args.length == 2) {
String username = args[0];
try {
URL hypixel = new URL("https://api.hypixel.net/player?key=apikey&name=" + username);
URLConnection urlConn = hypixel.openConnection();
urlConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
urlConn.getDoOutput();
try(final BufferedReader reader = new BufferedReader(new InputStreamReader(urlConn.getInputStream()))) {
final JsonParser parser = new JsonParser();
parser.parse(reader.readLine());
final JsonObject object = parser.parse("").getAsJsonObject();
String userId = object.getAsJsonObject("player").get("_id").getAsString();
p.sendMessage(ChatColor.GREEN + "UID: " + userId);
}
} catch (IOException e) {
p.sendMessage(ChatColor.RED + "Something went wrong!");
}
}
}
return false;
}
Any help is appreciated thank you!
(Oh and here is the part of the response from the api that i want to parse)
{"success":true,"player":{"_id":"5442f08f48b8f1e1e64a0400"}}
parser.parse("").getAsJsonObject()
is expecting "" to be JSON, which it is not.

java how to force refresh on url connection

I am behind proxy server and I am using the following code to get data from url:
private static String getData(String address)throws Exception
{
System.setProperty("java.net.useSystemProxies","true");
Date d = new Date();
String finalAdress = address+"?x="+d.getTime();
URL url = new URL(finalAdress);
System.out.println(finalAdress);
InputStream html = null;
HttpsURLConnection con = (HttpsURLConnection)url.openConnection();
con.setUseCaches(false);
con.setDefaultUseCaches(false);
con.setRequestMethod("GET");
con.addRequestProperty("Cache-Control", "no-cache");
//con.addRequestProperty("Cache-Control", "max-age=0");
con.addRequestProperty("Expires", "0");
con.addRequestProperty("Pragma", "no-cache");
html = con.getInputStream();
//html = url.openStream();
int c = 0;
StringBuffer buffer = new StringBuffer("");
while(c != -1) {
c = html.read();
buffer.append((char)c);
}
return buffer.toString();
}
However when data changes on the server side - I sill for some time get the same old (cached) data as a response..
Tried to use:
-Cache-Control
-slightly modified urls address+"?x="+d.getTime();
but nothing seems to work.
Is there a way to force refresh as I would with a web browser (ctrl-F5) ?

I'd like to send list to php file and save list to mysqli

Hi i'm trying to send course list which has more than 2,000 course info to mysqli through php file. but ! whenever i try to send list, it doesn't send it to server.
so can you help me to solve this problem..? :(
First, java source
public static void sendCourseInfoToDB(List<Subject> subjects, String url) {
try {
// url is my *.php file
URL target = new URL(url);
HttpURLConnection con = (HttpURLConnection) target.openConnection();
con.setRequestMethod("POST");
con.setDoOutput(true);
con.setDoInput(true);
con.setUseCaches(false);
con.setRequestProperty("Content-Type", "text/html; charset = utf-8");
DataOutputStream out = new DataOutputStream(con.getOutputStream());
int len = subjects.size();
for (int i = 0; i < len; ++i) {
//String t = subjects.get(i).toString();
out.writeBytes(subjects.get(i).toString());
out.flush();
}
out.flush();
out.close();
int responseCode = con.getResponseCode();
System.out.println("Post rqeust to Url : " + url);
System.out.println("Post Params : " + subjects.get(0).toString());
System.out.println("Resoponse Code : " + Integer.toString(responseCode));
con.disconnect();
} catch (Exception e) {
e.printStackTrace();
}
}
Subject class overrides toString. return-statement used parameter is encoded UTF-8
like this :
courseCode = 12156&courseName = %EC%8B%A0%EC%86%8C%EC%9E%AC%EA%B3%B5%ED%95%99%EB%B6%80&subjectName = %EC%A2%85%ED%95%A9%EA%B3%BC%EC%A0%9C%EC%84%A4%EA%B3%841&kindOfSubject = %EC%A0%84%EA%B3%B5&score = 2
and php file
<?php
header("Content-Type : text/html; charset = utf-8");
$mysqli = new mysqli("localhost", "user", "password", "db");
if($mysqli->mysqli_errno) {
print $mysqli_error;
exit();
}
$courseCode = $_POST["courseCode"];
$courseName = $_POST["courseName"];
$subjectName = $_POST["subjectName"];
$kindOfSubject = $_POST["kindOfSubject"];
$score = $_POST["score"];
$mysqli->query("INSERT INTO COURSE VALUES('$courseCode', '$courseName', '$subjectName', '$kindOfSubject', '$score')");
$response = $courseCode;
echo $response;
?>
should i call 'sendCourseInfoToDB function every time when i send course info to DB ? i dont know what is wrong.. help me crazy coding people~!~

Unexpected HTTP 403 error in Java

I am using Api in my Java App and triggering this URL (http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9999999999). I am getting HTTP 403 error in console but in web browser no error occurs and getting the expected response. I also tried other URL and they work fine without problem or any errors.
So, what is the problem in URL and what should I do?
Here is source code :
Main.java
import org.json.simple.*;
import org.json.simple.parser.*;
public class Main
{
public static void main(String[] args) throws Exception
{
String numb = "9999999999,8888888888";
String response = new http_client("http://checkdnd.com/api/check_dnd_no_api.php?mobiles="+numb).response;
System.out.println(response);
// encoding response
Object obj = JSONValue.parse(response);
JSONObject jObj = (JSONObject) obj;
String msg = (String) jObj.get("msg");
System.out.println("MESSAGE : "+msg);
JSONObject msg_text = (JSONObject) jObj.get("msg_text");
String[] numbers = numb.split(",");
for(String number : numbers)
{
if(number.length() != 10 || number.matches(".*[A-Za-z].*")){
System.out.println(number+" is invalid.");
}else{
if(msg_text.get(number).equals("Y"))
{
System.out.println(number+" is DND Activated.");
}else{
System.out.println(number+" is not DND Activated.");
}
}
}
}
}
Now , http_client.java
import java.net.*;
import java.io.*;
public class http_client
{
String response = "";
http_client(String URL) throws Exception
{
URL url = new URL(URL);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
BufferedReader bs = new BufferedReader(new InputStreamReader(con.getInputStream()));
String data ="";
String response = "";
while((data = bs.readLine()) != null){
response = response + data;
}
con.disconnect();
url = null;
con = null;
this.response = response;
}
}
Without you showing us the code you're using to access the supplied URL (http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9999999999) it makes it a wee bit difficult to determine exactly where your problem lies but my first guess would be that the link you provided is only accessible through a Secure Socket Layer (SSL). In other words, the link should start with https:// instead of http://
To validate this simply make the change to your url string: https://checkdnd.com/api/check_dnd_no_api.php?mobiles=9999999999 and try again.
You're not going to have an issue with a browser for the simple reason that generally browsers will always try both protocols to make a connection. It is also up to the Website what protocol is acceptable, lots allow for both and some just don't.
To check if a url string is utilizing a valid protocol you can use this little method I quickly whipped up:
/**
* This method will take the supplied URL String regardless of the protocol (http or https)
* specified at the beginning of the string, and will return whether or not it is an actual
* "http" (no SSL) or "https" (is SSL) protocol. A connection to the URL is attempted first
* with the http protocol and if successful (by way of data acquisition) will then return
* that protocol. If not however, then the https protocol is attempted and if successful then
* that protocol is returned. If neither protocols were successful then Null is returned.<br><br>
*
* Returns null if the supplied URL String is invalid, a protocol does not
* exist, or a valid connection to the URL can not be established.<br><br>
*
* #param webLink (String) The full link path.<br>
*
* #return (String) Either "http" for Non SLL link, "https" for a SSL link.
* Null is returned if the supplied URL String is invalid, a protocol does
* not exist, or a valid connection to the URL can not be established.
*/
public static String isHttpOrHttps(String webLink) {
URL url;
try {
url = new URL(webLink);
} catch (MalformedURLException ex) { return null; }
String protocol = url.getProtocol();
if (protocol.equals("")) { return null; }
URLConnection yc;
try {
yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
in.close();
return "http";
} catch (IOException e) {
// Do nothing....check for https instead.
}
try {
yc = new URL(webLink).openConnection();
//send request for page data...
yc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
yc.connect();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
in.close();
return "https";
} catch (IOException e) {
// Do Nothing....allow for Null to be returned.
}
return null;
}
To use this method:
// Note that the http protocol is supplied within the url string:
String protocol = isHttpOrHttps("http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9999999999");
System.out.println(protocol);
The output to console will be: https. The isHttpOrHttps() method has determined that the https protocol is the successful protocol to use in order to acquire data (or whatever) even though http was supplied.
To pull the page source from the web page you can perhaps use a method like this:
/**
* Returns a List ArrayList containing the page source for the supplied web
* page link.<br><br>
*
* #param link (String) The URL address of the web page to process.<br>
*
* #return (List ArrayList) A List ArrayList containing the page source for
* the supplied web page link.
*/
public static List<String> getWebPageSource(String link) {
if (link.equals("")) { return null; }
try {
URL url = new URL(link);
URLConnection yc = null;
//If url is a SSL Endpoint (using a Secure Socket Layer such as https)...
if (link.startsWith("https:")) {
yc = new URL(link).openConnection();
//send request for page data...
yc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
yc.connect();
}
//and if not a SLL Endpoint (just http)...
else { yc = url.openConnection(); }
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
List<String> sourceText = new ArrayList<>();
while ((inputLine = in.readLine()) != null) {
sourceText.add(inputLine);
}
in.close();
return sourceText;
}
catch (MalformedURLException ex) {
// Do whatever you want with exception.
ex.printStackTrace();
}
catch (IOException ex) {
// Do whatever you want with exception.
ex.printStackTrace();
}
return null;
}
In order to utilize both the methods supplied here you can try something like this:
String netLink = "http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9999999999";
String protocol = isHttpOrHttps(netLink);
String netLinkProtocol = netLink.substring(0, netLink.indexOf(":"));
if (!netLinkProtocol.equals(protocol)) {
netLink = protocol + netLink.substring(netLink.indexOf(":"));
}
List<String> list = getWebPageSource(netLink);
for (int i = 0; i < list.size(); i++) {
System.out.println(list.get(i));
}
And the console output will display:
{"msg":"success","msg_text":{"9999999999":"N"}}

URL set Connection Timeout is not working

I am using rss feeds to get latest news
and i get a XML response back
The issue i am facng is that in case if it takes longer than 5 seconds i just want the program to be stopped
this is my code (for testing purpose i have set time to 1 second)
public static void main(String args[]) {
Connection dbConnection = null;
PreparedStatement inserpstmt = null;
try {
final JSONArray latestnews = new JSONArray();
builder = getDocumentBuilderInstance();
final URL url = new URL("http://www.rssmix.com/u/8171434/rss.xml");
url.openConnection().setConnectTimeout(1000);
url.openConnection().setReadTimeout(1000);
final Document doc = builder.parse(url.openStream());
final NodeList items = doc.getElementsByTagName("item");
for (int i = 0; i < items.getLength(); i++) {
final Element item = (Element) items.item(i);
final String title = getValue(item, "title");
System.out.println(title);
}
} catch (Exception e) {
e.printStackTrace();
e.getMessage();
} catch (Throwable e) {
e.getMessage();
e.printStackTrace();
} finally {
}
}
But could you please let me know why this isn't being stopped and waiting for more than 1 second
Edited Code
StringBuffer sb = new StringBuffer("http://www.rssmix.com/u/8171434/rss.xml");
URLConnection conn = new URL(sb.toString()).openConnection();
conn.setConnectTimeout(7000);
conn.setReadTimeout(7000);
final Document doc = builder.parse(new InputSource(conn.getInputStream()));
You should probably approach this in the following fashion.
final URL url = new URL("http://www.rssmix.com/u/8171434/rss.xml");
URLConnection urlConn = url.openConnection();
urlConn.setConnectTimeout(1000);
urlConn.setReadTimeout(1000);
final Document doc = builder.parse(urlConn.getInputStream());
In the above code each time when you call openConnection, you get a new Connection object. Also finally you are using openStream which is equivalent to openConnection().getInputStream(). So all the timeouts are on different connection object and finally there is no timeout set on the connection object from where inputstream is taken. That is why it was not working. Below code will work as timeouts are present on same object from where InputStream is retrieved.
URLConnection connection = url.openConnection();
connection.setConnectTimeout(1000);
connection.setReadTimeout(1000);
connection.connect();

Categories