Java 403 Exception When My Bot Tries To Send An Embedded Link - java

I got a discord bot that I've made in Java and one of its purposes is to send an embedded link (I don't own the site) everytime someone leaves the server. It worked the first 2-3 times and every time after that I get the following exception:
java.io.IOException: Server returned HTTP response code: 403 for URL: ...
Example link:
https://signature.hzgaming.net/sig.php?name=Juntao_Lubu&style=1
I tried numerous solutions I've found online (with User-Agents and all that fancy stuff), but none of them seem to work for me.
Is there any other workaround this?
Code:
String link = "https://signature.hzgaming.net/sig.php?name=" + allMembers.get(mEvent.getUser().getDiscriminatedName()).replace(" ", "_") + "&style=1";
URLConnection urlCon = new URL(link).openConnection();
urlCon.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36");
InputStream is = urlCon.getInputStream();
StringBuilder textBuilder = new StringBuilder();
Reader reader = new BufferedReader(new InputStreamReader(is, Charset.forName(StandardCharsets.UTF_8.name())));
int c = 0;
while((c = reader.read()) != -1) {
textBuilder.append((char)c);
}
String result = textBuilder.toString().replaceAll("<[^>]*>", "");
if(!result.equalsIgnoreCase("Non-Existant Player") && !result.equalsIgnoreCase("Non-ExistantPlayer")) {
new MessageBuilder().append(link).send((TextChannel)server.getChannelById(973242211623895080L).get());
}
Thanks in advance.

Related

HTML contents different from Google "View page source"

I've read on this page that this has something to see with the user agent used, but I can't find a way to to get the one used by Google.
I'm trying to fet HTML contents from let's say https://www.kayak.fr/flights/TLS-ATH/2019-10-04/2019-10-07?sort=price_a, when I click on "View page source" using Google Chrome, I'm getting prices etc (what I need) but I can't access those with my java code..
Do I have to find the user-agent of my Google Chrome? I found this page but I'm getting the exact same result as before using java..
Any ideas?
Here's my code:
try{
URL url = new URL("https://www.kayak.fr/flights/TLS-ATH/2019-10-04/2019-10-07?sort=price_a");
URLConnection con = url.openConnection();
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(con.getInputStream(),"UTF-8"));
String line;
while((line = bufferedReader.readLine()) != null){
System.out.println(line);
}
bufferedReader.close();
}catch(IOException e){
e.printStackTrace();
}
The setRequestProperty is really random in this code because I'm still testing.

Java - Read page source from url returns unknown characters

I am using the code below to read a page source from url (https://www.amazon.com) with "UTF-8" charset in NetBeans, but it returns unknown characters (the attached image). I don't have any idea that what is the problem and would be gratefull if help me to modify the code to work properly? Thanks.
public static String getURLSource(String url) throws IOException
{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
private static String toString(InputStream inputStream) throws IOException
{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
{
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
Use HttpsUrlConnection instead of UrlConnection. See a similar question.
You just need to unzip your content. Here is the code that worked for me
HttpClient httpClient = new HttpClient();
try {
httpClient.setConnectionUrl("https://www.amazon.com");
ByteBuffer buff = httpClient.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11")
.sendHttpRequestForBinaryResponse(HttpClient.HttpMethod.GET);
try (
ByteArrayInputStream bais = new ByteArrayInputStream(buff.array());
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader isr = new InputStreamReader(gzis);
BufferedReader br = new BufferedReader(isr)
) {
br.lines().forEach(line -> System.out.println(line));
}
} catch (Exception e) {
System.out.println(httpClient.getLastResponseCode() + " "
+ httpClient.getLastResponseMessage() + TextUtils.getStacktrace(e, false));
}
Just few clarifications: In this example I use a 3d party Http client class HttpClient (And also class TextUtils). They both come from Open source MgntUtils library writen and maintained by me. But you don't have to use it. The main part is - read the info from the InputStream as binary info (as byte array or ByteBuffer) and than and unzip it with GZIPInputStream like in my example.
If you do want to use MgntUtils library you can get it As maven artifact or from Github (including source code and Javadoc). and here is Javadoc online

403 forbidden for url in java but not in browser

I'm behind a corporate firewall, but i can paste the URL in my browser with and without my proxy settings enabled within the browser and can retrieve the data fine. I just can't within java.
Any ideas?
Code:
private static String getURLToString(String strUrl) throws IOException {
// LOG.debug("Calling URL: [" + strUrl + "]");
String content = "";
URLConnection connection = new URL(strUrl).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
String inputLine;
while ((inputLine = br.readLine()) != null) {
content += inputLine;
}
br.close();
return content;
}
Error:
java.io.FileNotFoundException: Response: '403: Forbidden' for url: '<url here>'
at weblogic.net.http.HttpURLConnection.getInputStream(HttpURLConnection.java:778)
at weblogic.net.http.SOAPHttpURLConnection.getInputStream(SOAPHttpURLConnection.java:37)
Note: The '' portion is for anonymizing.
As you are receiving a "403: Forbidden" error, it means that your Java code can reach the URL, but it lacks something that is required to access it.
In the browser, press F12 (developer/debug mode) and request the URL again. Check the headers and cookies that are being sent. Most likely you will need to add one of these for you to be able to receive the content you need.
Adding "User-Agent" header fixed it for me:
connection.setRequestProperty("User-Agent", "Mozilla/5.0");

403 Forbidden with Java but not web browser?

I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers. Code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
public class DataGetter {
public static void main(String[] args) throws IOException {
getResultAmount("test");
}
private static int getResultAmount(String query) throws IOException {
BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
.getInputStream()));
String line;
String src = "";
while ((line = r.readLine()) != null) {
src += line;
}
System.out.println(src);
return 1;
}
}
And the error:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at DataGetter.getResultAmount(DataGetter.java:15)
at DataGetter.main(DataGetter.java:10)
Why is it doing this?
You just need to set user agent header for it to work:
URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
The SSL was transparently handled for you as could be seen from your exception stacktrace.
Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.
String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
String url = m.group(1);
connection = new URL(url).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.setRequestProperty("Cookie", cookie );
connection.connect();
r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
sb = new StringBuilder();
while ((line = r.readLine()) != null) {
sb.append(line);
}
response = sb.toString();
pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
m = pattern.matcher(response);
if( m.find() ) {
long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
return amount;
}
}
Running the full code I get 2930000000L as a result.
For me it worked by adding the header:
"Accept": "*/*"
You probably aren't setting the correct headers. Use LiveHttpHeaders (or equivalent) in the browser to see what headers the browser is sending, then emulate them in your code.
It's because the site uses SSL. Try using the Jersey HTTP Client. You will probably also have to learn a little about HTTPS and the certificates, but I think Jersey can bet set to ignore most of the details relating to the actual security.

Passing sessionId obtained from one response to the next request

I need to download a CSV file from Google insights programatically. Since it requires authentication, I used the clientLogin to get the session id.
How do I download the file by passing the session id as a cookie?
I tried using a new URLConnection object and set the cookie in setRequestParameter method hoping it would authenticate my login then, however it doesn't seem to be working. I have a feeling I shouldn't use two separate connections, is that true?
If so then how do I pass session id as parameter when i download the file? I also tried using the same connection this didn't work either. Please help.
try {
URL url1 = new URL("https://www.google.com/accounts/ClientLogin?accountType=GOOGLE&Email=*******.com&Passwd=*****&service=trendspro&source=test-test-v1");
URL url2 = new URL("http://www.google.com/insights/search/overviewReport?cat=0-7&geo=BR&cmpt=geo&content=1&export=1");
URLConnection conn = url1.openConnection();
// fake request coming from browser
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
String f = in.readLine();
// obtaining the sid.
String sid=f.substring(4);
System.out.println(sid);
URLConnection conn2 = url2.openConnection();
conn2.setRequestProperty("Cookie", sid);
BufferedInputStream i= new BufferedInputStream(conn2.getInputStream());
FileOutputStream fos = new FileOutputStream("f:/testplans.csv");
BufferedOutputStream bout = new BufferedOutputStream(fos,1024);
byte data[] = new byte[1024];
while(i.read(data,0,1024)>=0) {
bout.write(data);
}
bout.close();
in.close();
}
Try the following: link. Check the top answer: they don't use the SID, but the Auth.
If it's working for Google Reader, it will probably work for Google Insights as well.

Categories