how to NOT get mobile version of a website in java? - java

This code returns the mobile version of a website. How to get a desktop version instead?
InputStreamReader page = new InputStreamReader(new URL('http://www.***.com/').openStream());

Use a user agent that matchs a Desktop browser
Edited with sources
URL url = new URL("http://www.clarku.edu/");
URLConnection connection = url.openConnection();
connection.addRequestProperty("User-Agent", "Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

Related

HTML contents different from Google "View page source"

I've read on this page that this has something to see with the user agent used, but I can't find a way to to get the one used by Google.
I'm trying to fet HTML contents from let's say https://www.kayak.fr/flights/TLS-ATH/2019-10-04/2019-10-07?sort=price_a, when I click on "View page source" using Google Chrome, I'm getting prices etc (what I need) but I can't access those with my java code..
Do I have to find the user-agent of my Google Chrome? I found this page but I'm getting the exact same result as before using java..
Any ideas?
Here's my code:
try{
URL url = new URL("https://www.kayak.fr/flights/TLS-ATH/2019-10-04/2019-10-07?sort=price_a");
URLConnection con = url.openConnection();
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(con.getInputStream(),"UTF-8"));
String line;
while((line = bufferedReader.readLine()) != null){
System.out.println(line);
}
bufferedReader.close();
}catch(IOException e){
e.printStackTrace();
}
The setRequestProperty is really random in this code because I'm still testing.

Java - Read page source from url returns unknown characters

I am using the code below to read a page source from url (https://www.amazon.com) with "UTF-8" charset in NetBeans, but it returns unknown characters (the attached image). I don't have any idea that what is the problem and would be gratefull if help me to modify the code to work properly? Thanks.
public static String getURLSource(String url) throws IOException
{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
private static String toString(InputStream inputStream) throws IOException
{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
{
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
Use HttpsUrlConnection instead of UrlConnection. See a similar question.
You just need to unzip your content. Here is the code that worked for me
HttpClient httpClient = new HttpClient();
try {
httpClient.setConnectionUrl("https://www.amazon.com");
ByteBuffer buff = httpClient.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11")
.sendHttpRequestForBinaryResponse(HttpClient.HttpMethod.GET);
try (
ByteArrayInputStream bais = new ByteArrayInputStream(buff.array());
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader isr = new InputStreamReader(gzis);
BufferedReader br = new BufferedReader(isr)
) {
br.lines().forEach(line -> System.out.println(line));
}
} catch (Exception e) {
System.out.println(httpClient.getLastResponseCode() + " "
+ httpClient.getLastResponseMessage() + TextUtils.getStacktrace(e, false));
}
Just few clarifications: In this example I use a 3d party Http client class HttpClient (And also class TextUtils). They both come from Open source MgntUtils library writen and maintained by me. But you don't have to use it. The main part is - read the info from the InputStream as binary info (as byte array or ByteBuffer) and than and unzip it with GZIPInputStream like in my example.
If you do want to use MgntUtils library you can get it As maven artifact or from Github (including source code and Javadoc). and here is Javadoc online

403 forbidden for url in java but not in browser

I'm behind a corporate firewall, but i can paste the URL in my browser with and without my proxy settings enabled within the browser and can retrieve the data fine. I just can't within java.
Any ideas?
Code:
private static String getURLToString(String strUrl) throws IOException {
// LOG.debug("Calling URL: [" + strUrl + "]");
String content = "";
URLConnection connection = new URL(strUrl).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
String inputLine;
while ((inputLine = br.readLine()) != null) {
content += inputLine;
}
br.close();
return content;
}
Error:
java.io.FileNotFoundException: Response: '403: Forbidden' for url: '<url here>'
at weblogic.net.http.HttpURLConnection.getInputStream(HttpURLConnection.java:778)
at weblogic.net.http.SOAPHttpURLConnection.getInputStream(SOAPHttpURLConnection.java:37)
Note: The '' portion is for anonymizing.
As you are receiving a "403: Forbidden" error, it means that your Java code can reach the URL, but it lacks something that is required to access it.
In the browser, press F12 (developer/debug mode) and request the URL again. Check the headers and cookies that are being sent. Most likely you will need to add one of these for you to be able to receive the content you need.
Adding "User-Agent" header fixed it for me:
connection.setRequestProperty("User-Agent", "Mozilla/5.0");

Java HttpURLConnection trying to login with cookie

So am trying to login this website using java but for some reason its not working as expected i got the host and all that stuff but its not going to the account page with the cookie it still shows the login page and yes my account info is correct any help is great
public static void main(String[] args) {
try {
String params = "loginEmail=private#hotmail.com&loginPassword=privatepassword&Submit=Sign+In";
String urls = "http://www.filefactory.com/member/signin.php";
URL url = new URL(urls);
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
connection.setRequestProperty("Host", "www.filefactory.com");
connection.setRequestProperty("Referer", "http://www.filefactory.com/member/signin.php");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36 OPR/25.0.1614.50");
connection.setRequestProperty("Content-Length", "" +
Integer.toString(params.getBytes().length));
connection.setRequestProperty("Content-Language", "en-US");
connection.setDoInput(true);
connection.setDoOutput(true);
//Send request
DataOutputStream wr = new DataOutputStream (
connection.getOutputStream ());
wr.writeBytes (params);
wr.flush ();
wr.close ();
//Get Response
InputStream is = connection.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
String line;
StringBuilder response = new StringBuilder();
while((line = rd.readLine()) != null) {
response.append(line);
response.append('\r');
}
rd.close();
System.out.println(response.toString());
// get the cookie if need, for login
String cookies = connection.getHeaderField("Set-Cookie");
// open the new connnection again
connection = (HttpURLConnection) new URL("http://www.filefactory.com/account/").openConnection();
connection.setRequestProperty("Cookie", cookies);
connection.addRequestProperty("Accept-Language", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36 OPR/25.0.1614.50");
connection.addRequestProperty("Host", "www.filefactory.com");
System.out.println("Redirect to URL : " + "http://www.filefactory.com/account/");
BufferedReader in = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
String inputLine;
StringBuilder html = new StringBuilder();
while ((inputLine = in.readLine()) != null) {
html.append(inputLine);
}
in.close();
System.out.println("URL Content... \n" + html.toString());
System.out.println("Done");
} catch (Exception e) {
e.printStackTrace();
}
}
}
You are using : String cookies = connection.getHeaderField("Set-Cookie");
Are you sure there is only one entry for that header? There could be more.
http://en.wikipedia.org/wiki/HTTP_cookie
Try using chrome or firefox and try logging in manually to capture the request and response. That may give you some hints regarding what could be wrong.
Additionally you could use a tool to view the communication between your client and the server (unless you are already doing so)
It's hard to tell, not knowing the exact way that website works, but you should note that it sends you the cookies first when it presents the login page to you. When you send in your credentials you have to already send them together with the cookies, so that it knows to associate those credentials with this cookie.

Passing sessionId obtained from one response to the next request

I need to download a CSV file from Google insights programatically. Since it requires authentication, I used the clientLogin to get the session id.
How do I download the file by passing the session id as a cookie?
I tried using a new URLConnection object and set the cookie in setRequestParameter method hoping it would authenticate my login then, however it doesn't seem to be working. I have a feeling I shouldn't use two separate connections, is that true?
If so then how do I pass session id as parameter when i download the file? I also tried using the same connection this didn't work either. Please help.
try {
URL url1 = new URL("https://www.google.com/accounts/ClientLogin?accountType=GOOGLE&Email=*******.com&Passwd=*****&service=trendspro&source=test-test-v1");
URL url2 = new URL("http://www.google.com/insights/search/overviewReport?cat=0-7&geo=BR&cmpt=geo&content=1&export=1");
URLConnection conn = url1.openConnection();
// fake request coming from browser
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
String f = in.readLine();
// obtaining the sid.
String sid=f.substring(4);
System.out.println(sid);
URLConnection conn2 = url2.openConnection();
conn2.setRequestProperty("Cookie", sid);
BufferedInputStream i= new BufferedInputStream(conn2.getInputStream());
FileOutputStream fos = new FileOutputStream("f:/testplans.csv");
BufferedOutputStream bout = new BufferedOutputStream(fos,1024);
byte data[] = new byte[1024];
while(i.read(data,0,1024)>=0) {
bout.write(data);
}
bout.close();
in.close();
}
Try the following: link. Check the top answer: they don't use the SID, but the Auth.
If it's working for Google Reader, it will probably work for Google Insights as well.

Categories