JSOUP Blocked: Returning HTTPStatusException - java

I keep getting HTTPStatusExceptions, whether it be 500/502/503, even 522. I'm guessing my IP has been blacklisted? What options do I have to ameliorate this?
I've noticed that while actually browsing the site, it takes forever to load, and that trying to view the source code actually times-out.
public Document getTPBDocument(String searchField) throws IOException {
Connection.Response response = Jsoup.connect("https://thepiratebay.org/search/" + searchField + "/0/99/0").userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.referrer("http://www.google.com")
.timeout(30000)
.followRedirects(true)
.header("Content-Type", "application/json;charset=UTF-8")
.execute();
return response.parse();
}

Related

How to Automate login to a webpage using jsoup in android?

I was trying to make an automatic login for my users to http://www.bvrit.edu.in using jsoup and the display the logged in web page for my users using a webview.I added the jsoup API,checked using inspect elements that the id of the usename field is txtId1 and password is txtPwd1 and replace data in post with the respective names.I also added the internet access permission to manifest but i am not able to display the webpage my code is as shown below,i think i am getting some basics wrong but had not been able to figure it out.
public class MainActivity extends AppCompatActivity {
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
}
public void main(String[] args) throws IOException {
WebView browser = (WebView) findViewById(R.id.bvritWebview);
Connection.Response loginForm = Jsoup.connect("http://www.bvrit.edu.in/")
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.timeout(12000)
.followRedirects(true)
.method(Connection.Method.GET)
.execute();
Connection.Response loginFormFilled = Jsoup.connect("http://www.bvrit.edu.in/")
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.followRedirects(true)
.referrer("https://login.to/")
.data("txtId1", "username")//check the form to find field name for user name
.data("txtPwd1", "password")//check the form to find field name for user password
.cookies(loginForm.cookies())
.method(Connection.Method.POST)
.execute();
int statusCode = loginFormFilled.statusCode();
Map<String, String> cookies = loginFormFilled.cookies();
browser.getSettings().setJavaScriptEnabled(true);
browser.loadUrl("http://www.bvrit.edu.in");
}
}
header section of the networking after logigng in-
You are missing few parameters in your POST request. Load the page in your browser, press F12 to launch the developer tools and look at the POST request. You will see something like this -
You must send all these parameters to the server, not just those you do send now.
The first 3 parameters are unique to each session, and you can get them form the first GET request, something like this (the CSS selector may be different, I didn't try it on your URL):
Document doc = loginForm.parse();
Element e = doc.select("input[id=__VIEWSTATE]").first();
String viewState = e.attr("value");

java.io.IOException: 403 error loading URL jsoup login

help me with jsoup login, i have to login, then redirect from login page to another page with saved session.
public void parseXhtml() throws IOException{
String sessionID=null;
Map<String, String> cookies = new HashMap<String, String>();
cookies.put("login", "login");
cookies.put("password", "password");
Connection conn=Jsoup.connect("http://localhost:8080/dir/login.xhtml");
Connection.Response res = Jsoup
.connect("http://localhost:8080/dir/login.xhtml")
.data(cookies)
.execute();
res = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.execute();
Document doc = res.parse();
System.out.println(doc.html());
sessionID = res.cookie("JSESSIONID");
Document docu = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Connection.Method.GET)
.get();
it throws the below exception
java.io.IOException: 403 error loading URL http://localhost:8080/dir/dir/index.xhtml
if i'm doing like [Sending POST request with username and password and save session cookie it throws the same exception.
ExternalContext ec=FacesContext.getCurrentInstance().getExternalContext();
HttpServletRequest req=(HttpServletRequest) ec.getRequest();
HttpSession sess=(HttpSession) ec.getSession(true);
String url = req.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
ec.setRequest("http://localhost:8080/dir/dir/index.xhtml");
HttpServletRequest req2=(HttpServletRequest) ec.getRequest();
String url2 = req2.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
Document doc2=Jsoup.connect(url2).get();
System.out.println(doc2.html());
Finally i got it, I do not know if it is right, but works

Login to website through Jsoup post method not working

I have the following code which I am using to login to a website programmatically. However, instead of returning the logged in page's html (with user data info), it returns the html for the login page. I have tried to find what's going wrong multiple times but I can't seem to find it.
public class LauncherClass {
static String username = "----username here------"; //blocked out here for obvious reasons
static String password = "----password here------";
static String loginUrl = "https://parents.mtsd.k12.nj.us/genesis/parents/j_security_check";
static String userDataUrl = "https://parents.mtsd.k12.nj.us/genesis/parents?module=gradebook";
public static void main(String[] args) throws IOException{
LauncherClass launcher = new LauncherClass();
launcher.Login(loginUrl, username, password);
}
public void Login(String url, String username, String password) throws IOException {
Connection.Response res = Jsoup
.connect(url)
.data("j_username",username,"j_password",password)
.followRedirects(true)
.ignoreHttpErrors(true)
.method(Method.POST)
.userAgent("Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.4 Safari/537.36")
.timeout(500)
.execute();
Map <String,String> cookies = res.cookies();
Document loggedIn = Jsoup.connect(userDataUrl)
.cookies(cookies)
.get();
System.out.print(loggedIn);
}
}
[NOTE] The login form does have a line:
<input type="submit" class="saveButton" value="Login">
but this does not have a "name" attribute so I did not post it
Any answers/comments are appreciated!
[UPDATE2] For the login page, browser displays the following...
---General
Remote Address:107.0.42.212:443
Request URL:https://parents.mtsd.k12.nj.us/genesis/j_security_check
Request Method:POST
Status Code:302 Found
----Response Headers
view source
Content-Length:0
Date:Sun, 26 Jul 2015 20:06:15 GMT
Location:https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true
Server:Apache-Coyote/1.1
----Request Headers
view source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:51
Content-Type:application/x-www-form-urlencoded
Cookie:JSESSIONID=33C445158EB6CCAFFF77D2873FD66BC0; lastvisit=458D80553DC34ADD8DB232B5A8FC99CA
Host:parents.mtsd.k12.nj.us
HTTPS:1
Origin:https://parents.mtsd.k12.nj.us
Referer:https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.4 Safari/537.36
----Form Data
j_username: ---username here---
j_password: ---password here---
You have to login to the site in two stages.
STAGE 1 -
You send a GET request to this URL - https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true and you get the session cookies.
STAGE 2 -
You send a post request with your username and password, and add the cookies you got on stage 1.
The code for that is -
Connection.Response res = null;
Document doc = null;
try { //first connection with GET request
res = Jsoup.connect("https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true")
// .userAgent(YourUserAgent)
// .header("Accept", WhateverTheSiteSends)
// .timeout(Utilities.timeout)
.method(Method.GET)
.execute();
} catch (Exception ex) {
//Do some exception handling here
}
try {
doc = Jsoup.connect("https://parents.mtsd.k12.nj.us/genesis/parents/j_security_check"")
// .userAgent(YourUserAgent)
// .referrer(Referer)
// .header("Content-Type", ...)
.cookies(res.cookies())
.data("j_username",username)
.data("j_password",password)
.post();
} catch (Exception ex) {
//Do some exception handling here
}
//Now you can use doc!
You may have to add for both requests different HEADERS such as userAgent, referrer, content-type and so on. At the end of the second request, doc should have the HTML of the site.
The reason that you cannot login to the site is that you are sending the post request without the session cookies, so it's an invalid request from the server.

Logging in to a website with Jsoup which redirects, and scraping a page that isn't the redirect

This is the website I'm trying to scrape from.
I'm able to login to the website fairly easily. However, I'm unable to retrieve and reuse the cookies or session ID to scrape a page other than the one the login page redirects to. I receive a 403 everytime.
Here is an example of what I've tried:
try
{
String userAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0";
Connection.Response res = Jsoup.connect("http://www.interpals.net/login.php")
.data("action", "login")
.data("login", username)
.data("password", password)
.data("auto_login", "1")
.userAgent(userAgent)
.method(Connection.Method.POST)
.followRedirects(false)
.execute();
res.parse();
String sessionID = res.cookie("interpals_sessid");
Document doc = Jsoup.connect("http://www.interpals.net/friends.php").cookie("interpals_sessid", sessionID).get();
This code works for me:
try {
String url = "http://www.interpals.net/login.php";
String userAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36";
Connection.Response response = Jsoup.connect(url).userAgent(userAgent)
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect(url)
.cookies(response.cookies())
.data("action", "login")
.data("login", "login")
.data("password", "password")
.data("auto_login", "1")
.userAgent(userAgent)
.method(Connection.Method.POST)
.followRedirects(true)
.execute();
Document doc = Jsoup.connect("http://www.interpals.net/friends.php")
.cookies(response.cookies())
.userAgent(userAgent)
.get();
System.out.println(doc);
} catch (IOException e) {
e.printStackTrace();
}

Login to Facebook with Jsoup and proper cookies

I am currently trying to automatically scrap my own home page and possibly other pages that I have access to when logged in to facebook. However I can't seem to be "logged" in after using the code below and setting the cookie.
Connection.Response res = Jsoup.connect("http://www.facebook.com/login.php?login_attempt=1")
.data("email", "#####", "pass", "#####").userAgent("Mozilla")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
try{
Document doc2 = Jsoup.connect("https://www.facebook.com/")
.cookies(cookies).post();
System.out.println(doc2.text());
}
catch(Exception e){
e.printStackTrace();
}
When I do this it will just send me back the facebook home page as though I were not logged in. When I try my own about page I get
HTTP error fetching URL. Status=404
Am I doing something wrong here? Are there other fields I need to set?
I found another post that said other fields need to be set but I haven't the slightest clue how to find this information. The post can be found here Login Facebook via Jsoup
Any advice would be helpful. Thank you in advanced.
You can see a example here: How to Login on Facebook w/ Jsoup
public static void main(String[] args) {
Response req;
try {
String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36";
req = Jsoup.connect("https://m.facebook.com/login/async/?refsrc=https%3A%2F%2Fm.facebook.com%2F&lwv=100")
.userAgent(userAgent)
.method(Method.POST).data("email", "YOUR_EMAIL").data("pass", "YOUR_PASSWORD")
.followRedirects(true).execute();
Document d = Jsoup.connect("https://m.facebook.com/profile.php?ref=bookmarks").userAgent(userAgent)
.cookies(req.cookies()).get();
System.out.println(d.body().text().toString());
} catch (Exception e) {
e.printStackTrace();
}
}

Categories