Logging in and navigating using Jsoup - java

I am trying to login to a website using JSoup, my goal is scrape some data from the website but I am having some problems with the logging in/navigating.
See the code below for how the code currently looks like.
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}
I have also tried adding
.data("commit", "Log in")
and
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
without any success.
The error I get is as follow:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=https://app.northpass.com/login
From what I have read on other threads, people suggest using a userAgent (which, as said above, I have already tried). Thanks in advance for any help.

If you look at the network traffic when you attempt a login in your browser you'll see that an additional item of data is sent: authenticity_token. This is a hidden field in the form.
You will need then to extract that from the initial response and send it with the POST request:
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
//I can't test this but will be something like
//see https://jsoup.org/cookbook/extracting-data/selector-syntax
Document document = response.parse();
String token = document.select("input[hidden]").first().val();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.data("authenticity_token", token)
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}

Related

Auto login on website, stay logged in, and parse with Jsoup (java)

for the last 3 weeks i was to trying to write a programm, which logs onto a website and loops through the pages to filter specific informations (specifis rows/columns of tables). To be fair, this programm is the reason which i though myself coding (in java). I created some kind of an autofiller, which works, but is very slow, since it has to login for every page again. Therefore i was thinking, why my first (following) program isn't working. For some reason im able to log in, but as soon as i switch from the login page to the specific page (which is only accessable when logged in), i am being redirected to the login page.
For the purpose of this question i created a fake account. Maybe someone can help or tell where, where i can read further into this topic. I guess there is a problem with the cookies, though im not sure.
try {
String url1 = "https://www.novaragnarok.com/";
String url2 = "https://www.novaragnarok.com/?module=vending&action=item&id=2499";
Connection.Response res = Jsoup
.connect(url1)
.followRedirects(true)
.data("username", "stackoverflowww", "password", "stackpw")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
Document doc = Jsoup.connect(url2)
.cookies(cookies)
.followRedirects(true)
.get();
System.out.println(cookies);
System.out.println(doc);
} catch (IOException e) {
e.printStackTrace();
}
Try this
String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)" +
" Chrome/56.0.2924.87 Safari/537.36";
public void parseWebsite(){
try{
Connection.Response homepage = Jsoup.connect("https://www.novaragnarok.com/").userAgent(USER_AGENT)
.method(Connection.Method.GET).timeout(6000).execute();
Connection.Response login = Jsoup.connect("https://www.novaragnarok.com//?module=account&action=login&return_url=")
.cookies(homepage.cookies()).data("txtbox", "stackoverflowww")
.data("password", "stackpw").userAgent(USER_AGENT).method(Connection.Method.POST)
.timeout(6000).execute();
Connection.Response url2 = Jsoup.connect("https://www.novaragnarok.com/?module=vending&action=item&id=2499")
.cookies(login.cookies()).userAgent(USER_AGENT).method(Connection.Method.GET).timeout(6000).execute();
//Your Code here
}catch (SocketException e){
e.printStackTrace();
}
catch (UncheckedIOException e){
e.printStackTrace();
}
catch(Exception e){
}
}
}

Login to website through Jsoup post method not working

I have the following code which I am using to login to a website programmatically. However, instead of returning the logged in page's html (with user data info), it returns the html for the login page. I have tried to find what's going wrong multiple times but I can't seem to find it.
public class LauncherClass {
static String username = "----username here------"; //blocked out here for obvious reasons
static String password = "----password here------";
static String loginUrl = "https://parents.mtsd.k12.nj.us/genesis/parents/j_security_check";
static String userDataUrl = "https://parents.mtsd.k12.nj.us/genesis/parents?module=gradebook";
public static void main(String[] args) throws IOException{
LauncherClass launcher = new LauncherClass();
launcher.Login(loginUrl, username, password);
}
public void Login(String url, String username, String password) throws IOException {
Connection.Response res = Jsoup
.connect(url)
.data("j_username",username,"j_password",password)
.followRedirects(true)
.ignoreHttpErrors(true)
.method(Method.POST)
.userAgent("Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.4 Safari/537.36")
.timeout(500)
.execute();
Map <String,String> cookies = res.cookies();
Document loggedIn = Jsoup.connect(userDataUrl)
.cookies(cookies)
.get();
System.out.print(loggedIn);
}
}
[NOTE] The login form does have a line:
<input type="submit" class="saveButton" value="Login">
but this does not have a "name" attribute so I did not post it
Any answers/comments are appreciated!
[UPDATE2] For the login page, browser displays the following...
---General
Remote Address:107.0.42.212:443
Request URL:https://parents.mtsd.k12.nj.us/genesis/j_security_check
Request Method:POST
Status Code:302 Found
----Response Headers
view source
Content-Length:0
Date:Sun, 26 Jul 2015 20:06:15 GMT
Location:https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true
Server:Apache-Coyote/1.1
----Request Headers
view source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:51
Content-Type:application/x-www-form-urlencoded
Cookie:JSESSIONID=33C445158EB6CCAFFF77D2873FD66BC0; lastvisit=458D80553DC34ADD8DB232B5A8FC99CA
Host:parents.mtsd.k12.nj.us
HTTPS:1
Origin:https://parents.mtsd.k12.nj.us
Referer:https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.4 Safari/537.36
----Form Data
j_username: ---username here---
j_password: ---password here---
You have to login to the site in two stages.
STAGE 1 -
You send a GET request to this URL - https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true and you get the session cookies.
STAGE 2 -
You send a post request with your username and password, and add the cookies you got on stage 1.
The code for that is -
Connection.Response res = null;
Document doc = null;
try { //first connection with GET request
res = Jsoup.connect("https://parents.mtsd.k12.nj.us/genesis/parents?gohome=true")
// .userAgent(YourUserAgent)
// .header("Accept", WhateverTheSiteSends)
// .timeout(Utilities.timeout)
.method(Method.GET)
.execute();
} catch (Exception ex) {
//Do some exception handling here
}
try {
doc = Jsoup.connect("https://parents.mtsd.k12.nj.us/genesis/parents/j_security_check"")
// .userAgent(YourUserAgent)
// .referrer(Referer)
// .header("Content-Type", ...)
.cookies(res.cookies())
.data("j_username",username)
.data("j_password",password)
.post();
} catch (Exception ex) {
//Do some exception handling here
}
//Now you can use doc!
You may have to add for both requests different HEADERS such as userAgent, referrer, content-type and so on. At the end of the second request, doc should have the HTML of the site.
The reason that you cannot login to the site is that you are sending the post request without the session cookies, so it's an invalid request from the server.

Login to Facebook with Jsoup and proper cookies

I am currently trying to automatically scrap my own home page and possibly other pages that I have access to when logged in to facebook. However I can't seem to be "logged" in after using the code below and setting the cookie.
Connection.Response res = Jsoup.connect("http://www.facebook.com/login.php?login_attempt=1")
.data("email", "#####", "pass", "#####").userAgent("Mozilla")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
try{
Document doc2 = Jsoup.connect("https://www.facebook.com/")
.cookies(cookies).post();
System.out.println(doc2.text());
}
catch(Exception e){
e.printStackTrace();
}
When I do this it will just send me back the facebook home page as though I were not logged in. When I try my own about page I get
HTTP error fetching URL. Status=404
Am I doing something wrong here? Are there other fields I need to set?
I found another post that said other fields need to be set but I haven't the slightest clue how to find this information. The post can be found here Login Facebook via Jsoup
Any advice would be helpful. Thank you in advanced.
You can see a example here: How to Login on Facebook w/ Jsoup
public static void main(String[] args) {
Response req;
try {
String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36";
req = Jsoup.connect("https://m.facebook.com/login/async/?refsrc=https%3A%2F%2Fm.facebook.com%2F&lwv=100")
.userAgent(userAgent)
.method(Method.POST).data("email", "YOUR_EMAIL").data("pass", "YOUR_PASSWORD")
.followRedirects(true).execute();
Document d = Jsoup.connect("https://m.facebook.com/profile.php?ref=bookmarks").userAgent(userAgent)
.cookies(req.cookies()).get();
System.out.println(d.body().text().toString());
} catch (Exception e) {
e.printStackTrace();
}
}

jsoup posting Java

Im struggling getting java submitting POST requests over HTTPS
Code used is here
try{
Response res = Jsoup.connect(LOGIN_URL)
.data("username", "blah", "password", "blah")
.method(Method.POST)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0")
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.execute();
System.out.println(res.body());
System.out.println("Code " +res.statusCode());
}
catch (Exception e){
System.out.println(e.getMessage());
}
and also this
Document doc = Jsoup.connect(LOGIN_URL)
.data("username", "blah")
.data("password", "blah")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0")
.header("Content-type", "application/x-www-form-urlencoded")
.method(Method.POST)
.timeout(3000)
.post();
Where LOGIN_URL = https://xxx.com/Login?val=login
When used over HTTP it seems to work, HTTPS it doesnt, But doesnt throw any exceptions
How can I POST over HTTPS
Edit:
seems there is a 302 redirect involved when the server gets a POST over HTTPS (which doesnt happen over http) How can I use jsoup to store the cookie sent with the 302 to the next page ?
this is my code:
URL form = new URL(Your_url);
connection1 = (HttpURLConnection)form.openConnection();
connection1.setRequestProperty("Cookie", your_cookie);
connection1.setReadTimeout(10000);
StringBuilder whole = new StringBuilder();
BufferedReader in = new BufferedReader(
new InputStreamReader(new BufferedInputStream(connection1.getInputStream())));
String inputLine;
while ((inputLine = in.readLine()) != null)
whole.append(inputLine);
in.close();
Document doc = Jsoup.parse(whole.toString());
String title = doc.title();
i have used this code to get the title of the new page.
Here is what you can try...
import org.jsoup.Connection;
Connection.Response res = null;
try {
res = Jsoup
.connect("your-first-page-link")
.data("username", "blah", "password", "blah")
.method(Connection.Method.POST)
.execute();
} catch (IOException e) {
e.printStackTrace();
}
Now save all your cookies and make a request to the other page you want.
//Saving Cookies
cookies = res.cookies();
Making a request to another page.
try {
Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
e.printStackTrace();
}
Comment if further help needed.

How to retrieve cookies on a https connection?

I'm trying to save the cookies in a URL that uses SSL but always return NULL.
private Map<String, String> cookies = new HashMap<String, String>();
private Document get(String url) throws IOException {
Connection connection = Jsoup.connect(url);
for (Entry<String, String> cookie : cookies.entrySet()) {
connection.cookie(cookie.getKey(), cookie.getValue());
}
Response response = connection.execute();
cookies.putAll(response.cookies());
return response.parse();
}
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup .connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.response();
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso? numeroProcesso=" + campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
I try to capture the cookies the first connection, but always they are always set to NULL. I think it might be some Cois because of the secure connection (HTTPS)
Any idea?
The issue is not HTTPS. The problem is more or less a small mistake.
To fix your issue you can simply replace .response() with .execute(). Like so,
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup
.connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.execute(); // changed fron response()
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso?numeroProcesso="+campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
Overall you have to make sure you execute the request first.
Calling .response() is useful to grab the Connection.Response object after the request has already been executed. Obviously the Connection.Response object won't be very useful if you haven't executed the request.
In fact if you were to try calling res.body() on the unexecuted response you would receive the following exception indicating the issue.
java.lang.IllegalArgumentException: Request must be executed (with .execute(), .get(), or .post() before getting response body

Categories