Im struggling getting java submitting POST requests over HTTPS
Code used is here
try{
Response res = Jsoup.connect(LOGIN_URL)
.data("username", "blah", "password", "blah")
.method(Method.POST)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0")
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.execute();
System.out.println(res.body());
System.out.println("Code " +res.statusCode());
}
catch (Exception e){
System.out.println(e.getMessage());
}
and also this
Document doc = Jsoup.connect(LOGIN_URL)
.data("username", "blah")
.data("password", "blah")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0")
.header("Content-type", "application/x-www-form-urlencoded")
.method(Method.POST)
.timeout(3000)
.post();
Where LOGIN_URL = https://xxx.com/Login?val=login
When used over HTTP it seems to work, HTTPS it doesnt, But doesnt throw any exceptions
How can I POST over HTTPS
Edit:
seems there is a 302 redirect involved when the server gets a POST over HTTPS (which doesnt happen over http) How can I use jsoup to store the cookie sent with the 302 to the next page ?
this is my code:
URL form = new URL(Your_url);
connection1 = (HttpURLConnection)form.openConnection();
connection1.setRequestProperty("Cookie", your_cookie);
connection1.setReadTimeout(10000);
StringBuilder whole = new StringBuilder();
BufferedReader in = new BufferedReader(
new InputStreamReader(new BufferedInputStream(connection1.getInputStream())));
String inputLine;
while ((inputLine = in.readLine()) != null)
whole.append(inputLine);
in.close();
Document doc = Jsoup.parse(whole.toString());
String title = doc.title();
i have used this code to get the title of the new page.
Here is what you can try...
import org.jsoup.Connection;
Connection.Response res = null;
try {
res = Jsoup
.connect("your-first-page-link")
.data("username", "blah", "password", "blah")
.method(Connection.Method.POST)
.execute();
} catch (IOException e) {
e.printStackTrace();
}
Now save all your cookies and make a request to the other page you want.
//Saving Cookies
cookies = res.cookies();
Making a request to another page.
try {
Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
e.printStackTrace();
}
Comment if further help needed.
Related
I am trying to login to a website using JSoup, my goal is scrape some data from the website but I am having some problems with the logging in/navigating.
See the code below for how the code currently looks like.
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}
I have also tried adding
.data("commit", "Log in")
and
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
without any success.
The error I get is as follow:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=https://app.northpass.com/login
From what I have read on other threads, people suggest using a userAgent (which, as said above, I have already tried). Thanks in advance for any help.
If you look at the network traffic when you attempt a login in your browser you'll see that an additional item of data is sent: authenticity_token. This is a hidden field in the form.
You will need then to extract that from the initial response and send it with the POST request:
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
//I can't test this but will be something like
//see https://jsoup.org/cookbook/extracting-data/selector-syntax
Document document = response.parse();
String token = document.select("input[hidden]").first().val();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.data("authenticity_token", token)
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}
for the last 3 weeks i was to trying to write a programm, which logs onto a website and loops through the pages to filter specific informations (specifis rows/columns of tables). To be fair, this programm is the reason which i though myself coding (in java). I created some kind of an autofiller, which works, but is very slow, since it has to login for every page again. Therefore i was thinking, why my first (following) program isn't working. For some reason im able to log in, but as soon as i switch from the login page to the specific page (which is only accessable when logged in), i am being redirected to the login page.
For the purpose of this question i created a fake account. Maybe someone can help or tell where, where i can read further into this topic. I guess there is a problem with the cookies, though im not sure.
try {
String url1 = "https://www.novaragnarok.com/";
String url2 = "https://www.novaragnarok.com/?module=vending&action=item&id=2499";
Connection.Response res = Jsoup
.connect(url1)
.followRedirects(true)
.data("username", "stackoverflowww", "password", "stackpw")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
Document doc = Jsoup.connect(url2)
.cookies(cookies)
.followRedirects(true)
.get();
System.out.println(cookies);
System.out.println(doc);
} catch (IOException e) {
e.printStackTrace();
}
Try this
String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)" +
" Chrome/56.0.2924.87 Safari/537.36";
public void parseWebsite(){
try{
Connection.Response homepage = Jsoup.connect("https://www.novaragnarok.com/").userAgent(USER_AGENT)
.method(Connection.Method.GET).timeout(6000).execute();
Connection.Response login = Jsoup.connect("https://www.novaragnarok.com//?module=account&action=login&return_url=")
.cookies(homepage.cookies()).data("txtbox", "stackoverflowww")
.data("password", "stackpw").userAgent(USER_AGENT).method(Connection.Method.POST)
.timeout(6000).execute();
Connection.Response url2 = Jsoup.connect("https://www.novaragnarok.com/?module=vending&action=item&id=2499")
.cookies(login.cookies()).userAgent(USER_AGENT).method(Connection.Method.GET).timeout(6000).execute();
//Your Code here
}catch (SocketException e){
e.printStackTrace();
}
catch (UncheckedIOException e){
e.printStackTrace();
}
catch(Exception e){
}
}
}
This is the website I'm trying to scrape from.
I'm able to login to the website fairly easily. However, I'm unable to retrieve and reuse the cookies or session ID to scrape a page other than the one the login page redirects to. I receive a 403 everytime.
Here is an example of what I've tried:
try
{
String userAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0";
Connection.Response res = Jsoup.connect("http://www.interpals.net/login.php")
.data("action", "login")
.data("login", username)
.data("password", password)
.data("auto_login", "1")
.userAgent(userAgent)
.method(Connection.Method.POST)
.followRedirects(false)
.execute();
res.parse();
String sessionID = res.cookie("interpals_sessid");
Document doc = Jsoup.connect("http://www.interpals.net/friends.php").cookie("interpals_sessid", sessionID).get();
This code works for me:
try {
String url = "http://www.interpals.net/login.php";
String userAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36";
Connection.Response response = Jsoup.connect(url).userAgent(userAgent)
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect(url)
.cookies(response.cookies())
.data("action", "login")
.data("login", "login")
.data("password", "password")
.data("auto_login", "1")
.userAgent(userAgent)
.method(Connection.Method.POST)
.followRedirects(true)
.execute();
Document doc = Jsoup.connect("http://www.interpals.net/friends.php")
.cookies(response.cookies())
.userAgent(userAgent)
.get();
System.out.println(doc);
} catch (IOException e) {
e.printStackTrace();
}
I have the following code which will call the server through HttpUrlConnection.
String response = HttpUtil.submitRequest(json.toJSONString(), "http://ipaddr:port/SessionMgr/validateSession?sessionId=_78998348uthjae3a&showLoginPage=true");
The above lines will call the following code:
public static String submitRequest(String request, String **requestUrl**) {
try {
URL url = new URL(requestUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
OutputStream os = conn.getOutputStream();
os.write(request.getBytes());
os.flush();
if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) {
throw new RuntimeException("Failed : HTTP error code : "
+ conn.getResponseCode());
}
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
StringBuffer sb = new StringBuffer();
while ((output = br.readLine()) != null) {
sb.append(output);
}
conn.disconnect();
return sb.toString();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
return "";
}
The requestUrl will go to the servlet below:
public class ValidateSessionServlet extends HttpServlet {
String session = req.getParameter(sessionId);
if (session == null) {
// redirect to servlet which will display login page.
response.setContentType("text/html");
String actionUrl = getIpPortUrl(request)
+ PropertyConfig.getInstance().getFromIdPConfig(globalStrings.getCheckSSOSession());
out.write("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"> \n");
out.write("<html><head><body onload=\"document.forms[0].submit()\">\n");
out.write("<form method=\"POST\" action=\"" + actionUrl + "\">\n");
out.write("<input type=\"hidden\" name=\"locale\" value=\"" + locale + "\"/>\n");
out.write("<input type=\"hidden\" name=\"Sessionrequest\" value=\"" + true + "\"/>\n");
out.write("</form>\n</body>\n</html>\n");
}
}
In the above code the form should go to the servlet as mentioned in the actionUrl, but it is again going to servlet which is in step(1).
1) May i know can we make this above html form in step(3) to submitted and redirect to the servlet in actionUrl.
As per the above code i am summarizing the requirement. If the session is null, I have to redirect the user to login page and validated against database and then the response should go to step(1), Is it possible?
If you want your HttpUrlConnection to support redirections, you need to set your HttpUrlConnection like this:
...
conn.setRequestProperty("User-agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1");
conn.setInstanceFollowRedirects(true);
...
Then if your server redirect your request somewhere else, conn will receiver the redirected response.
To clarify, setInstanceFollowRedirects(true) only dictates whether HTTP redirects should be automatically followed by the HttpURLConnection instance. In your particular case, it seems that you want to redirect to servlets based on whether session is null (or some other condition based on your specific application logic).
The correct (and more bug-proof solution) is to check for HTTP 3xx status code cases and manually handle the redirect. Here is a code snippet as an example:
if (responseStatusCode != HttpURLConnection.HTTP_OK) {
switch(responseStatusCode){
case HttpURLConnection.HTTP_MOVED_TEMP:
// handle 302
case HttpURLConnection.HTTP_MOVED_PERM:
// handle 301
case HttpURLConnection.HTTP_SEE_OTHER:
String newUrl = conn.getHeaderField("Location"); // use redirect url from "Location" header field
String cookies = conn.getHeaderField("Set-Cookie"); // if cookies are needed (i.e. for login)
// manually redirect using a new connection
conn = (HttpURLConnection) new URL(newUrl).openConnection();
conn.setRequestProperty("Cookie", cookies);
conn.addRequestProperty("User-agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1");
default:
// handle default (other) case
}
}
The above code is similar to what I use for my app's user login redirects, and I've found that it's very easy to debug. (In general, I handle HTTP status code cases manually in order to avoid bugs down the road.)
Finally, I would recommend using a good JSON lib, such as json.org, to parse your responses.
I'm trying to save the cookies in a URL that uses SSL but always return NULL.
private Map<String, String> cookies = new HashMap<String, String>();
private Document get(String url) throws IOException {
Connection connection = Jsoup.connect(url);
for (Entry<String, String> cookie : cookies.entrySet()) {
connection.cookie(cookie.getKey(), cookie.getValue());
}
Response response = connection.execute();
cookies.putAll(response.cookies());
return response.parse();
}
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup .connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.response();
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso? numeroProcesso=" + campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
I try to capture the cookies the first connection, but always they are always set to NULL. I think it might be some Cois because of the secure connection (HTTPS)
Any idea?
The issue is not HTTPS. The problem is more or less a small mistake.
To fix your issue you can simply replace .response() with .execute(). Like so,
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup
.connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.execute(); // changed fron response()
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso?numeroProcesso="+campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
Overall you have to make sure you execute the request first.
Calling .response() is useful to grab the Connection.Response object after the request has already been executed. Obviously the Connection.Response object won't be very useful if you haven't executed the request.
In fact if you were to try calling res.body() on the unexecuted response you would receive the following exception indicating the issue.
java.lang.IllegalArgumentException: Request must be executed (with .execute(), .get(), or .post() before getting response body