How could I accept cookies with jsoup? - java

I'm trying to scrape comments from a digital newspaper. I'm doing it in Java with Jsoup. The problem is I've to accept cookies from the site to search notices... The site is https://elpais.com/buscador/
I've tryied connecting with Jsoup.connect(url) and taking the cookies from the response to do another request with those cookies but I didn't succedd.
Connection conexion = null;
Document doc = null;
Response response = null;
Map<String, String> cookies = new TreeMap<String, String>();
try {
// First request.
Connection connection1 = Jsoup.connect(url);
for (Entry<String, String> cookie : cookies.entrySet()) {
connection1.cookie(cookie.getKey(), cookie.getValue());
}
Response response1 = connection1.execute();
cookies.putAll(response1.cookies());
// Second request.
Connection connection2 = Jsoup.connect(url);
for (Entry<String, String> cookie : cookies.entrySet()) {
connection2.cookie(cookie.getKey(), cookie.getValue());
}
Response response2 = connection2.execute();
cookies.putAll(response2.cookies());
doc = response2.parse();
} catch (IOException ex) {
System.out.println("Exception raised: " + ex.getMessage());
}
return doc;
I've seen this code from some posts but it didn't help me :(
I would like to request the url "https://elpais.com/buscador/?qt=Espa%C3%B1a+Politica&sf=1&np=1&bu=ep&of=html" and get the response with the notices.
Thank you.

Related

Set simple string header (not key/value pair) in Volley HTTP POST request

EDIT: It looks like the main issue is setting a string-only header without key/value pairs and associated separators, since running a manual curl request with no "="s got me a server response, therefore I've edited the title.
I am trying to send a POST request to authenticate as described in this Amazon Alexa tutorial, from an Android app, using Volley.
For the second request I am sending (which requires a header), I receive a 400 server error, indicating a bad request.
According to the tutorial page, this is what the request header should look like:
The request must include the following headers:
POST /auth/o2/token
Host: api.amazon.com
Content-Type: application/x-www-form-urlencoded
If I use the regular getHeaders() method for the Volley Request class to override the headers, I can only set a hashmap, which results in the following header format:
{Host=api.amazon.com, Content-Type=application/x-www-form-urlencoded}
(or {POST=/auth/o2/token, Host=api.amazon.com, Content-Type=application/x-www-form-urlencoded} if I include another line for the first bit)
Being new to Volley in general, I wonder if I'm missing something really obvious here. This is the request I am sending:
StringRequest tokenPoller = new StringRequest(
Request.Method.POST,
"https://api.amazon.com/auth/O2/token",
new Response.Listener<String>() {
#Override
public void onResponse(String response) {
Log.i("volley", response);
}
},
new Response.ErrorListener() {
#Override
public void onErrorResponse(VolleyError error) {
VolleyLog.d("volley", "Error: " + error.getMessage());
error.printStackTrace();
}
}) {
#Override
public String getBodyContentType() {
return "application/x-www-form-urlencoded";
}
#Override
protected Map<String, String> getParams() throws AuthFailureError {
Map<String, String> params = new HashMap<String, String>();
params.put("grant_type", "device_code");
params.put("device_code", {{my device code from previous request}});
params.put("user_code", {{my user code from previous request}});
return params;
}
#Override
public Map<String, String> getHeaders() throws AuthFailureError {
Map<String, String> headers = new HashMap<String, String>();
headers.put("Host", "api.amazon.com");
headers.put("Content-Type", "application/x-www-form-urlencoded");
return headers;
}
};
I suspect that something about the headers is off, but I really can't put my finger on it. I've tried not overriding the headers too, to no avail. Any pointers will be highly appreciated! Thanks!
The 400 response in this case did not mean that my request was malformed. It was just the response code the server returns with all errors connected to the authorization process, in my case e.g. authorization_pending. Since I wasn't requesting the error message (via Volley or otherwise), I did not see that until much later.
While the tutorial does mention that several kinds of responses can be delivered while polling for a token, it does not mention that they are in fact error and error description for a 400 response.
I ended up switching from Volley to HttpsUrlConnection, and used this question's responses to receive the error and error message as a json response and implement the respective reactions.
In the end, my HTTP request looked like this:
String stringUrl = "https://api.amazon.com/auth/o2/token";
URL url = null;
try {
url = new URL(stringUrl);
} catch (MalformedURLException exception) {
Log.e(TAG, "Error with creating URL", exception);
}
String response = "";
HttpsURLConnection conn = null;
HashMap<String, String> params = new HashMap<String, String>();
String scope_data = null;
try {
params.put("grant_type", "device_code");
params.put("device_code", mDeviceCode);
params.put("user_code", mUserCode);
Set set = params.entrySet();
Iterator i = set.iterator();
StringBuilder postData = new StringBuilder();
for (Map.Entry<String, String> param : params.entrySet()) {
if (postData.length() != 0) {
postData.append('&');
}
postData.append(URLEncoder.encode(param.getKey(), "UTF-8"));
postData.append('=');
postData.append(URLEncoder.encode(String.valueOf(param.getValue()), "UTF-8"));
}
byte[] postDataBytes = postData.toString().getBytes("UTF-8");
conn = (HttpsURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
conn.setDoOutput(true);
conn.getOutputStream().write(postDataBytes);
InputStream inputStream = null;
try {
inputStream = conn.getInputStream();
} catch(IOException exception) {
inputStream = conn.getErrorStream();
}
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder builder = new StringBuilder();
for (String line = null; (line = reader.readLine()) != null;) {
builder.append(line).append("\n");
}
reader.close();
conn.disconnect();
response = builder.toString();

Calling APIGateway endpoint with HttpGet from Java

I'm trying to call an APIGateway endpoint using a HttpGet in Java. Here's what my code looks like:
try {
CloseableHttpClient httpClient = HttpClients.createDefault();
String HOST = "https://<resourceId>.execute-api.us-west-2.amazonaws.com/beta/update";
/* Prepare get request */
HttpGet httpGet = new HttpGet(HOST);
/* Add headers to get request */
httpGet.addHeader("Content-Type", "application/json");
httpGet.addHeader("host", HOST);
TreeMap<String, String> awsHeaders = new TreeMap<String, String>();
awsHeaders.put("host", HOST);
AWSV4Auth awsAuth = new AWSV4Auth.Builder("key","value")
.regionName("us-west-2")
.serviceName("execute-api") // es - elastic search. use your service name
.httpMethodName("GET") //GET, PUT, POST, DELETE, etc...
.debug()
.awsHeaders(awsHeaders)
.build();
Map<String, String> headers = awsAuth.getHeaders();
for (Map.Entry<String, String> entrySet : headers.entrySet()) {
httpGet.addHeader(entrySet.getKey(), entrySet.getValue());
}
System.out.println("Actual http headers");
List<Header> getHeaders = Arrays.asList(httpGet.getAllHeaders());
for (Header header: getHeaders) {
System.out.println(header.getName() + " : " + header.getValue());
}
HttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
String jsonResponse = EntityUtils.toString(entity, "UTF-8");
System.out.println("jsonResponse" + jsonResponse);
} catch (Exception e) {
System.out.println("Error calling HttpClient");
e.printStackTrace();
}
}
My APIGateway stage name is beta and the methodName is update.
I'm using this GitHub class here for signing my request - https://github.com/javaquery/Examples/blob/master/src/com/javaquery/aws/AWSV4Auth.java
I always keep seeing this error - "This request could not be satisfied."
I used postman as a reference to generate the headers and I still cannot figure out what or where am I going wrong.
I a not trying to use the generated SDK from APIGateway because of a specific internal problem. Am I missing any other headers which need to be passed in?

Jersey JAVA REST Client giving Error 500 "BAD Request" for POST request, while POSTMAN is able POST to same Restful API

I am trying to post form data through a JAVA Jersey REST client but i receive the response code 500 and an according exception:
java.lang.RuntimeException: Failed with HTTP error code : 500
The same request from POSTMAN(Chrome Extention) works successfully.
I am making a POST request to StreamSets Data Collector API.
Below is my Code
public static String testUploadService(String httpURL, File filePath) throws Exception {
// local variables
ClientConfig clientConfig = null;
Client client = null;
WebTarget webTarget = null;
Invocation.Builder invocationBuilder = null;
Response response = null;
FileDataBodyPart fileDataBodyPart = null;
FormDataMultiPart formDataMultiPart = null;
int responseCode;
String responseMessageFromServer = null;
String responseString = null;
String name = "*******";
String password = "*******";
String authString = name + ":" + password;
String sdc="sdc";
byte[] encoding = Base64.getEncoder().encode(authString.getBytes());
byte[] encoding2 = Base64.getEncoder().encode(sdc.getBytes());
String USER_PASS = new String(encoding);
String auth2=new String(encoding2);
try{
ClientConfig cc = new ClientConfig();
cc.register(MultiPartFeature.class);
try {
client = new JerseywithSSL().initClient(cc);
} catch (KeyManagementException e) {
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
webTarget = client.target(httpURL);
// set file upload values
fileDataBodyPart = new FileDataBodyPart("uploadFile", filePath, MediaType.MULTIPART_FORM_DATA_TYPE);
formDataMultiPart = new FormDataMultiPart();
formDataMultiPart.bodyPart(fileDataBodyPart);
// invoke service
invocationBuilder = webTarget.request();
invocationBuilder.header("Authorization", "Basic " + USER_PASS);
invocationBuilder.header("X-Requested-By","SDC"); //Additional Header requiered by Streamsets RestAPI
invocationBuilder.header("Content-type", "multipart/form-data");
response = invocationBuilder.post(Entity.entity(formDataMultiPart, MediaType.MULTIPART_FORM_DATA));
// get response code
responseCode = response.getStatus();
System.out.println("Response code: " + responseCode);
if (response.getStatus() != 200) {
throw new RuntimeException("Failed with HTTP error code : " + responseCode);
}
// get response message
responseMessageFromServer = response.getStatusInfo().getReasonPhrase();
System.out.println("ResponseMessageFromServer: " + responseMessageFromServer);
// get response string
responseString = response.readEntity(String.class);
}
catch(Exception ex) {
ex.printStackTrace();
}
finally{
// release resources, if any
fileDataBodyPart.cleanup();
formDataMultiPart.cleanup();
formDataMultiPart.close();
response.close();
client.close();
}
return responseString;
}
}
And here is screenshot of POSTMAN of all header and Authentication included,
I can't figure out whether its an issue with forming a multipart or is it an issue on the server side and if its the former than where exactly am I going wrong?
PS: I got over SSL certificate error by adding Trust certificate.
UPDATE 1
After I dig further into I got Following error stacktrace.
responseString : {
"RemoteException" : {
"message" : "java.lang.NullPointerException: in is null",
"errorCode" : "CONTAINER_0000",
"localizedMessage" : "in is null",
"exception" : "NullPointerException",
"javaClassName" : "java.lang.NullPointerException",
"stackTrace" : "java.lang.NullPointerException: in is null\n\tat java.util.zip.ZipInputStream.<init>(ZipInputStream.java:101)\n\tat java.util.zip.ZipInputStream.<init>(ZipInputStream.java:80)\n\tat com.streamsets.datacollector.restapi.PipelineStoreResource.importPipelines(PipelineStoreResource.java:551)\n\tat sun.reflect.GeneratedMethodAccessor573.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)\n\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)\n\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)\n\tat org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)\n\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)\n\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)\n\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)\n\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)\n\tat org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)\n\tat org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)\n\tat org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)\n\tat org.glassfish.jersey.internal.Errors.process(Errors.java:315)\n\tat org.glassfish.jersey.internal.Errors.process(Errors.java:297)\n\tat org.glassfish.jersey.internal.Errors.process(Errors.java:267)\n\tat org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)\n\tat org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)\n\tat org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)\n\tat org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)\n\tat org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)\n\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)\n\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)\n\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat com.streamsets.datacollector.http.GroupsInScopeFilter.lambda$doFilter$0(GroupsInScopeFilter.java:82)\n\tat com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)\n\tat com.streamsets.datacollector.http.GroupsInScopeFilter.doFilter(GroupsInScopeFilter.java:81)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)\n\tat org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:308)\n\tat org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:262)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)\n\tat com.streamsets.datacollector.http.LocaleDetectorFilter.doFilter(LocaleDetectorFilter.java:39)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)\n\tat com.streamsets.pipeline.http.MDCFilter.doFilter(MDCFilter.java:47)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1621)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:541)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:494)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1592)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1239)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:481)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1561)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1141)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:118)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:564)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)\n\tat org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:258)\n\tat org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:147)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)\n\tat org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:122)\n\tat org.eclipse.jetty.util.thread.strategy.ExecutingExecutionStrategy.invoke(ExecutingExecutionStrategy.java:58)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:201)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:133)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)\n\tat java.lang.Thread.run(Thread.java:748)\n"
}
}

SPOJ Login in Java

I was trying to automate SPOJ Login in Java with Jsoup, however the response page isn't what i expected, below is the code
public static void login() {
try {
Document doc;
Connection.Response response;
Map<String, String> cookies = new HashMap<>(), form = new HashMap<>();
response = Jsoup.connect(LOGIN_URL) // http://spoj.com/login
.method(Connection.Method.GET)
.userAgent(USER_AGENT)
.execute();
cookies.putAll(response.cookies());
form.put("login_user", LOGIN_ACCOUNT);
form.put("password", LOGIN_PASSWORD);
form.put("next_raw", "/");
response = Jsoup.connect(LOGIN_URL)
.cookies(cookies)
.method(Connection.Method.POST)
.data(form)
.userAgent(USER_AGENT)
.execute();
System.out.println(response.body());
} catch (IOException e) {
}
}
Response body result still contain sign-in form, which mean my login attempt is failed. Can anyone help, please?
I've resolved this issue by retrieving the cookie from set-cookie parameter in header, and change login implementation with apache client
public static void login() {
try {
//Submit Login Form
HttpResponse response = Request.Post(LOGIN_URL)
.bodyForm(Form.form()
.add("login_user", ACCOUNT)
.add("password", PASSWORD)
.add("next_raw", "/")
.add("autologin", "1")
.build())
.userAgent(USER_AGENT)
.execute()
.returnResponse();
//Retrieve Cookie from Response
StringBuilder sb = new StringBuilder();
for(Header header : response.getAllHeaders()) {
if(header.getName().equalsIgnoreCase("set-cookie") || header.getName().equalsIgnoreCase("set-cookie2")){
sb.append(header.getValue()).append("; ");
}
}
COOKIE = sb.toString();
//Debugging to Verify if Login is Successful
Document doc = Jsoup.connect(BASE_URL).cookies(cookieMapper()).get();
System.out.println(doc.text());
} catch (IOException e) {
e.printStackTrace();
}
}

How to retrieve cookies on a https connection?

I'm trying to save the cookies in a URL that uses SSL but always return NULL.
private Map<String, String> cookies = new HashMap<String, String>();
private Document get(String url) throws IOException {
Connection connection = Jsoup.connect(url);
for (Entry<String, String> cookie : cookies.entrySet()) {
connection.cookie(cookie.getKey(), cookie.getValue());
}
Response response = connection.execute();
cookies.putAll(response.cookies());
return response.parse();
}
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup .connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.response();
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso? numeroProcesso=" + campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
I try to capture the cookies the first connection, but always they are always set to NULL. I think it might be some Cois because of the secure connection (HTTPS)
Any idea?
The issue is not HTTPS. The problem is more or less a small mistake.
To fix your issue you can simply replace .response() with .execute(). Like so,
private void buscaJuizado(List<Movimentacao> movimentacoes) {
try {
Connection.Response res = Jsoup
.connect("https://projudi.tjpi.jus.br/projudi/publico/buscas/ProcessosParte?publico=true")
.userAgent("Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
.timeout(0)
.execute(); // changed fron response()
cookies = res.cookies();
Document doc = get("https://projudi.tjpi.jus.br/projudi/listagens/DadosProcesso?numeroProcesso="+campo);
System.out.println(doc.body());
} catch (IOException ex) {
Logger.getLogger(ConsultaProcessoTJPi.class.getName()).log(Level.SEVERE, null, ex);
}
}
Overall you have to make sure you execute the request first.
Calling .response() is useful to grab the Connection.Response object after the request has already been executed. Obviously the Connection.Response object won't be very useful if you haven't executed the request.
In fact if you were to try calling res.body() on the unexecuted response you would receive the following exception indicating the issue.
java.lang.IllegalArgumentException: Request must be executed (with .execute(), .get(), or .post() before getting response body

Categories