HTTP calls from spark map function ensuring single instance of HttpClient - java

I have a dataset and I have to call an API for each row of the dataset. I am using a Map function for this.
Inside the map function, I do the API call and return a new object .
I am able to do API calls by creating new HttpClient for each call.
HttpClient client = new DefaultHttpClient();client is initialised and used inside map function,
However, when I try to use a single instance of Http Client, my API calls are failing with.
java.lang.IllegalStateException: Invalid use of BasicClientConnManager: connection still allocated.Make sure to release the connection before allocating another one.
I am using following approach to ensure single object of HttpClient.
private static HttpClient httpClient;
public static HttpClient gethttpClient() {
if (httpClient == null) {
httpClient = new DefaultHttpClient();
}
return httpClient;
}
And calling gethttpClient() to getmake API calls. However, it is giving the above error.
What can be the correct way to do API calls from map function in java spark.

Looks like you want to avoid excess creation of HttpClient obects and so the method but only way out of this is to iterate the rows on batches and use new DefaultHttpClient() for each batch.
dataset is of type org.apache.spark.sql.Dataset
dataset.foreachPartition( dataSetBatch -> {
DefaultHttpClient http = new DefaultHttpClient();
if(dataSetBatch.hasNext()) {
dataSetBatch.next();
// invoke submit hhtp request here
}
http.close();
});

Related

Assign a different proxy to thread - Java

Hi Guys I am making a bot which can use a an other REST API to create a user but API only supports one user at a time and I dont want to change it. So I am using Multithreading to call the API multiple times, but I want to use proxies in it. So like different proxies for different threads, and all the threads use HttpClient and I tried its #proxy method and gave it the ip and port of the proxy but when I tried to call the API it returned a null response and I tried without the proxy and it did return a valid response.
So is there any other way to assign an proxy to a thread
My Code
HttpClient client = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.followRedirects(HttpClient.Redirect.NORMAL)
//THIS LINE BELOW IS HOW I USED PROXIES
.proxy(ProxySelector.of(new InetSocketAddress("proxy.example.com", 80)))
.connectTimeout(Duration.ofSeconds(30))
.build();
HttpRequest discordAccNoCaptcha = HttpRequest.newBuilder()
.uri(URI.create("https://myapiserver.com/api/v9/auth/register"))
.timeout(Duration.ofMinutes(2))
.POST(HttpRequest.BodyPublishers.ofString(body))
.build();
String response = client.send(discordAccNoCaptcha, HttpResponse.BodyHandlers.ofString()).body(); /* This is null when using ProxySelector and valid response when using no ProxySelector */

Best way for apache HttpClients using in a multithreaded environment

I need to create a service on the server-side for sending HTTP requests. It works like this:
1. Client sends a request to the server
2. Server uses singleton service for calling 3rd-party API via HTTP.
3. API returns the response
4. Server process this response and return it to the client
It is a synchronized process.
I created service:
public class ApacheHttpClient {
private final static CloseableHttpClient client = HttpClients.createDefault();
public String sendPost(String serverUrl, String body) {
try {
HttpPost httpPost = new HttpPost(serverUrl);
httpPost.setEntity(new StringEntity(body));
CloseableHttpResponse response = client.execute(httpPost);
String responseAsString = EntityUtils.toString(response.getEntity());
response.close();
return responseAsString;
} catch (IOException e) {
throw new RestException(e);
}
}
ApacheHttpClient is a singleton in my system. CloseableHttpClient client is a singleton too.
Question 1: Is it correct to use one instance of CloseableHttpClient client or I should create a new instance for each request?
Question 2: When I should close the client?
Question 3: This client can process only 2 connections in one time period. Should I use an executor?
The use of HttpClient instance as a singleton per distinct HTTP service is correct and is in line with the Apache HttpClient best practices. It should not be static, though.
http://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/preparation.html
One should close HttpClient when releasing the HTTP service. In your case ApacheHttpClient should implement Closeable and close the internal instance of CloseableHttpClient in its #close method.
You probably should not, but it really depends on how exactly your application deals with request execution.

Spring Boot: efficiently get data from REST API

I have a Spring Boot application that (among other things) gets some data from a third party JSON API (secured with OAuth), processes the result and presents it to the user. The application receives approx. 1 request each second.
Unfortunately this process is very slow at the moment (and in many cases even ends with a 503 error) and I am looking for some idea to improve the implementation. (by the way: the third party API itself does not seem to be the bottleneck as a instance of my app running on my local machine using the exact same API response very fast at the same time that the deploy instance takes very long).
For the API call I use the Apache HTTP library - or more specifically the Async HTTP Client:
this.httpClientAsync = HttpAsyncClients.custom()
.setDefaultCredentialsProvider(credsProvider) //for forward proxy
.build();
And the actual call to the API is this:
updateToken(); //get or update OAuth Token
HttpGet httpget = new HttpGet(URL);
httpget.addHeader("Authorization", "Bearer " + accessToken);
Future<HttpResponse> f = this.httpClientAsync.execute(httpget, callback);
Do you have any suggestion on how to improve the implementation?
To be honest, I don't even have an idea where the bottleneck is at the moment. Any idea on how to find out about that?
Thanks for your hints!
One more thing/update:
the Spring Controller looks something like this:
#RequestMapping(value = "/api/v1/api_data")
public DeferredResult<ResponseEntity<Map>> getAPIData() throws IOException, InterruptedException {
DeferredResult<ResponseEntity<Map>> res = new DeferredResult<>();
triggerAPICall(new FutureCallback() {
public void completed(Object o) {
(...)
res.setResult(...);
}
(...)
}
return res;
}
Furthermore, I was originally not using the async version of the HTTP client, but the blocking version. This then even slowed down the rest of the application...

Put request parameters not getting set

This may be standard stuff but unable to get it wokring.
I'm using org.apache.commons.httpclient.methods for making Http request from my Java code. In one instance I've to make a PUT request and pass some parameters. I'm doing it the following way:
PutMethod putMethod = new PutMethod(url);
putMethod.getParams().setParameter("param1", "param1Value");
putMethod.getParams().setParameter("param2", "param2Value");
httpClient.executeMethod(putMethod);
But at the server, when it tries to read these parameters - it can only get null.
However, When I modify my url as url?param1=param1Value&param2=param2Value it works.
How do I get it working using setParameter method?
To add Query Params to PutMethod, follow this method.
NameValuePair[] putParameters = new NameValuePair[2];
putParameters[0] = new NameValuePair(Param1, value1);
putParameters[1] = new NameValuePair(Param2, value2);
HttpClient client = new HttpClient();
PutMethod putMethod = new PutMethod(url);
putMethod.setQueryString(putParameters);
Then Call,
int response = client.executeMethod(putMethod);
Instead of putMethod.setQueryString(putParameters); you could also use
putMethod.setRequestBody(EncodingUtil.formUrlEncode(putParameters, "UTF-8"));
(This is deprecated)
GetMethod, PostMethod have slight differences when adding Query Params compared to the above code.
For More Code Examples : http://www.massapi.com/class/pu/PutMethod.html
Hope this helps.
your server side code has to support the PUT method
for example if its a Servlet you can include the method
doPUT(); // your put request will be delivered to this method
if you use REST based frameworks such as jersey
you can use
#PUT
Response yourPutMethod(){..}

java: apache HttpClient > how to disable retry

I'm using Apache Httpclient for Ajax-calls on a website. In some cases requests to external webservice fail, often with:
I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect.
In that case, more often than not, I want to skip retrying the request (something that Httpclient seems to do automatically) .
However, I can't find any method, param, etc. to skip retrying.
anyone?
Thanks Geert-Jan
From httpclient 4.3 use HttpClientBuilder
HttpClientBuilder.create().disableAutomaticRetries().build();
client.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false));
That would do it.
OK. There is issue in the Documentation. Also there has been change in API and methods.
So if you want to use DefaultHttpRequestRetryHandler , here are the ways to do that,
DefaultHttpClient httpClient = new DefaultHttpClient();
DefaultHttpRequestRetryHandler retryHandler = new DefaultHttpRequestRetryHandler(0, false);
httpClient.setHttpRequestRetryHandler(retryHandler);
or
HttpClient httpClient = new DefaultHttpClient();
DefaultHttpRequestRetryHandler retryHandler = new DefaultHttpRequestRetryHandler(0, false);
((AbstractHttpClient)httpClient).setHttpRequestRetryHandler(retryHandler);
In first one, we use concrete DefaultHttpClient (which is a subclass of AbstractHttpClient and so has the setHttpRequestRetryHandler() method.)
In second one, we are programming to the HttpClient interface (which sadly doesn't expose that method, and this is weird !! ehh), so we have to do that nasty cast.
There's a description in the HttpClient tutorial.
client.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler());
See the tutorial for more information, for instance this may be harmful if the request has side effects (i.e. is not idempotent).
The cast to AbstractHttpClient is not necessary. Another way is to use a strategy with AutoRetryHttpClient with DefaultServiceUnavailableRetryStrategy set to 0 for retry parameter. A better way would be to extend the AbstractHttpClient or implement HttpClient to expose the desired method.

Categories