Efficiently making multiple GET requests to the same url in Java

Efficiently making multiple GET requests to the same url in Java - java

I need to make multiple GET requests to the same URL but with different queries. I will be doing this on a mobile device (Android) so I need to optimise as much as possible. I learnt from watching an Android web seminar by Google that it takes ~200ms to connect to a server and there's also various other delays involved with making data calls. I'm just wondering if theres a way I can optimise the process of making multiple requests to the same URL to avoid some of these delays?
I have been using the below method so far but I have been calling it 6 times, one for each GET request.
//Make a GET request to url with headers.
//The function returns the contents of the retrieved file
public String getRequest(String url, String query, Map<String, List<String>> headers) throws IOException{
String getUrl = url + "?" + query;
BufferedInputStream bis = null;
try {
connection = new URL(url + "?" + query).openConnection();
for(Map.Entry<String, List<String>> h : headers.entrySet()){
for(String s : h.getValue()){
connection.addRequestProperty(h.getKey(), s);
}
}
bis = new BufferedInputStream(connection.getInputStream());
StringBuilder builder = new StringBuilder();
int byteRead;
while ((byteRead = bis.read()) != -1)
builder.append((char) byteRead);
bis.close();
return builder.toString();
} catch (MalformedURLException e) {
throw e;
} catch (IOException e) {
throw e;
}
}

If for every request you expect another result and you cannot combine requests by adding more than one GET variables in the same request then you cannot avoid the 6 calls.
However you can use multiple Threads to simultaneously run your requests. You may use a Thread Pool approach using the native ExecutorService in Java. I would propose you to use an ExecutorCompletionService to run your requests. As the processing time is not CPU-bounded, but network-bounded, you may use more Threads than your current CPUs.
For instance, in some of my projects I use 10+, sometimes 50+ Threads (in a Thread Pool) to simultaneously retrieve URL data, even though I only have 4 CPU cores.

Related

Two Threads Executing Same Method

I am developing an API request and I'm using multi threading.In the output I'm getting the same request twice generated by two threads.As I debugged two thread are calling the same method again.So need help so that this issue is resolved
This is my pseudo code
public void run() {
logger.debug("Thread " + currentThread().getName() + " Running");
String message = "";
Connection connection = null;
InputStream fileinput = null;
Properties properties = new Properties();
try {
File file = new File("/home/sridhar.anirudh/eclipse-workspace/API/Change.properties");
fileinput = new FileInputStream(file);
properties.load(fileinput);
soapEndpointUrl = properties.getProperty("endpoint_url");
soapAction = properties.getProperty("soap_action");
} catch (Exception e) {
e.printStackTrace();
}
try {
connection = Database.getInstance().getConnection();
} catch (SQLException e1) {
logger.error("Failed To Get Connection " + e1.getMessage());
return;
}
if (CATEGORY.equalsIgnoreCase("fraudrestriction")) {
String soapResponse = callSoapWebServiceFraudRestriction(soapEndpointUrl, soapAction);
String response_status = "";
if (soapResponse.contains("<tns:Description>SUCCESS</tns:Description>") &&
soapResponse.contains("<tns:Code>ERR_000</tns:Code>")) {
response_status = "SUCCESS";

If you kick off two copies of the thread, they will both run, creating the effect you see.
You can create multiple worker threads, but you need to allocate the work between those workers such that each performs a subset of the total workload.
Since you're (seemingly) parsing and processing a file, and making a network service request in response to that file's contents, it's not clear how you intend to divide up the work. That's the key; to use multiple threads to improve throughput, you the programmer must devise a means of partitioning the work between those threads.
As an analogy, if you have one (human) worker working on a job, simply hiring a second worker won't get the job completed any faster unless the work is divided between those workers. That division is your problem. There's nothing magical about threads that can do this for you.

Inconsistent output from multithreaded FTP InputStreams

I'm trying to create a java program that downloads certain asset files from an FTP server to a local file. Because my (free) FTP server doesn't support file sizes over a few megabytes, I decided to split up the files when they are uploaded and recombine them when the program downloads them. This works, but it is rather slow, because for each file, it has to get the InputStream, which takes some time.
The FTP server I use has a way to download the files without actually logging into the server, so I'm using this code to get the InputStream:
private static final InputStream getInputStream(String file) throws IOException {
return new URL("http://site.website.com/path/" + file).openStream();
}
To get the InputStream of a part of the asset file I'm using this code:
public static InputStream getAssetInputStream(String asset, int num) throws IOException, FTPException {
try {
return getInputStream("assets/" + asset + "_" + num + ".raf");
} catch (Exception e) {
// error handling
}
}
Because the getAssetInputStreams(String, int) method takes some time to run (especially if the file size is more then a megabyte), I decided to make the code that actually downloads the file multi-threaded. Here is where my problem lies.
final Map<Integer, Boolean> done = new HashMap<Integer, Boolean>();
final Map<Integer, byte[]> parts = new HashMap<Integer, byte[]>();
for (int i = 0; i < numParts; i++) {
final int part = i;
done.put(part, false);
new Thread(new Runnable() {
#Override
public void run() {
try {
InputStream is = FTP.getAssetInputStream(asset, part);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE];
int len = 0;
while ((len = is.read(buf)) > 0) {
baos.write(buf, 0, len);
curDownload.addAndGet(len);
totAssets.addAndGet(len);
}
parts.put(part, baos.toByteArray());
done.put(part, true);
} catch (IOException e) {
// error handling
} catch (FTPException e) {
// error handling
}
}
}, "Download-" + asset + "-" + i).start();
}
while (done.values().contains(false)) {
try {
Thread.sleep(100);
} catch(InterruptedException e) {
e.printStackTrace();
}
}
File assetFile = new File(dir, "assets/" + asset + ".raf");
assetFile.createNewFile();
FileOutputStream fos = new FileOutputStream(assetFile);
for (int i = 0; i < numParts; i++) {
fos.write(parts.get(i));
}
fos.close();
This code works, but not always. When I run it on my desktop computer, it works almost always. Not 100% of the time, but often it works just fine. On my laptop, which has a far worse internet connection, it almost never works. The result is a file that is incomplete. Sometimes, it downloads 50% of the file. Sometimes, it downloads 90% of the file, it differs every time.
Now, if I replace the .start() by .run(), the code works just fine, 100% of the time, even on my laptop. It is, however, incredibly slow, so I'd rather not use .run().
Is there a way I could change my code so it does work multi-threaded? Any help will be appreciated.

Firstly, get your FTP server replaced, there are plenty of free FTP servers that support arbitrary file size serving with additional features, but I digress...
Your code seems to have many unrelated problems that could potentially all cause the behavior you are seeing, addressed below:
You have race conditions from accessing the done and parts maps from unprotected/unsynchronized access from multiple threads. This could cause data corruption and loss of synchronization for these variables between threads, potentially causing done.values().contains(false) to return true even when it's really not.
You are calling done.values().contains() repeatedly at a high frequency. Whilst the javadoc doesn't explicitly state, a hash map likely traverses every value in a O(n) fashion to check if a given map contains a value. Coupled with the fact that other threads are modifying the map, you'll get undefined behavior. According to values() javadoc:
If the map is modified while an iteration over the collection is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
You are somehow calling new URL("http://site.website.com/path/" + file).openStream(); but stating you are using FTP. The http:// in the link defines the protocol openStream() tries to open in and http:// is not ftp://. Not sure if this is a typo or did you mean HTTP (or do you have an HTTP server serving identical files).
Any thread raising any type of Exception will cause the code to fail given that not all parts will have "completed" (based on your busy-wait loop design). Granted, you may be redacted some other logic to guard against this, but otherwise this is a potential problem with the code.
You aren't closing any streams that you've opened. This could mean that the underlying socket itself is also left open. Not only does this constitute resource leakage, if the server itself has some sort of maximum number of simultaneous connection limit, you are only causing new connections to fail because the old, completed transfers are not closed.
Based on the issues above, I propose moving the download logic into a Callable task and running them through an ExecutorService as follows:
LinkedList<Callable<byte[]>> tasksToExecute = new LinkedList<>();
// Populate tasks to run
for(int i = 0; i < numParts; i++){
final int part = i;
// Lambda to
tasksToExecute.add(() -> {
InputStream is = null;
try{
is = FTP.getAssetInputStream(asset, part);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE];
int len = 0;
while((len = is.read(buf)) > 0){
baos.write(buf, 0, len);
curDownload.addAndGet(len);
totAssets.addAndGet(len);
}
return baos.toByteArray();
}catch(IOException e){
// handle exception
}catch(FTPException e){
// handle exception
}finally{
if(is != null){
try{
is.close();
}catch(IOException ignored){}
}
}
return null;
});
}
// Retrieve an ExecutorService instance, note the use of work stealing pool is Java 8 only
// This can be substituted for newFixedThreadPool(nThreads) for Java < 8 as well for tight control over number of simultaneous links
ExecutorService executor = Executors.newWorkStealingPool(4);
// Tells the executor to execute all the tasks and give us the results
List<Future<byte[]>> resultFutures = executor.invokeAll(tasksToExecute);
// Populates the file
File assetFile = new File(dir, "assets/" + asset + ".raf");
assetFile.createNewFile();
try(FileOutputStream fos = new FileOutputStream(assetFile)){
// Iterate through the futures, writing them to file in order
for(Future<byte[]> result : resultFutures){
byte[] partData = result.get();
if(partData == null){
// exception occured during downloading this part, handle appropriately
}else{
fos.write(partData);
}
}
}catch(IOException ex(){
// handle exception
}
Using the executor service, you further optimize your multi-threading scenario since the output file will start writing as soon as pieces (in order) are available and that threads themselves are reused to save on thread creation costs.
As mentioned, there could be the case where too many simultaneous links causes the server to reject connections (or even more dangerously, write an EOF to make you think the part was downloaded). In this case, the number of worker threads can be tweaked by newFixedThreadPool(nThreads) to ensure at any given time, only nThreads amount of downloads can happen concurrently.

Multithreaded http/https Man in the middle Proxy, Socket Performance

Question edited following first comment.
My problem is mostly with java socket performance, and especially reading from the target server.
The server is a simple serversocket.accept() loop that create a client thread for every connection from firefox
Main problem is socket input stream reading that blocks for enormous amounts of time.
Client thread is as follows :
//Take an httpRequest (hc.apache.org), raw string http request, and the firefox socket outputstream
private void handle(httpRequest req, String raw, Outputstream out)
{
InputStream targetIn =null;
OutputStream targetOut = null;
Socket target = null;
try {
System.out.println("HANDLE HTTP");
String host = req.getHeaders("Host")[0].getValue();
URI uri = new URI(req.getRequestLine().getUri());
int port = uri.getPort() != -1 ? uri.getPort() : 80;
target = new Socket(host, port);
//**I have tried to play around with these but cannot seem to get a difference in performance**
target.setTcpNoDelay(true);
// target.setReceiveBufferSize(1024 *1024);
// target.setSendBufferSize(1024 * 1024);
//Get your plain old in/out streams
targetIn = target.getInputStream();
targetOut = target.getOutputStream();
//Send the request to the target
System.out.println("---------------Start response---------------");
targetOut.write(raw.getBytes());
System.out.println("request sent to target");
////Same as membrane
byte[] buffer = new byte[8 * 1024];
int length = 0;
try {
while((length = targetIn.read(buffer)) > 0) {
out.write(buffer, 0, length);
out.flush();
}
} catch(Exception e) {
e.printStackTrace();
}
System.out.println("closing out + target socket");
//IOUTILS
// long count = IOUtils.copyLarge(targetIn, out, 0L, 1048576L);
// int count = IOUtils.copy(targetIn, out);
// System.out.println("transfered : " + count );
//CHANNEL COPY
//
// ReadableByteChannel input = Channels.newChannel(targetIn);
// WritableByteChannel output = Channels.newChannel(out);
//
// ChannelTools.fastChannelCopy(input, output);
//
// input.close();
// output.close();
//CHAR TO CHAR COPY
// int c;
// while ((c = targetIn.read()) != -1) {
// out.write(c);
// }
target.close();
out.close();
System.out.println("-------------------- end response ------------------------------");
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The main problem lies in in the appropriate method to copy the target inputstream to the client (firefox) outputstream.
The site i am using to test this out is http://www.ouest-france.fr (new site with a load of images and makes loads of requests).
Ping time from workstation to target : 10ms
Normal Loading in iceweasel (debian firefox, firebug time) : 14 secs, 2.5MB
Loading behind this proxy : 14 minutes (firebug net panel is full of fake 404s, and aborted request that go back to black after a certain time, loads of requests are in blocking or waiting mode)
Now when executing i loadup visual vm, launch profiling with no class filter (to see where the app is really spending its time) and it spends 99% of its time in java.net.SocketInputStream.read(byte[], int, int), which is reading on the target socket input stream.
I think i have done my homework and been searching a testing different solutions juste about anywhere i could.
but performance never seems to improve.
I What i have already tried :
-Putting input and output streams into their buffered version, no change at all
-int to int copy, no change at all,
-classic byte[] array copy with variable sized arrays, no change at all
fiddling around with settcpnodelay, setsendbuffersize, setreceivebuffersize, could not get any change.
Was thinking of trying out nio socketchannels , but cannot find a way to do the socket to sslsocket hijacking.
So at the moment i am a bit stuck and searching for solutions.
I look at the source code of open sources proxies and cannot seem to find a fundamental difference in logic so i am completely lost with this.
Tried a other test :
export http_proxy="localhost:4242"
wget debiandvd.iso
Throughput gets to 2MB/s.
And threads seems to spend 66% time reading from target an 33% time writing to client
I am thinking that maybe i have to many threads running but running a test on www.google.com has much less requests going through but still the sames problems as www.ouest-france.fr
With the debian iso test i was thinking i had to many threads running (ouest-france is around 270 requests) but the google test (10 request) test seems to confirm that thread numbers are not the problem.
Any help will be appreciated.
Environment is debian, sun java 1.6, dev with eclipse and visualvm
I can provide the rest of the code as needed.
Thank you

Partial solution found :
Not a very clean solution but works.
I still have a throughput problem.
What I do is set the socket timer to a normal timeout (30000ms).
When the first read has come in the loop I reset the timer to something a lot lower (1000ms at the moment).
That allows me to wait for the server to start sending data, and if I get 1 second without any new data coming I consider the transfer to be finished.
Response times are still quite slow but way better.

Best practice for reading / writing to a java server socket

How do you design a read and write loop which operates on a single socket (which supports parallel read and write operations)? Do I have to use multiple threads? Is my (java) solution any good? What about that sleep command? How do you use that within such a loop?
I'm trying to use 2 Threads:
Read
public void run() {
InputStream clientInput;
ByteArrayOutputStream byteBuffer;
BufferedInputStream bufferedInputStream;
byte[] data;
String dataString;
int lastByte;
try {
clientInput = clientSocket.getInputStream();
byteBuffer = new ByteArrayOutputStream();
bufferedInputStream = new BufferedInputStream(clientInput);
while(isRunning) {
while ((lastByte = bufferedInputStream.read()) > 0) {
byteBuffer.write(lastByte);
}
data = byteBuffer.toByteArray();
dataString = new String(data);
byteBuffer.reset();
}
} catch (IOException e) {
e.printStackTrace();
}
}
Write
public void run() {
OutputStream clientOutput;
byte[] data;
String dataString;
try {
clientOutput = clientSocket.getOutputStream();
while(isOpen) {
if(!commandQueue.isEmpty()) {
dataString = commandQueue.poll();
data = dataString.getBytes();
clientOutput.write(data);
}
Thread.sleep(1000);
}
clientOutput.close();
}
catch (IOException e) {
e.printStackTrace();
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
Read fails to deliver a proper result, since there is no -1 sent.
How do I solve this issue?
Is this sleep / write loop a good solution?

There are basically three ways to do network I/O:
Blocking. In this mode reads and writes will block until they can be fulfilled, so if you want to do both simultaneously you need separate threads for each.
Non-blocking. In this mode reads and writes will return zero (Java) or in some languages (C) a status indication (return == -1, errno=EAGAIN/EWOULDBLOCK) when they cannot be fulfilled, so you don't need separate threads, but you do need a third API that tells you when the operations can be fulfilled. This is the purpose of the select() API.
Asynchronous I/O, in which you schedule the transfer and are given back some kind of a handle via which you can interrogate the status of the transfer, or, in more advanced APIs, a callback.
You should certainly never use the while (in.available() > 0)/sleep() style you are using here. InputStream.available() has few correct uses and this isn't one of them, and the sleep is literally a waste of time. The data can arrive within the sleep time, and a normal read() would wake up immediately.

You should rather use a boolean variable instead of while(true) to properly close your thread when you will want to. Also yes, you should create multiple thread, one per client connected, as the thread will block itself until a new data is received (with DataInputStream().read() for example). And no, this is not really a design question, each library/Framework or languages have its own way to listen from a socket, for example to listen from a socket in Qt you should use what is called "signals and slots", not an infinite loop.

How to create a Java non-blocking InputStream from a HttpsURLConnection?

Basically, I have a URL that streams xml updates from a chat room when new messages are posted. I'd like to turn that URL into an InputStream and continue reading from it as long as the connection is maintained and as long as I haven't sent a Thread.interrupt(). The problem I'm experiencing is that BufferedReader.ready() doesn't seem to become true when there is content to be read from the stream.
I'm using the following code:
BufferedReader buf = new BufferedReader(new InputStreamReader(ins));
String str = "";
while(Thread.interrupted() != true)
{
connected = true;
debug("Listening...");
if(buf.ready())
{
debug("Something to be read.");
if ((str = buf.readLine()) != null) {
// str is one line of text; readLine() strips the newline character(s)
urlContents += String.format("%s%n", str);
urlContents = filter(urlContents);
}
}
// Give the system a chance to buffer or interrupt.
try{Thread.sleep(1000);} catch(Exception ee) {debug("Caught thread exception.");}
}
When I run the code, and post something to the chat room, buf.ready() never becomes true, resulting in the lines never being read. However, if I skip the "buf.ready()" part and just read lines directly, it blocks further action until lines are read.
How do I either a) get buf.ready() to return true, or b) do this in such a way as to prevent blocking?
Thanks in advance,
James

How to create a Java non-blocking InputStream
You can't. Your question embodies a contradiciton in terms. Streams in Java are blocking. There is therefore no such thing as a 'non-blocking InputStream'.
Reader.ready() returns true when data can be read without blocking. Period. InputStreams and Readers are blocking. Period. Everything here is working as designed. If you want more concurrency with these APIs you will have to use multiple threads. Or Socket.setSoTimeout() and its near relation in HttpURLConnection.

For nonblocking IO don't use InputStream and Reader (or OutputStream/Writer), but use the java.nio.* classes, in this case a SocketChannel (and additional a CharsetDecoder).
Edit: as an answer to your comment:
Specifically looking for how to create a socket channel to an https url.
Sockets (and also SocketChannels) work on the transport layer (TCP), one (or two) level(s) below application layer protocols like HTTP. So you can't create a socket channel to an https url.
You would instead have to open a Socket-Channel to the right server and the right port (443 if nothing else given in the URI), create an SSLEngine (in javax.net.ssl) in client mode, then read data from the channel, feeding it to the SSL engine and the other way around, and send/get the right HTTP protocol lines to/from your SSLEngine, always checking the return values to know how many bytes were in fact processed and what would be the next step to take.
This is quite complicated (I did it once), and you don't really want to do this if you are not implementing a server with lots of clients connected at the same time (where you can't have a single thread for each connection). Instead, stay with your blocking InputStream which reads from your URLConnection, and put it simply in a spare thread which does not hinder the rest of your application.

You can use the Java NIO library which provides non-blocking I/O capabilities. Take a look at this article for details and sample code: http://www.drdobbs.com/java/184406242.

There is no HTTP/HTTPS implementation using Channels. There is no way to read the inputstream from a httpurlconnaction in a non-blocking way. You either have to use a third party lib or implement http over SocketChannel yourself.

import java.io.InputStream;
import java.util.Arrays;
/**
* This code demonstrates non blocking read from standard input using separate
* thread for reading.
*/
public class NonBlockingRead {
// Holder for temporary store of read(InputStream is) value
private static String threadValue = "";
public static void main(String[] args) throws InterruptedException {
NonBlockingRead test = new NonBlockingRead();
while (true) {
String tmp = test.read(System.in, 100);
if (tmp.length() > 0)
System.out.println(tmp);
Thread.sleep(1000);
}
}
/**
* Non blocking read from input stream using controlled thread
*
* #param is
* — InputStream to read
* #param timeout
* — timeout, should not be less that 10
* #return
*/
String read(final InputStream is, int timeout) {
// Start reading bytes from stream in separate thread
Thread thread = new Thread() {
public void run() {
byte[] buffer = new byte[1024]; // read buffer
byte[] readBytes = new byte[0]; // holder of actually read bytes
try {
Thread.sleep(5);
// Read available bytes from stream
int size = is.read(buffer);
if (size > 0)
readBytes = Arrays.copyOf(buffer, size);
// and save read value in static variable
setValue(new String(readBytes, "UTF-8"));
} catch (Exception e) {
System.err.println("Error reading input stream\nStack trace:\n" + e.getStackTrace());
}
}
};
thread.start(); // Start thread
try {
thread.join(timeout); // and join it with specified timeout
} catch (InterruptedException e) {
System.err.println("Data were note read in " + timeout + " ms");
}
return getValue();
}
private synchronized void setValue(String value) {
threadValue = value;
}
private synchronized String getValue() {
String tmp = new String(threadValue);
setValue("");
return tmp;
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficiently making multiple GET requests to the same url in Java - java

Related

Two Threads Executing Same Method

Inconsistent output from multithreaded FTP InputStreams

Multithreaded http/https Man in the middle Proxy, Socket Performance

Best practice for reading / writing to a java server socket

How to create a Java non-blocking InputStream from a HttpsURLConnection?

Categories

Resources