Incrementally handling twitter's streaming api using apache httpclient?

Incrementally handling twitter's streaming api using apache httpclient? - java

I am using Apache HTTPClient 4 to connect to twitter's streaming api with default level access. It works perfectly well in the beginning but after a few minutes of retrieving data it bails out with this error:
2012-03-28 16:17:00,040 DEBUG org.apache.http.impl.conn.SingleClientConnManager: Get connection for route HttpRoute[{tls}->http://myproxy:80->https://stream.twitter.com:443]
2012-03-28 16:17:00,040 WARN com.cloudera.flume.core.connector.DirectDriver: Exception in source: TestTwitterSource
java.lang.IllegalStateException: Invalid use of SingleClientConnManager: connection still allocated.
at org.apache.http.impl.conn.SingleClientConnManager.getConnection(SingleClientConnManager.java:216)
Make sure to release the connection before allocating another one.
at org.apache.http.impl.conn.SingleClientConnManager$1.getConnection(SingleClientConnManager.java:190)
I understand why I am facing this issue. I am trying to use this HttpClient in a flume cluster as a flume source. The code looks like this:
public Event next() throws IOException, InterruptedException {
try {
HttpHost target = new HttpHost("stream.twitter.com", 443, "https");
new BasicHttpContext();
HttpPost httpPost = new HttpPost("/1/statuses/filter.json");
StringEntity postEntity = new StringEntity("track=birthday",
"UTF-8");
postEntity.setContentType("application/x-www-form-urlencoded");
httpPost.setEntity(postEntity);
HttpResponse response = httpClient.execute(target, httpPost,
new BasicHttpContext());
BufferedReader reader = new BufferedReader(new InputStreamReader(
response.getEntity().getContent()));
String line = null;
StringBuffer buffer = new StringBuffer();
while ((line = reader.readLine()) != null) {
buffer.append(line);
if(buffer.length()>30000) break;
}
return new EventImpl(buffer.toString().getBytes());
} catch (IOException ie) {
throw ie;
}
}
I am trying to buffer 30,000 characters in the response stream to a StringBuffer and then return this as the data received. I am obviously not closing the connection - but I do not want to close it just yet I guess. Twitter's dev guide talks about this here It reads:
Some HTTP client libraries only return the response body after the
connection has been closed by the server. These clients will not work
for accessing the Streaming API. You must use an HTTP client that will
return response data incrementally. Most robust HTTP client libraries
will provide this functionality. The Apache HttpClient will handle
this use case, for example.
It clearly tells you that HttpClient will return response data incrementally. I've gone through the examples and tutorials, but I haven't found anything that comes close to doing this. If you guys have used a httpclient (if not apache) and read the streaming api of twitter incrementally, please let me know how you achieved this feat. Those who haven't, please feel free to contribute to answers. TIA.
UPDATE
I tried doing this: 1) I moved obtaining stream handle to the open method of the flume source. 2) Using a simple inpustream and reading data into a bytebuffer. So here is what the method body looks like now:
byte[] buffer = new byte[30000];
while (true) {
int count = instream.read(buffer);
if (count == -1)
continue;
else
break;
}
return new EventImpl(buffer);
This works to an extent - I get tweets, they are nicely being written to a destination. The problem is with the instream.read(buffer) return value. Even when there is no data on the stream, and the buffer has default \u0000 bytes and 30,000 of them, so this value is getting written to the destination. So the destination file looks like this.. " tweets..tweets..tweeets.. \u0000\u0000\u0000\u0000\u0000\u0000\u0000...tweets..tweets... ". I understand the count won't return a -1 coz this is a never ending stream, so how do I figure out if the buffer has new content from the read command?

The problem is that your code is leaking connections. Please make sure that no matter what you either close the content stream or abort the request.
InputStream instream = response.getEntity().getContent();
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream));
String line = null;
StringBuffer buffer = new StringBuffer();
while ((line = reader.readLine()) != null) {
buffer.append(line);
if (buffer.length()>30000) {
httpPost.abort();
// connection will not be re-used
break;
}
}
return new EventImpl(buffer.toString().getBytes());
} finally {
// if request is not aborted the connection can be re-used
try {
instream.close();
} catch (IOException ex) {
// log or ignore
}
}

It turns out that it was a flume issue. Flume is optimized to transfer events of size 32kb. Anything beyond 32kb, Flume bails out. (The workaround is to tune event size to be greater than 32KB). So, I've changed my code to buffer 20,000 characters at least. It kind of works, but it is not fool proof. This can still fail if the buffer length exceeds 32kb, however, it hasn't failed so far in an hour of testing - I believe it has to do with the fact that Twitter doesn't send a lot of data on its public stream.
while ((line = reader.readLine()) != null) {
buffer.append(line);
if(buffer.length()>20000) break;
}

Related

How to display content that has been written to a Socket.OutputStream while maintaining a persistent connection?

Suppose that I have a multi-threaded web server that only allow clients to perform GET requests for a couple of HTML files. I want to maintain a persistent connection (i.e HTTP Connection: keep-alive) while "dynamically" displaying the content for each request the client makes. Like if they first request index.html then foo.html etc. The problem right now is when I don't close the streams and socket, the program will hang until it happens.
Simply put, the multi-threaded web server consist of a thread pool (Java's ExecutorService) with a ServerSocket that listens to a specific port (e.g 9000) and selects a thread from the threadpool to handle the opening of a client socket to the server. It is basically the same setup as showed in http://tutorials.jenkov.com/java-multithreaded-servers/thread-pooled-server.html.
My modified setup looks like this:
WorkerRunnable.java:
public void run() {
try {
InputStream input = this.clientSocket.getInputStream();
OutputStream output = this.clientSocket.getOutputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
List<String> headers = readInputStream(input)
Request request = new Request(headers);
Response response = new Response(request);
// response.raw() returns correctly formatted HTTP
output.write(response.raw().getBytes(StandardCharsets.UTF_8));
// close the socket if the client specifies Connection: close
if (!request.keepAlive()) {
output.close();
input.close();
} else {
this.clientSocket.setKeepAlive(true);
}
} catch (IOException e) {
e.printStackTrace();
}
private List<String> readInputStream(InputStream input) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String line;
List<String> headers = new ArrayList<>();
while ((line = reader.readLine()) != null && !line.isEmpty()) {
headers.add(line);
}
return headers;
}
My problem is that the HTML only will be displayed when the input/output stream (and thus also the socket) are closed. As far as I understand, the Socket.InputStream will basically hang until it receives an EOF token - which it receives when the stream closes. But if I want to maintain a persistent connection, it doesn't really make sense to close the streams and client socket. So I was wondering how to maintain a persistent connection while also displaying the content of multiple GET requests from clients (assuming this is the correct approach)? If not, please let me know if I've approached this task wrongly.
I have tried to flush the output stream as suggested here, but the problem still persists.

UrlConnection API Call takes much more time the first time, then onwards it is comparable to curl.exe or postman

I have observed that one of my api is taking much more time if called through Java (URLConnection or Apache Http Client or OKHttp) for the first time. For the subsequent calls, the time is much lesser.
Although Postman or curl.exe takes very less time(comparable to the second iterations of java)
For my machine, the first time overhead is around 2 secs. But on some machines this is rising to around 5-6 secs for the first time. Thereafter it is around 300 ms roundtrip.
Here is my sample code:
public static String DoPostUsingURLConnection(String s_uri) throws Exception {
try {
URL uri = new URL(s_uri);
HttpURLConnection connection = (HttpURLConnection) uri.openConnection();
// Logger.log("Opened Connection");
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
connection.setRequestProperty("Authorization", authorizationHeader);
// Create the Request Body
try (OutputStream os = connection.getOutputStream()) {
byte[] input = jsonRequestBody.getBytes("utf-8");
os.write(input, 0, input.length);
}
// Logger.log("Written Output Stream");
int responseCode = connection.getResponseCode();
InputStream is = null;
if (responseCode == HttpURLConnection.HTTP_OK)
is = connection.getInputStream();
else
is = connection.getErrorStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine).append("\n");
;
}
in.close();
return response.toString();
} catch (Exception ex) {
return ex.getMessage();
} finally {
// Logger.log("Got full response");
}

You can investigate where time is taken by logging OkHttp connections events.
https://square.github.io/okhttp/events/
It will be particularly relevant if you are getting an IPv4 address and IPv6 and one is timing out and the other one succeeding.

This is just a guess. But the way Http connection works, that when you invoke it for the first time the connection gets established and that takes time. After that Http protocol doesn't really close connection for some time in expectation that some more requests would come and the connection could be re-used. And in your case indeed you send subsequent requests that re-use the previously created connection rather then re-establishing it which is expansive. I have written my own Open Source library that has a simplistic Http client in it. I noticed the same effect that first request takes much longer time than subsequent requests. But that doesn't explain why in Postman and curl we don't see the same effect. Anyway, if you want to solve this problem and you know your URL in advance, send a request upon your app initialization (you can even do it in separate thread). That will solve your problem.
If you are interested to look at my library here is Javadoc link. You can find it as maven artifact here and on github here. Article about the library covering partial list of features here

Unsuccessful in trying to reuse Java client socket

I have a software driver which communicates with a third-party controller; I have an API for using the latter but no visibility of its source code, and the supplier is not co-operative in trying to improve things!
The situation is as follows.
To send a request to the controller, I send an XML packet as the content of an HTTP POST to a servlet, which then sends me the response. The original code, implemented by a previous developer, works stably using java.net.Socket. However, our driver is implemented such that a new socket is created for EVERY request sent and, if the driver gets busy, the third-party controller struggles to keep up in terms of socket handling. In fact, their support guy said to me: "You really need to leave 5 seconds between each request...". This simply isn't commercially acceptable.
To improve performance, I wanted to try leaving our end of the socket open and reusing the socket pretty much indefinitely (given that connections can drop unexpectedly of course, but that's the least of my concerns and is manageable). However, whatever I seem to do, the effect is that if I use Comms.getSocket(false), a new socket is created for each request and everything works OK but bottlenecks when busy. If I use Comms.getSocket(true), the following happens:
Controller is sent first request
Controller responds to first request
Controller is sent second request (maybe 5 seconds later)
Controller never responds to second request or anything after it
postRequest() keeps getting called: for the first 12 seconds, the console outputs "Input shut down ? false" but, after that, the code no longer reaches there and doesn't get past the bw.write() and bw.flush() calls.
The controller allows both HTTP 1.0 and 1.1 but their docs say zilch about keep-alive. I've tried both and the code below shows that I've added Keep-Alive headers as well but the controller, as server, I'm guessing is ignoring them -- I don't think I have any way of knowing, do I ? When in HTTP 1.0 mode, the controller certainly returns a "Connection: close" but doesn't do that in HTTP 1.1 mode.
The likelihood is then that the server side is insisting on a "one socket per request" approach.
However, I wondered if I might be doing anything wrong (or missing something) in the following code to achieve what I want:
private String postRequest() throws IOException {
String resp = null;
String logMsg;
StringBuilder sb = new StringBuilder();
StringBuilder sbWrite = new StringBuilder();
Comms comms = getComms();
Socket socket = comms.getSocket(true);
BufferedReader br = comms.getReader();
BufferedWriter bw = comms.getWriter();
if (null != socket) {
System.out.println("Socket closed ? " + socket.isClosed());
System.out.println("Socket bound ? " + socket.isBound());
System.out.println("Socket connected ? " + socket.isConnected());
// Write the request
sbWrite
.append("POST /servlet/receiverServlet HTTP/1.1\r\n")
.append("Host: 192.168.200.100\r\n")
.append("Connection: Keep-Alive\r\n")
.append("Keep-Alive: timeout=10\r\n")
.append("Content-Type: text/xml\r\n")
.append("Content-Length: " + requestString.length() + "\r\n\r\n")
.append(requestString);
System.out.println("Writing:\n" + sbWrite.toString());
bw.write(sbWrite.toString());
bw.flush();
// Read the response
System.out.println("Input shut down ? " + socket.isInputShutdown());
String line;
boolean flag = false;
while ((line = br.readLine()) != null) {
System.out.println("Line: <" + line + ">");
if (flag) sb.append(line);
if (line.isEmpty()) flag = true;
}
resp = sb.toString();
}
else {
System.out.println("Socket not available");
}
return resp; // Another method will parse the response
}
To ease testing, I provide the socket using an extra Comms helper class and a method called getSocket(boolean reuse) where I can choose to always create a new socket or reuse the one that Comms creates for me, as follows:
public Comms(String ip, int port) {
this.ip = ip;
this.port = port;
initSocket();
}
private void initSocket() {
try {
socket = new Socket(ip, port);
socket.setKeepAlive(true);
socket.setPerformancePreferences(1, 0, 0);
socket.setReuseAddress(true);
bw = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), StandardCharsets.UTF_8));
br = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
System.out.println("### CREATED NEW SOCKET");
}
catch (UnknownHostException uhe) {
System.out.println("### UNKNOWN HOST FOR SOCKET");
}
catch (IOException ioe) {
System.out.println("### SOCKET I/O EXCEPTION");
}
}
public BufferedReader getReader() { return br; }
public BufferedWriter getWriter() { return bw; }
public Socket getSocket(boolean reuse) {
if (! reuse) initSocket();
return socket;
}
Can anyone help ?

If we assume that keep-alive thing is working as expected, I think the line while ((line = br.readLine()) != null) is a faulty one, as this is kind of infinity loop.
readline() returns null when there is no more data to read, e.g. a EOF, or when server/client closes the connection, that will break-down your reusing socket solution, since an open stream will never cause a null to a readLine() call, but blocking.
You need to fix the alg about reading a response (why not using implemented http client?), checking content-length, and when read the amount of required data from body, go for next loop by keeping the socket alive.
After that setting flag to true, you have to know what kind of data should be read(considering mime/content-type), besides that, the length of data, so reading data using readLine() may not be a good practice here.
Also make sure server allow for persistence connection, by checking if it respects it by responsing the same connection:keep-alive header.

Java Socket not receiving body of http request

I'm trying to read a HTTP request using only Socket and BufferedReader classes in Java. The problem is that I can't reach the body part of the request. The Buffered reader is giving me only the request line and the headers. Here is part of the code:
bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
String comando = "";
while((msgDoSocket = bufferedReader.readLine()) != null){
//telaOutput.adicionaFim(msgDoSocket);
try {
comando += msgDoSocket + " ";
//System.out.println(comando);
if(msgDoSocket.isEmpty()){
processaInput(comando);
}
} catch (Exception ex) {
Logger.getLogger(ServerThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
Here is a WireShark capture showing that the POST body is being sent. My program is running on port 15000 and the data is just a string "teste12345". I'm using the app POSTMAN from google chrome to send the requests.
I'm having exactly the same problem described in this thread but the solutions proposed there didn't work. The request still getting up to the last header and no more. Thanks in advance.
Edit: Problem Solved!
Following suggestion proposed on the answer, I changed the reading to:
reader = new DataInputStream(socket.getInputStream());
String comando = "";
while( (dt = reader.readByte()) >= 0){
comando += dt;
//... do the rest of the stuff
}
Reading it as binary made it possible to reach the body part of the request.

I'm far from being a Java guru, but I bet that readLine only returns with results when it found a sequence of \r\n. since your body is not terminated with \r\n the method readLine never returns. try to manually add that character sequence to your body and see what happens, or alternatively, use the raw InputStreamReader to read the body as byte array.
never the less, you can't expect any http body to actually be a string. it can also be a binary sequence which knows nothing about \r\n.

Resumable upload from Java client to Grails web application?

After almost 2 workdays of Googling and trying several different possibilities I found throughout the web, I'm asking this question here, hoping that I might finally get an answer.
First of all, here's what I want to do:
I'm developing a client and a server application with the purpose of exchanging a lot of large files between multiple clients on a single server. The client is developed in pure Java (JDK 1.6), while the web application is done in Grails (2.0.0).
As the purpose of the client is to allow users to exchange a lot of large files (usually about 2GB each), I have to implement it in a way, so that the uploads are resumable, i.e. the users are able to stop and resume uploads at any time.
Here's what I did so far:
I actually managed to do what I wanted to do and stream large files to the server while still being able to pause and resume uploads using raw sockets. I would send a regular request to the server (using Apache's HttpClient library) to get the server to send me a port that was free for me to use, then open a ServerSocket on the server and connect to that particular socket from the client.
Here's the problem with that:
Actually, there are at least two problems with that:
I open those ports myself, so I have to manage open and used ports myself. This is quite error-prone.
I actually circumvent Grails' ability to manage a huge amount of (concurrent) connections.
Finally, here's what I'm supposed to do now and the problem:
As the problems I mentioned above are unacceptable, I am now supposed to use Java's URLConnection/HttpURLConnection classes, while still sticking to Grails.
Connecting to the server and sending simple requests is no problem at all, everything worked fine. The problems started when I tried to use the streams (the connection's OutputStream in the client and the request's InputStream in the server). Opening the client's OutputStream and writing data to it is as easy as it gets. But reading from the request's InputStream seems impossible to me, as that stream is always empty, as it seems.
Example Code
Here's an example of the server side (Groovy controller):
def test() {
InputStream inStream = request.inputStream
if(inStream != null) {
int read = 0;
byte[] buffer = new byte[4096];
long total = 0;
println "Start reading"
while((read = inStream.read(buffer)) != -1) {
println "Read " + read + " bytes from input stream buffer" //<-- this is NEVER called
}
println "Reading finished"
println "Read a total of " + total + " bytes" // <-- 'total' will always be 0 (zero)
} else {
println "Input Stream is null" // <-- This is NEVER called
}
}
This is what I did on the client side (Java class):
public void connect() {
final URL url = new URL("myserveraddress");
final byte[] message = "someMessage".getBytes(); // Any byte[] - will be a file one day
HttpURLConnection connection = url.openConnection();
connection.setRequestMethod("GET"); // other methods - same result
// Write message
DataOutputStream out = new DataOutputStream(connection.getOutputStream());
out.writeBytes(message);
out.flush();
out.close();
// Actually connect
connection.connect(); // is this placed correctly?
// Get response
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
System.out.println(line); // Prints the whole server response as expected
}
in.close();
}
As I mentioned, the problem is that request.inputStream always yields an empty InputStream, so I am never able to read anything from it (of course). But as that is exactly what I'm trying to do (so I can stream the file to be uploaded to the server, read from the InputStream and save it to a file), this is rather disappointing.
I tried different HTTP methods, different data payloads, and also rearranged the code over and over again, but did not seem to be able to solve the problem.
What I hope to find
I hope to find a solution to my problem, of course. Anything is highly appreciated: hints, code snippets, library suggestions and so on. Maybe I'm even having it all wrong and need to go in a totally different direction.
So, how can I implement resumable file uploads for rather large (binary) files from a Java client to a Grails web application without manually opening ports on the server side?

HTTP GET method have special headers for range retrieval: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 It's used by most downloaders to do resumable download from server.
As I understand, there are no standard practice for using this headers for POST/PUT request, but it's up to you, right? You can make pretty standard Grails controller, that will accept standard http upload, with header like Range: bytes=500-999. And controller should put this 500 uploaded bytes from client into file, starting at position 500
At this case you don't need to open any socket, and make own protocols, etc.
P.S. 500 bytes is just a example, probably you're using much bigger parts.

Client Side Java Programming:
public class NonFormFileUploader {
static final String UPLOAD_URL= "http://localhost:8080/v2/mobileApp/fileUploadForEOL";
static final int BUFFER_SIZE = 4096;
public static void main(String[] args) throws IOException {
// takes file path from first program's argument
String filePath = "G:/study/GettingStartedwithGrailsFinalInfoQ.pdf";
File uploadFile = new File(filePath);
System.out.println("File to upload: " + filePath);
// creates a HTTP connection
URL url = new URL(UPLOAD_URL);
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setDoOutput(true);
httpConn.setRequestMethod("POST");
// sets file name as a HTTP header
httpConn.setRequestProperty("fileName", uploadFile.getName());
// opens output stream of the HTTP connection for writing data
OutputStream outputStream = httpConn.getOutputStream();
// Opens input stream of the file for reading data
FileInputStream inputStream = new FileInputStream(uploadFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = inputStream.read(buffer)) != -1) {
System.out.println("bytesRead:"+bytesRead);
outputStream.write(buffer, 0, bytesRead);
outputStream.flush();
}
System.out.println("Data was written.");
outputStream.flush();
outputStream.close();
inputStream.close();
int responseCode = httpConn.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// reads server's response
BufferedReader reader = new BufferedReader(new InputStreamReader(
httpConn.getInputStream()));
String response = reader.readLine();
System.out.println("Server's response: " + response);
} else {
System.out.println("Server returned non-OK code: " + responseCode);
}
}
}
Server Side Grails Programme:
Inside the controller:
def fileUploadForEOL(){
def result
try{
result = mobileAppService.fileUploadForEOL(request);
}catch(Exception e){
log.error "Exception in fileUploadForEOL service",e
}
render result as JSON
}
Inside the Service Class:
def fileUploadForEOL(request){
def status = false;
int code = 500
def map = [:]
try{
String fileName = request.getHeader("fileName");
File saveFile = new File(SAVE_DIR + fileName);
System.out.println("===== Begin headers =====");
Enumeration<String> names = request.getHeaderNames();
while (names.hasMoreElements()) {
String headerName = names.nextElement();
System.out.println(headerName + " = " + request.getHeader(headerName));
}
System.out.println("===== End headers =====\n");
// opens input stream of the request for reading data
InputStream inputStream = request.getInputStream();
// opens an output stream for writing file
FileOutputStream outputStream = new FileOutputStream(saveFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = inputStream.read(buffer);
long count = bytesRead
while(bytesRead != -1) {
outputStream.write(buffer, 0, bytesRead);
bytesRead = inputStream.read(buffer);
count += bytesRead
}
println "count:"+count
System.out.println("Data received.");
outputStream.close();
inputStream.close();
System.out.println("File written to: " + saveFile.getAbsolutePath());
code = 200
}catch(Exception e){
mLogger.log(java.util.logging.Level.SEVERE,"Exception in fileUploadForEOL",e);
}finally{
map <<["code":code]
}
return map
}
I have tried with above code it is worked for me(only for file size 3 to 4MB, but for small size files some bytes of code missing or not even coming but in request header content-length is coming, not sure why it is happening.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Incrementally handling twitter's streaming api using apache httpclient? - java

Related

How to display content that has been written to a Socket.OutputStream while maintaining a persistent connection?

UrlConnection API Call takes much more time the first time, then onwards it is comparable to curl.exe or postman

Unsuccessful in trying to reuse Java client socket

Java Socket not receiving body of http request

Resumable upload from Java client to Grails web application?

Categories

Resources