Performance difference in streaming data via Websocket vs normal outputstream

Performance difference in streaming data via Websocket vs normal outputstream - java

We are streaming data from server to client due to a huge volume. It's basically a browser requesting for search results (these records are streamed to the browser). In the performance test, it streamed 150,00+ records in batches of 1,000.
Option 1: use Websocket sending the batch of records via org.springframework.messaging.simp.SimpMessageSendingOperations which performed always within 30-40 seconds.
Option 2: use normal outputstream, via java.io.Writer which Spring MVC injects in the controller method. However, it was performing poorly ranging from 3-4 mins!
The reason why we are considering option 2 is due to some inconsistent behaviour with Websockets (as admittedly we are using it for the first time). Hence, I coded our fallback to normal streaming via outputstream but the slowdown is not acceptable. Does anyone have any idea why there is a huge performance difference between the 2 options? Is it still possible to make option 2 faster (currently I will be playing around with the batch size)?
The snippet for the the batch writing process (where the app is reading the search results from a NoSQL input stream and writing to the brower's outputstream) is as follows:
public class DefaultStreamProcessor implements StreamProcessor {
public void process(InputStream searchResponse) {
try(BufferedReader br = new BufferedReader(new InputStreamReader(searchResponse))) {
String line = null;
while ((line = br.readLine()) != null) {
addToBatch(line);
if (isBatchFull()) {
//option 1: use websocket
//simpMessageSendingOperations.convertAndSend("/topic/result/"+searchId, new ResultObject(getBatchAsMessage()));
//option 2: use normal printwriter
//writer.write(getBatchAsMessage());
}
}
}
}
}

Related

kafka byte[] input converted to java String very slow in overrided flink AbstractKafkaDeserializationSchema

i have flink app which reading mysql CDC json messages from Kafka. 5 tables json CDC strings are read and processed, and i used a overrided AbstractKafkaDeserializationSchema to turn Kafka byte[] into my customized BEAN object. but i found among 5 tables, there are 2 tables, their kafka input byte[] taken much time to converted to String than other 3 tables, worse case just stuck there minutes and even like forever and there are backpressure in flink web ui in Source subtask. the conversion is just String strValue = new String(valueByte). Also i tried new String(valueByte, "UTF-8"), new String(valueByte, StandardCharsets.US_ASCII), make no difference. the overrided method is just:
deserialize(byte[] keyBytes, byte[] valueBytes, String topic, int partition, long offset) throws IOException
this problem stopped me from release the app 1 week, since the conversion is so simple, i can't find any alternative ways to do it, search on stackoverflow and find some similiar complains but no working solution for me.

How to read output file for collecting stats (post) processing

Summary
I need to build a set of statistics during a Camel server in-modify-out process, and emit those statistics as one object (a single json log line).
Those statistics need to include:
input file metrics (size/chars/bytes and other, file-section specific measures)
processing time statistics (start/end/duration of processing time, start/end/duration of metrics gathering time)
output file metrics (same as input file metrics, and will be different numbers, output file being changed)
The output file metrics are the problem as I can't access the file until it's written to disk, and
its not written to disk until 'process'ing finishes
Background
A log4j implementation is being used for service logging, but after some tinkering we realised it really doesn't suit the requirement here as it would output multi-line json and embed the json into a single top-level field. We need varying top level fields, depending on the file processed.
The server is expected to deal with multiple file operations asynchronously, and the files vary in size (from tiny to fairly immense - which is one reason we need to iterate stats and measures before we start to tune or review)
Current State
input file and even processing time stats are working OK, and I'm using the following technique to get them:
Inside the 'process' override method of "MyProcessor" I create a new instance of my JsonLogWriter class. (shortened pseudo code with ellipsis)
import org.apache.camel.Exchange;
import org.apache.camel.Processor;
...
#Component
public class MyProcessor implements Processor {
...
#Override
public void process(Exchange exchange) throws Exception {
...
JsonLogWriter jlw = new JsonLogWriter();
jlw.logfilePath = jsonLogFilePath;
jlw.inputFilePath = inFilePath;
jlw.outputfilePath = outFilePath;
...
jlw.metricsInputFile(); //gathers metrics using inputFilePath - OK
...
(input file is processed / changed and returned as an inputstream:
InputStream result = myEngine.readAndUpdate(inFilePath);
... get timings
jlw.write
}
From this you can see that JsonLogWriter has
properties for file paths (input file, output file, log output),
a set of methods to populate data:
a method to emit the data to a file (once ready)
Once I have populated all the json objects in the class, I call the write() method and the class pulls all the json objects together and
the stats all arrive in a log file (in a single line of json) - OK.
Error - no output file (yet)
If I use the metricsOutputFile method however:
InputStream result = myEngine.readAndUpdate(inFilePath);
... get timings
jlw.metricsOutputFile(); // using outputfilePath
jlw.write
}
... the JsonLogWriter fails as the file doesn't exist yet.
java.nio.file.NoSuchFileException: aroute\output\a_long_guid_filename
when debugging I can't see any part of the exchange or result objects which I might pipe into a file read/statistics gathering process.
Will this require more camel routes to solve? What might be an alternative approach where I can get all the stats from input and output files and keep them in one object / line of json?
(very happy to receive constructive criticism - as in why is your Java so heavy-handed - and yes it may well be, I am prototyping solutions at this stage, so this isn't production code, nor do I profess deep understanding of Java internals - I can usually get stuff working though)

Use one route and two processors: one for writing the file and the next for reading the file, so one finishes writing before the other starts reading
Or , also you can use two routes: one for writing the file (to:file) and other that listens to read the file(from:file)
You can check for common EIP patterns that will solve most of this questions here:
https://www.enterpriseintegrationpatterns.com/patterns/messaging/

Very Large Number of Web Calls from Java Code

I am writing a program in Java which has to make around 6.5 million calls to various pages on same server (URL will be slightly altered by appending a user name that will be read from a text file) .. Firstly I want to know the most time efficient way of doing this, secondly can anybody give a guess as to how much time this may consume?? Currently I am reading each url in a separate thread of ExecutorService object .. something like this
ExecutorService executor = Executors.newFixedThreadPool(10);
Runnable worker = new MyRunnable(allUsers[n]);
executor.execute(worker);
and the run method looks like:
is = url.openStream(); // throws an IOException
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
page = page + line;
// More code follows
}
Any suggestions will be highly appreciated

No, nobody can guess how much "time" this will consume. We don't know if the server will take a millisecond or an hour to complete a request.
The most efficient way is to use a more efficient API, one that allows bulk requests.
At 10 threads your program will likely be IO bound. You will need to profile the number of threads you need to ensure full CPU use. You can avoid this using Java 7 features / an NIO framework, like Netty or MINA, so one thread can service many requests concurrently. (I'm not sure if these are client side).

I concur with the other comments and answers that it is impossible to predict how long this will take, and that a "bulk transfer" request will most likely give better performance.
A couple more points:
If you can use a RESTful API that returns JSON or XML instead of web pages ... that will be faster.
On the client side, this is going to be efficient if the documents you are fetching are large:
while ((line = br.readLine()) != null) {
page = page + line;
}
That is going to do an excessive amount of copying. A better approach is this:
StringBuilder sb = new StringBuilder(...);
while ((line = br.readLine()) != null) {
sb.append(line);
}
page = sb.toString();
If you can get a good estimate of the page size, then create the StringBuilder that big.

XStream not sent via sockets

I used already working code for save/load game for sending a player state via sockets. And I encountered a problem that game save is correct, but server is not receiving client's player state.
Here is the base code that is tested and working:
int retval = fc.showSaveDialog(givenComponent);
if (retval == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
try {
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF-8"));
XStream xs = new XStream();
GameSave gs = new GameSave();
ArrayList<PlayerSerialize> listps = new ArrayList<PlayerSerialize>();
for (Player tempplayer : Players.players) {
PlayerSerialize ps = new PlayerSerialize();
ps.getPlayerData(tempplayer);
listps.add(ps);
}
gs.playersSerialize = listps;
gs.gamedate = Dateutils.gamedate;
String s = xs.toXML(gs);
bw.write(s);
bw.close();
} catch (IOException ex) {
Logger.getLogger(DialogMainField.class.getName()).log(Level.SEVERE, null, ex);
}
}
Here is the client side code that is not sending anything to server:
XStream xs = new XStream();
GameSave gs = new GameSave();
ArrayList<PlayerSerialize> listps = new ArrayList<PlayerSerialize>();
PlayerSerialize ps = new PlayerSerialize();
ps.getPlayerData(Players.players.get(1));
listps.add(ps);
gs.playersSerialize = listps;
gs.gamedate = Dateutils.gamedate;
String s = xs.toXML(gs);
out.println("clientplayertoserver");
out.println(s);
Here is the server side just in case:
if (strIn.contains("clientplayertoserver")) {
strIn = in.readLine();
XStream xs = new XStream();
GameSave gs = (GameSave) xs.fromXML(strIn);
Players.players.get(1).getPlayerSerializeData(gs.playersSerialize.get(0));
}
I need some kind of clue because I'm stuck investigating the problem. Are there any XStream limitations? Or the error is in the working with sockets? The same code is working in one place and is not working in another - I greatly thank in advance for any help with this weird situation.

Well, you are doing two different things here:
1) Saving the data to a file, which is ok.
2) Sending data via a socket. You seem to assume that all your data (the XStream serialized object) is actually in one line. This will usually not be the case. Even if you configure XStream to serialize all data without identation, you still cannot be sure you won't have linebreaks in the serialized data (your variables).
So solve your issue, you should separate your concerns here.
1st serialize / deserialize your objects to String and back (that seems to be working for you.
2nd send this data to a medium, like a file (which you already have) or to a server.
For sending string data to a server, you'll need some kind of protocol. Either you can reuse an existing protocol, like HTTP (POST request to a server), Web Service, Rest Call or whatever else your server is running.
If you want to implement your own protocol (as you have tried above), you must ensure that the server knows what to expect and how to treat it properly. Usually you should split your request in a header and a payload section or something like that.
Include in your header what you want to do (e.g save player state) and the meta information of that (e.g how many bytes payload you are sending).
After the header, send the payload.
The server must now read the header 1st (like everything until the first newline), parse the header to understand what is going on (e.g save player state, 543 bytes data) and act on it (read the data, transform it to a string, deserialize the XStream object and store it in a local database or whatever the server should do with that).
So and after all this information, please adapt your question. As you have seen you do not really have a question about XStream, but about how to send some data from client to a custom server.

Best API model design

I have offline JSON definitions (in assets folder) - and with them I create my data model. It has like 8 classes which all inherit (extend) one abstract Model class.
Would it be better solution if I parse the JSON and keep the model in memory (more or less everything is Integer or String) through the whole life cycle of the App or would it be smarter if I parse the JSON files as they are needed?
thanks

Parsing the files and storing all the data in the memory will definitely give you a speed advantage. The problem with this solution is that if your application will go to back-ground (the user receives a phone call or just leaves the app by his will), no one can guarantee that the data will stay intact in memory.
This data can be clear by the GC if the system decided that it needs more memory.
This means that when the user comes back to the application and if you relay on the fact that the data is in the memory you might face an exception. So you need to consider this situation.
And from that point of you it is good to store you data on a file that can be parsed at a desired time, even thought this might be a slower solution.
Another solution you may look at is to parse this data at first application start-up to an SQLite DB and use it from there, or even store it in the DB in the first place. This will give you the advantages of both worlds, you would not have to parse the data before using it, and you will have a quick access to it using a Cursor and you are not facing the problem of data deletion in case of insufficient memory in the system.

I'd read all the file content at once and keep it as a static String somewhere in my application that is available to all application components (SingleTone Pattern) since usually Maintaining a small string in the memory is much cheaper than opening and closing files frequently.
To solve the GC point #Emil pointed out you can write your code something like this:
public class DataManager {
private static String myData;
public static String getData(Context context){
if(myData == null){
loadData(context);
}
return myData;
}
private static void LoadData(Context context){
context.getAssets().
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(getAssets().open("data.txt"), "UTF-8"));
StringBuilder builder = new StringBuilder();
do {
String mLine = reader.readLine();
builder.append(mLine);
} while (mLine != null)
reader.close();
myData = builder.toString();
} catch (IOException e) {
}
}
}
And from any class in your application that has a valid Context reference:
String data = DataManager.getData(context);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.