How to retrieve data from remote server faster in java?

How to retrieve data from remote server faster in java? - java

I have a enhanced for loop that enters ch.ethz.ssh2.connection to obtain over 200 values. Every time it goes into the loop a new server is being authenicated and it only retrieves one value from that server. Each time it's looped the data are being saved into an arraylist to be displayed in html tables using thymeleaf. But this method takes forever for eclipse to run through all 200 values one at a time, then it have to restart when I open up localhost:8080 to load up all the tables with all the data. It takes over 5 mins to load the page up. What can I do to speed things up?
Problem in code
List<DartModel> data = new ArrayList<DartModel>();
for(String server:serverArray) {
try {
conn = new ch.ethz.ssh2.Connection(server);
conn.connect();
boolean isAuthenticated = conn
.authenticateWithPassword(username_array[j],
password_array[j]);
if (isAuthenticated == false) {
throw new IOException("Authentication failed.");
}
I need to somehow recode the code above so I can obtain the data all in super quickly.
Output
Loop1: Server1
Loop2: DifferentServer2
Loop3: AllDifferentSever3
and goes on......
Alternative
I was thinking to let the java program run several times while saving the data into redis. Then Auto refresh the program, when it runs it sends the data into redis. Set an expiration time, But I was unable to get the data into thymeleaf html tables. Would this work? If so how can I display this into thymeleaf.

You can query multiple servers at once (in parallel).
If your framework for remote connections is blocking (the methods you call actually wait until the response is received), you'd have to start handful of threads (one thread for one server in the edge case) to do that in parallel (which doesn't scale very well).
When you can use some Future/Promise based tool, you can do it without much overhead (convert 200 futures into one future of 200 values/responses).
Note: In case you would query single server for 200 responses, it is not good idea to do it this way, because you would flood it with too many requests at once. Then you should implement some way to get all the data by one request.

Short answer:
Create a message protocol that sends all values in one response.
More Info:
Define a simple response message protocol.
One simple example might be this:
count,value,...
count: contains the number of values returned.
value: one of the values.
Concrete simple example:
5,123,234,345,456,567
You can go bigger and define the response using json or XML.
Use whatever seems best for your implementation.
Edit: My bad, this will not work if you are polling multiple servers. This solution assumes that you are retrieving 200 values from one server, not one value from 200 servers.

At face value, it's hard to tell without looking at your code (recommend sharing a gist or your code repo).
I assume you are using library. In general, a single SSH2 operation will make several attemtps to authenticate a client. It will iterate over several "methods". I you are using ssh over a command line, you can see these when you use the flag -vv. If one fails, it tries the next. The java library implementation that I found appears to do the same.
In the loop you posted (assuming you loop 200 times), you'll try to authenticate 200 x (authentication method order). I suspect the majority of your execution may be burned in SSH handshakes. This can be avoided by making sure you use your connection only once and get as much as you can from your (already authenticated) opened socket.
Consider moving you connection outside the loop. If you absolutely must do ssh, and the data you are using is too large, parallelism may help some, but that will involve more coordination.

Related

Streaming large number of small objects with Java

A client and a server application needs to be implemented in Java. The scenario requires to read large number of small objects from database on the server side and send them to client.
This is not about transferring large files rather it requires streaming large number of small objects to client.
The number of objects needs to be sent from server to client in a single request could be one or one million (let's assume the number of clients is limited for the sake of discussion - ignore throttling).
The total size of the objects in most cases will be too big to hold them in memory. A way to defer read and send operation on the server side until client requests the object is needed.
Based on my previous experience, WCF framework of .NET supports the scenario above with
transferMode of StreamedResponse
ability to return IEnumerable of objects
with the help of yield defer serialization
Is there a Java framework that can stream objects as they requested while keeping the connection open with the client?
NOTE: This may sound like a very general question, but I am hoping to give specific details that would hopefully lead to a clear answer benefiting me and possible others.

A standard approach is to use a form of pagination and get the results in chunks which can be accommodated temporarily in memory. How to do that specific it depends, but a basic JDBC approach would be to first execute a statement to find out the number of records and then get them in chunks. For example, Oracle has a ROWNUM column that you use in order to manage the ranges of records to return. Other databases have some other options.

You could use ObjectOutputStream / ObjectInputStream to do this.
The key to making this work would be to periodically call reset() on the output stream. If you don't do that, the sending and receiving ends will build a massive map that contains references to all objects sent / received over the stream.
However, there may be issues with keeping a single request / response (or database cursor) open for a long time. And resuming a stream that failed could be problematic. So your solution should probably combine the above with some kind of pagination.
The other thing to note is that a scalable solution needs to avoid network latency from becoming the bottleneck. It may be worth implementing a receiver thread that eagerly pulls objects from the stream and buffers them in a (bounded) queue.

Storing a list of used tokens in App Engine servlet - java

I have a little GAE application, a backend for my Android app.
I have a servlet in the app that pulls data from the datastore and send it to the user.
I don't want anyone to be able to use this servlet, so I store a private key in the app, and for every request I'm sending a token - a hash string of the private key and the current milliseconds, and the milliseconds I've used in the hash.
The server is taking the milliseconds and the private key, and comparing it with the token. If it went well, the server is storing the milliseconds in a HashSet so it will know not to use it again. (Someone can sniff the device data - and send the same milliseconds and token over and over again).
At first, I held a static field in the Servlet class, which was later discovered as mistake, because this field is not persisted, and all the data is getting lost when the instance get destroyed.
I've read about Memcache, but it's not an optimal solution because from what I understand, the data in the Memcache can get erased if the app is low on memory, or even if there are server failures.
I don't want to use datastore because it will really make the requests much slower.
I guess I'm not the first who is facing the problem.
How can I solve it?

I used a reverse approach in one of my apps:
Whenever a new client connects, I generate a set of three random "challenges" on the server (like your milliseconds), which I store in memcache with an expiration time of a minute or so. Then I send these challenges to the client. For each request that the client makes, it needs to use one of these 3 challenges (hashed with aprivate key). The server then deletes the used challenge, creates a new one and sends it to the client. That way, each challenge is single-use and I won't have to worry about replay-attacks.
A couple of notes on this approach:
The reason I generate 3 challenges is to allow for multiple requests in flight in parallel.
The longer you make the challenge, the less likely it will be that it will be randomly reused (allowing for a playback attack then).
If memcache forgets the challenges I stored, the app's request will fail. In the failure, response I include a "forget all other challenges and use these 3 new ones: ..." command.
You can tie the challenges to the client's IP address or some other sort of session info to make it even less likely that someone can "hack" you.
In general, it's probably always best to have the server generate the challenge or salt for an authentication than giving that flexibility to the client.
Another approach you could use if you would like to stick with using a timestamp is to use the first request interchange to determine the time offset between your server instance and your client device. Then, only accept requests with a "current" timestamp. For this, you would need to determine the uncertainty with which you can get the time offset and use that as a cutoff for a timestamp not to be current. To prevent replay-attacks within that cutoff period, you might need to save and disallow the last couple of timestamps used. This, you can probably do inside your instance since AppEngine, AFAIK, routes requests from the same client preferentially to the same instance. Then, if it takes longer to shut down an instance and restart one (i.e. to clear your disallow cache) than your "current"-cutoff is, you shouldn't have too many issues with replay-attacks.

Multiplayer-Game Network Protocol

I am responsible of the network part of a multiplayergame.
I hope anybody of you got some eperience with that.
My questions are:
Should I create an Object which contains all information (Coordinates, Stats, Chat) or is it better to send an own Object for each of them?
And how can i avoid the Object/s beeing cached at the client so i can update the Object and send it again? (i tried ObjectInputStream.reset() but it still received the same)
(Sorry for my bad english ;))

For every time send all data is not good solution, just diff of previous values can be better. Sometimes(eg 1 time for every 10 or maybe 100 update) send all values to sync.

1.in the logic layer, you can split the objects, and in transmission layer you send what you want, of course you can combine them and send.
2.you can maintain a version for each user and the client also have the version number, when things change, update the corresponding version in the server and then send the updates to all the clients, then the client should update version. it should be a subcribe mode.

Trouble with Cassandra and ConsistencyLevel (Redundancy)

So, I am been playing with Cassandra, and have setup a cluster with three nodes. I am trying to figure out how redundancy works with ConsistencyLevels. Currently, I am writing data with ConsistenyLevel.ALL and am reading data with ConsistencyLevel.ONE. From what I have been reading, this seems to make sense. I have three Cassandra nodes, and I want to write to all three of them. I only care about reading from one of them, so I will take the first response. To test this, I have written a bunch of data (again, with ConsistencyLevel.ALL). I then kill one of my nodes (not the "seed" or "listen_address" machine).
When I then try to read, I expect, maybe after some delay, to get my data back. Initially, I get a TimeoutException... which I expect. This is what one gets when Cassandra is trying to deal with an unexpected node loss, right? After about 20 seconds, I try again, and now am getting an UnavailableException, which is described as "Not all the replicas required could be created and/or read".
Well, I don't care about all the replicas... just one (as in ConsistencyLevel.ONE on my get statement), right?
Am I missing the ConsistencyLevel point here? How can I configure this to still get my information if a node dies?
Thanks

It sounds like you have Replication Factor (RF) set to 1, meaning only one node holds any given row. Thus, when you take a node down, no matter what consistency level you use, you won't be able to read or write 1/3 of your data. Your expectations match what should happen with RF = 3.

Designing a process

I challenge you :)
I have a process that someone already implemented. I will try to describe the requirements, and I was hoping I could get some input to the "best way" to do this.
It's for a financial institution.
I have a routing framework that will allow me to recieve files and send requests to other systems. I have a database I can use as I wish but it is only me and my software that has access to this database.
The facts
Via the routing framework I recieve a file.
Each line in this file follows a fixed length format with the identification of a person and an amount (+ lots of other stuff).
This file is 99% of the time im below 100MB ( around 800bytes per line, ie 2,2mb = 2600lines)
Once a year we have 1-3 gb of data instead.
Running on an "appserver"
I can fork subprocesses as I like. (within reason)
I can not ensure consistency when running for more than two days. subprocesses may die, connection to db/framework might be lost, files might move
I can NOT send reliable messages via the framework. The call is synchronus, so I must wait for the answer.
It's possible/likely that sending these getPerson request will crash my "process" when sending LOTS.
We're using java.
Requirements
I must return a file with all the data + I must add some more info for somelines. (about 25-50% of the lines : 25.000 at least)
This info I can only get by doing a getPerson request via the framework to another system. One per person. Takes between 200 and 400msec.
It must be able to complete within two days
Nice to have
Checkpointing. If im going to run for a long time I sure would like to be able to restart the process without starting from the top.
...
How would you design this?
I will later add the current "hack" and my brief idea
========== Current solution ================
It's running on BEA/Oracle Weblogic Integration, not by choice but by definition
When the file is received each line is read into a database with
id, line, status,batchfilename and status 'Needs processing'
When all lines is in the database the rows are seperated by mod 4 and a process is started per each quarter of the rows and each line that needs it is enriched by the getPerson call and status is set to 'Processed'. (38.0000 in the current batch).
When all 4 quaters of the rows has been Processed a writer process startes by select 100 rows from that database, writing them to file and updating their status to 'Written'.
When all is done the new file is handed back to the routing framework, and a "im done" email is sent to the operations crew.
The 4 processing processes can/will fail so its possible to restart them with a http get to a servlet on WLI.

Simplify as much as possible.
The batches (trying to process them as units, and their various sizes) appear to be discardable in terms of the simplest process. It sounds like the rows are atomic, not the batches.
Feed all the lines as separate atomic transactions through an asynchronous FIFO message queue, with a good mechanism for detecting (and appropriately logging and routing failures). Then you can deal with the problems strictly on an exception basis. (A queue table in your database can probably work.)
Maintain batch identity only with a column in the message record, and summarize batches by that means however you need, whenever you need.

When you receive the file, parse it and put the information in the database.
Make one table with a record per line that will need a getPerson request.
Have one or more threads get records from this table, perform the request and put the completed record back in the table.
Once all records are processed, generate the complete file and return it.

if the processing of the file takes 2 days, then I would start by implementing some sort of resume feature. Split the large file into smaller ones and process them one by one. If for some reason the whole processing should be interrupted, then you will not have to start all over again.
By splitting the larger file into smaller files then you could also use more servers to process the files.
You could also use some mass loader(Oracles SQL Loader for example) to take the large amount of data form the file into the table, again adding a column to mark if the line has been processed, so you can pick up where you left off if the process should crash.
The return value could be many small files which at the end would be combined into large single file. If the database approach is chosen you could also save the results in a table which could then be extracted to a csv file.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.