Large File Transfer between Java Servers over HTTPS

Large File Transfer between Java Servers over HTTPS - java

I have a central server, to which many distributed servers need to transmit data in the form of somewhat large files, 500MB - 10GB+. The servers are not on the same physical network and can't be connected to one another via a VPN. While we're trying to get other ports opened, currently we can only talk over 443, HTTPS, which is great for our REST services but terrible for file transfer between servers.
I know this isn't as specific a question as one would like for Stackoverflow, but I would like to know: what methods might work better than the ones I've tried?
Server A -> generate file -> transfer over https -> DMZ -> proxy pass -> receive at Server B
Both servers use Java 1.8, Tomcat, and Spring 4.1.4.RELEASE. The DMZ is just Apache and pretty much out of our control.
Things I've tried...
Make RPC calls to a service using Spring's HttpInvokerProxyFactoryBean (this works fine for smaller sites, but the larger sites often drop connections while transferring data)
Multipart form post using Apache HttpPost (this also works, but we have to configure file limits in apache/tomcat, plus its connection is unreliable as well)
Using a library called RMIIO, which basically simulates RMI over HTTP if configured properly. This seemed promising as it requests a stream from the server and writes to the stream from the remote server. I haven't really gotten this to work over HTTPS yet, and the library was written in 2007 (with some updates up through July 2016), but it feels very dated, not highly maintained and I suspect there are better ways to do this sort of thing now-a-days (not that I can find them)
Looked at gRPC but realized it's just a binary protocol and I'd have to basically handle chunking the files if I wanted to get a streaming effect.
Read an article about Developing non-blocking REST services with Spring MVC, http://callistaenterprise.se/blogg/teknik/2014/04/22/c10k-developing-non-blocking-rest-services-with-spring-mvc/ again looked interesting if we were receiving a lot of files at the same time, but I don't see how it helps with a single file transfer.
I've looked at a lot of other things and tried a few more, but it all seems wrong. When I read about big data and Spark streams or any of the million streaming options that I see, I feel like there should be something similar for transferring a single file from one server to another without a broker in the middle. Maybe there are, just not over HTTPS.
It would be nice to know the progress of the transfer (on both ends) and be able to recover should there be connectivity issues or transfer errors.
But any direction or thoughts would be immensely helpful. Thanks for your time and input.

Related

Best approach for large file transfer to multiple clients using java sockets

I have researched a lot but could not find any thing proper on the topic and hence asking question here.
I want to build an application like dropbox using java sockets, with a social media website where the files we upload on our shared folder gets automatically downloaded or synced onto all of our friends system which we added in the social media website.
Till now what I have thought is I will have a server running and every time a client connects(logs in) I will start 2 handlers, 1 will be for uploads and 1 will be for download. The DOWNLOAD handler will check for a new files from all my friends(meaning they added new files in their shared directory) every 5 minutes and will sync it. and UPLOAD handler will upload the files on the server sending it as a byte array when the handler receives it from the client. Client sends the data to the server using directory watcher to track changes in the directory.
Now the question is starting 2 threads per client, is it feasible? I think it will slow down the server badly as I will imagine to have like 100 clients let say and it means 200 threads. Can you guys just point me in the right direction on as to what approach I should take, I read about NIO and IO and got confused. Also is there any particular library which can be helpful? I looked at Netty, apache mina but don't understand how they can be helpful.
Thanks in advance :)

I recommend you to take a look to this article about NIO: https://today.java.net/pub/a/today/2007/02/13/architecture-of-highly-scalable-nio-server.html. Also try to think about scalability, if your server send files... what's the speed of your hard drives? It's more important, I think, than number of threads; but keep an eye over thread lock.
Why you want to implement something that web makes so well? If I were you I try to think about a secure proxy better than all that stuff about dealing with bytes. Even If you want to transfer files in multiple parts, you may use multipart zip file and download each part programatically, and then rebuild the file. With this approach you can reuse your infrastructure for web and client; also you can benefit of high IO throughtput of modern web servers.

When you think you will have a large number of clients, using the standard Socket and ServerSocket isn't going to work. These require 2 threads per client, as you already pointed out. Eventually, this will eat up all your server resources. What you need is the java.nio package. In there you will find the SocketChannel and the ServerSocketChannel. Through these you can set up non-blocking socket communication. This type of communication is event based. Meaning you can have multiple clients using the same 2 threads on the server for reading and writing.
If you want to know more, check out my Socket programming tutorial. The third part goes into setting up a non-blocking variant and should give you everything you need to get started. If you still have questions, let me know, and I'll see if I can help you out further.

Understanding of back end file seeding to provide fast client downloads

The theme of my project is to implement a distributed server which provides several clients several files to download. The server is hosting several files and we want that the server should implement some best algorithms to quickly let the clients download data from it.
My idea of implementation of project:
Like the client generally downloads the file using some download managers, similarly there must exist some server side managers/codes/algorithms which upload/seed the file quickly to let client download the file. There must not be any action of client except the selection of the file to be downloaded!
How should I write the code for such a server on the back end, analogous to multi-threading based downloaded managers for clients on the front-end?
How should server seed/make avail the file to the client if the client only sends the path as a String to the server in Java for downloading?
Or, if I am missing something/my idea is totally wrong, please enlighten me with an alternative process/algorithm which I must implement on the server side. Please remember that the whole purpose of asking this question is the back end server seeding algorithm OR equivalent algorithms/methods.

I assume, this server of yours has a good internet connection with a broad upstream. If that is the case then the limiting factor when only few clients are downloading few files is the bandwith of these clients. So you will at most get as fast as the downstream bandwith of your clients. So simply taking an off-the-shelf HTTP server library to serve the downloads should be sufficient.
Where your backend implementation really matters and is able to improve download performance is then many users are connecting to your server and downloading many files. First off there are following points to consider:
TCP has a startup-time. When you first open an connection, the download rate slowly starts to increase until it hits the maximum. To minimize this time, when downloading multiple files the connection opened for one file download should be reused for the next file.
Downloading many files at once(on clientside) is not reasonable when bandwidth is the limiting factor, because the client has to start up many TCP connections and the data will be either fragmented, when written to Disk, or (when allocating beforehand) the disk will be pretty busy while jumping between sectors.
Your server should generally use a non-blocking IO library (eg. java.nio) and refrain from creating a thread per incomming connection since this leads to thrashing which again decreases your server's performance drastically.
If you have a really big amount of clients simultaneously downloading from your server, the limit you will probably hit will be either:
The upstream limit of your provider
The read speed of your Harddrive (SSD have ~ 500MB/s as far as I'm informed)
Your server can try to hold the most commonly requested files in his memory and serve the content from there (DDR3 RAM reaches speeds of 17GB/s). I doubt that you have only as few files on your server that you could cache them all in your server's RAM.
So the main engineering task lays in the clever selection of which content should be cached and which not. This could be done on a priority base by assigning higher priorities to certain files or by a metric which encodes the probability of a single file to be downloaded in the next few minutes. Or simply the files which are downloaded by the most clients at this point of time.
With such considerations you are able to push the limits of your download server until a certain point from which the only improvement can be achieved by distributing or replicating your files onto many servers.
If you are going into such a direction where serving millions of clients simultaneously must be possible, you should consider buying such a service from CDNs. They are specialized in fast delivery and have many upstream server in most ASes so that every client can download his files from the regional CDN server.
I know, I haven't given any algorithm or code examples, but I didn't intend to answer this question completely. I just wnated to give you some important guidelines and thoughts to that topic. I hope, you can at least use some of these thoughts for your project.

How can I handle caching an log4j HTTP appender in Java?

So I am writing an application that will accept log entries from many other applications. We mostly use log4j.
Since these applications are on different machines, I wanted to have a web service that accepted POST'd data from each application. At which we could then search, etc.
I realize there are services like Loggly that handle this but I want to write my own (mainly for our security purposes and company not liking log information on 3rd party providers).
Anyway, I successfully got my own custom HttpAppender to work. So that each application would send the message to a web service.
But before I break out the champagne, I realize that a direct post over HTTP could be a bad thing because some of these apps generate MILLIONS of rows in the logs. So the last thing I want is my HttpAppender bringing down some T1 line or something.
So my idea was to buffer the HTTP POST's somehow and then periodically send those buffers in a single post. So fewer large posts vs many smaller posts.
Of course buffering in something like Reddis/memcached locally on the same machine would help but I have to assume that I can't use external caching (on the same server). So I would have to cache in the appender's memory/process.
Am I on the write track with buffering these HTTP posts? Or, should I write the buffers to log files and then periodically post those log files?

Android: Transfer file over TCP Java Socket

I am currently trying to transfer a file from a Android device to a Java TCP Server, but I am unable to find a good example which explains the structure I would need to implement this. There are many Java Client&Server examples there which explain file transfer but I want to make sure if this will still work once one throws an Android Device in there.
My question is how do I implement this sort of structure? And if it doesn't work, would I be better sending the file over an HTTP connection to a PHP server? I see a lot of examples and documentation online for the later method so I presume it is more reliable. I would however prefer to use a Java server.
The file consists of a large set of coordinates recorded by the Android device which will then be sent to the server. I have not yet established how I will store this data yet but I was originally going to store them in a primitive text file.

Design
The first thing you need is something to allow you to run Java code on your server.
There are a number of options. Two of the most popular technologies are Glassfish and Apache Tomcat.
Crudely speaking Apache Tomcat is sufficient for simple client-server communication and Glassfish is used if you need to do more complex stuff. Both allow Servlets (which are essentially self contained server classes written in Java) to run on the server-side.
They handle communication with the client by launching a JVM (Java Virtual Machine) each time they receive a request. The Java servlet can run inside the JVM and respond do some processing if required before sending a response back to the client.Each new request is run in a new instance of a servlet. This makes dealing with multiple concurrent requests simpler (no need for more complex threading).
Networking (sending data to and from the server)
In networking situations the client can be a PC, an Android phone, or any other device capable of connecting to the internet. As far as the server is concerned, if the client can communicate using HTTP (a standard protocol which it understands) the it doesn't care what sort of device it is. This means that solutions for PC desktop client-server applications are similar to one for a phone.
You can use library such as Apache HTTP Components to make it easier to handle HTTP requests and responses between the device and the server. Of course you could write your own classes to do this using Sockets but this would be very time consuming, particularly if you have never done it before.
Storage of Data
If you have time I would recommend implementing some sort of database to store the information.
They have a number of benefits to such as data recovery mechanisms, indexing for fast searching of data, ensure data integrity, better structuring of data and so on.
If you decide to use a database I recommend MySQL. It is a free and more importantly - well documented.
Aside: JDBC can be used to communicate with the database with Java.
Sorry about the in-line hyperlinks - apparently my repuation isn't high enough to post more than two!
Source: Personal experience from implementing a similar design.

Fat Java client need two-way communication channel to web server over http/https

I have a situation where I want a Java client to have a two-way data channel with a servlet (I have control over both), so that either can begin data transferring without having to wait for the other to do something first, but to get through the firewalls this needs to be tunnelled in http or https.
I have looked around, but I do not believe I know the right terms for asking Google.
I was originally looking at http-tunneling modules, but realizing that I have a web container in the other end, I believe that the appropriate way is to think of a fat client needing to communicate home. I was thinking that the persistant connection in http 1.1 might be very useful here. I can easily do heartbeat transfers to keep the connection from ideling.
At this point in time I just need to do a proof of concept so I primarily need something that works now, which can then be optimized or even replaced later.
So, I'd appreciate pointers to projects that allow me to have a connection where either side can at will push information (like a serialized object or a descriptive stream of bytes) to the other side. I'd prefer pure Java, if at all possible.
EDIT: Thanks for the pointers. It appears that what I need, will be available in the servlet 3.0 specification, which I might end up using in the long term depending on when it will be supported in the various web containers.
For now I am investigating the Cometd package, which appears to be able to do exactly what I need for my prototype.

Search terms: comet, long-polling
These are mostly used in an AJAX context, but I see no reason why you could not use them in a Java project.

Please take a look at Eclipse Net4J,
http://wiki.eclipse.org/Net4j
It supports all the features you mentioned. A special nice feature is that it supports HTTP connection pooling so you can have lots of channels between client and server but use only a few HTTP connections.
The only problem is that it doesn't have documentation at all. You just have to read the source code. Once you figure it out, it's very easy to use.
There are a few more diagrams on old Net4J site,
http://net4j.berlios.de/

How fast does it need to be? You could always just do polling on the client. Just check for new messages every so often.

You can use the Hessian protocol over HTTP. It's a fast binary protocol for serializing data. Typically used for a web-services style RPC communication, but there's no reason it couldn't be 2-way - see Hessian mux. It's pure Java, too :-)

Generally this is done by having the server not respond to an http request immediately. It waits around for some update (or a timeout) before sending a response. Obviously some care needs to be made ensuring that the server will handle this under load.
See, for instance, Comet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.