xml processing on a client server application in java using sockets - java

I am writing a client server application in java for XML processing.
I have successfully implemented a multi-threaded server that can handle multiple clients. The XML file has a number of transaction tags that are like commands for the server, instructing him to do something( they can have a number of values including binary data). So single client can send multiple commands. These commands are also processed in separate threads that I create using Executors. I have successfully transmitted an xml file and executed its commands.
I am stuck while I am trying to incorporate additional functionality. I want the transfer to resume from the point where it stopped say due to a network disconnection. For this I am splitting the file into bundles and then transmitting these bundles and keeping a track of the last bundle successfully transmitted (through acknowledgement) so that the transmission can resume from the bundle number last transmitted. Obviously there are other fields like total number of bundles to be transmitted and so on.
Now suppose some part of the XML file is transmitted and received at the server when the network is disconnected. This part may have some transaction tags or commands.
I want them to be processed by the server without waiting for the complete file to be received. In fact the correct solution would be to read the XML file being received and keep calling the command handler as soon as a complete transaction tag is received, without waiting for the complete XML file to be received.
One naive way of doing it would be to create my XML file line by line, that is having line separators for each line and then reading the socket line by line to determine location of transaction tags. I would be really grateful if anybody can give me a better solution.

You dont need to write your XML line by line. You can use a SAX parser.
The parser will read the input and dispach events for each tag it reads. The parser will be responsible for buffering the input and notify your code when a TAG start or end.

Related

Reading huge file and writing in RDBMS

I have a huge text file which is continuously getting appended from a common place, which I need to read line by line from my java application and update in a SQL RDBMS such that if java application crashes, it should start from where it left and not from the beginning.
its a plain text file. Each row will contains:
<Datatimestamp> <service name> <paymentType> <success/failure> <session ID>
Also the data which is retrieved from database should also be real time without any performance, availability or availability issues in web application
Here is my approach:
Deploy application in two systems boxes with each contains heartbeat which pings the other system for service availability.
When you get a success response to heart beat,you also get the time stamp which is last successfully read.
When the next heartbeat response fails, application in another system can take over, based on:
1. failed response
2. Last successful time stamp.
Also, since the need for data retrieval is very real time and data is huge, can I crawl the database put that into Solr or Elastic search for faster retrieval, instead of making the database calls ?
There are various ways to do it, what is the best way.
I would put a messaging system in between the text file and the DB writing applications. (for example RabbitMQ) in this case, the messaging system functions as a queue. one application constantly reads the file and inserts the rows as messages to the broker. on the other side, multiple "DB writing applications" can read from the queue and write to DB.
the advantage of the messaging system is its support for multiple clients reading from the queue. the messaging system takes care of synchronizing between the clients, dealing with errors, dead letters, etc. the clients don't care about what payload was processed by other instances.
regarding maintaining multiple instances of "DB writing applications": I would go for ready made cluster solutions. perhaps docker cluster managed by kubernates?
another viable alternative is a streaming platform, like Apache Kafka.
You can use a software like FileBeat to read the file and direct the filebeat output to RabbitMQ or Kafka. From there a Java program can subscribe / consume the data and put it into a RDBMS system.

Need help to separate out file processing server

I have developed Document Management System (DMS) having OCR feature. However, it takes too much time to process, as well as high CPU usage.
My current process is synchronous, as below :
User upload his file
OCR process
Store document information in DB
Considering the real-time production load, I want to make above second step asynchronous, on a dedicated file processing separate server.
My questions are,
Is it the right way to do it?
How to send/retrieve that file to another server to process? I also found out to use message queue, but I can not add whole file in it.
Is there anyway, we can acknowledge process completion?
Just to close this question, I have separated OCR process successfully on separate file processing server, which really helps me to resolve high CPU usage, using FIFO method.
Followed below steps :
User uploads file
OCR status pending
Separate server process file, which is pending as per FIFO method once at a time.
Update OCR process status in the database.
Processing server can be increased later, as per need and load of the server.

Reading end of huge and dynamic file via SFTP from server

I am trying to find a way to read just end of huge and dynamic log file (like 20-30 lines from end) via SFTP from server and to save the point until where I read, and if I need more lines, to read more from this point upper.
Everything I've tried takes too long time, I've tried to copy this file on machine and after this to read from end using ReversedLinesFileReader because this method need the File object, when via SFTP you will get only InputStream, takes a lot to download file.
Also tried to count lines and to read from n line but also takes too long and throws exception because sometime in this time file is modified. Another way I tried to connect via SSH and used tail -100 and get the desired result, but just for one time, because next time I will get also new logs, but I need to go upper. Is there a fast way to get the end of file and to save the point and to read more upper of this point later? Any idea?
You don't say what SFTP library you're using, but the most widely used Java SSH/SFTP library is JSch, so I'll assume you're using that.
The SFTP protocol has operations to perform random-access I/O on remote files. Unfortunately, the JSch SFTP client doesn't expose the full range of operations. However, it does have versions of the get operation (for getting a file from the remote server) which permit skipping over the first part of the remote file. You can use one of these operations to read for example the last 10 KB of a file.
Several of the JSch get operations return an InputStream. You can read the contents of the remote file from the input stream. If you want to access the remote file line-by-line, you can convert it to Reader using InputStreamReader.
So, a process might do the following:
Call stat() on the remote file to get its size.
Figure out where in the file you want to start reading from. You could keep track of where you stopped reading last time, or you could guess based on the amount of data you're willing to download and the expected size in bytes of these last 20-30 lines.
Call get() to start reading it.
Process data read from the InputStream returned by the get() call.
Best would be to have a kind of rotating log files, possibly with compression.
Hower rsync is a unidirectional synchronisation, that can transmit only the changed parts of a file: for a log the new end.
I am not sure whether it works sufficiently performant in your case, and ssh is a prerequisite.

Displaying console feed from game server on rcon client

I'm writing an RCON client for an Insurgency (Source engine game) dedicated server. I'm using the RCON protocol defined by Valve that is used in all of the games that use the Source engine. I can successfully send commands to the server, and display the server's response to those commands. However, I have no idea how to read or request the feed displayed by the in-game console (which contains the part I'm primarily interested in: the killfeed). I have looked at querying the server for a possible request to be sent the feed, but there is no such functionality listed.
How would I go about retrieving the console feed from the server?
You cannot request the console feed from the server via RCON.
Two alternative solutions come to mind:
Save the output of the server application
Insurgency (or most source servers for that matter) prints the information you are looking for to stdout. The most elegant solution to save this output would be to start the server via systemd and read it from the syslog via journalctl.
As a more simple solution you can just write it to a file using a pipe:
./start_server.sh > output.log
Or if you still want to see the output as it is printed:
./start_server.sh | tee output.log
Use sourcemod
You can write a sourcemod-plugin, or use an existing one that records and provides those information. The SuperLogs plugin comes to mind, but I haven't used it in a long time. This will be significantly more work.
I have been using the first solution for a long time now. Be aware that Insurgency buffers the output and only writes it once that buffer is full, leading to delays of 20 minutes and upwards. This can be improved by setting sv_logflush 1 in the Insurgency config.

Throttle speed at which a servlet accepts an HTTP Post Body under Tomcat

I have a servlet that accepts large (up to 4GB) binary file uploads. The submitted file is transmitted as the body of an HTTP POST.
The servlet has to perform some time-consuming processing as it receives the file, and it has to finish doing that before sending the response. As a result, it can appear to a fast client that the server has hung because the client can be waiting for a minute or two after sending the the last few bytes before getting the response.
Is there a way either within Tomcat or within the servlet API to throttle back the speed at which the server accepts the file? I would like it to appear to the client that the server is accepting the file at (for example) 10MB/second rather than it accepting the file at 50MB/second and then taking a few minutes after receiving the body to return a response.
Thanks.
I'm extending on the comment of Mark Thomas here because I feel that this is worth being an answer (or the answer), rather than a comment. Mark, let me know if you want to convert the comment yourself and I'll happily delete mine.
John, you're trying to solve your problem in a way that imposes severe limitations: What's the bandwidth that you want to throttle to? What happens when the server is upgraded to a beefier CPU and can process more quickly? What if multiple uploads happen at the same time?
You probably want to have an upload of 4G in as quick a time as possible - imagine the connection going down in the middle - in a web application this typically means you'll have to restart the upload from the beginning. Thus you should decouple your processing from the upload procedure as much as possible.
You also don't mention the file format that gets uploaded: If it happens to be a zip file, note that the server can't do anything with the file until it's fully transmitted, as zip files have the directory of contents at their end. (this might be old knowledge, but at least the old spec had it this way. Someone correct me if this changed)
So: The proper way: Accept the file for processing, signal that you received it and are processing. If you like: Implement Ajax updates once you're done. In the simplest case: "click here to see if processing finished" or frequently reload the page. Anything works and everything is better than throttling throughput on this layer.

Categories