Processing a large file with a web service

Processing a large file with a web service - java

I have web service method that is supposed to process a very large file and output several files to the server. However, this web service will just timeout and there will be no way for the invoker to get the CREATED status. I am just wondering whether there is a way to run the processing job (starting a new thread or something) and return the status without waiting for the process to be done.
public Response processFile(InputStream inputStream){
//I want to process the file here
//but I dont want the invoker to wait for that process to finish
//I just want the response to be returned right away
return Response.status(Response.Status.CREATED).build();
}

The file comes from the input stream, right? So if you'll send back a CREATED status (effectually closing the connection) you might sever the connection before you receive the entirety of the input file?
That's what i thought anyways... In which case you'll just want to set the timeout to a lengthier value.
If that's not the case, then i guess it would be fine to start a new thread, process everything there in good time and send back the CREATED status.

Related

Spring boot app and what approach to use to download bulk data

I have spring boot application with basic REST API.
My question is what shall we use to download some bulk data? What is preferable way how to download bulk data without memory leak? Let's suppose we have 10 million records.
Here are some approaches but not sure:
download with PipedInputStream when data are written with PipedOutputStream in separated thread. Is it fine or it is not good choice?
download with ByteArrayOutputStream when data are written into temp file in separated thread and after finish it is ready to download. We can mark this operation with some flags for end user eg. DOWNLOAD_ACTIVE, DOWNLOAD_DONE. The user is just initiating download with result flag DOWNLOAD_ACTIVE and trying to ping server for response flag DOWNLOAD_DONE. When it is done then the user is going to send request to download data.
Summary 2)
1. initiate request to download data - ACTIVE state
2. ping server and server returns current state - ACTIVE or DONE
3. if final state is DONE then user initiate final request to download data
Thanks

You can use the second approach. Which can prepare data in the background and once it's ready you can download it.
Send a request to prepare data. The server responds with a UUID.
Server starts preparing files in the background. The server has a Map that has the key with a new UUID and value as status ACTIVE.
Client saved UUID and checks the server after a certain interval by passing the UUID.
Once the server finishes the task it will update the Map for the given UUID value as status DONE.
As the status is DONE next request will provide the status DONE and UI and send another request to download the file.
The above approach will only work if you don't refresh the page. As page refresh will clear the UUID and you have to proceed again.
To achieve this after refresh/cross-logins then you need to use a database table instead of Map. Store the username along with other information and inform the user once it's ready.

Heroku process is supposed to always stay running, but when it gets pinged it crashes with an H20 error

So I have a Java application running in Heroku where it's always listening for something. It's not supposed to be pinged, but if it does, it returns an H20 error because the process never ends. How can I prevent this?
I've tried listening for a get call:
get("", new Route() {
#Override
public Object handle(Request request, Response response) throws Exception {
return "Hello";
}
});
But then realized that it's going to end the whole process, meaning it's no longer going to run until I do heroku restart because the process starts as soon as I publish.
What can I do to make the server ignore pings, without ending the process itself?
To make things clear, this is supposed to be a text bot listening for commands in a certain chat.

An H20 is an App Boot Timeout.
For web process types, Heroku requires you to bind the port specified at $PORT. If after 75 seconds, your process isn't bound to that port, it is considered as too long to boot and therefore killed.
If your bot isn't meant to listen for HTTP requests, you can fix this changing the process type name from web to anything else in your Procfile.

Handle client-side request timeout in java

A client sends a request and catches a timeout exception. However the server is still processing the request and saving it to the database. Before that happening, the client already sent a second request which doubles the record on the database. How do I prevent that from happening? Im using java servlets and javascript.

A few suggestions:-
1) Increase the client timeout.
2) Make the server more efficient so it can respond faster.
3) Get the server to respond with an intermediate "I'm working on it" response before returning with the main response.
4) Does the server need to do all the work before it responds to the client, or can some be offloaded to a seperate process for running later?

A client sends a request and catches a timeout exception. However the server is still processing the request
Make the servlet generate some output (can be just blank spaces) and flush the stream every so often (every 15 seconds for example).
If the connection has been closed on the client side, the write will fail with a socket exception.
Before that happening, the client already sent a second request which doubles the record on the database
Use the atomicity of the database, for example, a unique key. Start the process by creating a unique record (maybe in some "unfinished" status), it will fail if the record already exists.

Java servlet and Ajax uploading files progress bar

Currently, i using XmlHttpRequest to uploading files to the server using HTML5 capabilities.
There's progress bar:
xhr.upload.addEventListener('progress', function(e) {
var done = e.position || e.loaded, total = e.totalSize || e.total;
console.log(done);
});
... everything works fine, but it doesn't consider processing the file by server. So it shows 100% uploaded even when file weren't created yet.
The file receiver is Java servlet, which able to response only after return. So here's no way to count the percents left by its response.
Whether there are ways around it?

If the processing the server does takes a long time and you want to give feedback while it happens, here's a rough outline of a solution. This will take a while to implement so it's only really worth doing if the processing is slow.
Modify your servlet to do its work asynchronously, and return a 201 Accepted response to the client.
As the server processes the file, set the progress on an object, typically a ConcurrentHashMap injected with Spring if that's what you're using.
Expose an API that queries the current progress of the task without blocking for it to complete.
In your javascript, poll this API until the task completes, and show the progress.
You can return a tracking ID in the response at step 1 if you need to.

Salesforce/PHP - Bulk Outbound message (SOAP), Time out issue - See update #2

Salesforce can send up to 100 requests inside 1 SOAP message. While sending this type of Bulk Ooutbound message request my PHP script finishes executing but SF fails to accept the ACK used to clear the message queue on the Salesforce side of things. Looking at the Outbound message log (monitoring) I see all the messages in a pending state with the Delivery Failure Reason "java.net.SocketTimeoutException: Read timed out". If my script has finished execution, why do I get this error?
I have tried these methods to increase the execution time on my server as I have no access on the Salesforce side:
set_time_limit(0); // in the script
max_execution_time = 360 ; Maximum execution time of each script, in seconds
max_input_time = 360 ; Maximum amount of time each script may spend parsing request data
memory_limit = 32M ; Maximum amount of memory a script may consume
I used the high settings just for testing.
Any thoughts as to why this is failing the ACK delivery back to Salesforce?
Here is some of the code:
This is how I accept and send the ACK file for the imcoming SOAP request
$data = 'php://input';
$content = file_get_contents($data);
if($content) {
respond('true');
} else {
respond('false');
}
The respond function
function respond($tf) {
$ACK = <<<ACK
<?xml version = "1.0" encoding = "utf-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<notifications xmlns="http://soap.sforce.com/2005/09/outbound">
<Ack>$tf</Ack>
</notifications>
</soapenv:Body>
</soapenv:Envelope>
ACK;
print trim($ACK);
}
These are in a generic script that I include into the script that uses the data for a specific workflow. I can process about 25 requests (That are in 1 SOAP response) but once I go over that I get the timeout error in the Salesforce queue. for 50 requests is usually takes my PHP script 86.77 seconds.
Could it be Apache? PHP?
I have also tested just accepting the 100 request SOAP response and just accepting and sending the ACK the queue clears out, so I know it's on my side of things.
I show no errors in the apache log, the script runs fine.
I did find some info on the Salesforce site but still no luck. Here is the link.
Also I'm using the PHP Toolkit 11 (From Salesforce).
Other forum with good SF help
Thanks for any insight into this,
--Phill
UPDATE:
If I receive the incoming message and print the response, should this happen first regardless if I do anything else after? Or does it wait for my process to finish and then print the response?
UPDATE #2:
okay I think I have the problem:
PHP uses the single thread processing approach and will not send back the ACK file until the thread has completed it's processing. Is there a way to make this a mutli thread process?
Thread #1 - accept the incoming SOAP request and send back the ACK
Thread #2 - Process the SOAP request
I know I could break it up into like a DB table or flat file, but is there a way to accomplish this without doing that?
I'm going to try to close the socket after the ACK submission and continue the processing, cross my fingers it will work.

Sounds like the outbound message is hitting the timeout. Other users have reported timeouts as low as 10 seconds (see forum link below). The sandbox instance that I use (cs1) is timing out after about 1 minute, from my testing. It's possible that the timeout is an organization or instance level setting that Salesforce controls.
Two things you could try:
Open a support ticket with
Salesforce to see if they can
increase the timeout value for
outbound messages. From my
experience, there are lot of
settings that they can modify on the
organization level - this might be
one of them.
Offload processing of your data, so
that the ACK is sent immediately
back to Salesforce. Then the actual
processing of your data will take
place asynchronously. ie. Message
queue, separate thread, etc.
Some other resources that might be helpful:
related Salesforce forum discussion
Outbound messaging documentation

I think they timeout the thing waiting for Your script to end.
There is a way You could try to fix this.
Output the envelope with ack message at the beginning and then flush the thing so that their server gets it before You end processing. No threading, just plain priorities rethinking :)
read this for best info on flushing content

Are you 100% sure that Salesforce will wait the amount of time your scripts need too run? 80 seconds seem like a loong time too me.
If all requests failed I would guess that Salesforce expects you to set the Content-Type header appropriately, but this does not seem to be the case.

I don't know about Salesforce, but if you want to make some multithreading with PHP you should take a look at this code example and more precisely to pcntl_fork().
N.B: pcntl is not enabled by default and won't work on Windows platforms.

So what I've done is:
Accept all incoming OBM's, parse them into a DB
When this is done kick of a process that runs in the background (Actually I send it to the background so the script can end)
Send ACK file back
By just accepting the raw data, parsing into fields and inserting it into a DB is fairly quick. Then I issue a Linux Command Line command that also send the processing script to run in the background. Then I send the ACK file to SF and the script ends within the allotted time. It is cumbersome to split the script process into two separate stages but it works.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Processing a large file with a web service - java

Related

Spring boot app and what approach to use to download bulk data

Heroku process is supposed to always stay running, but when it gets pinged it crashes with an H20 error

Handle client-side request timeout in java

Java servlet and Ajax uploading files progress bar

Salesforce/PHP - Bulk Outbound message (SOAP), Time out issue - See update #2

Categories

Resources