I'm quite new to Apache Camel and trying to bring some routes into action.
I have a TCP server which serves large JSON-Messages (up to ~30-50kB in size, where i do not have any control about the source size) that contain lots of measurement data which i want to process using certain additional routes that work fine.
I'm using camel 2.20 within spring-boot environment 1.5.7.
I faced the problem that if i commented out every other routes except the incoming reduced netty4 route (only from and to a counter), see below
#Bean
public RouteBuilder getRoute() {
String fromSource = String.format("netty4:tcp://%s:%d?clientMode=true&textline=true&receiveBufferSize=64000&decoderMaxLineLength=64000",sourceIp,sourcePort);
return new RouteBuilder() {
from(fromSource)
.to("metrics:counter:incomingCounter");
};
}
The route works nearly fine but consumes more and more heap-space (around 2MB every second, where there are messages served with a frequency of around 20-30Hz) until java throws java.lang.OutOfMemoryError: Java heap space.
Without any route no memory-leak was registered, as i can focus the problem to the netty-route
Any help will be appreciated.
Thanks in advance.
I found the resolution myself by debugging the code.
I forgot to set property sync=false in netty4-camel endpoint as i don't want to process message and send an answer back to the server after processing, just consuming - while sync=true (default settings) buffers all incoming data for later response which caused my "memory-leak".
The behavior of "sync" was not totally clear from the netty4-camel documentation (http://camel.apache.org/netty4.html) - i'll suggest an improvement of the documentation (will write a mail with a proposal) to make the usage a little more clearly.
Maybe this helps someone another having a similar problem.
Best
As I wrote in title we need in project notify or execute method of some thread by another. This implementation is part of long polling. In following text describe and show my implementation.
So requirements are that:
UserX send request from client to server (poll action) immediately when he got response from previous. In service is executed spring async method where thread immediately check cache if there are some new data in database. I know that cache is usually used for methods where for specific input is expected specific output. This is not that case, because I use cache to reduce database calls and output of my method is always different. So cache help me store notification if I should check database or not. This checking is running in while loop which end when thread find notification to read database in cache or time expired.
Assume that UserX thread (poll action) is currently in while loop and checking cache.
In that moment UserY (push action) send some data to server, data are stored in database in separated thread, and also in cache is stored userId of recipient.
So when UserX is checking cache he found id of recipient (id of recipient == his id in this case), and then break loop and fetch these data.
So in my implementation I use google guava cache which provide manually write.
private static Cache<Long, Long> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
In create method I store id of user which should read these data.
public void create(Data data) {
dataRepository.save(data);
cache.save(data.getRecipient(), null);
System.out.println("SAVED " + userId + " in " + Thread.currentThread().getName());
}
and here is method of polling data:
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = findRecent(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = findRecent(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
After User X got response he send poll request again and whole process is repeated.
Can you tell me if is this application design for long polling in java (spring) is correct or exists some better way? Key point is that when user call poll request, this request should be holded for new data for some time and not response immediately. This solution which I show above works, but question is if it will be works also for many users (1000+). I worry about it because of pausing threads which should make slower another requests when no threads will be available in pool. Thanks in advice for your effort.
Check Web Sockets. Spring supports it from version 4 on wards. It doesn't require client to initiate a polling, instead server pushes the data to client in real time.
Check the below:
https://spring.io/guides/gs/messaging-stomp-websocket/
http://www.baeldung.com/websockets-spring
Note - web sockets open a persistent connection between client and server and thus may result in more resource usage in case of large number of users. So, if you are not looking for real time updates and is fine with some delay then polling might be a better approach. Also, not all browsers support web sockets.
Web Sockets vs Interval Polling
Longpolling vs Websockets
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
In your current approach, if you are having a concern with large number of threads running on server for multiple users then you can trigger the polling from front-end every time instead. This way only short lived request threads will be triggered from UI looking for any update in the cache. If there is an update, another call can be made to retrieve the data. However don't hit the server every other second as you are doing otherwise you will have high CPU utilization and user request threads may also suffer. You should do some optimization on your timing.
Instead of hitting the cache after a delay of 1 sec for 100 times, you can apply an intelligent algorithm by analyzing the pattern of cache/DB update over a period of time.
By knowing the pattern, you can trigger the polling in an exponential back off manner to hit the cache when the update is most likely expected. This way you will be hitting the cache less frequently and more accurately.
I have a requirement to process a list of large number of users daily to send them email and SMS notifications based on some scenario. I am using Java EE batch processing model for this. My Job xml is as follows:
<step id="sendNotification">
<chunk item-count="10" retry-limit="3">
<reader ref="myItemReader"></reader>
<processor ref="myItemProcessor"></processor>
<writer ref="myItemWriter"></writer>
<retryable-exception-classes>
<include class="java.lang.IllegalArgumentException"/>
</retryable-exception-classes>
</chunk>
</step>
MyItemReader's onOpen method reads all users from database, and readItem() reads one user at a time using list iterator. In myItemProcessor, the actual email notification is sent to user, and then the users are persisted in database in myItemWriter class for that chunk.
#Named
public class MyItemReader extends AbstractItemReader {
private Iterator<User> iterator = null;
private User lastUser;
#Inject
private MyService service;
#Override
public void open(Serializable checkpoint) throws Exception {
super.open(checkpoint);
List<User> users = service.getUsers();
iterator = users.iterator();
if(checkpoint != null) {
User checkpointUser = (User) checkpoint;
System.out.println("Checkpoint Found: " + checkpointUser.getUserId());
while(iterator.hasNext() && !iterator.next().getUserId().equals(checkpointUser.getUserId())) {
System.out.println("skipping already read users ... ");
}
}
}
#Override
public Object readItem() throws Exception {
User user=null;
if(iterator.hasNext()) {
user = iterator.next();
lastUser = user;
}
return user;
}
#Override
public Serializable checkpointInfo() throws Exception {
return lastUser;
}
}
My problem is that checkpoint stores the last record that was executed in the previous chunk. If I have a chunk with next 10 users, and exception is thrown in myItemProcessor of the 5th user, then on retry the whole chunck will be executed and all 10 users will be processed again. I don't want notification to be sent again to the already processed users.
Is there a way to handle this? How should this be done efficiently?
Any help would be highly appreciated.
Thanks.
I'm going to build on the comments from #cheng. My credit to him here, and hopefully my answer provides additional value in organizing and presenting the options usefully.
Answer: Queue up messages for another MDB to get dispatched to send emails
Background:
As #cheng pointed out, a failure means the entire transaction is rolled back, and the checkpoint doesn't advance.
So how to deal with the fact that your chunk has sent emails to some users but not all? (You might say it rolled back but with "side effects".)
So we could restate your question then as: How to send email from a batch chunk step?
Well, assuming you had a way to send emails through an transactional API (implementing XAResource, etc.) you could use that API.
Assuming you don't, I would do a transactional write to a JMS queue, and then send the emails with a separate MDB (as #cheng suggested in one of his comments).
Suggested Alternative: Use ItemWriter to send messages to JMS queue, then use separate MDB to actually send the emails
With this approach you still gain efficiency by batching the processing and the updates to your DB (you were only sending the emails one at a time anyway), and you can benefit from simple checkpointing and restart without having to write complicated error handling.
This is also likely to be reusable as a pattern across batch jobs and outside of batch even.
Other alternatives
Some other ideas that I don't think are as good, listed for the sake of discussion:
Add batch application logic tracking users emailed (with ItemProcessListener)
You could build your own list of either/both successful/failed emails using the ItemProcessListener methods: afterProcess and onProcessError.
On restart, then, you could know which users had been emailed in the current chunk, which we are re-positioned to since the entire chunk rolled back, even though some emails have already been sent.
This certainly complicates your batch logic, and you also have to persist this success or failure list somehow. Plus this approach is probably highly specific to this job (as opposed to queuing up for an MDB to process).
But it's simpler in that you have a single batch job without the need for a messaging provider and a separate app component.
If you go this route you might want to use a combination of both a skippable and a "no-rollback" retryable exception.
single-item chunk
If you define your chunk with item-count="1", then you avoid complicated checkpointing and error handling code. You sacrifice efficiency though, so this would only make sense if the other aspects of batch were very compelling: e.g. scheduling and management of jobs through a common interface, the ability to restart at the failing step within a job
If you were to go this route, you might want to consider defining socket and timeout exceptions as "no-rollback" exceptions (using ) since there's nothing to be gained from rolling back, and you might want to retry on a network timeout issue.
Since you specifically mentioned efficiency, I'm guessing this is a bad fit for you.
use a Transaction Synchronization
This could work perhaps, but the batch API doesn't especially make this easy, and you still could have a case where the chunk completes but one or more email sends fail.
Your current item processor is doing something outside the chunk transaction scope, which has caused the application state to be out of sync. If your requirement is to send out emails only after all items in a chunk have successfully completed, then you can move the emailing part to a ItemWriterListener.afterWrite(items).
Im attempting to stream data from a kafka installation into BigQuery using Java based on Google samples. The data is JSON rows ~12K in length. I batching these into blocks of 500 (roughly 6Mb) and streaming them as:
InsertAllRequest.Builder builder = InsertAllRequest.newBuilder(tableId);
for (String record : bqStreamingPacket.getRecords()) {
Map<String, Object> mapObject = objectMapper.readValue(record.replaceAll("\\{,", "{"), new TypeReference<Map<String, Object>>() {});
// remove nulls
mapObject.values().removeIf(Objects::isNull);
// create an id for each row - use to retry / avoid duplication
builder.addRow(String.valueOf(System.nanoTime()), mapObject);
}
insertAllRequest = builder.build();
...
BigQueryOptions bigQueryOptions = BigQueryOptions.newBuilder().
setCredentials(Credentials.getAppCredentials()).build();
BigQuery bigQuery = bigQueryOptions.getService();
InsertAllResponse insertAllResponse = bigQuery.insertAll(insertAllRequest);
Im seeing insert times of 3-5 seconds for each call. Needless to say this makes BQ streaming less than useful. From their documents I was worried about hitting per-table insert quotas (Im streaming from Kafka at ~1M rows / min) but now Id be happy to deal with that problem.
All rows insert fine. No errors.
I must be doing something very wrong with this setup. Please advise.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last three years as you can see in the chart, we stream from Softlayer to Google.
Try to vary the numbers from hundreds to thousands row, or until you reach some streaming api limits and measure each call.
Based on this you can deduce more information such as bandwidth problem between you and BigQuery API, latency, SSL handshake, and eventually optimize it for your environment.
You can leave also your project id/table and maybe some Google engineer will check it.
I've been writing a little application that will let people upload & download files to me. I've added a web service to this applciation to provide the upload/download functionality that way but I'm not too sure on how well my implementation is going to cope with large files.
At the moment the definitions of the upload & download methods look like this (written using Apache CXF):
boolean uploadFile(#WebParam(name = "username") String username,
#WebParam(name = "password") String password,
#WebParam(name = "filename") String filename,
#WebParam(name = "fileContents") byte[] fileContents)
throws UploadException, LoginException;
byte[] downloadFile(#WebParam(name = "username") String username,
#WebParam(name = "password") String password,
#WebParam(name = "filename") String filename) throws DownloadException,
LoginException;
So the file gets uploaded and downloaded as a byte array. But if I have a file of some stupid size (e.g. 1GB) surely this will try and put all that information into memory and crash my service.
So my question is - is it possible to return some kind of stream instead? I would imagine this isn't going to be terribly OS independent though. Although I know the theory behind web services, the practical side is something that I still need to pick up a bit of information on.
Cheers for any input,
Lee
Yes, it is possible with Metro. See the Large Attachments example, which looks like it does what you want.
JAX-WS RI provides support for sending and receiving large attachments in a streaming fashion.
Use MTOM and DataHandler in the programming model.
Cast the DataHandler to StreamingDataHandler and use its methods.
Make sure you call StreamingDataHandler.close() and also close the StreamingDataHandler.readOnce() stream.
Enable HTTP chunking on the client-side.
Stephen Denne has a Metro implementation that satisfies your requirement. My answer is provided below after a short explination as to why that is the case.
Most Web Service implementations that are built using HTTP as the message protocol are REST compliant, in that they only allow simple send-receive patterns and nothing more. This greatly improves interoperability, as all the various platforms can understand this simple architecture (for instance a Java web service talking to a .NET web service).
If you want to maintain this you could provide chunking.
boolean uploadFile(String username, String password, String fileName, int currentChunk, int totalChunks, byte[] chunk);
This would require some footwork in cases where you don't get the chunks in the right order (Or you can just require the chunks come in the right order), but it would probably be pretty easy to implement.
When you use a standardized web service the sender and reciever do rely on the integrity of the XML data send from the one to the other. This means that a web service request and answer only are complete when the last tag was sent. Having this in mind, a web service cannot be treated as a stream.
This is logical because standardized web services do rely on the http-protocol. That one is "stateless", will say it works like "open connection ... send request ... receive data ... close request". The connection will be closed at the end, anyway. So something like streaming is not intended to be used here. Or he layers above http (like web services).
So sorry, but as far as I can see there is no possibility for streaming in web services. Even worse: depending on the implementation/configuration of a web service, byte[] - data may be translated to Base64 and not the CDATA-tag and the request might get even more bloated.
P.S.: Yup, as others wrote, "chuinking" is possible. But this is no streaming as such ;-) - anyway, it may help you.
I hate to break it to those of you who think a streaming web service is not possible, but in reality, all http requests are stream based. Every browser doing a GET to a web site is stream based. Every call to a web service is stream based. Yes, all. We don't notice this at the level where we are implementing services or pages because lower levels of the architecture are dealing with this for you - but it is being done.
Have you ever noticed in a browser that sometimes it can take a while to fetch a page - the browser just keeps cranking away showing the hourglass? That is because the browser is waiting on a stream.
Streams are the reason mime/types have to be sent before the actual data - it's all just a byte stream to the browser, it wouldn't be able to identify a photo if you didn't tell it what it was first. It's also why you have to pass the size of a binary before sending - the browser won't be able to tell where the image stops and the page picks up again.
It's all just a stream of bytes to the client. If you want to prove this for yourself, just get a hold of the output stream at any point in the processing of a request and close() it. You will blow up everything. The browser will immediately stop showing the hourglass, and will display a "cannot find" or "connection reset at server" or some other such message.
That a lot of people don't know that all of this stuff is stream based shows just how much stuff has been layered on top of it. Some would say too much stuff - I am one of those.
Good luck and happy development - relax those shoulders!
For WCF I think its possible to define a member on a message as stream and set the binding appropriately - I've seen this work with wcf talking to Java web service.
You need to set the transferMode="StreamedResponse" in the httpTransport configuration and use mtomMessageEncoding (need to use a custom binding section in the config).
I think one limitation is that you can only have a single message body member if you want to stream (which kind of makes sense).
Apache CXF supports sending and receiving streams.
One way to do it is to add a uploadFileChunk(byte[] chunkData, int size, int offset, int totalSize) method (or something like that) that uploads parts of the file and the servers writes it the to disk.
Keep in mind that a web service request basically boils down to a single HTTP POST.
If you look at the output of a .ASMX file in .NET , it shows you exactly what the POST request and response will look like.
Chunking, as mentioned by #Guvante, is going to be the closest thing to what you want.
I suppose you could implement your own web client code to handle the TCP/IP and stream things into your application, but that would be complex to say the least.
I think using a simple servlet for this task would be a much easier approach, or is there any reason you can not use a servlet?
For instance you could use the Commons open source library.
The RMIIO library for Java provides for handing a RemoteInputStream across RMI - we only needed RMI, though you should be able to adapt the code to work over other types of RMI . This may be of help to you - especially if you can have a small application on the user side. The library was developed with the express purpose of being able to limit the size of the data pushed to the server to avoid exactly the type of situation you describe - effectively a DOS attack by filling up ram or disk.
With the RMIIO library, the server side gets to decide how much data it is willing to pull, where with HTTP PUT and POSTs, the client gets to make that decision, including the rate at which it pushes.
Yes, a webservice can do streaming. I created a webservice using Apache Axis2 and MTOM to support rendering PDF documents from XML. Since the resulting files could be quite large, streaming was important because we didn't want to keep it all in memory. Take a look at Oracle's documentation on streaming SOAP attachments.
Alternately, you can do it yourself, and tomcat will create the Chunked headers. This is an example of a spring controller function that streams.
#RequestMapping(value = "/stream")
public void hellostreamer(HttpServletRequest request, HttpServletResponse response) throws CopyStreamException, IOException
{
response.setContentType("text/xml");
OutputStreamWriter writer = new OutputStreamWriter (response.getOutputStream());
writer.write("this is streaming");
writer.close();
}
It's actually not that hard to "handle the TCP/IP and stream things into your application". Try this...
class MyServlet extends HttpServlet
{
public void doGet(HttpServletRequest request, HttpServletResponse response)
{
response.getOutputStream().println("Hello World!");
}
}
And that is all there is to it. You have, in the above code, responded to an HTTP GET request sent from a browser, and returned to that browser the text "Hello World!".
Keep in mind that "Hello World!" is not valid HTML, so you may end up with an error on the browser, but that really is all there is to it.
Good Luck in your development!
Rodney