Vertx 3 - Java serializing large objects - java

Vertx 3 Newbie. I'm using the Java API. The usecase is for a reporting app which typically deals with large objects (POJOs). These POJOs contain the data to be exported into PDF, CSV etc and these are typically List of Maps.
I'm wondering if I have to asynchronously pass around the data to different verticles via the EventBus, there is going to be a cost of serialization/deserialization. Are there any tips/tricks while dealing with large objects so that we don't incur a huge overhead for serialization/deserialization?

I think it's bad idea to send anything large through EventBus. You may use vertx SharedData, and send only id of your object.
LocalMap<String, LargeObject> map = vertx.sharedData.getLocalMap("uniq-map-id");
map.put("unique-id", data);
vertx.eventBus.send(ADDRESS, "unique-id");

You can deploy anonymous worker verticle to do that:
Assuming LongOperatingVerticle is the verticle that handles your POJO's:
class LongOperatingVerticle extends AbstractVerticle {
#Override
public void start() {
final String pojo = "Very long file...";
final Future<String> f = Future.future();
// Anonymous verticle in worker mode
this.vertx.deployVerticle(new AbstractVerticle() {
#Override
public void start() throws Exception {
Thread.sleep(5000);
f.complete("Ok");
}
}, new DeploymentOptions().setWorker(true));
System.out.println("Will wait now");
f.setHandler((e) -> {
System.out.println(e.result());
});
}
}

Related

How to read Message in netty in other class

I want to read a message at a specific position in an class other than InboundHandler. I can't find a way to read it expect in the channelRead0 method, which is called from the netty framework.
For example:
context.writeMessage("message");
String msg = context.readMessage;
If this is not possible, how can I map a result, which I get in the channelRead0 method to a specific call I made in another class?
The Netty framework is designed to be asynchronously driven. Using this analogy, it can handle large amount of connections with minimal threading usage. I you are creating an api that uses the netty framework to dispatch calls to a remote location, you should use the same analogy for your calls.
Instead of making your api return the value direct, make it return a Future<?> or a Promise<?>. There are different ways of implementing this system in your application, the simplest way is creating a custom handler that maps the incoming requests to the Promises in a FIFO queue.
An example of this could be the following:
This is heavily based on this answer that I submitted in the past.
We start with out handler that maps the requests to requests in our pipeline:
public class MyLastHandler extends SimpleInboundHandler<String> {
private final SynchronousQueue<Promise<String>> queue;
public MyLastHandler (SynchronousQueue<Promise<String>> queue) {
super();
this.queue = queue;
}
// The following is called messageReceived(ChannelHandlerContext, String) in 5.0.
#Override
public void channelRead0(ChannelHandlerContext ctx, String msg) {
this.queue.remove().setSuccss(msg);
// Or setFailure(Throwable)
}
}
We then need to have a method of sending the commands to a remote server:
Channel channel = ....;
SynchronousQueue<Promise<String>> queue = ....;
public Future<String> sendCommandAsync(String command) {
return sendCommandAsync(command, new DefaultPromise<>());
}
public Future<String> sendCommandAsync(String command, Promise<String> promise) {
synchronized(channel) {
queue.offer(promise);
channel.write(command);
}
channel.flush();
}
After we have done our methods, we need a way to call it:
sendCommandAsync("USER anonymous",
new DefaultPromise<>().addListener(
(Future<String> f) -> {
String response = f.get();
if (response.startWidth("331")) {
// do something
}
// etc
}
)
);
If the called would like to use our a api as a blocking call, he can also do that:
String response = sendCommandAsync("USER anonymous").get();
if (response.startWidth("331")) {
// do something
}
// etc
Notice that Future.get() can throw an InterruptedException if the Thread state is interrupted, unlike a socket read operation, who can only be cancelled by some interaction on the socket. This exception should not be a problem in the FutureListener.

Rest method need long time

This is a design question and I am asking for some ideas.
I have a rest method and it will trigger long-time tasks (10~15 minutes)
As the function takes long time, I run it as a thread,
this can avoid method timeout, but how can I know if the thread went wrong?
Runnable loader = new Runnable() {
public void run() {
//tasks
}
};
(new Thread(loader)).start();
Update: the rest service like this
#path()
beginload(){
//let thread run and return info first
//how can i know if this thread went wrong?
(new Thread(loader)).start();
return "need 15 minutes";
}
Conceptually there has to be a way for the service to communicate a failure to the client. There are multiple ways you can do this. Here are three examples:
After the client calls the service, the service immediately returns a job ID. The client can use the job ID later to query the service for the status (including error). For example, when you launch instances at AWS EC2, it takes a while for EC2 to service the request, so the launch request returns a so-called "reservation ID" that you can use in subsequent operations (like querying for status, terminating the launch, etc.).
Pro: Usable in a wide variety of cases, and easy enough to implement.
Con: Requires polling. (I.e. more chatty.)
The client offers a callback URI that the service invokes upon job completion. The callback URI can either be configured into the service, or else passed along as a request parameter. (Don't hardcode the callback URI in the service since services shouldn't depend on their clients.)
Pro: Still pretty simple, and avoids polling.
Con: Client has to have URI for the service to call, which may not be convenient. (E.g. the client may be a desktop app rather than a service, firewall may prevent it, etc.)
The client pushes a notification into a message queue, and the client listens to that queue.
Pro: Avoids polling, and client doesn't need endpoints to call.
Con: More work to set up (requires messaging infrastructure).
There are other possibilities but those are typical approaches.
Do you need to differentiate between different requests? If several tasks to perform, you need an ID.
You can do something like the following:
private static final ExecutorService es = Executors.newFixedThreadPool(10);
private static final Map<Long, Future<Void>> map = new HashMap<>();
#GET
#Path("/submit")
public Response submitTask() {
long id = System.currentTimeMillis();
Future<Void> future = es.submit(new Callable<Void>() {
public Void call() throws Exception {
// long task
// you must throw exception for bad task
return null;
}
});
map.put(id, future);
return Response.ok(id, MediaType.TEXT_PLAIN).build();
}
#GET
#Path("/status/{id}")
public Response submitTask(#PathParam("id") long id) {
Future<Void> future = map.get(id);
if (future.isDone()) {
try {
future.get();
return Response.ok("Successful!", MediaType.TEXT_PLAIN).build();
} catch (InterruptedException | ExecutionException e) {
// log
return Response.ok("Bad task!", MediaType.TEXT_PLAIN).build();
}
}
return Response.ok("Wait a few seconds.", MediaType.TEXT_PLAIN).build();
}
This can give you an idea. Remember purge the map of old tasks.
If you want to get the return value of your thread and throw/catch possible exception, consider use Callable rather than Runnable, and it can be used along with ExecutorService which provide more functionality.
Callable : A task that returns a result and may throw an exception.
Implementors define a single method with no arguments called call.
public interface Callable<V> {
V call() throws Exception;
}

Using concurrent classes to process files in a directory in parallel

I am trying to figure out how to use the types from the java.util.concurrent package to parallelize processing of all the files in a directory.
I am familiar with the multiprocessing package in Python, which is very simple to use, so ideally I am looking for something similar:
public interface FictionalFunctor<T>{
void handle(T arg);
}
public class FictionalThreadPool {
public FictionalThreadPool(int threadCount){
...
}
public <T> FictionalThreadPoolMapResult<T> map(FictionalFunctor<T> functor, List<T> args){
// Executes the given functor on each and every arg from args in parallel. Returns, when
// all the parallel branches return.
// FictionalThreadPoolMapResult allows to abort the whole mapping process, at the least.
}
}
dir = getDirectoryToProcess();
pool = new FictionalThreadPool(10); // 10 threads in the pool
pool.map(new FictionalFunctor<File>(){
#Override
public void handle(File file){
// process the file
}
}, dir.listFiles());
I have a feeling that the types in java.util.concurrent allow me to do something similar, but I have absolutely no idea where to start.
Any ideas?
Thanks.
EDIT 1
Following the advices given in the answers, I have written something like this:
public void processAllFiles() throws IOException {
ExecutorService exec = Executors.newFixedThreadPool(6);
BlockingQueue<Runnable> tasks = new LinkedBlockingQueue<Runnable>(5); // Figured we can keep the contents of 6 files simultaneously.
exec.submit(new MyCoordinator(exec, tasks));
for (File file : dir.listFiles(getMyFilter()) {
try {
tasks.add(new MyTask(file));
} catch (IOException exc) {
System.err.println(String.format("Failed to read %s - %s", file.getName(), exc.getMessage()));
}
}
}
public class MyTask implements Runnable {
private final byte[] m_buffer;
private final String m_name;
public MyTask(File file) throws IOException {
m_name = file.getName();
m_buffer = Files.toByteArray(file);
}
#Override
public void run() {
// Process the file contents
}
}
private class MyCoordinator implements Runnable {
private final ExecutorService m_exec;
private final BlockingQueue<Runnable> m_tasks;
public MyCoordinator(ExecutorService exec, BlockingQueue<Runnable> tasks) {
m_exec = exec;
m_tasks = tasks;
}
#Override
public void run() {
while (true) {
Runnable task = m_tasks.remove();
m_exec.submit(task);
}
}
}
How I thought the code works is:
The files are read one after another.
A file contents are saved in a dedicated MyTask instance.
A blocking queue with the capacity of 5 to hold the tasks. I count on the ability of the server to keep the contents of at most 6 files at one time - 5 in the queue and another fully initialized task waiting to enter the queue.
A special MyCoordinator task fetches the file tasks from the queue and dispatches them to the same pool.
OK, so there is a bug - more than 6 tasks can be created. Some will be submitted, even though all the pool threads are busy. I've planned to solve it later.
The problem is that it does not work at all. The MyCoordinator thread blocks on the first remove - this is fine. But it never unblocks, even though new tasks were placed in the queue. Can anyone tell me what am I doing wrong?
The thread pool you are looking for is the ExecutorService class. You can create a fixed-size thread pool using newFixedThreadPool. This allows you to easily implement a producer-consumer pattern, with the pool encapsulating all the queue and worker functionality for you:
ExecutorService exec = Executors.newFixedThreadPool(10);
You can then submit tasks in the form of objects whose type implements Runnable (or Callable if you want to also get a result):
class ThreadTask implements Runnable {
public void run() {
// task code
}
}
...
exec.submit(new ThreadTask());
// alternatively, using an anonymous type
exec.submit(new Runnable() {
public void run() {
// task code
}
});
A big word of advice on processing multiple files in parallel: if you have a single mechanical disk holding the files it's wise to use a single thread to read them one-by-one and submit each file to a thread pool task as above, for processing. Do not do the actual reading in parallel as it will degrade performance.
A simpler solution than using ExecuterService is to implement your own producer-consumer scheme. Have a thread that create tasks and submits to a LinkedBlockingQueue or ArrayBlockingQueue and have worker threads that check this queue to retrieve the tasks and do them. You may need a special kind of tasks name ExitTask that forces the workers to exit. So at the end of the jobs if you have n workers you need to add n ExitTasks into the queue.
Basically, what #Tudor said, use an ExecutorService, but I wanted to expand on his code and I always feel strange editing other people's posts. Here's a sksleton of what you would submit to the ExecutorService:
public class MyFileTask implements Runnable {
final File fileToProcess;
public MyFileTask(File file) {
fileToProcess = file;
}
public void run() {
// your code goes here, e.g.
handle(fileToProcess);
// if you prefer, implement Callable instead
}
}
See also my blog post here for some more details if you get stuck
Since processing Files often leads to IOExceptions, I'd prefer a Callable (which can throw a checked Exception) to a Runnable, but YMMV.

Are there any executor in java concurrent package which guarantee that all tasks will be done in order they were submitted?

A code sample for demonstration of the idea from the title:
executor.submit(runnable1);
executor.submit(runnable2);
I need to be sure that runnable1 will finish before runnable2 start and I haven't found any proofs of such behavior in the executors documentation.
About the problem I'm solving:
I need write lots of logs to a file. Each log requires much precomputing (formatting and some other stuff). So, I want to put each logging task to a kind of queue and process these tasks in a separate thread. And, of course, it's important to keep logs ordering.
A single threaded executor will perform all tasks in the order submitted. You would only use a thread pool with multiple threads if you wanted the tasks to be perform concurrently.
Adding tasks to a queue can be expensive in itself. You can use an Exchanger like this
http://vanillajava.blogspot.com/2011/09/exchange-and-gc-less-java.html?z#!/2011/09/exchange-and-gc-less-java.html
This avoid using a queue or creating object.
An alternative which is faster is to use a memory mapped file which doesn't require a background thread (actually the OS is working in the background) This is much faster again. It supports sub-microsecond latencies and millions of messages per second.
https://github.com/peter-lawrey/Java-Chronicle
You could create a simple wrapper like the one below so that all your Runnables are executed in the same thread (i.e. sequentially), and submit that wrapper to the executor instead. That does not address the logging issue.
class MyRunnable implements Runnable {
private List<Runnable> runnables = new ArrayList<>();
public void add(Runnable r) {
runnables.add(r);
}
#Override
public void run() {
for (Runnable r : runnables) {
r.run();
}
}
}
//......
MyRunnable r = new MyRunnable();
r.add(runnable1);
r.add(runnable2);
executor.submit(r);
Presumably you are doing some post-analysis of the logfile? Have you considered not caring about the order they're written and re-ordering offline later. You could allocate a unique id at submit time using, a timestamp or AtomicLong?
a code sketch (untested) would look like this:
import java.util.concurrent.atomic.AtomicLong;
class MyProcessor {
public void work()
for (Object data: allData) {
executor.submit(new MySequencedRunnable(data);
}
}
}
class MySequencedRunnable implements Runnable {
private static final AtomicLong LOG_SEQUENCE_ID = new AtomicLong(0);
private final Object data;
MySequencedRunnable(Object data) {
this.data = data;
}
public void run() {
LOGGER.log(LOG_SEQUENCE_ID.incrementAndGet(), data);
}
}
Also consider, if you're using something like log4j, using NDC or MDC to assist with the re-ordering.

Connecting multiple flex clients to a single java class

I have a multi-user application consisting of a flex client and blazeds/Spring/java backend - I have the main elements working fine ie. sending messages to destination, consuming and producing. Flex clients are able to send and retrieve a string from this class no problem. What I want to do is to have the 2 clients with access to the same variable..in this crude sample I'm sending a guid from each swf which I append to a string _players server side. What happens is when I launch Swf A, it recieves its guid back fine, as does Swf B. Then Swf A recieves the guid from Swf B, but Swf B does not recieve Swf A. BTW this is the same swf code just launched twice each in a different browser.
Can anyone see where I'm going wrong or what might be a better solution?
public class GameFeed {
private static GaneFeedThread thread;
private final MessageTemplate template;
public GameFeed(MessageTemplate template) {
this.template = template;
}
public void start() {
if (thread == null) {
thread = new GaneFeedThread(this.template);
thread.start();
}
}
public void stop() {
thread.running = false;
thread = null;
}
public static class GaneFeedThread extends Thread {
public boolean running = false;
private final MessageTemplate template;
public GaneFeedThread(MessageTemplate template) {
this.template = template;
}
private static String _players;
public void addPlayer(String name)
{
_players += name + ",";
}
while (this.running) {
this.template.send("game-feed", _players);
}
You have a threading problem in you class. It is not sure if this is the cause of your problem - but it could.
It seams that you are sharing data though the _player variable. But this variable is not thread safe. It has two major problem:
issue 1 : If two clients invoke the addPlayer method at the same time - it is not clear what happen to your player variable - alt least you could have something like a lost update
issue 2: (this is maybe the cause) - The Java memory model does not guarantee that the _player variable is updated in both threads without proper concurrency management.
To fix it you have to do two things:
first: wrap _players += name + ","; in an synchronized block (for issue 1)
second: mark _players as volatile (for issue 2)
#see http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
It's probably the server that's preventing this. Traditionally, data that is to be shared between clients, or otherwise persisted, is written to a DB or some other datasource. You might do well with a in memory DB. Most web servers have one configured out of the box using HSQLDB or Derby.
A general other solution would be using a thread save collection instead of the String, but this my lead to other problems and is not so efficent like your string.
But nevertheless you should rething your decision: to use a static variable in a Thread class to store shared business data like your player list.

Categories