Thread VS RabbitMQ Worker resource consumption

Thread VS RabbitMQ Worker resource consumption - java

I am using JAVA ExecutorService threads to send amazon emails, this helps me to make concurrent connection with AmazonSES via API and sends mails at lightning speed. So amazon accepts some number of connection in a sec, so for me its 50 requests in a second. So I execute 50 threads in a second and send around 1million emails daily.
This is working pretty good, but now the number of mails is going to be increased. And I don't want to invest more into RAM and processors.
One of my friend suggested me to use RabbitMQ Workers instead of threads, so instead of 50 threads, I ll be having 50 workers which will do that job.
So before changing some code to test the resource management, I just want to know will there be any huge difference in consumption? So currently when I execute my threads, JAVA consumes 20-30% of memory. So if I used workers will it be low or high?
Or is their any alternative option to this?
Here is my thread email sending function:
#Override
public void run() {
Destination destination = new Destination().withToAddresses(new String[] { this.TO });
Content subject = new Content().withData(SUBJECT);
Content textBody = new Content().withData(BODY);
Body body = new Body().withHtml(textBody);
Message message = new Message().withSubject(subject).withBody(body);
SendEmailRequest request = new SendEmailRequest().withSource(FROM).withDestination(destination).withMessage(message);
Connection connection = new Connection();
java.util.Date dt = new java.util.Date();
java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String insert = "";
try {
System.out.println("Attempting to send an email to " + this.TO);
ctr++;
client.sendEmail(request);
insert = "INSERT INTO email_histories (campaign_id,contact_id,group_id,is_opened,mail_sent_at,mail_opened_at,ip_address,created_at,updated_at,is_sent) VALUES (" + this.campaign_id + ", " + this.contact_id + ", " + this.group_id + ", false, '" + sdf.format(dt) + "', null, null, '" + sdf.format(dt) + "', '" + sdf.format(dt) + "', true);";
connection.insert(insert);
System.out.println("Email sent! to " + this.TO);
} catch (Exception ex) {
System.out.println("The email was not sent.");
System.out.println("Error message: " + ex.getMessage());
}
}

I have no experience with RabbitMQ, so I'll have to leave that for others to answer.
Or is their any alternative option to this?
Instead of using one thread per mail, move that code inside a runable. Add a shared Semaphore with the number of permits = the number of mails you want to send per second. Take one permit per mail, refill permits every second from another thread (i.e. a separate SchedledExecutorService or a Timer). Then adjust the Executor thread pool size to whatever your server can handle.

From a RabbitMQ perspective there is a small amount of memory and network resource consumed, although pretty constant for each connection. If you use a pool of worker threads to read off of the RabbitMQ queue or queues it is possible that it will save you some resources because you are not garbage collecting the individual threads. As far as alternatives are concerned I would use a Thread Pool in any case. Although perhaps too heavyweight for your use, Spring Framework has a very good thread pool that I have used in the past.

Related

How to improve the performance iterating over 130 items uploading them to aws s3

I have to iterate over 130 Data Transfer Objects, and each time will generate a json to be uploaded to aws S3.
With no improvements, it takes around 90 seconds the complete the whole process. I tried using lamba and not using lamba, same results for both.
for(AbstractDTO dto: dtos) {
try {
processDTO(dealerCode, yearPeriod, monthPeriod, dto);
} catch (FileAlreadyExistsInS3Exception e) {
failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
}
}
dtos.stream().forEach(dto -> {
try {
processDTO(dealerCode, yearPeriod, monthPeriod, dto);
} catch (FileAlreadyExistsInS3Exception e) {
failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
}
});
After some investigation, I concluded that the method processDTO takes around 0.650ms per item to run.
My first attempt was to use parallel streams, and the results were pretty good, taking around 15 seconds to complete the whole process:
dtos.parallelStream().forEach(dto -> {
try {
processDTO(dealerCode, yearPeriod, monthPeriod, dto);
} catch (FileAlreadyExistsInS3Exception e) {
failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
}
});
But I still need to decrease that time.
I researched about improving parallel streams, and discovered the ForkJoinPool trick:
ForkJoinPool forkJoinPool = new ForkJoinPool(PARALLELISM_NUMBER);
forkJoinPool.submit(() ->
dtos.parallelStream().forEach(dto -> {
try {
processDTO(dealerCode, yearPeriod, monthPeriod, dto);
} catch (FileAlreadyExistsInS3Exception e) {
failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
}
})).get();
forkJoinPool.shutdown();
Unfortunately, the results were a bit confusing for me.
When PARALLELISM_NUMBER is 8, it takes around 13 seconds to complete the whole process. Not a big improve.
When PARALLELISM_NUMBER is 16, it takes around 8 seconds to complete the whole process.
When PARALLELISM_NUMBER is 32, it takes around 5 seconds to complete the whole process.
All tests were done using postman requests, calling the controller method which will end-up iterating the 130 items
I'm satisfied with 5 seconds, using 32 as PARALLELISM_NUMBER, but I'm worried about the consequences.
Is it ok to keep 32?
What is the ideal PARALLELISM_NUMBER?
What do I have to keep in mind when deciding its value?
I'm running on a Mac 2.2GHZ I7
sysctl hw.physicalcpu hw.logicalcp
hw.physicalcpu: 4
hw.logicalcpu: 8
Here's what processDTO does:
private void processDTO(int dealerCode, int yearPeriod, int monthPeriod, AbstractDTO dto) throws FileAlreadyExistsInS3Exception {
String flatJson = JsonFlattener.flatten(new JSONObject(dto).toString());
String jsonFileName = dto.fileName() + JSON_TYPE;;
String jsonFilePath = buildFilePathNew(dto.endpoint(), dealerCode, yearPeriod, monthPeriod, AWS_S3_JSON_ROOT_FOLDER);
uploadFileToS3(jsonFilePath + jsonFileName, flatJson);
}
public void uploadFileToS3(String fileName, String fileContent) throws FileAlreadyExistsInS3Exception {
if (s3client.doesObjectExist(bucketName, fileName)) {
throw new FileAlreadyExistsInS3Exception(ErrorMessages.FILE_ALREADY_EXISTS_IN_S3.getMessage());
}
s3client.putObject(bucketName, fileName, fileContent);
}

The parallelism parameters decides how many threads will be used by ForkJoinPool. That's why by default parallelism value is the available CPU core count:
Math.min(MAX_CAP, Runtime.getRuntime().availableProcessors())
In your case the bottlneck should be checking that a file exists and uploading it to S3. The time here will depend on at least few factors: CPU, network card and driver, operating system, other. It seems that S3 network operation time is not CPU bound in your case as you are observing improvement by creating more simulations worker threads, perhaps the network request are enqueued by the operating system.
The right value for parallelism varies from one workload type to another. A CPU-bound workflow is better with the default parallelism equal to CPU cores due to the negative impact of context switching. A non CPU-bound workload like yours can be speed up with more worker threads assuming the workload won't block the CPU e.g. by busy waiting.
There is no one single ideal value for parallelism in ForkJoinPool.

I managed to reduce to 8 seconds thanks to all your helpful advices and explanations.
Since the bottleneck was the upload to aws s3, and you mentioned a non-blocking API at aws, after some research, I found out that the class TransferManager contains a non-blocking upload.
TransferManager class
So instead of using ForkJoinPool to increase the number of threads, I kept the simple parallelStream:
dtos.parallelStream().forEach(dto -> {
try {
processDTO(dealerCode, yearPeriod, monthPeriod, dto);
} catch (FileAlreadyExistsInS3Exception e) {
failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
}
});
And the uploadToS3Method changed a bit, instead of using an AmazonS3, I used the TransferManager:
public Upload uploadAsyncFileToS3(String fileName, String fileContent) throws FileAlreadyExistsInS3Exception {
if (s3client.doesObjectExist(bucketName, fileName)) {
throw new FileAlreadyExistsInS3Exception(ErrorMessages.FILE_ALREADY_EXISTS_IN_S3.getMessage());
}
InputStream targetStream = new ByteArrayInputStream(fileContent.getBytes());
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(fileContent.getBytes().length);
return transferManager.upload(bucketName, fileName, targetStream, metadata);
}
This way, when the upload is called, it doesn't wait for it to finish, letting another DTO to be processed. When all DTO are processed, I check their upload status to see possible errors (outside the first forEach)

Vert.x performance drop when starting with -cluster option

I'm wondering if any one experienced the same problem.
We have a Vert.x application and in the end it's purpose is to insert 600 million rows into a Cassandra cluster. We are testing the speed of Vert.x in combination with Cassandra by doing tests in smaller amounts.
If we run the fat jar (build with Shade plugin) without the -cluster option, we are able to insert 10 million records in about a minute. When we add the -cluster option (eventually we will run the Vert.x application in cluster) it takes about 5 minutes for 10 million records to insert.
Does anyone know why?
We know that the Hazelcast config will create some overhead, but never thought it would be 5 times slower. This implies we will need 5 EC2 instances in cluster to get the same result when using 1 EC2 without the cluster option.
As mentioned, everything runs on EC2 instances:
2 Cassandra servers on t2.small
1 Vert.x server on t2.2xlarge

You are actually running into corner cases of the Vert.x Hazelcast Cluster manager.
First of all you are using a worker Verticle to send your messages (30000001). Under the hood Hazelcast is blocking and thus when you send a message from a worker the version 3.3.3 does not take that in account. Recently we added this fix https://github.com/vert-x3/issues/issues/75 (not present in 3.4.0.Beta1 but present in 3.4.0-SNAPSHOTS) that will improve this case.
Second when you send all your messages at the same time, it runs into another corner case that prevents the Hazelcast cluster manager to use a cache of the cluster topology. This topology cache is usually updated after the first message has been sent and sending all the messages in one shot prevents the usage of the ache (short explanation HazelcastAsyncMultiMap#getInProgressCount will be > 0 and prevents the cache to be used), hence paying the penalty of an expensive lookup (hence the cache).
If I use Bertjan's reproducer with 3.4.0-SNAPSHOT + Hazelcast and the following change: send message to destination, wait for reply. Upon reply send all messages then I get a lot of improvements.
Without clustering : 5852 ms
With clustering with HZ 3.3.3 :16745 ms
With clustering with HZ 3.4.0-SNAPSHOT + initial message : 8609 ms
I believe also you should not use a worker verticle to send that many messages and instead send them using an event loop verticle via batches. Perhaps you should explain your use case and we can think about the best way to solve it.

When you're you enable clustering (of any kind) to an application you are making your application more resilient to failures but you're also adding a performance penalty.
For example your current flow (without clustering) is something like:
client ->
vert.x app ->
in memory same process eventbus (negletible) ->
handler -> cassandra
<- vert.x app
<- client
Once you enable clustering:
client ->
vert.x app ->
serialize request ->
network request cluster member ->
deserialize request ->
handler -> cassandra
<- serialize response
<- network reply
<- deserialize response
<- vert.x app
<- client
As you can see there are many encode decode operations required plus several network calls and this all gets added to your total request time.
In order to achive best performance you need to take advantage of locality the closer you are of your data store usually the fastest.

Just to add the code of the project. I guess that would help.
Sender verticle:
public class ProviderVerticle extends AbstractVerticle {
#Override
public void start() throws Exception {
IntStream.range(1, 30000001).parallel().forEach(i -> {
vertx.eventBus().send("clustertest1", Json.encode(new TestCluster1(i, "abc", LocalDateTime.now())));
});
}
#Override
public void stop() throws Exception {
super.stop();
}
}
And the inserter verticle
public class ReceiverVerticle extends AbstractVerticle {
private int messagesReceived = 1;
private Session cassandraSession;
#Override
public void start() throws Exception {
PoolingOptions poolingOptions = new PoolingOptions()
.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 3)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 1)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 3)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 20)
.setMaxQueueSize(32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 20);
Cluster cluster = Cluster.builder()
.withPoolingOptions(poolingOptions)
.addContactPoints(ClusterSetup.SEEDS)
.build();
System.out.println("Connecting session");
cassandraSession = cluster.connect("kiespees");
System.out.println("Session connected:\n\tcluster [" + cassandraSession.getCluster().getClusterName() + "]");
System.out.println("Connected hosts: ");
cassandraSession.getState().getConnectedHosts().forEach(host -> System.out.println(host.getAddress()));
PreparedStatement prepared = cassandraSession.prepare(
"insert into clustertest1 (id, value, created) " +
"values (:id, :value, :created)");
PreparedStatement preparedTimer = cassandraSession.prepare(
"insert into timer (name, created_on, amount) " +
"values (:name, :createdOn, :amount)");
BoundStatement timerStart = preparedTimer.bind()
.setString("name", "clusterteststart")
.setInt("amount", 0)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStart);
EventBus bus = vertx.eventBus();
System.out.println("Bus info: " + bus.toString());
MessageConsumer<String> cons = bus.consumer("clustertest1");
System.out.println("Consumer info: " + cons.address());
System.out.println("Waiting for messages");
cons.handler(message -> {
TestCluster1 tc = Json.decodeValue(message.body(), TestCluster1.class);
if (messagesReceived % 100000 == 0)
System.out.println("Message received: " + messagesReceived);
BoundStatement boundRecord = prepared.bind()
.setInt("id", tc.getId())
.setString("value", tc.getValue())
.setTimestamp("created", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(boundRecord);
if (messagesReceived % 100000 == 0) {
BoundStatement timerStop = preparedTimer.bind()
.setString("name", "clusterteststop")
.setInt("amount", messagesReceived)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStop);
}
messagesReceived++;
//message.reply("OK");
});
}
#Override
public void stop() throws Exception {
super.stop();
cassandraSession.close();
}
}

Random occurrences of java.net.ConnectException

I'm experiencing java.net.ConnectException in random ways.
My servlet runs in Tomcat 6.0 (JDK 1.6).
The servlet periodically fetches data from 4-5 third-party web servers.
The servlet uses a ScheduledExecutorService to fetch the data.
Run locally, all is fine and dandy. Run on my prod server, I see semi-random failures to fetch data from 1 of the third parties (Canadian weather data).
These are the URLs that are failing (plain RSS feeds):
http://weather.gc.ca/rss/city/pe-1_e.xml
http://weather.gc.ca/rss/city/pe-2_e.xml
http://weather.gc.ca/rss/city/pe-3_e.xml
http://weather.gc.ca/rss/city/pe-4_e.xml
http://weather.gc.ca/rss/city/pe-5_e.xml
http://weather.gc.ca/rss/city/pe-6_e.xml
http://meteo.gc.ca/rss/city/pe-1_f.xml
http://meteo.gc.ca/rss/city/pe-2_f.xml
http://meteo.gc.ca/rss/city/pe-3_f.xml
http://meteo.gc.ca/rss/city/pe-4_f.xml
http://meteo.gc.ca/rss/city/pe-5_f.xml
http://meteo.gc.ca/rss/city/pe-6_f.xml
Strange: each cycle, when I periodically fetch this data, the success/fail is all over the map: some succeed, some fail, but it never seems to be the same twice. So, I'm not completely blocked, just randomly blocked.
I slowed down my fetches, by introducing a 61s pause between each one. That had no effect.
The guts of the code that does the actual fetch:
private static final int TIMEOUT = 60*1000; //msecs
public String fetch(String aURL, String aEncoding /*UTF-8*/) {
String result = "";
long start = System.currentTimeMillis();
Scanner scanner = null;
URLConnection connection = null;
try {
URL url = new URL(aURL);
connection = url.openConnection(); //this doesn't talk to the network yet
connection.setConnectTimeout(TIMEOUT);
connection.setReadTimeout(TIMEOUT);
connection.connect(); //actually connects; this shouldn't be needed here
scanner = new Scanner(connection.getInputStream(), aEncoding);
scanner.useDelimiter(END_OF_INPUT);
result = scanner.next();
}
catch (IOException ex) {
long end = System.currentTimeMillis();
long time = end - start;
fLogger.severe(
"Problem connecting to " + aURL + " Encoding:" + aEncoding +
". Exception: " + ex.getMessage() + " " + ex.toString() + " Cause:" + ex.getCause() +
" Connection Timeout: " + connection.getConnectTimeout() + "msecs. Read timeout:" +
connection.getReadTimeout() + "msecs."
+ " Time taken to fail: " + time + " msecs."
);
}
finally {
if (scanner != null) scanner.close();
}
return result;
}
Example log entry showing a failure:
SEVERE: Problem connecting to http://weather.gc.ca/rss/city/pe-5_e.xml Encoding:UTF-8.
Exception: Connection timed out java.net.ConnectException: Connection timed out
Cause:null
Connection Timeout: 60000msecs.
Read timeout:60000msecs.
Time taken to fail: 15028 msecs.
Note that the time to fail is always 15s + a tiny amount.
Also note that it fails to reach the configured 60s timeout for the connection.
The host-server admins (Environment Canada) state that they don't have any kind of a blacklist for the IP address of misbehaving clients.
Also important: the code had been running for several months without this happening.

Someone suggested that instead I should use curl, a bash script, and cron. I implemented that, and it works fine.
I'm not able to solve this problem using Java.

Db4o client doesn't seem to get commit events from other clients

I have a system with a db4o server that has two clients. One client is hosted in process, the other is a web server hosting a number of servlets that need to query the database.
In the web server's connection code, I have registered for the commit event, and use it to refresh objects as suggested by the db4o documentation at http://community.versant.com/documentation/reference/db4o-8.0/java/reference/Content/advanced_topics/callbacks/possible_usecases/committed_event_example.htm :
client = Db4oClientServer.openClient (context.getBean ("db4oClientConfiguration", ClientConfiguration.class),
arg0.getServletContext ().getInitParameter ("databasehost"),
Integer.parseInt (arg0.getServletContext ().getInitParameter ("databaseport")),
arg0.getServletContext ().getInitParameter ("databaseuser"),
arg0.getServletContext ().getInitParameter ("databasepassword"));
System.out.println ("DB4O connection established");
EventRegistry events = EventRegistryFactory.forObjectContainer (client);
events.committed ().addListener (new EventListener4<CommitEventArgs> () {
public void onEvent (Event4<CommitEventArgs> commitEvent, CommitEventArgs commitEventArgs)
{
for (Iterator4<?> it = commitEventArgs.updated ().iterator (); it.moveNext ();)
{
LazyObjectReference reference = (LazyObjectReference) it.current ();
System.out.println ("Updated object: " + reference.getClass () + ":" + reference.getInternalID ());
//if (trackedClasses.contains (reference.getClass ()))
{
Object obj = reference.getObject ();
commitEventArgs.objectContainer ().ext ().refresh (obj, 1);
System.out.println (" => updated (" + obj + ")");
}
}
}
});
In the in-process client, the following code is then executed:
try {
PlayerCharacter pc = new PlayerCharacter (player, name);
pc.setBio(bio);
pc.setArchetype(archetype);
player.getCharacters ().add (pc);
database.store (pc);
database.store (player.getCharacters ());
database.store (player);
database.commit ();
con.sendEvent (id, "CHARACTER_CREATED".getBytes (Constants.CHARSET));
}
catch (Exception e)
{
con.sendEvent (id, EventNames.ERROR, e.toString ());
}
The 'CHARACTER_CREATED' event gets sent successfully, so I know that commit isn't throwing an exception, but nothing shows up on the other client. It continues to use the old versions of the objects, and the 'updated object' messages I'm expecting don't show up on the server console.
Any ideas what I'm doing wrong?

Apparently the .committed() event only fires on a client, when the commit is from a other TCP client.
So you would need to turn your internal .openClient() / .openSession() clients to full blown TCP clients to see the events.
The .openClient() / .openSession() object containers are way more light weight and bypass all code which is related to network communication. Apparently also the event distribution across the network.

Turning onMessage() method into an atomic action

I've encounter the problem that if my method below fails or it's an exception I still consume the msg. I would want the functionality to do a rollback during the catch and place the msg back on the queue/topic.
public void onMessage(Message message)
{
String messageId = null;
Date messagePublished = null;
try
{
messageId = message.getJMSMessageID();
messagePublished = new Date(message.getJMSTimestamp());
LOGGER.info("JMS Message id =" + messageId + " JMS Timestamp= " + messagePublished);
process(message);
LOGGER.info(" returning from onMessage() successfully =" + messageId + " JMS Timestamp= " + messagePublished);
}
catch(Throwable t)
{
LOGGER.error("Exception:",t);
LOGGER.error(t.getStackTrace() + "\n Exception is unrecoverable.");
throw new RuntimeException("Failed to handle message.",t);
}
}

You can look at the different acknowledge modes that exist within JMS for this. See this article http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html.
The appropriate mode for you would be Client mode.
So basically, the client needs to acknowledge when they are happy they have processed the message.
You could call the acknowledge after the call to process(message), if an exception occurs in the proccess(message) method, the message will not have been dequeued as you didnt acknowledge it. We used this approach before with Oracle AQ and it works very well.
This approach means you dont have to worry about transactions for the messages on the queue (Database transactions are another story). The only thing you need to ensure is that your app can handle a call to process(message) with potential duplicate messages

you should be able to just make your onMessage method transacted.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.