We're developing a web app and are coming to the end of development, and the client we're working with has suddenly sprung the fact on us that we will need to be able to handle load balancing.
We have batch jobs running which would need to run on both servers, but we don't want them to overlap. They are selecting rows from the database, processing the objects, and merging them back into the database. One of these jobs MUST run at the same time each day, though the others run every n minutes. We have about a week at most to get something working, and it'll become technical debt for us.
My question is: what quick and dirty hacks exist to get this working properly? We have a SQLServer 2008 instance and are running Java EE 6 on JBoss 5, which will be load balanced between two servers. We're using Spring 3 and JPA2 backed by Hibernate, and using the stock spring scheduler to schedule and run the jobs. Help me Obi Wan Kenobi; you're my only hope!
on jboss5 u need to use Scheduler API as the simplest solution - the implmentation is built on top of quartz and generally you would user clustered configuration like described here
http://quartz-scheduler.org/documentation/quartz-2.x/configuration/ConfigJDBCJobStoreClustering
Almost 10 years after this question was asked, I had the same need and the "quickest and dirty-ess" solution for me was a load balancer using shared file system without any master.
Each worker locks->picks the jobs from the shared file system, independent of other workers. To balance load, each worker sleeps X seconds between each job polling iteration, where X is proportional to load on the worker (in my case load is count of processes started by worker in the background). Thus high load sleeper gives higher probability to other workers to pick up the next job. Worker loops are running under supervisor (linux).
My use case was execution of sparklyr client-mode jobs on Spark/Hadoop cluster without overloading the edge nodes. It was implemented as a bash script within few hours and then scaled to 3 hosts, and has been stable for some months now - till there is time to invest in a better solution.
Related
I have a spring batch app that's runs on tomcat 8.5.
This batch works with lots of data such as ten Million records and it is too slow.
I want to find most time consuming parts such as database queries E.G, socket IO, thread blocking or waiting, CPU consuming, or garbage collection that maybe slows down the app.
I 'm mostly suspicious to jdbc queries E.G, socket IO.
I tried to use local partitioning to scale it up and give more memory to tomcat and increase commit interval in spring batch settings.
I had a look at socketIO tab in Jmc and logged execution time of one of the methods it shows,but it only takes 15 up to 30 milliseconds.
Another problem is that Jmc only shows percentages not exact time. So, I could not figure out how long it takes.
I'm a little confused.
Thanks too much in advance.
Question: How to create a lightweight on-demand instance, preconfigured w/ Java8 and my code, pull a task from a task queue, execute the memory-intensive tasks, and shut itself down. (on-demand, high memory, medium cpu, single task executors)
History: I was successfully using Google App Engine Task Queue in Java for "bursty" processing of relatively rare events - maybe once a week someone would submit a form, the form creates ~10 tasks, the system would chew up some memory and CPU cycles thinking about the tasks for a few minutes, save the results, and the webpage would be polling the backend for completion. It worked great within Google App Engine - Auto scaling would remove all idle instances, Task Queues would handle getting the processing done, I'd make sure not to overload things by setting the max-concurrent-requests=1, and life was good!
But then my tasks got too memory intensive for instance-class: F4_1G 😢 I'd love to pick something with more memory, but that isn't an option. So I need to figure something out.
I think my best bet is to spin up a generic instance using the API com.google.api.services.compute.model.Instance but get stopped there. I'm so spoiled with how easy the Task Queue was to build that I'd hate to get lost in the weeds just to get the higher memory instance - I don't need a cluster, and don't need any sort of reliability!
Is this a docker container thing?
Is it going to be hard auth-wise to pull from the Pull Queue outside of GAE?
Is it crazy to spin up/down an instance (container?) for each task if a task is ~10 minutes?
I found some similar questions, but no answers that quite fit:
How to Consume App Engine Task Queue Via Compute Engine Instance
How do I integrate google app engine, taks queue and google compute engine?
I would have a read about GAE modules. These can be set to use basic scaling so an instance gets created on demand, then expires some time later, set by you in your appengine-web.xml using something such as:
<basic-scaling>
<max-instances>2</max-instances>
<idle-timeout>5m</idle-timeout>
</basic-scaling>
If the module processes requests from a task queue then is has 10 minutes to get its job done, which is probably ample for many tasks.
I developed a plugin of my own in Neo4j in order to speed the process of inserting node. Mainly because I needed to insert node and relationship only if they didn't exists before which can be too slow using the REST API.
If I try to call my plugin a 100 time, inserting roughly 100 nodes and 100 relationship each time, it take approximately 350ms on each call. Each call is inserting different nodes, in order to rule out locking cause.
However if I parallelize my calls (2, 3 , 4... at time), the response time drop accordingly to the parallelism degree. It takes 750ms to insert my 200 objects when I do 2 call at a time, 1000ms when I do 3 etc.
I'm calling my plugin from a .NET MVC controller, using HttpWebRequest. I set the maxConnection to 10000, and I can see all the TCP connection opened.
I investigated a little on this issue but it seems very wrong. I must have done something wrong, either in my neo4j configuration, or in my plugin code. Using VisualVM I found out that the threads launched by Neo4j to handle my calls are working sequentially. See the picture linked.
http://i.imgur.com/vPWofTh.png
My conf :
Windows 8, 2 core
8G of RAM
Neo4j 2.0M03 installed as a service with no conf tuning
Hope someone will be able to help me. As it is, I will be unable to use Neo4j in production, where there will be tens of concurrent calls, which cannot be done sequentially.
Neo4j is transactional. Every commit triggers an IO operation on filesystem which needs to run in a synchronized block - this explains the picture you've attached. Therefore it's best practice to run writes single threaded. Any pre-processing prior can of course benefit from parallelizing.
In general for maximum performance go with the stable version (1.9.2 as of today). Early milestone builds are not optimized yet, so you might get a wrong picture.
Another thing to consider is the transaction size used in your plugin. 10k to 50k in a single transaction should give you best results. If your transactions are very small, transactional overhead is significant, in case of huge transactions, you need lots of memory.
Write performance is heavily driven by the performance of underlying IO subsystem. If possible use fast SSD drives, even better stripe then.
I have 2 exactly same machines(COM1 - COM2) and both are single-core.
Both machines have couchdb and tomcat running
My application queries the database via rest requests, and i am implemented a threadpool of 10 to fasten the process. Each thread has its own database instance.
When I set my application to use the local database with threadpool (war file is in COM1, database is in COM1), 30 queries take 431.83 miliseconds. Same config without threadpool it takes 823.83 miliseconds.
However when I set it to use the remote database with threadpool, (war is in COM1, database is in COM2), 30 queries take 276.52 miliseconds. Same config without threadpool it takes 960.00 miliseconds.
My questions are:
Why I am getting spead increase in single core when I use thread pool?
Why remote database configuration is faster than local one?
Thanks
Why I am getting spead increase in single core when I use thread pool?
Threads aren't always doing stuff on CPU. Some will be reading data from disk, network, memory etc., and other threads can use the CPU in the mean time. If you have a espresso maker AND a milk steamer, letting two people work on producing cappuccino will be faster than having one guy working at it.
Why remote database configuration is faster than local one?
If your query is CPU intensive, it is conceivable that having two CPUs at hand gains enough performance that your loss in network latency is compensated. I.e. if your espresso making takes enough time, then it makes sense to use the espresso maker in the next floor even if you have to climb the stairs. Note that doing that if you only have one guy makes no sense. That's why you get 960ms instead of 823ms for that setup (i.e. useless stair climbing).
My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output.
Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".
When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time.
Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.
When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.
The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.
What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?
I'm expecting this to be something very silly that I've overlooked.
I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367
This implements the feature I was looking for in Hadoop 0.21
It introduces the flag mapreduce.local.map.tasks.maximum to control it.
For now I've also found the solution described here in this question.
I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.
Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.
Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine
Just for clarification...
If hadoop runs in local mode you don't have parallel execution on a task level (except you're running >= hadoop 0.21 (MAPREDUCE-1367)). Though you can submit multiple jobs at once and these getting executed in parallel then.
All those
mapred.tasktracker.{map|reduce}.tasks.maximum
properties do only apply to the hadoop running in distributed mode!
HTH
Joahnnes
According to this thread on the hadoop.core-user email list, you'll want to change the mapred.tasktracker.tasks.maximum setting to the max number of tasks you would like your machine to handle (which would be the number of cores).
This (and other properties you may want to configure) is also documented in the main documentation on how to setup your cluster/daemons.
What you want to do is run Hadoop in "pseudo-distributed" mode. One machine, but, running task trackers and name nodes as if it were a real cluster. Then it will (potentially) run several workers.
Note that if your input is small Hadoop will decide it's not worth parallelizing. You may have to coax it by changing its default split size.
In my experience, "typical" Hadoop jobs are I/O bound, sometimes memory-bound, way before they are CPU-bound. You may find it impossible to fully utilize all the cores on one machine for this reason.