We've developed a Java standalone program. We've configured in our Linux (RedHat ES 4) cron
schedule to execute this Java standalone every 10 minutes. Each standalone
may sometime take more than 1 hour to complete, or sometime it may complete
even within 5 minutes.
My problem/solution I'm looking for is, the number of Java standalones executing
at any time should not exceed, for example, 5 process. So, for example,
before even a Java standalone/process starts, if there are already 5 processes running,
then this process should not be started; otherwise this would indirectly start
creating OutOfMemoryError problems. How do I control this? I would also like to make this 5 process limit configurable.
Other Information:
I've also configured -Xms and -Xmx heap size settings.
Is there any tool/mechanism by which we can control this?
I also heard about Java Service Wrapper. What is this all about?
You can create 5 empty files (with names "1.lock",...,"5.lock") and make the app to lock one of them to execute (or exit if all files are already locked).
First, I am assuming you are using the words "thread" and "process" interchangably. Two ideas:
Have the cron job be a script that will check the currently running processes and count them. If less than threshold spawn new process, otherwise exit, here threshold can be defined in your script.
Have the main method in your executing java file check some external resource (a file, database table, etc) for a count of running processes, if it is below threshold increment and start process, otherwise exit (this is assuming the simple main method will not be enough to cause your OOME problem). You may also need to use an appropriate locking mechanism on the external resource (though if your job is every 10 minutes, this may be overkill), here you could defin threshold in a .properties, or some other configuration file for your program.
Java Service Wrapper helps you set up a java program as a Windows service or a *nix daemon. It doesn't really deal with the concurrency issue you are looking at--the closest thing is a config setting that disallows concurrent instances if its a Windows service.
Related
I am running a task via Docker on AWS's ECS. The task does some calculations which are CPU-bound, which I would like to run in parallel. I start a thread pool with the number of threads specified in Runtime.getRuntime().availableProcessors() which works fine locally on my PC. For some reason, on AWS ECS, this always returns 1, even though there are multiple cores available. Therefore my calculations run serially, and do not utilize the multiple cores.
For example, right now, I have a task running on a "t3.medium" instance which should have 2 cores according to the docs.
When I execute the following code:
System.out.println("Java reports " +
Runtime.getRuntime().availableProcessors() + " cores");
Then the following gets displayed on the log:
Java reports 1 cores
I do not specify the cpu parameter in ECS's task definition. I see that in the list of tasks within the ECS Management Console it has a column for "CPU" which reads 0 for my task. I also notice that in the list of instances (= VMs) it lists "CPU available" as 2048 which presumably has something to do with the fact the VM has 2 cores.
I would like my Java program to see all cores that the VM has to offer. (As would normally be the case when a Java program runs on a computer without Docker).
How do I go about doing that?
Thanks to #stdunbar in the comments for pointing me in the right direction.
EDIT: Thanks to #Imran in the comments. If you start lots of threads, they will absolutely be scheduled to multiple cores. This answer is only about getting Runtime.getRuntime().availableProcessors() to return the right value. Many "thread pools" start as many threads as that method returns: it should return the number of cores available.
There seem to be two main solutions, neither of which is ideal:
Set the cpu parameter in the task definition. For example, if you have 2 cores and want to use them both you have to set "cpu":2048 in the task's definition. This isn't very convenient for two reasons:
If you choose a bigger instance, you have to make sure to update this parameter.
If you want to have two tasks running simultaneously, both of which can sporadically use all cores for short-term activities, AWS will not schedule two tasks on a 2-core system with "cpu":2048. It says the VM is "full" from a CPU perspective. This goes against the timesharing (Unix etc.) philosophy of every task taking what it needs (for example, imagine on a desktop PC, if you run Word and Excel on a dual-core computer, and Windows wouldn't allow you to start any other tasks, on the grounds that Word might need all of one core, and Excel might do too, so if another program might need all the core at the same time, there wouldn't be enough cores.)
Use the -XX:ActiveProcessorCount=xx JVM option in JDK 10 onwards, as described here. This isn't convenient because:
As above, you have to change the value if you change your instance type.
I wrote a longer blog post describing my findings here: https://www.databasesandlife.com/java-docker-aws-ecs-multicore/
I am using Tanuki Software Wrapper for building a java application as Windows Service . I follow the example Simple HelloWorldServer Java Class and it works fine . I have made configure in wrapper.conf file wrapper.ntservice.starttype = AUTO_START for automatically starting the service on windows system starting .
But i want that my service would be automatically started on every two hours , how can i do it , if any one has idea please help me .
Thanks a lot in advance .
Finally I have done through following configuration in the wrapper.conf file as
wrapper.pausable=TRUE
wrapper.pause-on-startup=TRUE
wrapper.timer.1.interval=minute=120
wrapper.timer.1.action=restart, resume
wrapper.on_exit.default=PAUSE
It basically pause the wrapper action after main jvm(java application) is closed , and then after 2 hours it automatically restart wrapper's local JVM and resume the required output with updated data .
Thanks to all for trying to help me .
It's better to keep your java application running, and schedule tasks from within your application.
E.g. use http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html
If you schedule a task in your main() method, a new Timer Thread is started, so the application will keep running after the main() has ended, and keep executing the scheduled task at the rate you specified.
Ajeet,
As GreyFairer said, it is usually a good idea to run tasks from within the JVM, especially if they happen often.
The Wrapper's ability to stop and start the JVM using the pausable feature definitely works as well. This approach can be better if your JVM is large, and the task it needs to complete is relatively infrequent. There is a bit of load to launch a JVM.
Relaunching the JVM as you are doing also has the benefit of allowing you to change the configuration for each invocation if you combine configuration include files with the wrapper.restart.reload_configuration=TRUE property. You can modify the include file as needed so each JVM runs with the needed information. (There are of course ways of getting the same results within a single JVM invocation if needed.)
Cheers, Leif
I have a .bat file in a Windows machine that starts our program by calling a main class of a Java executable(.Jar)
Now I need to run this every 30 mins.
I gone through several ways of doing it, but unable to decide which is better.
Scheduling through Windows scheduler or Using Java Timer. Which one to choose?
I want only one instance of the process running. If the previous process doesnt complete within 30min, i could wait.
Please let me know what to go for, based on my use case.
Thanks in advance.
You're better off using the Windows Scheduler. If there's a real risk of the process taking too long, you can create a file, or open a socket while the process is running and when another one tries to start up, it can detect that and simply quit. This would make it "miss" the 30m window (i.e. if the first job started at 12 and finished at 12:35, the next job would not start until 1).
But this way you don't have to worry at all about setting up long running processes, starting and stopping the java service, etc. The Windows scheduler just makes everything easier for something like this.
TimerTask is not a scheduling system, it is a library that provides tools for in-app scheduling. Seems that for your use-case you need a the system: you need it to run whether or not your app is running, you need reporting, etc. Windows Scheduler (or cron on unix/linux) is more appropriate for your needs.
My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output.
Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".
When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time.
Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.
When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.
The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.
What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?
I'm expecting this to be something very silly that I've overlooked.
I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367
This implements the feature I was looking for in Hadoop 0.21
It introduces the flag mapreduce.local.map.tasks.maximum to control it.
For now I've also found the solution described here in this question.
I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.
Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.
Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine
Just for clarification...
If hadoop runs in local mode you don't have parallel execution on a task level (except you're running >= hadoop 0.21 (MAPREDUCE-1367)). Though you can submit multiple jobs at once and these getting executed in parallel then.
All those
mapred.tasktracker.{map|reduce}.tasks.maximum
properties do only apply to the hadoop running in distributed mode!
HTH
Joahnnes
According to this thread on the hadoop.core-user email list, you'll want to change the mapred.tasktracker.tasks.maximum setting to the max number of tasks you would like your machine to handle (which would be the number of cores).
This (and other properties you may want to configure) is also documented in the main documentation on how to setup your cluster/daemons.
What you want to do is run Hadoop in "pseudo-distributed" mode. One machine, but, running task trackers and name nodes as if it were a real cluster. Then it will (potentially) run several workers.
Note that if your input is small Hadoop will decide it's not worth parallelizing. You may have to coax it by changing its default split size.
In my experience, "typical" Hadoop jobs are I/O bound, sometimes memory-bound, way before they are CPU-bound. You may find it impossible to fully utilize all the cores on one machine for this reason.
I have an application that has a license for a set number of cpus and I want to be able to set the number of cpus that java runs in to 1 before the check is done. I am running Solaris and have looked at pbind but thought that if I started the application and then used pbind it would have checked the license before it had set the number of CPUs that java could use.
Does anyone know a way of starting an application with a set number of CPUs on Solaris?
It is a workaround, but using Solaris 10 you could set up a zone with a single CPU available and then run the application inside that zone.
If you want to do testing without running the full application, this bit of Java is most likely what they are using to get the number of CPU's:
Runtime runtime = Runtime.getRuntime();
int nrOfProcessors = runtime.availableProcessors();
A full example here.
This isn't a complete solution, but might be enough to develop into one. There's definitely a point at which the java process exists (and thus can be controlled by pbind) and at which point it hasn't yet run the code to do the processor check. If you could pause the launch of the application itself until pbind had done its work, this should be OK (assuming that the pbind idea will work from the CPU-checking point of view).
One way to do this that should definitely pause the JVM at an appropriate place is the socket attach for remote debuggers and starting with suspend mode. If you pass the following arguments to the java invocation:
-Xdebug -Xrunjdwp:transport=dt_socket,address=8000,suspend=y,server=y
then the JVM will pause after starting the java process but before executing the main class, until a debugger/agent is attached to port 8000.
So perhaps it would be possible to use a wrapper script to start the program in the background with these parameters, sleep for a second or so, use pbind to set the number of processors to one for the java process, then attach and detach some agent to port 8000 (which will be enough to get Java to proceed with execution).
Flaws or potential hiccoughs in this idea would be whether running in debug mode would notably affect performance of your app (it doesn't seem to have a big impact in general), whether you can control some kind of no-op JDWP agent from the command line, and whether you're able to open ports on the machine. It's not something I've tried to automate before (though I've used something broadly similar in a manual way to increase the niceness of a Java process before letting it loose), so there might be other issues I've overlooked.
I think the most direct answer to your question is to use pbind to bind the running shell process, and then start Java from that shell. According to the man page the effects of pbind are inherited by processes that are created from a bound process. Try this:
% pbind -b 0 $$
% java ...
Googling over, I found that you are right, pbind binds processes to processors.
More info and examples at: http://docs.sun.com/app/docs/doc/816-5166/pbind-1m?a=view