Remote debugging threads with different debuggers

Remote debugging threads with different debuggers - java

I have an application which is scheduler running different threads.
The application may load new Runnable classes and run them.
Currently the application is in production, that is it's running on remote server.
My team consists of 3 people developing Runnable classes.
When the class is ready, it's uploaded to server and loaded to scheduler.
I would like to give my team the ability to debug specific threads.
That is: person A may debug threads of Runnable A, B-B, and so on.
Giving them the full access to the remote JVM is not a solution, because
the developers are not allowed to see the system core, and each others solutions.
So my question is: how to allow multiple remote debugging with thread specific connections?
Preferable IDE: Eclipse
EDIT:
It's possible to connect remotely to specific thread with jdb
http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html
Here is an example: http://www.itec.uni-klu.ac.at/~harald/CSE/Content/debugging.html
1) Find your thread with jdb threads
2) Put breakpoint and enter the wanted thread
Still the security issue stays.
One solution was to compile protected code without debug symbols, but it will only protect the core, allow seeing each other's threads.
So, next step - digging Security Manager. Maybe there's privilege layer suitable for my situation.

I'm not sure I've got a good answer to your question, but let's see how it pans out.
As I understand it you want to allow different developers to debug their class alone, and their class runs as a thread as part of a single Java process.
On the face of it that sort of runs counter to the nature of debugging in that normally you have access to everything in the process. I don't imagine that Java is any different to any other language in this respect (I'm no Java programmer).
So how about running the classes in separate Java processes. That way I presume the standard Eclipse tools would allow each developer to remote attach and debug their class.
However I presume that these classes need to interact with each other in some way, otherwise you wouldn't be asking your question in the first place. And running each class in a separate process (JVM) sounds like a bad thing as far as interaction is concerned.
So how about a different form of interaction where tbe process boundary between each class doesn't really matter that much? You could look at using JCSP which, as far as I can tell, doesn't really care if two threads are in the same process or not.
It's a completely different interaction model, based solely on synchronous message passing. You get some nice fringe benefits - scalability is suddenly no longer a massive problem, and it allows you to dodge many pitfalls normally associated with multithreaded programs (deadlock, etc). However if you've already written a large amount of code, adopting JCSP is probably a significant rewrite.
Is that anywhere near the mark? Good luck.

Related

What is an execution environment?

Providing that I didn't attend college lessons on hardware or on OS topics - I've only tried to follow some youtube videos and read some online articles about it (without significant success); someone could explain me what is an execution environment? I study Java and now I'm starting multithreading. On an Oracle tutorial section I've found this definition of process and thread: "Both processes and threads provide an execution environment". The problem Is that I really don't get what does this mean.

In this context, "execution environment" means a context in which code can execute (run). A process can use multiple threads in parallel. A thread executes a single stack of code at a time.
This is a gross oversimplification, but hopefully you get the point.

'Execution Environment' or 'Environment' terms are extremely overloaded. They are used to mean different things at different context.(Somehow related though)
There many layers of abstractions in a software system. Each layer depends on something from the immediate below layers and abstracted from realities coming from layers further below. For instance, Computers depends fundamentally on physics. So at the bottom, there is physics. On top of physics, there is transistors. On top of transistors, there is logic gates. On top of logic gates, there is logic circuits. On top of that, there is microarchitecture and on top of that, there is architecture (instructions set, registers etc.). On top of that there is software written using the architecture. The software is further divided into operating system and applications. OS and applications have layered parts themselves.
So, everything depends on something else. That something else is called environment. In other words, everything your part do not implement but depends on called 'environment'.
OS is an environment for applications because of the fact that when application performs an I/O, they use the operating system provided functionalities by making system calls.
Most platforms like JAVA or C++ runtime, further provide functionality to the applications. Hence, they are also called a form of environment.
In the context of processes and threads, these things can be said...
Operating systems do not run a single program at a time to improve hardware utilization. (When the code runs it may block for several reasons, and during that time OS runs another program). Furthermore, there is something called 'timesharing' which means OS allocates the CPU(s) to different programs for a limited amount of time and stops/continues programs.
In order to do that, OS isolates a specific instance of program execution through an abstraction called 'process'. By doing that it provides programs an 'environment' that they can use without thinking other programs running. (Isolates running programs through processes)
For instance, a program code running in a process could not read data written to the memory allocated to the another process.
Threads are like processes to some extent. They share the memory of the process they belong. Namely, they share the HEAP space of the processes but they have separate stacks. Since they have separate stacks, they could be in different parts of the same code. From stacks they have references to the shared heap which allows them to communicate more effectively than the inter process communication way.
To sum up, everything you are implicitly or explicitly benefiting but you are not implementing is called 'environment'.

"In a distributed environment, one does not use multithreding" - Why?

I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.

When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.

I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.

Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.

What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.

Use JNI library in Servlet container

I'm working on a web application but I need to call certain proprietary C++ library functions. As I understand native methods are not thread safe, it is therefore possible that an access Violation in native code can crash application server JVM. (Tomcat). This native API is very small part of the overall web application functionality, I would say only 5% of users will ever access this functionality. No matter how thorough application is tested ( I don't have access to native source code), there is a risk of a potential bug in native library can bring down whole application server logging out users and potentially downtime.
So the question - which strategy is better?
1) Should I wrap native library in a separate process so that main web server is not impacted by a bug in native code. I can probably use UNIX sockets to communicate to this separate process from my web server. ( Avoiding overhead of TCP socket). If this happens fix the problem as quickly as possible and accept downtime for 5% of users.
Or
2) Bite the bullet and continue to use JNI in servlet container. ( With a risk of potential downtime for everyone)
Regards,
Rohit

It depends:
Take into account, that if a function is not thread-safe, that not necessarily means that it will crash if called from multiple-threads. It might simply return completely wrong results.
If your application cannot overcome it somehow, then you have no other options, you need to serialize access to the native code.
If you are sure that the only side-effect of calling the not-thread safe function is that it can crash, then you need to make sure that the crash does not results in other type of errors, like inconsistent data in your application in the back-end (database corruption, etc.). (You may use transactions to prevent this.)
If your application is able to overcome all of the above, then a 3rd piece of information is still needed:
You need to study how much downtime/crash your users tolerate. If they tolerate the possible down-times, then go ahead and do not care about the crashes, you can safely "bite the bullet", because it won't harm your users or your application.
In all other cases you have to serialize access to the native functions.
Wrapping them into a process might be a good idea, but you have to make sure that the function(s) can be run ONLY in one thread at a time. So probably you need to implement some mechanism to make the other threads/servlets wait until one of them finished calling the function(s).

How to build a distributed java application?

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).
Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes).
I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution.
Questions:
This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?
2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?
If I want to make the application distributed, is it possible with my idea? If not what would i need to use?
Sorry for having many questions , but I am really confused about this.

I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).
If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...
The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?
The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)
The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.

The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).
There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.
Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.

To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.
As #AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.
If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.
Another route could be to distribute your application using EJBs running on an application server.

Which Java APIs create Threads

Without having the source code for a Java API, is there anyway to know if the API methods create multiple threads ? Are there any conventions to follow if you are writing Java APIs and they create multiple threads. This may be very fundamental question but it happened to spawn out of a discussion in which the crux question was - " How do you know which Java APIs create threads and which don't " ?

One way of determining which libraries create new threads is by disallowing Thread creation and ThreadGroup modification in the SecurityManager. See the java.lang.SecurityManager.checkAccess(Thread) method. By implementing your own SecurityManager, you are able to react on the creation of Threads.
To answer the other question: many libraries create new threads, even if you don't expect it. For example APIs for HTTP communication create Timers for Keep-Alives or session timeouts. Java 2D is creating a signalling thread. Java itself has multiple threads, e.g. the Finalizer thread; the AWT/Swing event dispatcher thread etc.

There's no way to tell. Actually, I don't think you normally would care that much unless you're in some kind of constrained environment. What's I've found is more relevant is to determine if a method is written with an expectation of being run on a particular thread (the AWT Event dispatch thread, in the case I've seen). There's not a way to do that either, unless the code is using some kind of naming convention, or it's documented.

In my experience, if you are looking at core java, not J2EE, the only time I can think that threads are created in core Java is with Swing.
I haven't seen any example of other threads being created by the core Java APIs, except for the Thread class, of course. :)
But, if you are using other libraries then it may be that they are creating threads, but, if you don't want to profile, you may want to use AspectJ to log whenever a new thread is created, and the stack track of what called it, so you can see what is creating the threads.
UPDATE:
Swing uses 4 threads, according to this post, but he also explains how you can go about killing off the threads, if needed.
http://www.herongyang.com/Swing/jframe_2.html

If you want to see active threads, just fire up the jvisualvm application (located in your $JDK/bin directory) to connect to any local java process. You'll be able to see a multitude of information about the process, including thread names, status, and history. Get more information here.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.