As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The Evented and Threaded models are quite popular and are usually discussed a lot.
The 'Threaded' approach, where every I/O operation can block, is simpler. It's easier to write and debug synchronous code, and the fact of the matter is that most ecosystems provide blocking I/O libraries, just use a thread pool, configure it properly and you're good to go.
But.. it doesn't scale.
And then there's the 'Evented' approach, where there is one thread (or one per cpu) that never blocks and performs only CPU instructions. When IO returns, it activates the appropriate computation, allowing better utilization of the CPU.
But.. it's harder to code, easier to create unreadable spaghetti code, there aren't enough libraries for async I/O etc... And non-blocking and blocking I/O don't mix well. It's very problematic to use in ecosystems that weren't designed from the ground up to be non-blocking. In NodeJS all I/O is non-blocking from the beginning (because javascript never had an I/O library to begin with). Good luck trying to implement the same in C++/Java. You can try your best, but it would take one synchronous call to kill your performance.
And then came Go. I started looking into Go recently because i found it's concurrency model interesting. Go gives you the ability to "get the best out of both worlds", all I/O is blocking, write synchronous code, but enjoy the full utilization of the CPU.
Go has an abstraction to Threads called 'Go Routines', which are basically user level threads, the 'Go Runtime' (which gets compiled in with your program) is in charge of scheduling the different Go Routines on real OS threads (let's say 1 per CPU) whenever a Go Routine performs a system call the 'Go Runtime' schedules another Go Routine to run in one of the OS threads, it 'multiplexes' the go-routines onto the OS threads.
User level threads isn't a new concept, Go's approach is nice, and simple, so I started wondering, why doesn't the JVM world use a similar abstraction, it's child's play compared to what usually happens there under the hood.
Then I found out it did, Sun's 1.2 JVM called them green threads which were user level threads, but the were only multiplexed into a single OS thread, they moved on to real OS threads to allow utilizing multi-core CPU's.
Why wasn't this relevant in the JVM world after 1.2? Am I failing to see the downsides of the Go approach? Maybe some concept that applies to Go, but would not be implementable on the JVM?
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Why does Java have much better performance compared to other interpreted languages like Python? I know this probably has something to do with the fact that it's compiled beforehand, but what about concurrency?
How is the JVM able to perform so much better with concurrent programs, whereas interpreted languages have to do deal with things like global interpreter locking etc, that really slow things down?
This is a really interesting question, but I'm not sure there's a simple way to answer it. JVMs these days use a range of highly aggressive optimizations to try to improve performance. Here are a few:
Dynamic compilation: Most good JVMs can dynamically compile the bytecode directly into machine code, which then executes at native speeds.
Polymorphic inline caching: Many JVMs use inline caching to try to improve the performance of method dispatching by remembering which functions have been called in the past.
Static typing: Since Java is statically-typed, bytecode instructions rarely have to do expensive introspection on the type of an object to determine how to perform an operation on it. Field offsets can be computed statically, and method indices in a virtual function table can be precomputed as well. Contrast this with languages like JavaScript, which don't have static typing and are much harder to interpret.
Garbage collection: The JVM garbage collector is optimized to allocate and deallocate objects efficiently. It uses a combination of mark-and-sweep and stop-and-copy techniques to make most allocations really fast and to make it easy to reclaim lots of memory quickly.
Known choke points: Rather than having a huge VM lock, some JVM implementations automatically insert extra code into each piece of compiled/interpreted code to periodically check in with the VM and determine whether they can keep running. That way, if the JVM needs to do garbage collection in only a few threads, it can do so while the other threads are running. If it needs to do a stop-the-world operation, it will only occur when the threads hit specific points, meaning that simple operations don't have to continuously check in with the VM state.
There are many, many more optimizations in place that I'm probably not aware of, but I hope that this helps you get toward an answer!
Java code has next to no optimisation during compilation.
The runtime JIT does most of the compilation.
What may be different about Java is that it relatively feature poor with minimal side effects. This makes the code easier to optimise.
whereas interpreted languages have to do deal with things like global interpreter locking etc, that really slow things down?
This is an implementation issue. Java was designed with multi-threading support from the start. I suspect python was designed for scripting and rapid development cycles, something it does much better as a result.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am writing a server (Java based) which will be only called by program on the same host.
So, in term of performance & reliability, should I use UDP or just Unix Socket?
UDP is not reliable. I'm picking nits, but there is no guarantee of either delivery order or delivery at all with UDP. Between two processes on the same host, this is probably never going to manifest as a problem, but it may be possible under extreme load scenarios.
There's also a lot more overhead in a UDP packet than there is in a Unix socket. Again, this is unlikely to be a practical problem except under the most extreme load, and you'd have a lot of other load-related problems before that was a concern, because the overhead for both is nominal in modern computing terms.
If you're really worried about performance and reliability, stick with Unix sockets.
If you have any plan to distribute and load-balance it in the future, UDP will give you more flexibility if you need to support multiple hosts.
Having said all that, none of this is a practical concern these days. Most services use TCP for even local communication, and then layer other services like ZeroMQ on top of that. You almost definitely should not be worrying about that level of performance. Use software that makes your code easier to write and maintain, and scale up the system in the unlikely event that you need to. It's easier and cheaper to throw new servers at problems than it is to spend man-hours re-engineering software that wasn't written to be flexible.
Also note that ZeroMQ (and other message queueing systems) will pick the most efficient transfer mechanism available. For example, ZeroMQ will use IPC (inter-process communication) if possible, which is far faster than either UDP or Unix sockets, and it will also scale up to thousands of hosts worldwide over the Internet if you need that, and you basically won't have to change your code.
Never prematurely optimize.
A unix socket will certainly spare you the overhead of context switching and encapsulation/decapsulation through the tcp/ip stack. But how perceptible that gain will be ? I think it depends on your requirements for performance and reliability and on the load you're expecting this server to handle.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
I'm a php developer, but recently had to write the same application twice, once in php and once in java, for a class I'm taking at school. For curiosity I did a benchmark on the two and found that the java version was 2 to 20 times slower than the php version if the database is accessed, and 1 to 10 times slower without DB access. I see two immediate possibilites:
I suck at java.
I can finally tell people to quit whining about php.
I posted my servlet code here. I don't want any nit-picky whining or minor improvements, but can someone see a horrible glaring performance issue in there? Or can anybody explain why Java feels like it has to suck?
I've always heard people say that java is faster and more scalable than php, especially my teacher, he is convinced of it, but the more requests that are made, the slower java gets. php doesn't seem to be affected by increased loads but remains constant.
In a mature Java web application the Servlet would make use of an existing JDBC connection pool. Establishing a new connection will be by far the greatest cost you pay in time.
Calling Class.forName for every attempt to get the connection will also cause an unnecessary slow down.
JVM tuning could also be a factor. In an enterprise environment the JVM memory and possibly GC configurations would be adjusted and tuned to achieve a desirable balance between responsiveness and resource utilization.
As Stephen C points out, the JVM also has a concept of a sort of "warm up".
All that said, I have no idea how PHP compares to Java and I feel both languages offer great solutions to separate non-disjoint sets of needs.
Based on not much info (where the best decisions are made), my guess is the Class.forName("com.mysql.jdbc.Driver"); in getConnection() is the big timesink.
Creating a new String in importFile when the char[] can be passed to out.println is me nitpicking.
Your test seems to reflect initial overhead moreso than steady-state performance. Try doing the non-DB tests multiple times in a loop (so that each test wold run the code multiple times) and look at the linear relationship between runtime and number of iterations. I suspect the incremental cost for java is lower than that for php
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
Question about Cassandra
Why the hell on earth would anybody write a database ENGINE in Java ?
I can understand why you would want to have a Java interface, but the engine...
I was under the impression that there's nothing faster than C/C++, and that a database engine shouldn't be any slower than max speed, and certainly not use garbage collection...
Can anybody explain me what possible sense that makes / why Cassandra can be faster than ordinary SQL that runs on C/C++ code ?
Edit:
Sorry for the "Why the hell on earth" part, but it really didn't make any sense to me.
I neglected to consider that a database, unlike the average garden-varitety user programs, needs to be started only once and then runs for a very long time, and probably also as the only program on the server, which self-evidently makes for an important performance difference.
I was more comparing/referencing to a 'disfunctional' (to put it mildly) Java tax program I was using at the time of writing (or rather would have liked to use).
In fact, unlike using Java for tax programs, using Java for writing a dedicated server program makes perfect sense.
What do you mean, C++? Hand coded assembly would be faster if you have a few decades to spare.
I can see a few reasons:
Security: it's easier to write secure software in Java than in C++ (remember the buffer overflows?)
Performance: it's not THAT worse. It's definitely worse at startup, but once the code is up and running, it's not a big thing. Actually, you have to remember an important point here: Java code is continually optimized by the VM, so in some circumstances, it gets faster than C++
Why the hell on earth would anybody write a database ENGINE in JAVA ?
Platform independance is a pretty big factor for servers, because you have a lot more hardware and OS heterogenity than with desktop PCs. Another is security. Not having to worry about buffer overflows means most of the worst kind of security holes are simply impossible.
I was under the impression that
there's nothing faster than C/C++, and
that a database engine shouldn't be
any slower than max speed, and
certainly not use garbage
collection...
Your impression is incorrect. C/C++ is not necessarily faster than Java, and modern garbage collectors have a big part in that because they enable object creation to be incredibly fast.
Don't forget that Java VMs make use of a just-in-time (JIT) engine that perform on-the-fly optimisations to make Java comparable to C++ in terms of speed. Bearing in mind that Java is quite a productive language (despite its naysayers) and portable, together with the JIT optimisation capability, means that Java isn't an unreasonable choice for something like this.
The performance penalty for modern Java runtimes is not that big and programming in Java is less error-prone than in c.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've had an argument with a friendly coder who was mildly damaged by Joel's Law of Leaky Abstractions. It is very hard to convince him of using any new framework/toolbox at all. I'm trying to present a point where "abstractions are fine as long as they allow low-level access to abstracted level".
Examples:
GWT - Google's superb Java-to-Javascript compiler, has JSNI - ability to write "native" Javascript if you really want to.
Hibernate - AFAIK has SQLQuery - way to write native SQL.
Java - JNI - if you miss C.
Is it sound?
Am I missing something?
Thanks
What I took from reading the leaky abstractions article wasn't that abstractions were bad, but that you should make it a point to understand what goes on under the hood, so that you can account for "unexpected" behavior, and avoid them.
What does your friend program in? Machine language? :)
Joel's point (as I understand it) is that by abstracting complexity away you are sacrificing the finer control over that underlying complexity. For anything but trivial cases you will eventually need to access that finer granularity of control at which point the abstraction breaks down.
So all abstractions are leaky (almost) by definition :
If there is complexity in a system, it must be there for a reason (or you should find a way to remove it) and so will occasionally be useful/vital.
By abstracting you are limiting the control you have over the underlying complexity.
When those 'occasionally's come along you will have to break the abstraction.
To some extent he has a point. Traditional c/unix development works to a platform that is simple enough to be able to understand more-or-less in its entirety. Modern platforms are orders of magnitude more complex, and understanding how all of the layers interact is much harder, often infeasible.
The law of leaky abstractions mainly applies when the framework does a bad job of managing the underlying complexity. Some of the ways in which a framework may be judged are its transparency (ease of understanding what's going on behind the scenes) and its ability to drop out to a custom workaround for limitations in its functionality.
When a framework does a lot of complex magic behind the scenes, diagnosing and troubleshooting become much harder, often requiring a disproportionately large amount of expertise in the underlying architecture of the framework. This means that productivity gains from the framework get absorbed in the extra effort of training and debugging the code. It also makes the framework difficult to learn and use with confidence, which is what your C programming friend is used to.
When a framework obstructs you from working around its limitations, then it becomes an impediment to development. When this happens often enough, the code base is either boxed in or becomes polluted with ever greater and messier hacks to work around the issues. This also leads to stability and debugging problems,
Examples of frameworks with these flaws abound. MFC was quite famous for failing to hide the underlying complexity of Win32. It also made extensive use of wizards that generated messy code that needed to be manually modified aftwrwards, defeating the purpose of having a code generator in the first place. Early Java GUI toolkits (AWT and early versions of Swing) have very little uptake in desktop applications because they obstructed developeers from implementing a native look-and-feel for the applications. SWT was built in no small part because of these limitations with Swing.
However, now Java has matured a bit, it could be argued that most of its early sins have been fixed in modern frameworks. J2EE is still a large, complex system, and developing a non-triviall user interface in a browser is also quite a substantial undertaking. Becoming proficient in this platform is quite a lot of work. However, it's not beyond the wit of man.
While I think its true that every abstraction is leaky, that is not necessarily a bad thing.
For example in traditional plain C (C#, Java, whatever) code you usually retrieve data from arrays by using loops, ifs etc. But that way you overspecify your solution to a problem.
The way SQL, Linq approach the same problem is more intelligent: You just say WHAT you want, the machine figures out how to do it. This way it is free of any specific ordering of commands from the traditional way and can split the work onto different cpus for example, or reorder stuff to make better use of caches etc.
So yes you give some control to the machine, but the machine has one big advantage over you: It is at the user/customer's place and that way can make on-demand decisions (like using multi-core, using advantage of MMX, whatever).