Performance of Java combined with Scala project [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm building Rest webservice and I would like to have the best performance. I was thinking about using Scala. I read in some sources that it's faster than java. Is there any way to achieve better performance mixing java & Scala, when my project use maven? I found maven-scala-plugin but still don't know what is an influence on performance.

Java and Scala both compile to the same bytecode; more-or-less anything you can do in one you can do in the other. What is true is that sometimes Scala makes it more practical to do the things that performance requires, e.g. async I/O is much easier to work with in Scala, and #specialized can autogenerate implementations of generic code for primitive types, which would have to be done manually in Java. IIRC scala libraries spray and unfiltered placed highly in the techempower benchmarks before they were removed.
But those benchmarks are about theoretical performance limits you are unlikely to ever reach. In practice it's highly unlikely that you can afford the optimization effort it would take to achieve those levels of performance. "I want the best performance" is unlikely to be your real requirement. Would you spend 10x as long developing in order to have 1% better performance? If you really do need that kind of absurd level of performance, you should be able to run your own benchmarks on the scale of the techempower ones; it's a lot of work, but it's peanuts compared to the amount of work you need for "best performance" in a real program.
In more realistic scenarios, any language implementation that supports asynchronous i/o (assuming a typical REST workload here) and doesn't actively slow everything down by being full of hashtable lookups (i.e. python or ruby) is likely to be more than fast enough. Maybe you need parallelism (though I doubt it), but fortunately the JVM has that. You are extraordinarily unlikely to be working at the level where the difference between Scala and Java makes a difference, as long as you use a sensible framework (i.e. an async one) in either language.
And as always, the key to performance, far more than technology choices, is to measure and experiment. Profile, automatedly; look for bottlenecks, address them and measure whether your solutions worked. In any realistic codebase a few minutes profiling will produce a much greater speedup than any possible difference between Java and Scala.

You need to define performance. A good way to define is to by using specific scenarios.
For example -
99% of my REST request to complete within 1 seconds
the REST service should provide results for 1000 concurrent clients within 2 seconds on a WAN network
Once you have an idea of this you need to design/architect your system. Performance depends on a lot of factors
overall structure of your system
data structures and algorithms used
latency and bandwidth of your network
available resources (memory, CPU and storage type)
frameworks and libraries used
databases
other qualities in your system (for example, security will degrade performance)
...
...
HTH

Related

How to improve performance of code you don't own?

We have in-house a 3rd party java application on a ridiculously hefty Linux box that runs a scheduling algorithm. The application runs far too slow for the load we need. We do not have the code and the vendor won't be making any changes to the application due to monetary reasons, thus I can't improve the code. The application is single threaded and its design does not lend itself to parallelization (so I can't split the load between 2 boxes).
What can I, either software or hardware wise, do to improve performance of the application?
Get on the newest version of Java (newer versions tend to have performance improvements)
Give Java more memory to work with (benchmark to see if this makes any difference)
Measure what it's doing with top. Upgrade whatever it's having problems with (more memory, faster CPU, SSD). Some CPUs are better at single threaded work loads than others (read: don't run this on a Bulldozer; something with Turbo Boost might be helpful).
Play with other experimental JVM options (benchmark to see if this makes any difference)
Remove any other applications running on this machine (benchmark to see if there's any benefit -- no sense wasting money if it doesn't help)
Pay the vendor to make it faster or give you the code (ie: give them monetary reasons to fix this)
Find an alternative
Write your own alternative
1) You can improve the hardware that the application runs on. Do this by looking at what resources the application is using. Is it maxing out CPU, or using all the memory (or both)? If so, you can add more CPU power or RAM accordingly.
2) Is there a way you can cache the results from the application? Can you ever avoid using it?
Otherwise, there really isn't much more you can do. If becomes a bigger problem, you might have to write your own scheduling algorithm, or better yet, find a better vendor.
Can you preprocess the input so the application has less work to do?
For example, perhaps the first thing the application does is sort the list of jobs to be scheduled using a merge sort. If you pre-sort the list, then the application's sort will have no work to do. You might be able to sort the list faster than the application can - use many cores, do it ahead of time, etc.
Run it on a faster computer. This is probably the cheapest solution of the lot.

Python or Java for text processing (text mining, information retrieval, natural language processing) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm soon to start on a new project where I am going to do lots of text processing tasks like searching, categorization/classifying, clustering, and so on.
There's going to be a huge amount of documents that need to be processed; probably millions of documents. After the initial processing, it also has to be able to be updated daily with multiple new documents.
Can I use Python to do this, or is Python too slow? Is it best to use Java?
If possible, I would prefer Python since that's what I have been using lately. Plus, I would finish the coding part much faster. But it all depends on Python's speed. I have used Python for some small scale text processing tasks with only a couple of thousand documents, but I am not sure how well it scales up.
Both are good. Java has a lot of steam going into text processing. Stanford's text processing system, OpenNLP, UIMA, and GATE seem to be the big players (I know I am missing some). You can literally run the StanfordNLP module on a large corpus after a few minutes of playing with it. But, it has major memory requirements (3 GB or so when I was using it).
NLTK, Gensim, Pattern, and many other Python modules are very good at text processing. Their memory usage and performance are very reasonable.
Python scales up because text processing is a very easily scalable problem. You can use multiprocessing very easily when parsing/tagging/chunking/extracting documents. Once your get your text into any sort of feature vector, then you can use numpy arrays, and we all know how great numpy is...
I learned with NLTK, and Python has helped me greatly in reducing development time, so I opine that you give that a shot first. They have a very helpful mailing list as well, which I suggest you join.
If you have custom scripts, you might want to check out how well they perform with PyPy.
It's very difficult to answer questions like this without trying. So why don't you
Figure out what would be a difficult operation
Implement that (and I mean the simplest, quickest hack that you can make work)
Run it with a lot of data, and see how long it takes
Figure out if it's too slow
I've done this in the past and it's really the way to see if something performs well enough for something.
Just write it, the biggest flaw in programming people have is premature optimization. Work on a project, write it out and get it working. Then go back and fix the bugs and ensure that its optimized. There are going to be a number of people harping on about speed of x vs y and y is better than x but at the end of a day its just a language. Its not what a language is but how it does it.
it's not language you have to evaluate, but frameworks and app servers for clustering, data storage/retrieval etc available for the language.
you can use jython and use all the java enterprise technologies for high load system and do text parsing with python.

Help to make a decision on which programming language [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We're developing a system which can be best described as a layer above social networking. It's extremely data heavy, would involve a huge user base, data base would be beefy basically, but it wouldn't involve complex computations. The most complexity would involve fast retrieval of data. Now, we have programmers who're comfortable with JAVA as well as PHP. The front end is nailed - Javascript, HTML, CSS. But we're having a huge debate on what to go with for the back end. We consulted numerous blogs and forums and have a consensus on the pluses and minuses. To sum it up, people say that it's a pain to host Java but it's extremely scalable, whereas PHP is fast and easy to host, but not entirely OO or sturdy. We're still not able to reach a decision. For a system like ours, does it even matter what we go with? Because at the end, it's the performance boosters that matter.... or am I entirely wrong in thinking that? Any input will be appreciated. Thanks in advance!
Java and PHP are both valid choices for developing a web backend. Personally I would choose Java (I'm an OO guy), but the decision for you should be more about what your developers are more comfortable with.
Forcing developers to adapt to a technology which they are resisting will cause problems for management and is likely to even upset the outcome of the project. I would not invest in a project knowing that the developers were not happy with the technology being used - it is too much of a risk.
I would prefer PHP because of the amount of help you can get (if you are ever stuck). It is also very adaptable, and has many CMS systems available.
I agree that for a system like yours, the language does not even matter. Retrieving the data efficiently is what matters, and there are umpteen solutions to help you with that.
So then you should choose the language your team is most comfortable with, and helps you code in a safe way (read: tdd).
For me, from the choice you give me, that would be java, as i don't know any tdd solutions for php (doesn't mean they are not there).
I would be so bold as to suggest an alternative: use ruby and Ruby on Rails. At least, that is the switch I made, and i have never been happier. But that is personal, and you and your team should be willing to learn a new platform (and i can imagine it would defeat the whole purpose).
To rephrase: according to what you told us, the language does not matter. Choose the language and toolset where your team is most comfortable with.
If it's huge data base plus massive read-write in real-time much like apps layered on Twitter feeds and Facebook, you are probably going to be more worried about how fast and efficiently you can retrieve data from database. There are tools to do so, in both, Java and PHP, so you are not going to be worried much what you start with.
I would suggest to with one you have more suitable tools and APIs to develop on. So, to answer your questions:
For a system like ours, does it even matter what we go with?
It does not matter what you choose. At least, when starting it is premature to decide what's the best. I've seen pretty efficient system on Java. So, I am inclined towards Java.
Because at the end, it's the performance boosters that matter.... or am I entirely wrong in thinking that?
Performance boost will come with faster data-retrieval and ability to handle large request load. Java scales pretty well. But there are high performance sites build on PHP.
I would say, it's a good idea to evaluate what makes you to do less work in getting your app done and go with that. In massive load situations, you might have to tweak things in either of the cases.
You should design your system to scale well with the number of users. You will find CPU/memory is relatively cheap these days so using a bit more cpu/memory is not as big a problem as it used to me.
I would say if you can use Java or PHP, you should consider using Java AND PHP. This way you can take advantage of the best of both languages.

Java GPU programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Is it possible to do GPU programming in Java ?
I mean without using native libraries.
And how much of a performance improvement can one expect when we switch over to gpu's ?
Edit:
I am not looking at game programming, I want to do hard core number crunching.
Yes. Java3D, LWJGL and JOGL support GLSL (OpenGL Shading Language).
Edit:
You can use OpenCL if you want to do platform-neutral, general-purpose computation on GPUs. This framework lets you write code that treats all processing units identically, despite wildly varying feature sets and execution environments. Though, this is very low level programming compared with Java.
It seems your ideal would be a JVM written with OpenCL support. Searching online, I found a little bit of interest in the idea but no evidence of any major backing.
how much of a performance improvement can one expect
That depends on the system you're running on and what sort of data you're processing (Matrix and vector math is extremely efficient on GPUs). You'd likely get some major gains on a system like mine with two powerful GPUs and a modest single-core CPU. However on a computer with a modest GPU and a quad-core CPU, the performance gains might have a hard time overcoming the overhead.
Rootbeer1 has just been released on github: https://github.com/pcpratts/rootbeer1
With Rootbeer you can program using almost any Java except the following:
native methods and fields
reflection
dynamic method invocation
garbage collection
sleeping inside a monitor
This means you can use arbitrary object graphs with composite types.
If you are still considering hard core number crunching in Java on GPU without using native libraries, you might be interested by this blog article http://ateji.blogspot.com/2011/02/java-on-gpu-with-ateji-px.html
We (I am part of the Ateji team) have seen performance of up to 60x so far on Java applications that can be massively parallelized.
def check out rootbeer1, but you need to make sure you have a CUDA accepted GFX card first before you start on it and have done all of the NVIDIA setup etc
download Link:
google CUDA download
getting started guide
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Getting_Started_Windows.pdf

Anyone using JavaSpaces technology? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Are there real practical uses of JavaSpaces technology out there and how exactly is it implemented?
We are currently using javaspaces (the Sun outrigger implementation), to coordinate loosely coupled processes. The idea behind it is compelling, and the API is very simple. The actual implementation has been a problem. It's built on Jini, so 5 or 6 processes are required to bring up a space. And, at least in Sun's implementation, there is no way to have it communicate over specific ports, which makes firewalls a bit of a pain.
The other issue that we have run into is that there is no implied ordering in the space. So if you put 5 objects in, and your template on the read/take matches all 5, it is unspecified which one you will get. Depending on the application, this may or may not be an issue.
GigaSpaces is a mature version of JavaSpaces. It is widely used in financial applications, which are kept quiet.
As for the Implementation it is basically an transactional Object database on top of Jini. The queries are similar to db4o.
I've seen it used in a financial application, mostly for managing compute workers (grid style) where entries were written into the space from front-tier applications and pulled out by workers by matching on a field showing work was needed. Results could be written back into the space, triggering a notify registered by the front-tier app which then reads back the finished piece of work.
For compute workers it's OK, but lack of ordering may be an issue for you (if only because of unpredictability) - some implementations have features to enforce FIFO ordering. It was also used for long term data storage as it was persistent, but I don't think that was a good idea. The admin tooling wasn't good enough to make it manageable and performance suffered due to the volume of data.
Dan Creswell's Blitz JavaSpaces implementation was used - it's got a good range of features (can run in transient or persistent modes), is designed to be robust (with transaction logging) and retain high performance, and it's very tunable. As with the other Jini services, you can configure the "exporter" to have it listen on specific ports to make firewalling easier - SSL transports and full PKI were used too and are made possible by Jini's abstraction of communication.
I think Gigaspaces is the only implementation that has continued to innovate by extending the specification in numerous ways, which is nice to see. They've made it fit a wide variety of use-cases and added implementation features such as clustering and high availability. Using it would worry me though, as I'd be much happier seeing two or more implementations of these features in the community, given Gigaspaces is fairly proprietary.
I believe Orbitz which is a reservation system for hotels runs on Jini.
Based on Java Posse episodes #82, #84 and #86 which is an interview with Vin Simmons this technology is sometimes used in military or financial applications which are unfortunatley on the quiet.
I used it a few years back but it probably has not changed much.
#Keith: It is(used to be atleast) possible to start all the services in a single process/JVM and I think there is documentation out there on how to do this.
I believe Jini/Javaspaces is used in a few large applications (ticketing, cell phones etc) in Europe. Also used by GE Aircraft for research and analysis.
SORCER lab at Texas Tech has a large SOA architecture built on top of Jini/Javaspaces and you may be able to find some help there.
I'm not aware of any new usage of JavaSpaces at this point in time. For distributed computing, most large-scale systems are being built with In-Memory Data Grid technology or partitioned NoSQL-like solutions. (I see a lot of Oracle Coherence being used, but that's probably because I work with it.)
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Categories