Do external libraries make apps slower? - java

I am building an app that scrapes information from web pages. To do that I have chosen to use an html scraper called Jsoup because it's so simple to use. Jsoup is also dependent on Apache Commons Lang libray. (Together they make up a total of 385kB ).
So Jsoup will be used to Download the page and parse it.
My question is if the use of these simplifying libraries, instead of using Androids built-in libraries, will make my app slower? (in terms of downloading data and parsing).
I was thinking that the internal libraries would be optimized for Android.

The next release of jsoup will not require Apache Commons-Lang or any other external dependencies, which brings down the jar size to around 115K.
Internally, jsoup uses standard Java libraries (URL connection, HashMap etc) which are going to be reasonably well Android optimised.
I've spent a good amount of time optimising jsoup's parse execution time and data extractor methods; and certainly if you find any ways to improve it, I'm all ears.

If the question is, "Will external libraries INHERENTLY make my app slower than if I wrote the same code myself?", the answer is generally, "Yes, but not very much."
It will take the JVM some time to load an external library. It's likely that the library has functions or features that you aren't using, and loading these or reading past them will take some time. But in most cases this difference will be trivial, and I wouldn't worry about it unless you are in a highly constrained environment.
If what you mean is, "Can I write code that will do the same function faster than an external library?", the answer is, "Almost certainly yes, but is it worth your time?"
The odds are that any external library you use will have all sorts of features that you don't need but are included to accomodate the needs of others. The authors of the library don't know exactly what every user is up to so they have to optimize in a general way. So if you wrote your own code, you could make it do exactly what you need and nothing more, and be optimized to exactly what you are up to.
Whether it's worth the trouble in your particular case is the big question.

The external libraries will also use the internal libraries that are optimized for Android. I guess the real question is: would your custom implementation be faster than the generic implementation of these libraries?
In most cases, third-party libraries solve the problem that you want to solve, but also other problems that you might not need to solve, and it's this part that might hurt performance. You have to find the balance between reinventing the wheel and using optimized code just for your basic needs.
Additionally, if these libraries weren't designed with the Android platform in mind, make sure to test them extensively.

It's the classical build-vs-buy argument.
If run-time performance is really important for your application then you should consider rolling out your own implementation or optimizing the library (assuming it's open source.) However, before you do that you should know good or bad the performance of the existing library is. You won't know that unless you actually use it and get some data.
As a first step I would recommend using the library and collect data regarding it's performance OR ask someone who has already used this library on Android for performance numbers. The library may be slow but if it's acceptable then I guess it's better than rolling one on your own.
Keep in mind when you create your own implementation it will cost your time and money (design, coding, testing and maintenance.) So you are trading off runtime performance for reuse and reduced development cost.
EDIT: Another important point is that performance is a function of many things. For example, the hardware, the Android version and the network. If your target device is running 2.1 or less and you may get a boost in performance by using 2.2. On the other hand, if you want to target all versions you have to adopt a different strategy.

Related

Is there a difference in the speed of the *generated code* between ASM and Javassist?

I am considering runtime byte-code generation/modification for a Java project.
Two important, and still maintained, APIs are ASM and Javassist.
ASM is the fastest at generating the code, and probably the most powerful. But it's also a lot less user-friendly than Javassist.
In my case, I want to perform the bytecode manipulation upfront, so that it is complete at the end of the application setup phase. So the speed of manipulation/generation is not critical. What is critical, is the speed of the generated code, because it will be part of a real-time desktop game, not the typical web-app, where the network delays hide the costs of reflection completely.
So my question is, does Javassist introduce some unnecessary overhead in the byte-code, which would not be present when using ASM? Or, expressed another way, is working at the ASM level going to provide me with a speed boost in the generated code compared to working with Javassist?
[EDIT] I'm interested in the newest version of both tools, and mostly interested to see if anyone tried them both on the same problem, and saw any significant difference in the speed of the resulting classes.
I don't think it would be possible to provide a simple objective answer to this. It would (I imagine) depend on the versions of the respective tools, the nature of the code you are generating, and (most importantly) whether you are using the respective tools as well as they can be used.
The other thing you should consider is whether going down the byte-code manipulation route is likely to give you enough performance benefit to compensate for the increased software development and maintenance pain. (And that can't be answered by anyone but yourself ...)
Anyway, my advice would be to only go down this route if you've already tried the "pure Java" approach and found it to give unacceptable performance.

Should I use a 3rd party stock charting java library or not?

I am creating a project for my studies which is based upon a trading system.
This trading system is using some moving averages to decide when to buy or sell a stock.
The language that I will use is java.
My question is whether I should use a 3rd party library (like JFreeChart) or should I try to build the whole application on my own (I have some java swing experience)?
Thank you.
If there is something which already does what you need, you should really use that in the first place.
Creating your own, is usually not advisable since using 3rd party applications will mean that you will be using something which is already tested and has lots of more features.
Using a 3rd party API will save you time, time which you can than invest into other aspects of your project (communities always tend to make life simpler).
A few more points I thought I should mention is that although developing everything yourself is usually a great (if not the best) way of learning things, I think that employers tend to prefer it when people are already familiar with certain 3rd party popular API's.
If the licenses are compatible (for instance JFreeChart is LGPL, so you can use it without opening your source code), and the library does what you need, why would you reinvent the wheel?
Using a library that is already tested and developed by many people can save you a lot of time, and if you need something different or a bug fixed, you can always fix it yourself and commit it so the rest of the community can also benefit.
If I were you I would befriend JFreeChart. It is a de facto standard for Swing plotting/charting (at least in the free world). It has some quirks and needs some hacking to get your result from time to time. But it is well tested, the limitations are well known, there is a community of users with experience that can help you with the problems.
I would rather not reinvent the wheel, unless I needed a very specific wheel...
It depends how much time you have and what you want to achieve, if you create your own charting it would be great experience, but if you are looking towards building your career in finance domain then you would be better off looking at the pricing models.
Also there are many matured solutions to get your charting requirement done, you can just use them so as to save time and definitely you can learn a lot by looking at some great Open Source code.

MKL Accelerated Math Libraries for Java

I've looked at the related threads on StackOverflow and Googled with not much luck. I'm also very new to Java (I'm coming from a C# and .NET background) so please bear with me. There is so much available in the Java world it's pretty overwhelming.
I'm starting on a new Java-on-Linux project that requires some heavy and highly repetitious numerical calculations (i.e. statistics, FFT, Linear Algebra, Matrices, etc.). So maximizing the performance of the mathematical operations is a requirement, as is ensuring the math is correct. So hence I have an interest in finding a Java library that perhaps leverages native acceleration such as MKL, and is proven (so commercial options are definitely a possibility here).
In the .NET space there are highly optimized and MKL accelerated commercial Mathematical libraries such as Centerspace NMath and Extreme Optimization. Is there anything comparable in Java?
Most of the math libraries I have found for Java either do not seem to be actively maintained (such as Colt) or do not appear to leverage MKL or other native acceleration (such as Apache Commons Math).
I have considered trying to leverage MKL directly from Java myself (e.g. JNI), but me being new to Java (let alone interoperating between Java and native libraries) it seemed smarter finding a Java library that has already done this correctly, efficiently, and is proven.
Again I apologize if I am mistaken or misguided (even in regarding any libraries I've mentioned) and my ignorance of the Java offerings. It's a whole new world for me coming from the heavily commercialized Microsoft stack so I could easily be mistaken on where to look and regarding the Java libraries I've mentioned. I would greatly appreciate any help or advice.
For things like FFT (bulk operations on arrays), the range check in java might kill your performance (at least recently it did). You probably want to look for libraries which optimize the provability of their index bounds.
According to the The HotSpot spec
The Java programming language
specification requires array bounds
checking to be performed with each
array access. An index bounds check
can be eliminated when the compiler
can prove that an index used for an
array access is within bounds.
I would actually look at JNI, and do your bulk operations there if they are individually very large. The longer the operation takes (i.e. solving a large linear system, or large FFT) the more its worth it to use JNI (even if you have to memcpy there and back).
Personally, I agree with your general approach, offloading the heavyweight maths from Java to a commercial-grade library.
Googling around for Java / MKL integration I found this so what you propose is technically possible. Another option to consider would be the NAG libraries. I use the MKL all the time, though I program in Fortran so there are no integration issues. I can certainly recommend their quality and performance. We tested, for instance, the MKL version of FFTW against a version we built from sources ourselves. The MKL implementation was faster by a small integer multiple.
If you have concerns about the performance of calling a library through JNI, then you should plan to structure your application to make fewer larger calls in preference to more smaller ones. As to the difficulties of using JNI, my view (I've done some JNI programming) is that the initial effort you have to make in learning how to use the interface will be well rewarded.
I note that you don't seem to be overwhelmed yet with suggestions of what Java maths libraries you could use. Like you I would be suspicious of research-quality, low-usage Java libraries trawled from the net.
You'd probably be better off avoiding them I think. I could be wrong, it's not a bit I'm too familiar with, so don't take too much from this unless a few others agree with me, but calling up the JNI has quite a large overhead, since it has to go outside of the JRE and everything to do it, so unless you're grouping a lot of things together into a single function to put through at once, the slight benefit of the external library's will be outweighed hugely by the cost of calling them. I'd give up looking for an MKL library and find an optimized pure Java library. I can't say I know of any better than the standard one to recommend though, sorry.

Using SIFT for Augmented Reality

I've come across MANY AR libraries/SDKs/APIs, all of them are marker-based, until I found this video, from the description and the comments, it looks like he's using SIFT to detect the object and follow it around.
I need to do that for Android, so I'm gonna need a full implementation of SIFT in pure Java.
I'm willing to do that but I need to know how SIFT is used for augmented reality first.
I could make use of any information you give.
In my opinion, trying to implement SIFT for a portable device is madness. SIFT is an image feature extraction algorithm, which includes complex math and certainly requires a lot of computing power. SIFT is also patented.
Still, if you indeed want to go forth with this task, you should do quite some research at first. You need to check things like:
Any variants of SIFT that enhance performance, including different algorithms all around
I would recommend looking into SURF which is very robust and much more faster (but still one of those scary algorithms)
Android NDK (I'll explain later)
Lots and lots of publications
Why Android NDK? Because you'll probably have a much more significant performance gain by implementing the algorithm in a C library that's being used by your Java application.
Before starting anything, make sure you do that research cause it would be a pity to realize halfway that the image feature extraction algorithms are just too much for an Android phone. It's a serious endeavor in itself implementing such an algorithm that provides good results and runs in an acceptable amount of time, let alone using it to create an AR application.
As in how you would use that for AR, I guess that the descriptor you get from running the algorithm on an image would have to be matched against with data saved in a central database. Then the results can be displayed to the user. The features of an image gathered from SURF are supposed to describe it such as that it can be then identified using those. I'm not really experienced on doing that but there's always resources on the web. You'd probably wanna start with generic stuff such as Object Recognition.
Best of luck :)
I have tried SURF for 330Mhz Symbian mobile and it was still too slow even with all optimizations and lookup tables. And SIFT should be even more slow. Everyone using FAST for mobiles now. Anyway feature extraction is not a biggest problem. Correspondence and clearing false positive in it is more difficult.
FAST link
http://svr-www.eng.cam.ac.uk/~er258/work/fast.html
If I where you, I'd look into how (and why) the SIFT feature works (as was said, its wikipedia-page offers a good cochise explanation, and for more details check the science paper (which is linked to at wikipedia)), and then build your own variant that suits your taste; i.e. has the optimal balance between performance and cpu-load, needed for your application.
For instance, I think Gaussian smoothing might be replaced by some faster way of smoothing.
Also, when you build your own variant, you don't have anything to do with patents (there already are lots of variants, like GLOH).
I would recommend you to start by looking at the features already implemented in the OpenCV library, which include SURF, MSER and others:
http://opencv.willowgarage.com/documentation/cpp/feature_detection.html
This might be enough for your application and are faster than SIFT. And as mentioned above, SIFT is patented.
Also, start by making performance tests in your mobile platform, just by extracting the features at every frame, this way you'll have an idea which ones can run real-time or not.
Have you tried OpenCV's FAST implementation in the Android port? I've tested it out and it runs blazingly fast.
You can also compute reduced histogram descriptors around the detected FAST keypoints. I've heard of 3x3 rather than standard 4x4 of SIFT. That has a decent chance of working in real time if you optimize it heavily with NEON instructions. Otherwise, I'd recommend something fast and simple like sum of squared or absolute differences for a patch around the keypoints which are very fast.
SIFT is not a panacea. For real time video applications, it's usually overkill.
As always, Wikipedia is a good place to start from : http://en.wikipedia.org/wiki/Scale-invariant_feature_transform, but note that SIFT is patented.

Choosing Java vs Python on Google App Engine

Currently Google App Engine supports both Python & Java. Java support is less mature. However, Java seems to have a longer list of libraries and especially support for Java bytecode regardless of the languages used to write that code. Which language will give better performance and more power? Please advise. Thank you!
Edit:
http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine?pli=1
Edit:
By "power" I mean better expandability and inclusion of available libraries outside the framework. Python allows only pure Python libraries, though.
I'm biased (being a Python expert but pretty rusty in Java) but I think the Python runtime of GAE is currently more advanced and better developed than the Java runtime -- the former has had one extra year to develop and mature, after all.
How things will proceed going forward is of course hard to predict -- demand is probably stronger on the Java side (especially since it's not just about Java, but other languages perched on top of the JVM too, so it's THE way to run e.g. PHP or Ruby code on App Engine); the Python App Engine team however does have the advantage of having on board Guido van Rossum, the inventor of Python and an amazingly strong engineer.
In terms of flexibility, the Java engine, as already mentioned, does offer the possibility of running JVM bytecode made by different languages, not just Java -- if you're in a multi-language shop that's a pretty large positive. Vice versa, if you loathe Javascript but must execute some code in the user's browser, Java's GWT (generating the Javascript for you from your Java-level coding) is far richer and more advanced than Python-side alternatives (in practice, if you choose Python, you'll be writing some JS yourself for this purpose, while if you choose Java GWT is a usable alternative if you loathe writing JS).
In terms of libraries it's pretty much a wash -- the JVM is restricted enough (no threads, no custom class loaders, no JNI, no relational DB) to hamper the simple reuse of existing Java libraries as much, or more, than existing Python libraries are similarly hampered by the similar restrictions on the Python runtime.
In terms of performance, I think it's a wash, though you should benchmark on tasks of your own -- don't rely on the performance of highly optimized JIT-based JVM implementations discounting their large startup times and memory footprints, because the app engine environment is very different (startup costs will be paid often, as instances of your app are started, stopped, moved to different hosts, etc, all trasparently to you -- such events are typically much cheaper with Python runtime environments than with JVMs).
The XPath/XSLT situation (to be euphemistic...) is not exactly perfect on either side, sigh, though I think it may be a tad less bad in the JVM (where, apparently, substantial subsets of Saxon can be made to run, with some care). I think it's worth opening issues on the Appengine Issues page with XPath and XSLT in their titles -- right now there are only issues asking for specific libraries, and that's myopic: I don't really care HOW a good XPath/XSLT is implemented, for Python and/or for Java, as long as I get to use it. (Specific libraries may ease migration of existing code, but that's less important than being able to perform such tasks as "rapidly apply XSLT transformation" in SOME way!-). I know I'd star such an issue if well phrased (especially in a language-independent way).
Last but not least: remember that you can have different version of your app (using the same datastore) some of which are implemented with the Python runtime, some with the Java runtime, and you can access versions that differ from the "default/active" one with explicit URLs. So you could have both Python and Java code (in different versions of your app) use and modify the same data store, granting you even more flexibility (though only one will have the "nice" URL such as foobar.appspot.com -- which is probably important only for access by interactive users on browsers, I imagine;-).
Watch this app for changes in Python and Java performance:
http://gaejava.appspot.com/
(edit: apologies, link is broken now. But following para still applied when I saw it running last)
Currently, Python and using the low-level API in Java are faster than JDO on Java, for this simple test. At least if the underlying engine changes, that app should reflect performance changes.
Based on experience with running these VMs on other platforms, I'd say that you'll probably get more raw performance out of Java than Python. Don't underestimate Python's selling points, however: The Python language is much more productive in terms of lines of code - the general agreement is that Python requires a third of the code of an equivalent Java program, while remaining as or more readable. This benefit is multiplied by the ability to run code immediately without an explicit compile step.
With regards to available libraries, you'll find that much of the extensive Python runtime library works out of the box (as does Java's). The popular Django Web framework (http://www.djangoproject.com/) is also supported on AppEngine.
With regards to 'power', it's difficult to know what you mean, but Python is used in many different domains, especially the Web: YouTube is written in Python, as is Sourceforge (as of last week).
June 2013: This video is a very good answer by a google engineer:
http://www.youtube.com/watch?v=tLriM2krw2E
TLDR; is:
Pick the language that you and your team is most productive with
If you want to build something for production: Java or Python (not Go)
If you have a big team and a complex code base: Java (because of static code analysis and refactoring)
Small teams that iterate quickly: Python (although Java is also okay)
An important question to consider in deciding between Python and Java is how you will use the datastore in each language (and most other angles to the original question have already been covered quite well in this topic).
For Java, the standard method is to use JDO or JPA. These are great for portability but are not very well suited to the datastore.
A low-level API is available but this is too low level for day-to-day use - it is more suitable for building 3rd party libraries.
For Python there is an API designed specifically to provide applications with easy but powerful access to the datastore. It is great except that it is not portable so it locks you into GAE.
Fortunately, there are solutions being developed for the weaknesses listed for both languages.
For Java, the low-level API is being used to develop persistence libraries that are much better suited to the datastore then JDO/JPA (IMO). Examples include the Siena project, and Objectify.
I've recently started using Objectify and am finding it to be very easy to use and well suited to the datastore, and its growing popularity has translated into good support. For example, Objectify is officially supported by Google's new Cloud Endpoints service. On the other hand, Objectify only works with the datastore, while Siena is 'inspired' by the datastore but is designed to work with a variety of both SQL databases and NoSQL datastores.
For Python, there are efforts being made to allow the use of the Python GAE datastore API off of the GAE. One example is the SQLite backend that Google released for use with the SDK, but I doubt they intend this to grow into something production ready. The TyphoonAE project probably has more potential, but I don't think it is production ready yet either (correct me if I am wrong).
If anyone has experience with any of these alternatives or knows of others, please add them in a comment. Personally, I really like the GAE datastore - I find it to be a considerable improvement over the AWS SimpleDB - so I wish for the success of these efforts to alleviate some of the issues in using it.
I'm strongly recommending Java for GAE and here's why:
Performance: Java is potentially faster then Python.
Python development is under pressure of a lack of third-party libraries. For example, there is no XSLT for Python/GAE at all. Almost all Python libraries are C bindings (and those are unsupported by GAE).
Memcache API: Java SDK have more interesting abilities than Python SDK.
Datastore API: JDO is very slow, but native Java datastore API is very fast and easy.
I'm using Java/GAE in development right now.
As you've identified, using a JVM doesn't restrict you to using the Java language. A list of JVM languages and links can be found here. However, the Google App Engine does restrict the set of classes you can use from the normal Java SE set, and you will want to investigate if any of these implementations can be used on the app engine.
EDIT: I see you've found such a list
I can't comment on the performance of Python. However, the JVM is a very powerful platform performance-wise, given its ability to dynamically compile and optimise code during the run time.
Ultimately performance will depend on what your application does, and how you code it. In the absence of further info, I think it's not possible to give any more pointers in this area.
I've been amazed at how clean, straightforward, and problem free the Python/Django SDK is. However I started running into situations where I needed to start doing more JavaScript and thought I might want to take advantage of the GWT and other Java utilities. I've gotten just half way through the GAE Java tutorial, and have had one problem after another: Eclipse configuration issues, JRE versionitis, the mind-numbing complexity of Java, and a confusing and possibly broken tutorial. Checking out this site and others linked from here clinched it for me. I'm going back to Python, and I'll look into Pyjamas to help with my JavaScript challenges.
I'm a little late to the conversation, but here are my two cents. I really had a hard time choosing between Python and Java, since I am well versed in both languages. As we all know, there are advantages and disadvantages for both, and you have to take in account your requirements and the frameworks that work best for your project.
As I usually do in this type of dilemmas, I look for numbers to support my decision. I decided to go with Python for many reasons, but in my case, there was one plot that was the tipping point. If you search "Google App Engine" in GitHub as of September 2014, you will find the following figure:
There could be many biases in these numbers, but overall, there are three times more GAE Python repositories than GAE Java repositories. Not only that, but if you list the projects by the "number of stars" you will see that a majority of the Python projects appear at the top (you have to take in account that Python has been around longer). To me, this makes a strong case for Python because I take in account community adoption & support, documentation, and availability of open-source projects.
It's a good question, and I think many of the responses have given good view points of pros and cons on both sides of the fence. I've tried both Python and JVM-based AppEngine (in my case I was using Gaelyk which is a Groovy application framework built for AppEngine). When it comes to performance on the platform, one thing I hadn't considered until it was staring me in the face is the implication of "Loading Requests" that occur on the Java side of the fence. When using Groovy these loading requests are a killer.
I put a post together on the topic (http://distractable.net/coding/google-appengine-java-vs-python-performance-comparison/) and I'm hoping to find a way of working around the problem, but if not I think I'll be going back to a Python + Django combination until cold starting java requests has less of an impact.
Based on how much I hear Java people complain about AppEngine compared to Python users, I would say Python is much less stressful to use.
There's also project Unladen Swallow, which is apparently Google-funded if not Google-owned. They're trying to implement a LLVM-based backend for Python 2.6.1 bytecode, so they can use a JIT and various nice native code/GC/multi-core optimisations. (Nice quote: "We aspire to do no original work, instead using as much of the last 30 years of research as possible.") They're looking for a 5x speed-up to CPython.
Of course this doesn't answer your immediate question, but points towards a "closing of the gap" (if any) in the future (hopefully).
The beauty of python nowdays is how well it communicates with other languages. For instance you can have both python and java on the same table with Jython. Of course jython even though it fully supports java libraries it does not support fully python libraries. But its an ideal solution if you want to mess with Java Libraries. It even allows you to mix it with Java code with no extra coding.
But even python itself has made some steps forwared. See ctypes for example, near C speed , direct accees to C libraries all of this without leaving the comfort of python coding. Cython goes one step further , allowing to mix c code with python code with ease, or even if you dont want to mess with c or c++ , you can still code in python but use statically type variables making your python programms as fast as C apps. Cython is both used and supported by google by the way.
Yesterday I even found tools for python to inline C or even Assembly (see CorePy) , you cant get any more powerful than that.
Python is surely a very mature language, not only standing on itself , but able to coooperate with any other language with easy. I think that is what makes python an ideal solution even in a very advanced and demanding scenarios.
With python you can have acess to C/C++ ,Java , .NET and many other libraries with almost zero additional coding giving you also a language that minimises, simplifies and beautifies coding. Its a very tempting language.
Gone with Python even though GWT seems a perfect match for the kind of an app I'm developing. JPA is pretty messed up on GAE (e.g. no #Embeddable and other obscure non-documented limitations). Having spent a week, I can tell that Java just doesn't feel right on GAE at the moment.
One think to take into account are the frameworks you intend yo use. Not all frameworks on Java side are well suited for applications running on App Engine, which is somewhat different than traditional Java app servers.
One thing to consider is the application startup time. With traditional Java web apps you don't really need to think about this. The application starts and then it just runs. Doesn't really matter if the startup takes 5 seconds or couple of minutes. With App Engine you might end up in a situation where the application is only started when a request comes in. This means the user is waiting while your application boots up. New GAE features like reserved instances help here, but check first.
Another thing are the different limitations GAE psoes on Java. Not all frameworks are happy with the limitations on what classes you can use or the fact that threads are not allowed or that you can't access local filesystem. These issues are probably easy to find out by just googling about GAE compatibility.
I've also seen some people complaining about issues with session size on modern UI frameworks (Wicket, namely). In general these frameworks tend to do certain trade-offs in order to make development fun, fast and easy. Sometimes this may lead to conflicts with the App Engine limitations.
I initially started developing working on GAE with Java, but then switched to Python because of these reasons. My personal feeling is that Python is a better choice for App Engine development. I think Java is more "at home" for example on Amazon's Elastic Beanstalk.
BUT with App Engine things are changing very rapidly. GAE is changing itself and as it becomes more popular, the frameworks are also changing to work around its limitations.

Categories