is is necessary to learn java for hadoop? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I wanted to start learning the big data technology from the scratch. I wanted to know is it necessary to learn java for operating with hadoop as i am already well versed in python?

No, you don't necessarily need java knowledge, as you can write map-reduce jobs perfectly well in pig or hive (similar to SQL). However, as with all layers of abstraction, at some point you may well need to know what is going on "behind the scenes" and being able to look, understand and debug the underlying java is a big advantage.
There is a lot of effort currently going into providing a more complete SQL interface to hadoop, with tools such as Impala (Cloudera), Presto (Facebook), Phoenix and Hive (already mentioned).

Check out MRJob, a python based wrapped for hadoop jobs running, logging and monitoring.
Although pure java solutions might be faster in some cases, you hardly ever will need to debug java code.

Not needed at all , though thats just my opinion. if you python well you should be fine.
check this out writing a hadoop map reduce in python. theres a lot of ways to implement solutions with hadoop. Just because a great deal of them are in Java doesnt mean java is the only tool to solve use . If your working with legacy that is written in java then knowing the basics may help but to be honest i think you could just reference things as you come across them. There is no need to spend a week learning the intricacies of Java 7 and whats new in Java 8 for your current needs.

Related

Integration of different parts of code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working on a small project on metadata extraction from documents and have run into, eh a dilemma. I have some libraries in Java which work well with document-handling for information retrieval, like Apache Tika, POI etc and some more tools in other languages like Ruby(pdf-extract) and a script in bash to fetch data from a RESTful API using wget.
AFAIK, Code reuse is a good thing, right? But then, if its not possible (natively, I mean) to reuse all this code, What approach has to be taken?
Using Java to run terminal-commands is a solution but I don't think it is good programming practice.
Integrating multiple technologies is something that is very common in real world applications. In order for it to scale properly, you probably want to use some methodology to keep things consistent. To me, the weakest part is probably fetching using wget, but that's my opinion.
In order to integrate and for everything to scale nicely you may want to look at some message passing protocols and have some sort of handling of queues where individual workers run in different programming languages and environments. Look at:
https://www.amqp.org/ (message passing standard)
https://www.rabbitmq.com/ (Java, .NET, Ruby, Python, PHP, JavaScript...

Java: SQL and Statistics/Machine Learning [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a question for you concerning Java. I am basically a Java user and did most of my work with it. However, in the machine learning classes I took in college, we used mostly python with the scikit-learn and numpy packages.
Now I want to do a project where I crawl data from the web, store it in SQL databases, and then do machine learning on this data. Maybe some of you have experience with those things and share some of it? I mean, of course it is possible to do these things with java, but maybe you have had some particular experiences on why I should use something else or what to consider?
I am happy for all your thoughts :-)
Have a great weekend!
It turns out that programming language and database implementation are secondary problems. Think first about the machine learning you want to do. Review the existing packages (in any language) and pick one according to how well it fits the needs of the business problem you are trying to solve. Then work with whatever language is most convenient for that package. You will probably find that no single language is suitable for all parts of the problem; you will end up gluing together Java, Python, R, shell scripts, etc, to make a complete solution, and there's nothing wrong with that. Consider that your job is problem solving instead of programming in a specific language and go from there.

How to maintain database of words in a java dictionary application(other than files) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am developing a java dictionary application...
I used jdbc for maintaining words but i realized that it won't work on other machines since i am making a desktop application...
suggest me a way to maintain words other than files
I don't think there is an alternative to using files if you want to persist your data - it will have to be stored somewhere!
What about the following:
- Use an in memory database that can be persisted to file (h2 for example)
- Use XML persistance - there are plenty of libraries to help.
My suggestion: http://www.manning.com/ingersoll/ :)
And after getting familiar with it, try Solr.
https://lucene.apache.org/solr/
Btw: Please make your question more specific. As this is your first post, I upvoted the question. But think of all the guys around here, who have no knowledge about your problem. Everyone wants to help, but you have to provide more information.
Why not use Embedded Derby?
It's platform independent, uses standard JDBC, and writes files which are accessible across any platform.
http://db.apache.org/derby/papers/DerbyTut/embedded_intro.html
There's no reason why you cannot use JDBC in a standalone java application incidentally. Think of JDBC as a kind of socket you can plug into, but what's behind that socket is entirely implementation specific, and usually can be defined by configuration.
JCR is another good example of this, but this kind of engineering technique is plentiful in the Java universe.

Convert prolog application to a JVM based language? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a legacy ISO prolog application of medium size that I would like to move to a JVM based language. The application is a command line tool that parses text files, does some evaluation/transformations and then export a text based file.
My team develops mainly in Java so we have a lot of existing java competence and reusable components. Prolog competence is however very low.
I don't expect there to be a tool that takes prolog source code and transform it to some other language. But I'm trying to understand what would be the easiest solution. Starting from scratch in Java or using a more functional language like Clojure?
But I'm trying to understand what would be the easiest solution.
Some implementations of Prolog run on the JVM platform. Wikipedia lists 5 of them here: http://en.wikipedia.org/wiki/Comparison_of_Prolog_implementations. So maybe the easiest solution is train someone in your team in Prolog, and just port the application to a JVM Prolog implementation. (Which might be a simple thing ...)
Someone on your team is likely to need Prolog skills anyway to successfully translate Prolog to some other language.
However, I recognize that there could be other reasons to translate; e.g. if the existing Prolog code needs a major overhaul anyway.
That is mainly depending on your team's skill. You mentioned that your team has a pretty good Java skills; why not starting with that?
If they don't know LISP, they will spent a lot of time learning it from the scratch. Learning LISP is quite an investment but it is definitely pays in the end.
Although Clojure is going to help you a lot in your case (because of data flow and data transformation), I would say that Java is a better bet since your team is competent with it.
You could consider using Clojure together with core.logic (tutorial) which is a miniKanren implementation. You would need some logic/functional programming skills but you could stay on the JVM.
Prolog is so different from java and other Object Oriented Language. I studied this language to see an other way of programming.
But I dont think there is a magic solution to convert prolog app to a java app. The logic is not the same and no other language is like Prolog. I think you will have to analyse what your prolog app does exactly and go from scractch with a new java app.

Extend already existing open source php tool in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Please tell me, is it possible to extend any PHP based tool in Java programing language.
I am trying some open sourcve tools and found one satisfactory to most of my requirment. But my problem is that, the tool i found is developed in PHP language but i know only java language and there are some business requirment as well. I need to add some more features in this tool. What is the best possioble way to achieve this extension.
Please guide me friends.
Thanks a lot.
You can't wrap or extend PHP code from Java the way you would a Java library, but you might be able to call PHP code on your server and work with the results from your own Java application. See How can I execute a PHP script from Java?
Barring than that, I agree with the comment above. It might be easier to either learn PHP or rewrite the parts of the code that you need in Java.
After seeing in your comments what PHP tool you're talking about, rewriting in Java doesn't seem like much of an option. If you're starting a site from scratch (meaning you don't already have a Java application that you want to add this to) then PHP is not that hard to learn. Other than that, you could look for Java projects that do the same thing as the PHP tool you're looking at now.
Well, it would be useful if you'd tell us what exactly your PHP tool/framework does, maybe even give us a link to it's site, then I'm sure you'll get tons of recommandations for corresponding Java tool/framework that does the same thing. I don't think you need to remake a PHP tool in Java by hand, really I can't think of PHP libraries that don't also exist for Java as well...

Categories