I am trying to build a recommendation engine using mahout, hadoop and java. It is my first time working with hadoop, I am getting my data sets from a server where hadoop is already installed which is a linux enviroment. My development environment is windows, now do I need to install mahout in my development environment or the server? If I need mahout on my development environment do I also need to install hadoop in it?
If you don't have Hadoop on your machine, Mahout will run in pseudo-distributed mode on the current machine.
Nonetheless, Windows and Hadoop don't really like each other, and depending on your Mahout version (more specifically the Hadoop dependency it has), you will most likely run into this issue (link). The issue is present from Hadoop 0.20.204 onwards (although I must admit that I don't know if it has been fixed on the latest version of Hadoop)
Related
i'm new to storm ,zookeeper and java
Now i imported project using storm -0.8.2 with recommended maven -3.X
but i don't know what is the version of zookeeper and version of maven that i should install it !!?
and i installed java 7
java -version
java version "1.7.0_80"
is there problem with using another version of java that coder used another ?
or is there any problem of using java 7 with old version of storm ?
There is no problems to use Java 7 with storm 0.8.2 as I done it. But stay away from zookeeper 3.3.3. Try zookeeper 3.3.5 or higher since the 3.3.3 caused storm workers to hang, crash or stop after 4-5 hours.
As for maven, Maven 3.3 will work fine to build jar compatible with storm.
I am trying to run some map-reduce programs on a remote server of Hadoop 2.0.0, which is running on CentOS 6.4 using ssh.
I am using Eclipse LUNA on my windows 8 machine.
Is there a way to run the programs directly on my Eclipse without converting them to JAR files?
If hadoop is running on a linux machine, you can not connect directly from windows. The connection has to be SSH.
I hope you are looking for something like this:
https://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.rse.doc.user%2Fgettingstarted%2Fgusing.html
The correct answer (similar) to this is here:
Work on a remote project with Eclipse via SSH
I would like to develop and test Map reduce program on windows 7 machine before deploying to Hadoop cluster.
If it is possible, can any body point me to some good resources.
I have eclipse installed on my windows machine, what else i need to develop and test map reduce program.
You can test localy with MRUnit http://mrunit.apache.org/
Linux is the only OS supported for production. I don't know whether you can install Hadoop on Windows or not (I never tried that) but I would suggest to go with a "Hadoop Distribution".
You can for example get a VirtualBox image from HortonWorks or Cloudera (major Hadoop players) and run it on your Windows machine (for development and testing purposes).
The distribution usually include the Hadoop ecosystem: Hadoop, HDFS, Pig, Hive, HCatalog, Impala ...
Everything is configured to be compatible with each other (real time saver).
I submit my mapreduce jobs from a java application running on windows to the hadoop 2.2 cluster running on ubuntu. In hadoop 1.x this worked as expected but on hadoop 2.2 I get a strange Error:
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
I compiled the necesary windows libraries (hadoop.dll and winutils.exe) and can access the hdfs via code and read the cluster information using hadoop API. Only the job submission does not work.
Any help is aprecciated.
Solution: I found it out myself, the path where the windows hadoop binaries can be found has to be added to the PATH variable of windows.
Get hadoop.dll (or libhadoop.so on *x). Make sure to match bitness (32- vs. 64-bit) with your JVM.
Make sure it is available via PATH or java.library.path.
Note that setting java.library.path overrides PATH. If you set java.library.path, make sure it is correct and contains the hadoop library.
This error generally occurs due to the mismatch in your binary files in your %HADOOP_HOME%\bin folder. So, what you need to do is to get hadoop.dll and winutils.exe specifically for your hadoop version.
Get hadoop.dll and winutils.exe for your specific hadoop version and copy them to your %HADOOP_HOME%\bin folder.
I have been having issues with my Windows 10 Hadoop installation since morning where the NameNode and DataNode were not starting due the mismatch in the binary files. The issues were resolved after I replaced the bin folder with the one that corresponds with the version of my Hadoop. Possibly, the bin folder I replaced with the one that came with the installation was for a different version, I don't know how it happened. If all your configurations are intact, you might want to replace the bin folder with a version that correspond with your Hadoop installation.
I had Install Hadoop in pseudo-distributed mode on a VMware having SUSE Linux Enterprise Server 11. I am able to run the hello world examples like word count. Also I used WinSCP to connect to that VM and uploaded several XML files into hadoop cluster.
My question is now how can I configure my eclipse which I am having on my local machine that is windows 7 to connect that VM and write some java code to play with data I had dumped in the cluster. I did some work and able to get Map/Reduce perspective in the eclipse but not able to figure out how to connect hadoop on VM from my local machine, write my java code (mapper,reducer classes) to play with data and save the result back in cluster.
If someone can help me with this that will be great. Thanks in advance.
Let me know if more information is needed.
I am using hadoop-0.20.2-cdh3u5 and eclipse europa 3.3.1
I am struggling with this as well at the moment. Maybe you will find these links helpful:
http://www.bigdatafarm.com/?p=4
http://developer.yahoo.com/hadoop/tutorial/module3.html
Cheers