## INSTALLING HADOOP ON MAC OSX LION

Although you are likely to run hadoop on a big cluster of computers, it useful to have it locally installed for debugging and testing purpose. Here are some quick notes on how to set hadoop on Mac OSX Lion. Please refer below mentioned references for details.

## Detailed Instructions

If you haven’t heard about homebrew, then you should definitely give it a try. It really makes installing and uninstalling softwares effortless and keeps your machine clean of unused files. Below I am using homebrew to install hadoop.

brew install hadoop


Step 2: Edit Configurations

Step 2.1 Add following line to /usr/local/Cellar/hadoop/1.0.1/libexec/conf/hadoop-env.sh. This line is required to overcome the following error related “SCDynamicStore”, expecially “Unable to load realm info from SCDynamicStore”

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"


Step 2.2:Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/core-site.xml. One key property is hadoop.tmp.dir. Note that we are setting the hdfs in current user’s folder and naming it as hadoop-store. You don’t need to create this folder as it will be automatically created for you in the later stages.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<value>/Users/${user.name}/hadoop-store</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration>  Step 2.3: Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> </configuration>  Step 2.4: Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>  Step 3:Enable SSH to localhost Make sure that you have ssh private (~/.ssh/id_rsa) and public (~/.ssh/id_rsa.pub) keys already setup. If you are missing the above two files, then run the following command (Thanks to Ryan Rosario for pointing out this). Instead of using rsa key, you can also use dsa (replace rsa with dsa in the command below). However instructions below assume that you have used rsa key. ssh-keygen -t rsa  Step 3.1: Make sure that “Remote login” is enabled in your system preferences. For this, Go to “System Preferences” -> “Sharing”. “Remote login” should be checked. Step 3.2: From the terminal run the following command. Make sure that authorized_key has 0600 permission. (see Raj Bandyopadhay’s comment) cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys  Step 3.3: Try login to localhost. If you get any error remove (or change to something else) ~/.ssh/known_hosts and retry connecting to localhost. ssh localhost  Step 4. Start and Test Hadoop hadoop namenode -format /usr/local/Cellar/hadoop/1.0.1/bin/start-all.sh hadoop jar /usr/local/Cellar/hadoop/1.0.1/libexec/hadoop-examples-1.0.1.jar pi 10 100  To make sure that all hadoop processes started, use the following command ps ax | grep hadoop | wc -l # expected output is 6  There are 5 process related to hadoop. If you see less than 6 processes then check log files. Log files are located at /usr/local/Cellar/hadoop/1.0.1/libexec/logs/*.log ## Additional Notes • Namenode info: http://localhost:50070/dfshealth.jsp • Jobtracker: http://localhost:50030 • Starting hadoop cluster: ‘/usr/local/Cellar/hadoop/1.0.1/bin/start-all.sh’ • Stop hadoop cluster: /usr/local/Cellar/hadoop/1.0.1/bin/stop-all.sh • Verify hadoop started properly: Use ps ax | grep hadoop | wc -l and make sure you see 6 as output. There are 5 processes associated with hadoop and one pertaining to the last command ## Common Issues • Unable to load realm info from SCDynamicStore: Refer step 2.1 • could only be replicated to 0 nodes, instead of 1: Refer Step 3. Mostly likely this problem is because SSH to localhost not available • Jobtracker not starting: I stumbled across this problem and found that there was a spelling mistake in mapread-site.xml. I misspelled mapread to mapred (missing second a). Also see above additional notes to make sure that there are 5 processes running. ## References: About these ads ## About Ritesh Agrawal I am a applied researcher who enjoys anything related to statistics, large data analysis, data mining, machine learning and data visualization. This entry was posted in Hadoop and tagged , . Bookmark the permalink. ### 39 Responses to INSTALLING HADOOP ON MAC OSX LION 1. kindjal@gmail.com says: Thank you for the writeup. Very helpful! > hadoop nodename -format Exception in thread “main” java.lang.NoClassDefFoundError: nodename Caused by: java.lang.ClassNotFoundException: nodename at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Perhaps you meant: > hadoop namenode -format 2. Great tutorial, very useful. One small caveat is that the .ssh/authorized_keys file must have permission bits set to 0600. Use ‘chmod 0600 .ssh/authorized_keys’ after creating that file. 3. Thank you for the excellent tutorial. This is my first time installing on Mac — I usually use Hadoop on Ubuntu. One thing to note. An SSH keypair must already exist in order to do step 3.1: ssh-keygen -t dsa • Thanks Ryan for pointing out that. I updated the instructions. 4. Great instructions. I went through it successfully but I still have trouble running hadoop directly on java files and cannot set the HADOOP_CLASSPATH properly as I am going over the hadoop book. Any ideas? Thanks! • @kavic, I didn’t explicitly set hadoop_classpath. Did you use brew to install hadoop. Also make sure that you JAVA_HOME properly set. In my case JAVA_HOME points to /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home Also make sure you are able to run the hadoop test hadoop jar /usr/local/Cellar/hadoop/1.0.1/libexec/hadoop-examples-1.0.1.jar pi 10 100 Let me know if you are able to solve this problem. I would like to update the blog based on your solution. • Thanks for following up. I finally figured out what was wrong since the tests were running fine and I could even run python scripts via streaming. The problem was that HADOOP_CLASSPATH is apparently set relative to the home directory /user/hduser and once I copied the classes over a new directory over there things were fixed… I am still wondering exactly what went wrong though! 5. Philip says: I have followed this exactly but unfortunately receive an error when trying to run the example. It says something about protocol mismatch, ClientProtocol version mismatch. (client = 61, server = 63). Any ideas? • checkout this link. It might help solve your problem: http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/27552 • Philip says: Hi Ritesh, I read through that but can’t see what to do? • Philip says: New error occuring now Number of Maps = 10 Samples per Map = 100 12/07/18 21:10:38 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s). 12/07/18 21:10:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s). 12/07/18 21:10:40 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s). 12/07/18 21:10:41 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s). 12/07/18 21:10:42 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s). 12/07/18 21:10:43 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s). 12/07/18 21:10:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s). 12/07/18 21:10:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s). 12/07/18 21:10:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s). 12/07/18 21:10:47 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s). java.lang.RuntimeException: java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318) at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:265) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
… 17 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
… 31 more

6. ahmedahmedov says:

Hadoop dfs -ls shows local files not the HDFS any solutions?

• Can you try executing “hadoop dfs” and see if it shows help. If it does, try copying something from local to hdfs and see if that works.

• Also, make sure that you are using correct path in step 2.2.

7. unaur says:

for rsa key generation we can make sure that its stored in the proper file name.
ssh-keygen -t rsa -p ” ” -f ~/.ssh/id_rsa

8. Parag Patel says:

This worked flawlessly in on OS X 10.8 Mountain. Could you please update this page to indicate that.

Thanks,

Parag

9. Hello..
I am using Mac OS 10.8…. Could you tell me what is going wrong….

When trying out the tutorial the map seems to work, but it cannot compute the reduce.
12/08/13 08:58:12 INFO mapred.JobClient: Running job: job_201208130857_0001
12/08/13 08:58:13 INFO mapred.JobClient: map 0% reduce 0%
12/08/13 08:58:27 INFO mapred.JobClient: map 20% reduce 0%
12/08/13 08:58:33 INFO mapred.JobClient: map 30% reduce 0%
12/08/13 08:58:36 INFO mapred.JobClient: map 40% reduce 0%
12/08/13 08:58:39 INFO mapred.JobClient: map 50% reduce 0%
12/08/13 08:58:42 INFO mapred.JobClient: map 60% reduce 0%
12/08/13 08:58:45 INFO mapred.JobClient: map 70% reduce 0%
12/08/13 08:58:48 INFO mapred.JobClient: map 80% reduce 0%
12/08/13 08:58:51 INFO mapred.JobClient: map 90% reduce 0%
12/08/13 08:58:54 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 08:59:14 INFO mapred.JobClient: Task Id : attempt_201208130857_0001_m_000000_0, Status : FAILED
Too many fetch-failures
12/08/13 08:59:18 INFO mapred.JobClient: map 89% reduce 0%
12/08/13 08:59:21 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 09:00:14 INFO mapred.JobClient: Task Id : attempt_201208130857_0001_m_000001_0, Status : FAILED
Too many fetch-failures

Here is what I get when I try to see the tasklog using the links given in the output
2012-08-13 08:58:39.189 java[74092:1203] Unable to load realm info from SCDynamicStore

Also this error of Unable to load realm info from SCDynamicStore does not show up when I do ‘hadoop namenode -format’ or ‘start-all.sh’

10. Dave says:

In the event that a non-admin will be running hadoop, you’ll also need to adjust permissions on the hadoop log directory. For a typical developer workstation, something like this will usually be fine:

chmod -R a+w libexec/logs

11. Krishna says:

Cool Ritesh. It was a piece of cake.
However I have a couple of observations -
1. ” /usr/local/Cellar/hadoop/1.0.1/libexec/conf/ ” is incorrect. The correct one should be – ” /usr/local/Cellar/hadoop/1.0.4/libexec/conf/ “

• when I installed hadoop, it was at 1.0.1. Now the latest stable version is 1.0.4 and that’s why you are getting 1.0.4 instead of 1.0.1

12. Tomasz Kleczek says:

13. I get this error message after entering the below into the terminal. I’m not doing something right.

• Make sure you have ssh permission correctly set

14. Kevin says:

Great Job ! I’m using it on My Macbook Air OS X Mountain Lion with the 1.1.0 version.

Works like a charm

15. Kevin says:

Hello again Ritesh Agrawal !

I would like to know who hadoop work with the other exemple in hadoop-exemple.jar .

There is several example to be use as a test.

I found the WordCount test but i would like to know how to execute it with the right syntax :

First I need a .txt with some letter in double triple… etc..

Is there any command to let hadoop knows or do I just need to do something like this :

Is there any documentation where I can find some help with the example ? or a Wiki ?

Kevin

• liseregnier says:

Let’s say you want to count the words of the file : ~/Downloads/ulysse.txt
First copy it to the hdfs::
To run the example:
It will write the result in /user/yourname/wordcount-ex/output
To see the result :

Hope this helps!

16. Thank you so much! Worked seamlessly on Mac OS X 10.8 !!

17. wesseljm says:

Reblogged this on Only The Best SQL Tips, Tricks, & Shortcuts and commented:
Great Intro to Config of Hadoop

18. subu says:

Instructions were great. Worked like a charm. Thanks much for putting this together.

19. liseregnier says:

I followed your tutorial to install the current hadoop-1.1.2 on lion 10.8.3 with java 1.6.0_43.
It seems to work pretty well, at least the pi example works fine.
But when I run the word count examples (as I explain above) it works but I have two warnings bothering me :

Do you know how I can solve these ?

20. Pramod says:

Thanks Ritesh, instructions are really good, worked without issue. I am new to Hadoop. Would you recommend any links for further example which will help me write my own job.

21. mkstayalive says:

Thank you. Helped me as well.

22. Chao Chen says:

Dear Retish,

I have done step 2.1, however, still got the error, do you know why? Thank you.