Although you are likely to run hadoop on a big cluster of computers, it useful to have it locally installed for debugging and testing purpose. Here are some quick notes on how to set hadoop on Mac OSX Lion. Please refer below mentioned references for details.
Quick Summary
Detailed Instructions
Step 1: Installing Hadoop
If you haven’t heard about homebrew, then you should definitely give it a try. It really makes installing and uninstalling softwares effortless and keeps your machine clean of unused files. Below I am using homebrew to install hadoop.
brew install hadoop
Step 2.1 Add following line to /usr/local/Cellar/hadoop/1.0.1/libexec/conf/hadoop-env.sh. This line is required to overcome the following error related “SCDynamicStore”, expecially “Unable to load realm info from SCDynamicStore”
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
Step 2.2:Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/core-site.xml. One key property is hadoop.tmp.dir. Note that we are setting the hdfs in current user’s folder and naming it as hadoop-store. You don’t need to create this folder as it will be automatically created for you in the later stages.
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/Users/${user.name}/hadoop-store</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration>
Step 2.3: Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> </configuration>
Step 2.4: Add the following content in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Step 3:Enable SSH to localhost
Make sure that you have ssh private (~/.ssh/id_rsa) and public (~/.ssh/id_rsa.pub) keys already setup. If you are missing the above two files, then run the following command (Thanks to Ryan Rosario for pointing out this). Instead of using rsa key, you can also use dsa (replace rsa with dsa in the command below). However instructions below assume that you have used rsa key.
ssh-keygen -t rsa
Step 3.1: Make sure that “Remote login” is enabled in your system preferences. For this, Go to
“System Preferences” -> “Sharing”. “Remote login” should be checked.
Step 3.2: From the terminal run the following command. Make sure that authorized_key has 0600 permission. (see Raj Bandyopadhay’s comment)
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step 3.3: Try login to localhost. If you get any error remove (or change to something else) ~/.ssh/known_hosts and retry connecting to localhost.
ssh localhost
hadoop namenode -format /usr/local/Cellar/hadoop/1.0.1/bin/start-all.sh hadoop jar /usr/local/Cellar/hadoop/1.0.1/libexec/hadoop-examples-1.0.1.jar pi 10 100
To make sure that all hadoop processes started, use the following command
ps ax | grep hadoop | wc -l # expected output is 6
There are 5 process related to hadoop. If you see less than 6 processes then check log files. Log files are located at /usr/local/Cellar/hadoop/1.0.1/libexec/logs/*.log
Additional Notes
- Namenode info: http://localhost:50070/dfshealth.jsp
- Jobtracker: http://localhost:50030
- Starting hadoop cluster: ‘/usr/local/Cellar/hadoop/1.0.1/bin/start-all.sh’
- Stop hadoop cluster: /usr/local/Cellar/hadoop/1.0.1/bin/stop-all.sh
- Verify hadoop started properly: Use ps ax | grep hadoop | wc -l and make sure you see 6 as output. There are 5 processes associated with hadoop and one pertaining to the last command
Common Issues
- Unable to load realm info from SCDynamicStore: Refer step 2.1
- could only be replicated to 0 nodes, instead of 1: Refer Step 3. Mostly likely this problem is because SSH to localhost not available
- Jobtracker not starting: I stumbled across this problem and found that there was a spelling mistake in mapread-site.xml. I misspelled mapread to mapred (missing second a). Also see above additional notes to make sure that there are 5 processes running.
Thank you for the writeup. Very helpful!
> hadoop nodename -format
Exception in thread “main” java.lang.NoClassDefFoundError: nodename
Caused by: java.lang.ClassNotFoundException: nodename
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Perhaps you meant:
> hadoop namenode -format
Great tutorial, very useful. One small caveat is that the .ssh/authorized_keys file must have permission bits set to 0600. Use ‘chmod 0600 .ssh/authorized_keys’ after creating that file.
Thank you for the excellent tutorial. This is my first time installing on Mac — I usually use Hadoop on Ubuntu.
One thing to note. An SSH keypair must already exist in order to do step 3.1:
ssh-keygen -t dsa
Thanks Ryan for pointing out that. I updated the instructions.
Great instructions. I went through it successfully but I still have trouble running hadoop directly on java files and cannot set the HADOOP_CLASSPATH properly as I am going over the hadoop book. Any ideas?
Thanks!
@kavic,
I didn’t explicitly set hadoop_classpath. Did you use brew to install hadoop. Also make sure that you JAVA_HOME properly set. In my case JAVA_HOME points to /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Also make sure you are able to run the hadoop test
hadoop jar /usr/local/Cellar/hadoop/1.0.1/libexec/hadoop-examples-1.0.1.jar pi 10 100
Let me know if you are able to solve this problem. I would like to update the blog based on your solution.
Thanks for following up.
I finally figured out what was wrong since the tests were running fine and I could even run python scripts via streaming. The problem was that HADOOP_CLASSPATH is apparently set relative to the home directory /user/hduser and once I copied the classes over a new directory over there things were fixed… I am still wondering exactly what went wrong though!
I have followed this exactly but unfortunately receive an error when trying to run the example. It says something about protocol mismatch, ClientProtocol version mismatch. (client = 61, server = 63). Any ideas?
checkout this link. It might help solve your problem: http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/27552
Hi Ritesh, I read through that but can’t see what to do?
New error occuring now
Number of Maps = 10
Samples per Map = 100
12/07/18 21:10:38 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
12/07/18 21:10:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
12/07/18 21:10:40 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
12/07/18 21:10:41 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
12/07/18 21:10:42 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
12/07/18 21:10:43 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
12/07/18 21:10:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
12/07/18 21:10:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
12/07/18 21:10:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
12/07/18 21:10:47 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
java.lang.RuntimeException: java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:265)
at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
… 17 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
at org.apache.hadoop.ipc.Client.call(Client.java:1050)
… 31 more
Hadoop dfs -ls shows local files not the HDFS :( any solutions?
Can you try executing “hadoop dfs” and see if it shows help. If it does, try copying something from local to hdfs and see if that works.
Also, make sure that you are using correct path in step 2.2.
Ritesh – I have the same problem too. When I give the command ‘hadoop dfs’ I do get the help options but when I give any ‘hadoop dfs’ commands (-ls, -mkdir, copyFromLocal, etc) it creates everything on UFS where I am. (I am using MacOS). I also get the following warning: “WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable”
for rsa key generation we can make sure that its stored in the proper file name.
ssh-keygen -t rsa -p ” ” -f ~/.ssh/id_rsa
This worked flawlessly in on OS X 10.8 Mountain. Could you please update this page to indicate that.
Thanks,
Parag
Hello..
I am using Mac OS 10.8…. Could you tell me what is going wrong….
When trying out the tutorial the map seems to work, but it cannot compute the reduce.
12/08/13 08:58:12 INFO mapred.JobClient: Running job: job_201208130857_0001
12/08/13 08:58:13 INFO mapred.JobClient: map 0% reduce 0%
12/08/13 08:58:27 INFO mapred.JobClient: map 20% reduce 0%
12/08/13 08:58:33 INFO mapred.JobClient: map 30% reduce 0%
12/08/13 08:58:36 INFO mapred.JobClient: map 40% reduce 0%
12/08/13 08:58:39 INFO mapred.JobClient: map 50% reduce 0%
12/08/13 08:58:42 INFO mapred.JobClient: map 60% reduce 0%
12/08/13 08:58:45 INFO mapred.JobClient: map 70% reduce 0%
12/08/13 08:58:48 INFO mapred.JobClient: map 80% reduce 0%
12/08/13 08:58:51 INFO mapred.JobClient: map 90% reduce 0%
12/08/13 08:58:54 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 08:59:14 INFO mapred.JobClient: Task Id : attempt_201208130857_0001_m_000000_0, Status : FAILED
Too many fetch-failures
12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_000000_0&filter=stdout
12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_000000_0&filter=stderr
12/08/13 08:59:18 INFO mapred.JobClient: map 89% reduce 0%
12/08/13 08:59:21 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 09:00:14 INFO mapred.JobClient: Task Id : attempt_201208130857_0001_m_000001_0, Status : FAILED
Too many fetch-failures
Here is what I get when I try to see the tasklog using the links given in the output
http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_000000_0&filter=stderr —>
2012-08-13 08:58:39.189 java[74092:1203] Unable to load realm info from SCDynamicStore
http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_000000_0&filter=stdout —>
Also this error of Unable to load realm info from SCDynamicStore does not show up when I do ‘hadoop namenode -format’ or ‘start-all.sh’
In the event that a non-admin will be running hadoop, you’ll also need to adjust permissions on the hadoop log directory. For a typical developer workstation, something like this will usually be fine:
chmod -R a+w libexec/logs
(from the hadoop directory).
Cool Ritesh. It was a piece of cake.
However I have a couple of observations –
1. ” /usr/local/Cellar/hadoop/1.0.1/libexec/conf/ ” is incorrect. The correct one should be – ” /usr/local/Cellar/hadoop/1.0.4/libexec/conf/ “
when I installed hadoop, it was at 1.0.1. Now the latest stable version is 1.0.4 and that’s why you are getting 1.0.4 instead of 1.0.1
Great post, really helpful, thanks!
I get this error message after entering the below into the terminal. I’m not doing something right.
-MacBook-Pro:hadoop jonathanschaller$ /usr/local/Cellar/hadoop/1.0.4/libexec/conf/hadoop-env.sh
-bash: /usr/local/Cellar/hadoop/1.0.4/libexec/conf/hadoop-env.sh: Permission denied
Make sure you have ssh permission correctly set
Great Job ! I’m using it on My Macbook Air OS X Mountain Lion with the 1.1.0 version.
Works like a charm :)
Hello again Ritesh Agrawal !
I would like to know who hadoop work with the other exemple in hadoop-exemple.jar .
There is several example to be use as a test.
I found the WordCount test but i would like to know how to execute it with the right syntax :
First I need a .txt with some letter in double triple… etc..
Is there any command to let hadoop knows or do I just need to do something like this :
hadoop jar /usr/local/~/hadoop-example-1.1.0.jar worldcount? 10 100 /usr/~/toto.txt
Is there any documentation where I can find some help with the example ? or a Wiki ?
Thanks for your help,
Kevin
Let’s say you want to count the words of the file : ~/Downloads/ulysse.txt
First copy it to the hdfs::
fs -put ~/Downloads/ulysse.txt /user/yourname/wordcount-ex
To run the example:
hadoop jar hadoop-examples*.jar wordcount /user/yourname/wordcount-ex/ulysse.txt /user/yourname/wordcount-ex/output
It will write the result in /user/yourname/wordcount-ex/output
To see the result :
hadoop fs -cat /user/yourname/wordcount-ex/output/part-r-00000
Hope this helps!
Thank you so much! Worked seamlessly on Mac OS X 10.8 !! :) :)
Reblogged this on Only The Best SQL Tips, Tricks, & Shortcuts and commented:
Great Intro to Config of Hadoop
Instructions were great. Worked like a charm. Thanks much for putting this together.
I followed your tutorial to install the current hadoop-1.1.2 on lion 10.8.3 with java 1.6.0_43.
It seems to work pretty well, at least the pi example works fine.
But when I run the word count examples (as I explain above) it works but I have two warnings bothering me :
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
WARN snappy.LoadSnappy: Snappy native library not loaded.
Do you know how I can solve these ?
I get the first warning message too. Let me know how you fixed it.
Even I get the same warning….did u find a solution?
nope no solution yet but it is working fine even with the warnings
did you try :
brew install snappy
Thanks Ritesh, instructions are really good, worked without issue. I am new to Hadoop. Would you recommend any links for further example which will help me write my own job.
Thank you. Helped me as well.
Dear Retish,
I have done step 2.1, however, still got the error, do you know why? Thank you.
starting namenode, logging to /usr/local/Cellar/hadoop/1.1.2/libexec/bin/../logs/hadoop-chaochen-namenode-Chaos-iMac.local.out
2013-05-18 23:13:39.369 java[4390:1b03] Unable to load realm info from SCDynamicStore
@Chao,
Make sure that you copied the whole statement in 2.1: export HADOOP_OPTS=”-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk”
Apart from that I am not sure. I haven’t tried installing hadoop 1.1.2. let me know if that doesn’t work and I can try installing hadoop 1.1.2 tonight.
Ritesh
Thanks very useful but I am trying to install hadoop 1.1.2 but I am getting the following error.
The following two commands seem to work ok:
~ $ hadoop namenode -format
~ $ /usr/local/Cellar/hadoop/1.1.2/bin/start-all.sh
As you can see other commands seem to work ok
~ $ ps ax | grep hadoop | wc -l
5
However the example fails miserably… ideas?
~ $ hadoop jar /usr/local/Cellar/hadoop/1.1.2/libexec/hadoop-examples-1.1.2.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
2013-05-27 15:05:30.221 java[30151:1703] Unable to load realm info from SCDynamicStore
13/05/27 15:05:30 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/sergep/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
I just realized that for hadoop 1.1.2 you can skip Step #2 entirely. The other steps are still required.
Well explained..Thanks..Just installed 112.
I am running 1.1.2 and undid the step 2.* and I got the “Unable to load realm info from SCDynamicStore” error while executing hadoop namenode -format. But the command formatted. “/usr/local/Cellar/hadoop/1.1.2/bin/start-all.sh” starts hadoop, and “hadoop jar /usr/local/Cellar/hadoop/1.1.2/libexec/hadoop-examples-1.1.2.jar pi 10 10” throws the “Unable to load realm info from SCDynamicStore” error, but completes. However I think the results are incorrect. The last 2 lines of the output are:
Job Finished in 2.517 seconds
Estimated value of Pi is 3.20000000000000000000
The pi value seems way off.
I just played around a bit more with the sample and it appears that the pi estimation is correct. You just have to modify the final term, samples per map, to get more decimal places. Here are more detailed results.
hadoop jar /usr/local/Cellar/hadoop/1.1.2/libexec/hadoop-example1.1.2.jar pi 10 1000000000
Number of Maps = 10
Samples per Map = 1000000000
……..
Job Finished in 321.893 seconds
Estimated value of Pi is 3.14159266440000000000
Great Post.. you should start series of hadoop related components installation on Mac such as Hive, Accumulo, and etc..
I am unable to ssh to local host on my Mac (OS 10.8.x). I looked around on the web but couldn’t solve. Would greatly appreciate any help. Here is what I did.
* tested with rsa keys but when it didn’t work I created a new .ssh directory and create dsa files. didn’t work
* followed the instructions and enabled ‘remote login’
* disabled the firewall too.
* Following is what I get:
ssh localhost
Connection closed by ::1
In the /var/log I see the following message every time I try to test ‘ssh localhost’
sshd[1787]: fatal: Access denied for user by PAM account configuration [preauth]
Did you allow remote connection in System Preference -> Sharing -> Remote Login?
Fixed the problem. It was the setting in ‘enable remote login’ setting in Mac.
Well I did PAM file manipulation. SSHD config file. Still didnt work. This just did it. Thanks a lot. Luvyaaa..
Hi, when I run start-all.sh, i see some processes launched and I can see the icons for those on my window. I am working in some window, and if I launch some map-reduce job, my work get interrupted by those processes launching and coming to foreground/focus. How to avoid that on mac?
VJ
Thanks buddy.
Great instruction! It works on Mavericks!
Hi, I am trying to create a smoketest user in my hdfs. When i give the command hadoop fs -chmod 757 /mapred it shows the following error
chmod: Call From diliprnair-VAIO/192.168.1.136 to diliprnair-VAIO:8020 failed on
connection exception: java.net.ConnectException: Connection refused: no further
information; For more details see: http://wiki.apache.org/hadoop/ConnectionRef
used
Could you heplp
Ido not even know how I finished up here, however I believed this poxt was great.
I do not recognize who you might be buut definittely
you arre going to a famous blogger if you happen to are not
already. Cheers!