MacBookAir で Hadoop擬似分散環境構築
はじめに
Apache Mahoutで遊んでみたいので、MacBookAirにHadoopの環境を構築することにしました。
情報がたくさんありそうで、バージョンや環境の問題でまとまってなかったのでメモを残します。
参考
以下のページが参考になりました。
http://lizan.asia/blog/2012/11/13/mountain-lion-setup-hadoop/
http://shayanmasood.com/blog/how-to-setup-hadoop-on-mac-os-x-10-9-mavericks/
http://www.ayutaya.com/ops/os-x/hadoop-pdist
http://metasearch.sourceforge.jp/wiki/index.php?Hadoop%A5%BB%A5%C3%A5%C8%A5%A2%A5%C3%A5%D7
環境
手順
hadoopのインストール
brew install hadoop
hadoop設定
設定ファイルは /usr/local/Cellar/hadoop/1.2.1/libexec/conf 以下に全てあるらしい
設定内容は参考ページを真似して。
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
- core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
<configuration> <property> <name>dfs.name.dir</name> <value>/Users/${user.name}/hdfs/name-node</value> </property> <name>dfs.data.dir</name> <value>/Users/${user.name}/hdfs/data-node</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
- mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> </configuration>
ssh の設定
擬似分散環境では localhostにsshするので設定する
「システム環境設定」→「共有」→「リモートログイン」にチェックを入れてから
ssh-keygen -t rsa -P "" cat id_rsa >> authorized_keys
ssh localhost
でログインできればOK
hostname
sudo hostname localhost
で作業する必要がありました。。
このへんは後でちゃんと設定したいなと思います。
hadoop起動
いよいよhadoopを起動していきます
初期化
hadoop namenode -format
14/01/19 20:25:51 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.6.0_65 ************************************************************/ Re-format filesystem in /Users/junji/hdfs ? (Y or N) Y 14/01/19 20:25:54 INFO util.GSet: Computing capacity for map BlocksMap 14/01/19 20:25:54 INFO util.GSet: VM type = 64-bit 14/01/19 20:25:54 INFO util.GSet: 2.0% max memory = 1039859712 14/01/19 20:25:54 INFO util.GSet: capacity = 2^21 = 2097152 entries 14/01/19 20:25:54 INFO util.GSet: recommended=2097152, actual=2097152 14/01/19 20:25:54 INFO namenode.FSNamesystem: fsOwner=junji 14/01/19 20:25:55 INFO namenode.FSNamesystem: supergroup=supergroup 14/01/19 20:25:55 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/01/19 20:25:55 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 14/01/19 20:25:55 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 14/01/19 20:25:55 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 14/01/19 20:25:55 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/01/19 20:25:55 INFO common.Storage: Image file /Users/junji/hdfs/current/fsimage of size 111 bytes saved in 0 seconds. 14/01/19 20:25:55 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/Users/junji/hdfs/current/edits 14/01/19 20:25:55 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/Users/junji/hdfs/current/edits 14/01/19 20:25:55 INFO common.Storage: Storage directory /Users/junji/hdfs has been successfully formatted. 14/01/19 20:25:55 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1 ************************************************************/
hadoop-env.sh での設定がちゃんとできていないと、Unable to load realm info from SCDynamicStore
のエラーが出ます
起動
start-all.sh
starting namenode, logging to /usr/local/Cellar/hadoop/1.2.1/libexec/bin/../logs/hadoop-junji-namenode-localhost.out localhost: starting datanode, logging to /usr/local/Cellar/hadoop/1.2.1/libexec/bin/../logs/hadoop-junji-datanode-localhost.out localhost: starting secondarynamenode, logging to /usr/local/Cellar/hadoop/1.2.1/libexec/bin/../logs/hadoop-junji-secondarynamenode-localhost.out starting jobtracker, logging to /usr/local/Cellar/hadoop/1.2.1/libexec/bin/../logs/hadoop-junji-jobtracker-localhost.out localhost: starting tasktracker, logging to /usr/local/Cellar/hadoop/1.2.1/libexec/bin/../logs/hadoop-junji-tasktracker-localhost.out
確認
- http://127.0.0.1:50070/ にアクセスしてnamenodeを確認
- http://127.0.0.1:50030/ にアクセスしてstate: RUNNINGを確認
実行サンプル
hadoop jar /usr/local/Cellar/hadoop/1.2.1/libexec/hadoop-examples-1.2.1.jar pi 2 100
Number of Maps = 2 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/01/19 21:22:11 INFO mapred.FileInputFormat: Total input paths to process : 2 14/01/19 21:22:11 INFO mapred.JobClient: Running job: job_201401192119_0001 14/01/19 21:22:12 INFO mapred.JobClient: map 0% reduce 0% 14/01/19 21:22:18 INFO mapred.JobClient: map 100% reduce 0% 14/01/19 21:22:25 INFO mapred.JobClient: map 100% reduce 33% 14/01/19 21:22:26 INFO mapred.JobClient: map 100% reduce 100% 14/01/19 21:22:27 INFO mapred.JobClient: Job complete: job_201401192119_0001 14/01/19 21:22:27 INFO mapred.JobClient: Counters: 27 14/01/19 21:22:27 INFO mapred.JobClient: Job Counters 14/01/19 21:22:27 INFO mapred.JobClient: Launched reduce tasks=1 14/01/19 21:22:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8154 14/01/19 21:22:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/01/19 21:22:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/01/19 21:22:27 INFO mapred.JobClient: Launched map tasks=2 14/01/19 21:22:27 INFO mapred.JobClient: Data-local map tasks=2 14/01/19 21:22:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8750 14/01/19 21:22:27 INFO mapred.JobClient: File Input Format Counters 14/01/19 21:22:27 INFO mapred.JobClient: Bytes Read=236 14/01/19 21:22:27 INFO mapred.JobClient: File Output Format Counters 14/01/19 21:22:27 INFO mapred.JobClient: Bytes Written=97 14/01/19 21:22:27 INFO mapred.JobClient: FileSystemCounters 14/01/19 21:22:27 INFO mapred.JobClient: FILE_BYTES_READ=50 14/01/19 21:22:27 INFO mapred.JobClient: HDFS_BYTES_READ=480 14/01/19 21:22:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=165610 14/01/19 21:22:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 14/01/19 21:22:27 INFO mapred.JobClient: Map-Reduce Framework 14/01/19 21:22:27 INFO mapred.JobClient: Map output materialized bytes=56 14/01/19 21:22:27 INFO mapred.JobClient: Map input records=2 14/01/19 21:22:27 INFO mapred.JobClient: Reduce shuffle bytes=56 14/01/19 21:22:27 INFO mapred.JobClient: Spilled Records=8 14/01/19 21:22:27 INFO mapred.JobClient: Map output bytes=36 14/01/19 21:22:27 INFO mapred.JobClient: Total committed heap usage (bytes)=454238208 14/01/19 21:22:27 INFO mapred.JobClient: Map input bytes=48 14/01/19 21:22:27 INFO mapred.JobClient: Combine input records=0 14/01/19 21:22:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=244 14/01/19 21:22:27 INFO mapred.JobClient: Reduce input records=4 14/01/19 21:22:27 INFO mapred.JobClient: Reduce input groups=4 14/01/19 21:22:27 INFO mapred.JobClient: Combine output records=0 14/01/19 21:22:27 INFO mapred.JobClient: Reduce output records=0 14/01/19 21:22:27 INFO mapred.JobClient: Map output records=4 Job Finished in 16.776 seconds Estimated value of Pi is 3.12000000000000000000
まとめ
次回はmahoutと絡めて動かしていきたいと思います。