This will add a quick note on how to install necessary Hadoop ecosystem by Installing Apache BigTop for Hadoop Development.
- Remember that we need a 64-bit OS to be able to run Big Data project.
- First, add a new repo. The latest BigTop repo (0.8.0) is located at http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/ and we just need to add appropriate repo for the current OS. For example, with CentOS 6:
[bash]wget -O /etc/yum.repos.d/bigtop.repo http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/centos6/bigtop.repo[/bash]
- Then, simply install your selected software within Apache BigTop. E.g.
[bash]yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\*[/bash]
- We then need Java environment for all BigTop programs
[bash]sudo yum install java-1.7.0-openjdk-devel.x86_64[/bash]
- Format the NameNode:
[bash]sudo /etc/init.d/hadoop-hdfs-namenode init[/bash]
- Start the Hadoop services for your pseudodistributed cluster:
[bash]for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; \
do sudo service $i start ; done[/bash] - Create a sub-directory structure in HDFS:
[bash]sudo /usr/lib/hadoop/libexec/init-hdfs.sh[/bash]
- Start the YARN daemons:
[bash]sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start[/bash]
The above steps will reduce much effort spent for installing and linking compatible softwares if we only install each package :-). Of course we can still use other built-in package such as CloudEra or Hortonworks’ ones.
Troubleshooting for Noobs
1. If there is “org.apache.hadoop.security.AccessControlException” relates to “/tmp/hadoop-yarn” while running hadoop/pig/etc. scripts, it is due to the permission on the tmp folder. Do not be frustrated, just run
[bash]sudo -u hdfs hadoop fs -chmod -R 777 /tmp/hadoop-yarn[/bash]
and you will be fine 🙂