Install Apache BigTop for Hadoop Development

bigtopThis will add a quick note on how to install necessary Hadoop ecosystem by Installing Apache BigTop for Hadoop Development.

  1. Remember that we need a 64-bit OS to be able to run Big Data project.
  2. First, add a new repo. The latest BigTop repo (0.8.0) is located at http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/ and we just need to add appropriate repo for the current OS. For example, with CentOS 6:

    [bash]wget -O /etc/yum.repos.d/bigtop.repo http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/centos6/bigtop.repo[/bash]

  3. Then, simply install your selected software within Apache BigTop. E.g.

    [bash]yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\*[/bash]

  4. We then need Java environment for all BigTop programs

    [bash]sudo yum install java-1.7.0-openjdk-devel.x86_64[/bash]

  5. Format the NameNode:

    [bash]sudo /etc/init.d/hadoop-hdfs-namenode init[/bash]

  6. Start the Hadoop services for your pseudodistributed cluster:

    [bash]for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; \
    do sudo service $i start ; done[/bash]

  7. Create a sub-directory structure in HDFS:

    [bash]sudo /usr/lib/hadoop/libexec/init-hdfs.sh[/bash]

  8. Start the YARN daemons:

    [bash]sudo service hadoop-yarn-resourcemanager start
    sudo service hadoop-yarn-nodemanager start[/bash]

The above steps will reduce much effort spent for installing and linking compatible softwares if we only install each package :-). Of course we can still use other built-in package such as CloudEra or Hortonworks’ ones.

Troubleshooting for Noobs

1. If there is “org.apache.hadoop.security.AccessControlException” relates to “/tmp/hadoop-yarn” while running hadoop/pig/etc. scripts, it is due to the permission on the tmp folder. Do not be frustrated, just run
[bash]sudo -u hdfs hadoop fs -chmod -R 777 /tmp/hadoop-yarn[/bash]
and you will be fine 🙂

Leave a Reply