Hadoop Configuration using Ansible !

4 min readDec 20, 2020

Hello everyone ! Welcome to the blog on configuring Hadoop Services using Ansible !

Lets start writing our playbook !!!

We know that Hadoop cluster typically consists of many number of slaves and a master. Lets start writing our playbook for the Master !!

Here in my case, IP of my master is 172.20.10.2. We already knew that inorder to create clusters using Hadoop, we need jdk(Hadoop’s dependency) and hadoop packages. Here I have jdk rpm and hadoop rpm and stored its location using a variable(Helps while installing them). Now, my playbook consists of copy module which helps us to copy our files from one location to any desired location in my case I am storing them in / folder in Target Node, which acts as a Hadoop Namenode.

vars:
    jdkrpm: /jdk-8u171-linux-x64.rpm
    hadooprpm: /hadoop-1.2.1-1.x86_64.rpm  tasks:
  - name: Copying jdk file
    copy:
      src: /home/karthik/jdk-8u171-linux-x64.rpm
      dest: /  - name: Copying Hadoop file
    copy: 
      src: /home/karthik/hadoop-1.2.1-1.x86_64.rpm
      dest: /

Now I am using command/shell module to install the packages that I have stored their location using variables. Since I am using RedHat, I can use rpm command to install my packages. In your case, you can use your OS respected packet manager !

- name: Using command module to install
    shell: "rpm -i {{ jdkrpm }} --force"
    shell: "rpm -hiv {{ hadooprpm }} --force"

If there are any clusters formed before, its always recommended to clear the previous folder(in my case /nn folder) and after deleting those folder create a new folder for your new cluster.In my case, I am using ignore_errors: yes to ignore errors saying no such directory (Assuming there are no previous clusters or such folders). Since you are configuring master its also recommended to clear RAM before creating cluster !

- name: Deleting previous caches (if any)
    command: "rm -r /nn"
    ignore_errors: yes- name: creating a new folder !
    file:
      path: /nn
      state: directory

Here in these tasks, I am updating my core-site and hdfs-site file (which helps to say that I am configuring a Master)

- name: updating Core-site
    copy: 
      src: "/home/karthik/core-site.xml"
      dest: "/etc/hadoop/core-site.xml"
  - name: updating hdfs-site
    copy: 
      src: "/home/karthik/hdfs-site.xml"
      dest: "/etc/hadoop/hdfs-site.xml"

Its one of the best practice to format our namenode before starting our cluster and then I am using command module to start namenode service. Later I use debug module to say my namenode started successfully.

- name: Formatting namenode
    command: "echo Y | hadoop namenode -format"- name: Starting Namenode service
    command: "hadoop-daemon.sh start namenode"- debug:
      msg: Namenode started successfully !

Now, my entire playbook looks like

- hosts: 172.20.10.2
  vars:
    jdkrpm: /jdk-8u171-linux-x64.rpm
    hadooprpm: /hadoop-1.2.1-1.x86_64.rpm
  tasks:
  - name: Copying jdk file
    copy:
      src: /home/karthik/jdk-8u171-linux-x64.rpm
      dest: /
  - name: Copying Hadoop file
    copy: 
      src: /home/karthik/hadoop-1.2.1-1.x86_64.rpm
      dest: /
  - name: Using command module to install
    shell: "rpm -i {{ jdkrpm }} --force"
    shell: "rpm -hiv {{ hadooprpm }} --force"- name: Deleting previous caches (if any)
    command: "rm -r /nn"
    ignore_errors: yes- name: creating a new folder !
    file:
      path: /nn
      state: directory
      
  - name: updating Core-site
    copy: 
      src: "/home/karthik/core-site.xml"
      dest: "/etc/hadoop/core-site.xml"
  - name: updating hdfs-site
    copy: 
      src: "/home/karthik/hdfs-site.xml"
      dest: "/etc/hadoop/hdfs-site.xml"- name: Formatting namenode
    command: "echo Y | hadoop namenode -format"- name: Starting Namenode service
    command: "hadoop-daemon.sh start namenode"- debug:
      msg: Namenode started successfully !

In the same way you can configure slave node too ! In my case I am using only one slave ! You can use any number of slaves and group them in your Ansible’s Inventory file.

- hosts: 172.20.10.2
  vars:
    jdkrpm: /jdk-8u171-linux-x64.rpm
    hadooprpm: /hadoop-1.2.1-1.x86_64.rpm
  tasks:
  - name: Copying jdk file
    copy:
      src: /home/karthik/jdk-8u171-linux-x64.rpm
      dest: /
  - name: Copying Hadoop file
    copy: 
      src: /home/karthik/hadoop-1.2.1-1.x86_64.rpm
      dest: /
  - name: Using command module to install
    shell: "rpm -i {{ jdkrpm }} --force"
    shell: "rpm -hiv {{ hadooprpm }} --force"- name: Deleting previous caches (if any)
    command: "rm -r /dn"
    ignore_errors: yes- name: creating a new folder !
    file:
      path: /dn
      state: directory
      
  - name: updating Core-site
    copy: 
      src: "/home/karthik/core-site.xml"
      dest: "/etc/hadoop/core-site.xml"
  - name: updating hdfs-site
    copy: 
      src: "/home/karthik/hdfs-site.xml"
      dest: "/etc/hadoop/hdfs-site.xml"- name: Starting Datanode service
    command: "hadoop-daemon.sh start datanode"- debug:
      msg: Namenode started successfully !

After running those playbooks, you can configure your namenode and datanode ! And the best part is that you can simlify those playbooks using template module and many other which I recommend you to do !

After starting your services, you can check the status of your nodes in the OS that you used as a tagnet/managed node using jps command which shows the output like below.

Now our services has been started !

Thanks for reading my blog. Hope you all loved this blog and I will come up next time with an another exciting blog. This blog I have written is a part of my journey in ARTH — The School of Technologies, guided by World Record Holder Mr. Vimal daga.

Hadoop Configuration using Ansible !

Written by Karthik Avula