, , ,

I think docker is simplifying the big data dev ops concerns by a factor of 10x or more.

It is easy enough for me to just run a single command and bring to life any specific distribution of Hadoop in docker containers.

To get a flavor of it, thought of writing this blog entry. In the part 1 of this blog I had set up linux container based environment. In this entry, I am posting docker based environment set up.

Step 1:  Install Docker

Step 2: Install Kubernetes with Kubectl

In my case I do not want to mess with my laptop so I use a VM centos6.6 on my macbook pro. That way it is one extra step to start-up the VM, but it keeps my host laptop free of installations and configurations.

Once both step #1 and step #2 are working for you,

Here is how you will launch a hadoop instance.

Step 3: Create a PoD Definition for Kubernetes. Pick any available Hadoop image from Docker hub.

[dockeruser@centos6 docker-for-hadoop]$ vi hbase-single-node-pod.yaml
apiVersion: v1
kind: Pod
  name: hbase-single-node-pod
    name: hbase-single-node-pod
  - name: hbase
    image: 'santanu77/hadoop-docker'
    - containerPort: 60000
      hostPort: 60000
    - containerPort: 60010
      hostPort: 60010
    - containerPort: 8088
      hostPort: 8088
Step 4: Just launch the instance
[dockeruser@centos6 docker-for-hadoop]$ kubectl create -f hbase-single-node-pod.yaml
Now let us check the status of the instance.  Once this is working we can log into the instance or view a service etc.
[dockeruser@centos6 docker-for-hadoop]$ kubectl describe pod hbase-single-node
Name:                hbase-single-node-pod
Namespace:            default
Image(s):            santanu77/hadoop-docker
Labels:                name=hbase-single-node-pod
Status:                Running
Replication Controllers:    <none>
    Image:        santanu77/hadoop-docker
    State:        Running
      Started:        Thu, 10 Sep 2015 23:55:16 -0400
    Ready:        True
    Restart Count:    0
  Type        Status
  Ready     True
No events.

I can hit the hadoop cluster manager service from my host as well given that it the port 8088 was mapped to the hosts port. So I can access it using my VM’s static IP and port 8088.

Hadoop ClusterAlso I can directly SSH into my hadoop instance as any other instance and start running a job.