• About
  • Home

Big is Small

~ APIs, ML Engineering at scale, and Cloud is making it all small, connected and intelligent.

Big is Small

Tag Archives: hadoop

Setting up a local Development Environment for Playing with Big Data: Part 2 : Play with Hadoop on Docker

11 Friday Sep 2015

Posted by santanu77 in How-to

≈ 1 Comment

Tags

Big Data, docker, hadoop, how to

I think docker is simplifying the big data dev ops concerns by a factor of 10x or more.

It is easy enough for me to just run a single command and bring to life any specific distribution of Hadoop in docker containers.

To get a flavor of it, thought of writing this blog entry. In the part 1 of this blog I had set up linux container based environment. In this entry, I am posting docker based environment set up.

Step 1:  Install Docker

Step 2: Install Kubernetes with Kubectl

In my case I do not want to mess with my laptop so I use a VM centos6.6 on my macbook pro. That way it is one extra step to start-up the VM, but it keeps my host laptop free of installations and configurations.

Once both step #1 and step #2 are working for you,

Here is how you will launch a hadoop instance.

Step 3: Create a PoD Definition for Kubernetes. Pick any available Hadoop image from Docker hub.

[dockeruser@centos6 docker-for-hadoop]$ vi hbase-single-node-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: hbase-single-node-pod
  labels:
    name: hbase-single-node-pod
spec:
  containers:
  - name: hbase
    image: 'santanu77/hadoop-docker'
    ports:
    - containerPort: 60000
      hostPort: 60000
    - containerPort: 60010
      hostPort: 60010
    - containerPort: 8088
      hostPort: 8088
Step 4: Just launch the instance
[dockeruser@centos6 docker-for-hadoop]$ kubectl create -f hbase-single-node-pod.yaml
pods/hbase-single-node-pod
Now let us check the status of the instance.  Once this is working we can log into the instance or view a service etc.
[dockeruser@centos6 docker-for-hadoop]$ kubectl describe pod hbase-single-node
Name:                hbase-single-node-pod
Namespace:            default
Image(s):            santanu77/hadoop-docker
Node:                127.0.0.1/127.0.0.1
Labels:                name=hbase-single-node-pod
Status:                Running
Reason:
Message:
IP:                172.17.0.1
Replication Controllers:    <none>
Containers:
  hbase:
    Image:        santanu77/hadoop-docker
    State:        Running
      Started:        Thu, 10 Sep 2015 23:55:16 -0400
    Ready:        True
    Restart Count:    0
Conditions:
  Type        Status
  Ready     True
No events.

I can hit the hadoop cluster manager service from my host as well given that it the port 8088 was mapped to the hosts port. So I can access it using my VM’s static IP and port 8088.

Hadoop ClusterAlso I can directly SSH into my hadoop instance as any other instance and start running a job.

Topics

AI analytics API Big Data container Device docker IoT java kubernetes logging LXC Machine-Learning ML Oauth Oauthv2.0 performance Protocol security Sensor VirtualBox Virtualization

Recent Posts

  • ML Certifications to pursue in 2022
  • Operational Challenges of Data Science
  • Data Science Platform Capabilities
  • Data Management Capabilities Needed for Real-time Predictive Maintenance Use Cases
  • Analytics Platform Assessment Questionnaire Download

Blog Posts

  • January 2022 (1)
  • July 2019 (1)
  • January 2019 (1)
  • December 2018 (1)
  • September 2017 (1)
  • September 2016 (1)
  • July 2016 (1)
  • March 2016 (1)
  • February 2016 (2)
  • October 2015 (1)
  • September 2015 (1)
  • May 2015 (1)
  • April 2015 (2)
  • September 2014 (1)
  • June 2014 (2)

Categories

Follow @Santanu_Dey on Twitter

My Tweets

Blog at WordPress.com.

  • Follow Following
    • Big is Small
    • Already have a WordPress.com account? Log in now.
    • Big is Small
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...