• About
  • Home

Big is Small

~ APIs, ML Engineering at scale, and Cloud is making it all small, connected and intelligent.

Big is Small

Tag Archives: docker

Spark Cluster using Multi-Node Kubernetes and Docker

03 Saturday Sep 2016

Posted by santanu77 in How-to

≈ 4 Comments

Tags

Big Data, cluster, container, docker, How-to, kubernetes

In this post I am going to share my experience with

  • setting up a kubernetes multinode cluster on docker
  • then running a spark cluster on kubernetes
Installation
My Installation was 3 node:
I used virtual box and CentOS 7 to create the master node first & then cloned to create the worker nodes.
  • Kubernetes Master & 1st worker node – 192.168.56.121
  • Kubernetes 2nd worker node – 192.168.56.122
  • Kubernetes 3rd worker node – 192.168.56.123

ScreenClip.png

I have used the following version of software and installation steps on all node installation
  • The environment variables also reflect the software version
export MASTER_IP=192.168.56.121 # is needed by all nodes
export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags
export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes
export FLANNEL_IPMASQ=true
  • Installation steps to follow are here in Kubernetes official site http://kubernetes.io/docs/getting-started-guides/docker-multinode/
  • For the time being, allow all connections through Iptables and delete all rules. Keep the service running
  • Install Docker – I used version 1.11.2
  • Optionally set up host names in /etc/hosts file – if you are connecting via hostnames. And also set the hostname of the VM
You will note that, in the docker based version of Kubernetes there are two docker instances are running on all nodes.
  • Docker 1st instance, let’s call it the bootstrap docker instance -this one is required for the Kubernetes components itself
  • Docker 2nd instance, this is the regular installation of Docker required for running the managed containers on the VM nodes.

An easy way to create a script to start the bootstrap version is

case $1 in
 start)
 sudo sh -c 'docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --iptables=false --ip-masq=false --mtu=1500 --bridge=none --exec-root=/var/run/docker-bootstrap --graph=/var/lib/docker-bootstrap 2> /var/log/docker-bootstrap.log 1> /dev/null &'
esac

Note: In my case I had to add the MTU option to get past this issue https://github.com/docker/docker/issues/15498 .  Otherwise MTU should be optional.

Then I also wrote a script to start and stop all services.  Below is the one for kubernetes master node

## Checking commandline arguments
while test $# -gt 0; do
case "$1" in
-h|--help)
echo Usage:
echo "start-k8s.sh [proxy true]"
exit 0
;;
-p|--proxy)
shift
if test $# -gt 0; then
export PROXY=$1
else
echo "invalid argument for proxy"
exit 1
fi
shift
;;
*)
break
;;
esac
done

#stop any running instance
/opt/docker-bootstrap/stop-k8s.sh

## Setting up env var
#############################################
export MASTER_IP=192.168.56.121

# get from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export K8S_VERSION=v1.4.0-alpha.1

# get from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export ETCD_VERSION=2.2.5

# get from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags
export FLANNEL_VERSION=0.5.5

# the interface that would connect all hosts
export FLANNEL_IFACE=enp0s8
export FLANNEL_IPMASQ=true

## starting docker boot-strap
/opt/docker-bootstrap/docker-boostrap start

echo "waiting for docker-bootstrap to start"
sleep 5

## starting up docker
#sudo systemctl start docker

## start etcd
sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
/usr/local/bin/etcd \
--listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \
--advertise-client-urls=http://${MASTER_IP}:4001 \
--data-dir=/var/etcd/data
echo "waiting for etc-d to start"
sleep 25

## Save a network config
sudo docker -H unix:///var/run/docker-bootstrap.sock run \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'

echo "waiting for network config to save"
sleep 5

## Run Flannel
flannel_image_id=$(sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
--privileged \
-v /dev/net:/dev/net \
quay.io/coreos/flannel:${FLANNEL_VERSION} \
/opt/bin/flanneld \
--ip-masq=${FLANNEL_IPMASQ} \
--etcd-endpoints=http://${MASTER_IP}:4001 \
--iface=${FLANNEL_IFACE})

echo "waiting for Flannel to pick up config"
sleep 5

echo Flannel config is
SET_VARIABLES=$(sudo docker -H unix:///var/run/docker-bootstrap.sock exec $flannel_image_id cat /run/flannel/subnet.env)
eval $SET_VARIABLES
sudo bash -c "echo [Service] > /etc/systemd/system/docker.service.d/docker.conf"

if [ "$PROXY" == "true" ]
then
sudo bash -c "echo Environment=HTTP_PROXY=http://203.127.104.198:8080/ NO_PROXY=localhost,127.0.0.1,192.168.0.0/16,10.0.0.0/16 FLANNEL_NETWORK=$FLANNE
L_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.service.d/docker.conf"

else
sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s
ervice.d/docker.conf"

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s
ervice.d/docker.conf"

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
/hyperkube kubelet \
--allow-privileged=true \
--api-servers=http://localhost:8080 \
--v=2 \
--address=0.0.0.0 \
--enable-server \
--hostname-override=127.0.0.1 \
--config=/etc/kubernetes/manifests-multi \
--containerized \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local

## Sleep 10
echo get all pods
sleep 10
kubectl create -f dashboard-service.yaml --namespace=kube-system
kubectl get pod --all-namespaces

Note: All the source code is available at https://github.com/santanu-dey/kubernetes-cluster

Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes.  Checkout the git hub repo.  Once the master VM is ready, it can be cloned to create the worker VMs.

Start up the services 

Once the services are started up then the spark services can be started up like below:

Master Node

# ./start-k8s.sh
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1m 

Similarly when the worker nodes are up they would show up on the list of node

 
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1h
kubernetes2 Ready 31m
kubernetes3 Ready 19m

And also the kubernetes cluster would show up as below

# kubectl get svc --all-namespaces -o yaml
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system k8s-master-127.0.0.1 4/4 Running 1 4m
kube-system k8s-proxy-127.0.0.1 1/1 Running 0 4m
kube-system kube-addon-manager-127.0.0.1 2/2 Running 0 4m
kube-system kube-dns-v18-7tvnm 3/3 Running 0 4m
kube-system kubernetes-dashboard-v1.1.0-q30lc 1/1 Running 0 4m

And then the kubernetes cluster is ready for running any container workload.  I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster

Putting it all together :

Setting up a local Development Environment for Playing with Big Data: Part 2 : Play with Hadoop on Docker

11 Friday Sep 2015

Posted by santanu77 in How-to

≈ 1 Comment

Tags

Big Data, docker, hadoop, how to

I think docker is simplifying the big data dev ops concerns by a factor of 10x or more.

It is easy enough for me to just run a single command and bring to life any specific distribution of Hadoop in docker containers.

To get a flavor of it, thought of writing this blog entry. In the part 1 of this blog I had set up linux container based environment. In this entry, I am posting docker based environment set up.

Step 1:  Install Docker

Step 2: Install Kubernetes with Kubectl

In my case I do not want to mess with my laptop so I use a VM centos6.6 on my macbook pro. That way it is one extra step to start-up the VM, but it keeps my host laptop free of installations and configurations.

Once both step #1 and step #2 are working for you,

Here is how you will launch a hadoop instance.

Step 3: Create a PoD Definition for Kubernetes. Pick any available Hadoop image from Docker hub.

[dockeruser@centos6 docker-for-hadoop]$ vi hbase-single-node-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: hbase-single-node-pod
  labels:
    name: hbase-single-node-pod
spec:
  containers:
  - name: hbase
    image: 'santanu77/hadoop-docker'
    ports:
    - containerPort: 60000
      hostPort: 60000
    - containerPort: 60010
      hostPort: 60010
    - containerPort: 8088
      hostPort: 8088
Step 4: Just launch the instance
[dockeruser@centos6 docker-for-hadoop]$ kubectl create -f hbase-single-node-pod.yaml
pods/hbase-single-node-pod
Now let us check the status of the instance.  Once this is working we can log into the instance or view a service etc.
[dockeruser@centos6 docker-for-hadoop]$ kubectl describe pod hbase-single-node
Name:                hbase-single-node-pod
Namespace:            default
Image(s):            santanu77/hadoop-docker
Node:                127.0.0.1/127.0.0.1
Labels:                name=hbase-single-node-pod
Status:                Running
Reason:
Message:
IP:                172.17.0.1
Replication Controllers:    <none>
Containers:
  hbase:
    Image:        santanu77/hadoop-docker
    State:        Running
      Started:        Thu, 10 Sep 2015 23:55:16 -0400
    Ready:        True
    Restart Count:    0
Conditions:
  Type        Status
  Ready     True
No events.

I can hit the hadoop cluster manager service from my host as well given that it the port 8088 was mapped to the hosts port. So I can access it using my VM’s static IP and port 8088.

Hadoop ClusterAlso I can directly SSH into my hadoop instance as any other instance and start running a job.

Topics

AI analytics API Big Data container Device docker IoT java kubernetes logging LXC Machine-Learning ML Oauth Oauthv2.0 performance Protocol security Sensor VirtualBox Virtualization

Recent Posts

  • ML Certifications to pursue in 2022
  • Operational Challenges of Data Science
  • Data Science Platform Capabilities
  • Data Management Capabilities Needed for Real-time Predictive Maintenance Use Cases
  • Analytics Platform Assessment Questionnaire Download

Blog Posts

  • January 2022 (1)
  • July 2019 (1)
  • January 2019 (1)
  • December 2018 (1)
  • September 2017 (1)
  • September 2016 (1)
  • July 2016 (1)
  • March 2016 (1)
  • February 2016 (2)
  • October 2015 (1)
  • September 2015 (1)
  • May 2015 (1)
  • April 2015 (2)
  • September 2014 (1)
  • June 2014 (2)

Categories

Follow @Santanu_Dey on Twitter

My Tweets

Blog at WordPress.com.

  • Follow Following
    • Big is Small
    • Already have a WordPress.com account? Log in now.
    • Big is Small
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...