Operational Challenges of Data Science
05 Friday Jul 2019
Posted How-to
in05 Friday Jul 2019
Posted How-to
in17 Sunday Sep 2017
Posted How-to
inOften businesses on their Analytics journey need to decide on the technologies, timeframe, scale, budget, team structure etc. to be successful. In order to take a holistic approach it is critical to discover the current situation at first. To take stock of the organization’s analytics requirements, capabilities, priorities and so on, some essential questions need to be discussed in a structured manner by the relevant Business units, stakeholders and even may include external consultants.
In my experience, the best way to ensure that all relevant points are covered, a standard “Analytics Platform Assessment Questionnaire” is a good tool to that can get you started. It covers questions from strategy point of view, project level details and data perspectives as well.
Here is the download link: Analytics Platform Assessment Questionnaire.
Please share your email by submitting the contact form below, (I will not sell your emails or spam you, this is just for my own download tracking purposes)
03 Saturday Sep 2016
Posted How-to
inIn this post I am going to share my experience with
export MASTER_IP=192.168.56.121 # is needed by all nodes export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes export FLANNEL_IPMASQ=true
An easy way to create a script to start the bootstrap version is
case $1 in start) sudo sh -c 'docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --iptables=false --ip-masq=false --mtu=1500 --bridge=none --exec-root=/var/run/docker-bootstrap --graph=/var/lib/docker-bootstrap 2> /var/log/docker-bootstrap.log 1> /dev/null &' esac
Note: In my case I had to add the MTU option to get past this issue https://github.com/docker/docker/issues/15498 . Otherwise MTU should be optional.
Then I also wrote a script to start and stop all services. Below is the one for kubernetes master node
## Checking commandline arguments while test $# -gt 0; do case "$1" in -h|--help) echo Usage: echo "start-k8s.sh [proxy true]" exit 0 ;; -p|--proxy) shift if test $# -gt 0; then export PROXY=$1 else echo "invalid argument for proxy" exit 1 fi shift ;; *) break ;; esac done #stop any running instance /opt/docker-bootstrap/stop-k8s.sh ## Setting up env var ############################################# export MASTER_IP=192.168.56.121 # get from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt export K8S_VERSION=v1.4.0-alpha.1 # get from https://gcr.io/v2/google_containers/etcd-amd64/tags/list export ETCD_VERSION=2.2.5 # get from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags export FLANNEL_VERSION=0.5.5 # the interface that would connect all hosts export FLANNEL_IFACE=enp0s8 export FLANNEL_IPMASQ=true ## starting docker boot-strap /opt/docker-bootstrap/docker-boostrap start echo "waiting for docker-bootstrap to start" sleep 5 ## starting up docker #sudo systemctl start docker ## start etcd sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \ --net=host \ gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \ /usr/local/bin/etcd \ --listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \ --advertise-client-urls=http://${MASTER_IP}:4001 \ --data-dir=/var/etcd/data echo "waiting for etc-d to start" sleep 25 ## Save a network config sudo docker -H unix:///var/run/docker-bootstrap.sock run \ --net=host \ gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \ etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }' echo "waiting for network config to save" sleep 5 ## Run Flannel flannel_image_id=$(sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \ --net=host \ --privileged \ -v /dev/net:/dev/net \ quay.io/coreos/flannel:${FLANNEL_VERSION} \ /opt/bin/flanneld \ --ip-masq=${FLANNEL_IPMASQ} \ --etcd-endpoints=http://${MASTER_IP}:4001 \ --iface=${FLANNEL_IFACE}) echo "waiting for Flannel to pick up config" sleep 5 echo Flannel config is SET_VARIABLES=$(sudo docker -H unix:///var/run/docker-bootstrap.sock exec $flannel_image_id cat /run/flannel/subnet.env) eval $SET_VARIABLES sudo bash -c "echo [Service] > /etc/systemd/system/docker.service.d/docker.conf" if [ "$PROXY" == "true" ] then sudo bash -c "echo Environment=HTTP_PROXY=http://203.127.104.198:8080/ NO_PROXY=localhost,127.0.0.1,192.168.0.0/16,10.0.0.0/16 FLANNEL_NETWORK=$FLANNE L_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.service.d/docker.conf" else sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s ervice.d/docker.conf" fi echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU ## Delete docker networking sudo /sbin/ifconfig docker0 down sudo brctl delbr docker0 ## Start docker service sudo systemctl daemon-reload sudo systemctl start docker sudo systemctl status docker -l ## Start kubernetes master sudo docker run \ --volume=/:/rootfs:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:rw \ --volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \ --volume=/var/run:/var/run:rw \ --net=host \ --privileged=true \ --pid=host \ -d \ gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \ sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s ervice.d/docker.conf" fi echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU ## Delete docker networking sudo /sbin/ifconfig docker0 down sudo brctl delbr docker0 ## Start docker service sudo systemctl daemon-reload sudo systemctl start docker sudo systemctl status docker -l ## Start kubernetes master sudo docker run \ --volume=/:/rootfs:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:rw \ --volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \ --volume=/var/run:/var/run:rw \ --net=host \ --privileged=true \ --pid=host \ -d \ gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \ /hyperkube kubelet \ --allow-privileged=true \ --api-servers=http://localhost:8080 \ --v=2 \ --address=0.0.0.0 \ --enable-server \ --hostname-override=127.0.0.1 \ --config=/etc/kubernetes/manifests-multi \ --containerized \ --cluster-dns=10.0.0.10 \ --cluster-domain=cluster.local ## Sleep 10 echo get all pods sleep 10 kubectl create -f dashboard-service.yaml --namespace=kube-system kubectl get pod --all-namespaces
Note: All the source code is available at https://github.com/santanu-dey/kubernetes-cluster
Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes. Checkout the git hub repo. Once the master VM is ready, it can be cloned to create the worker VMs.
Start up the services
Once the services are started up then the spark services can be started up like below:
Master Node
# ./start-k8s.sh # kubectl get node NAME STATUS AGE 127.0.0.1 Ready 1m 
Similarly when the worker nodes are up they would show up on the list of node
# kubectl get node NAME STATUS AGE 127.0.0.1 Ready 1h kubernetes2 Ready 31m kubernetes3 Ready 19m
And also the kubernetes cluster would show up as below
# kubectl get svc --all-namespaces -o yaml NAMESPACE NAME READY STATUS RESTARTS AGE kube-system k8s-master-127.0.0.1 4/4 Running 1 4m kube-system k8s-proxy-127.0.0.1 1/1 Running 0 4m kube-system kube-addon-manager-127.0.0.1 2/2 Running 0 4m kube-system kube-dns-v18-7tvnm 3/3 Running 0 4m kube-system kubernetes-dashboard-v1.1.0-q30lc 1/1 Running 0 4m
And then the kubernetes cluster is ready for running any container workload. I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster
Putting it all together :
23 Saturday Jul 2016
Considerations
|
Kubernetes
|
Docker Swarm
|
Adoption and Maturity
|
Kubernetes is much ahead with adoption from major companies like RedHat for OpenShit, Rackspace for Solum.
Google cloud platform and AWS also has seen Kubernetes deployments. It is a standard offering.
The product is also quite active in git hub and has been updating frequently.
|
Docker swarm is relatively new.
Also the code frequency is not as massive as kubernetes.
|
Deployment Environment
|
Kubernetes readily installs on virtually everything starting from bare Linux OS to Docker or Vagrant or Cloud or Mesos.
|
Docker swarm manager can run on linux.
Installation on anything else will have to be done following the installation steps.
|
Features
|
Kubernetes is feature reach, for now:
|
All of these can be achieved in docker swarm as well. However, as of now these are not straight out of the box features in Docker Swarm.
|
docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
13 Sunday Mar 2016
Posted How-to
in12 Friday Feb 2016
Posted How-to
inThis is why I think this place is crowded !!
09 Tuesday Feb 2016
Posted How-to
inTags
analytics, aws, javascript, mobile, sdk
Here is a short video on using AWS Mobile Analytics.
Associated source code can be found at https://github.com/santanu-dey/aws-mobile-analytics.git
05 Monday Oct 2015
Posted How-to
inI tried to use AWS CloudFormation to automate AWS deployment. This is similar to Docker / Kubernetes combination, functionally, to launch and maintain a host of computing resources. However the things is that AWS is very slow to launch and terminate these resources. But the concept works very well. Check out below:
Note: If the video does not work, you can view it directly at youtube https://www.youtube.com/watch?v=Cs_0r04ajb8
The slides are here:
11 Friday Sep 2015
I think docker is simplifying the big data dev ops concerns by a factor of 10x or more.
It is easy enough for me to just run a single command and bring to life any specific distribution of Hadoop in docker containers.
To get a flavor of it, thought of writing this blog entry. In the part 1 of this blog I had set up linux container based environment. In this entry, I am posting docker based environment set up.
Step 1: Install Docker
Step 2: Install Kubernetes with Kubectl
In my case I do not want to mess with my laptop so I use a VM centos6.6 on my macbook pro. That way it is one extra step to start-up the VM, but it keeps my host laptop free of installations and configurations.
Once both step #1 and step #2 are working for you,
Here is how you will launch a hadoop instance.
Step 3: Create a PoD Definition for Kubernetes. Pick any available Hadoop image from Docker hub.
[dockeruser@centos6 docker-for-hadoop]$ vi hbase-single-node-pod.yaml apiVersion: v1 kind: Pod metadata: name: hbase-single-node-pod labels: name: hbase-single-node-pod spec: containers: - name: hbase image: 'santanu77/hadoop-docker' ports: - containerPort: 60000 hostPort: 60000 - containerPort: 60010 hostPort: 60010 - containerPort: 8088 hostPort: 8088
[dockeruser@centos6 docker-for-hadoop]$ kubectl create -f hbase-single-node-pod.yaml pods/hbase-single-node-pod
[dockeruser@centos6 docker-for-hadoop]$ kubectl describe pod hbase-single-node Name: hbase-single-node-pod Namespace: default Image(s): santanu77/hadoop-docker Node: 127.0.0.1/127.0.0.1 Labels: name=hbase-single-node-pod Status: Running Reason: Message: IP: 172.17.0.1 Replication Controllers: <none> Containers: hbase: Image: santanu77/hadoop-docker State: Running Started: Thu, 10 Sep 2015 23:55:16 -0400 Ready: True Restart Count: 0 Conditions: Type Status Ready True No events.
I can hit the hadoop cluster manager service from my host as well given that it the port 8088 was mapped to the hosts port. So I can access it using my VM’s static IP and port 8088.
17 Sunday May 2015
Posted How-to
inOAuth is very relevant for Internet and Data-sharing use cases. It has now become a base standard for various consumer centric services. Here is a quick primer on the concept along with a quick demo, without going into the low level details of the spec. Here are the, The slides :
The Video: If the embedded video is not playing please use this link http://youtu.be/H0P6rXQCoSU