Operational Challenges of Data Science
05 Friday Jul 2019
Posted How-to
in05 Friday Jul 2019
Posted How-to
in02 Wednesday Jan 2019
Posted Viewpoint
inTags
01 Saturday Dec 2018
Posted Viewpoint
inTags
When dealing with Predictive Maintenance of machines and systems, some capabilities are essential in the underlying data platform to effectively carry out maintenance Just-In-Time before the actual failure happens.
Let us look at the most key capabilities more closely.
Imminent failure signals can be detected within a few minutes of appearing, if real-time data is made available to the predictive analytics system. Not taking advantage of available signals as soon as they occur results in inability to take corrective actions within a reasonable time and leads to operational outages.
Also, real-time situational awareness of the overall operational aspects can only be derived from real-time ingestion of data. Real-time visualisation can be only possible with real-time data points available in the system.
Real-time ingestion and processing of telemetry data from sensor might technically become very challenging soon
Now add time-series to this mix.
Usually time series data has the following characteristics:
General purpose NoSQL databases such as HBase and traditional RDBMS databases such as MySQL are not well equipped to handle time-series data mainly due to the following reasons:
The Data Platform for Predictive Maintenance use cases should be equipped with a time-series database that supports compression algorithms built-in, more data can be efficiently made available for computation workloads.
There are various reasons why serverless is important in this use case.
17 Sunday Sep 2017
Posted How-to
inOften businesses on their Analytics journey need to decide on the technologies, timeframe, scale, budget, team structure etc. to be successful. In order to take a holistic approach it is critical to discover the current situation at first. To take stock of the organization’s analytics requirements, capabilities, priorities and so on, some essential questions need to be discussed in a structured manner by the relevant Business units, stakeholders and even may include external consultants.
In my experience, the best way to ensure that all relevant points are covered, a standard “Analytics Platform Assessment Questionnaire” is a good tool to that can get you started. It covers questions from strategy point of view, project level details and data perspectives as well.
Here is the download link: Analytics Platform Assessment Questionnaire.
Please share your email by submitting the contact form below, (I will not sell your emails or spam you, this is just for my own download tracking purposes)
03 Saturday Sep 2016
Posted How-to
inIn this post I am going to share my experience with
export MASTER_IP=192.168.56.121 # is needed by all nodes export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes export FLANNEL_IPMASQ=true
An easy way to create a script to start the bootstrap version is
case $1 in start) sudo sh -c 'docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --iptables=false --ip-masq=false --mtu=1500 --bridge=none --exec-root=/var/run/docker-bootstrap --graph=/var/lib/docker-bootstrap 2> /var/log/docker-bootstrap.log 1> /dev/null &' esac
Note: In my case I had to add the MTU option to get past this issue https://github.com/docker/docker/issues/15498 . Otherwise MTU should be optional.
Then I also wrote a script to start and stop all services. Below is the one for kubernetes master node
## Checking commandline arguments while test $# -gt 0; do case "$1" in -h|--help) echo Usage: echo "start-k8s.sh [proxy true]" exit 0 ;; -p|--proxy) shift if test $# -gt 0; then export PROXY=$1 else echo "invalid argument for proxy" exit 1 fi shift ;; *) break ;; esac done #stop any running instance /opt/docker-bootstrap/stop-k8s.sh ## Setting up env var ############################################# export MASTER_IP=192.168.56.121 # get from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt export K8S_VERSION=v1.4.0-alpha.1 # get from https://gcr.io/v2/google_containers/etcd-amd64/tags/list export ETCD_VERSION=2.2.5 # get from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags export FLANNEL_VERSION=0.5.5 # the interface that would connect all hosts export FLANNEL_IFACE=enp0s8 export FLANNEL_IPMASQ=true ## starting docker boot-strap /opt/docker-bootstrap/docker-boostrap start echo "waiting for docker-bootstrap to start" sleep 5 ## starting up docker #sudo systemctl start docker ## start etcd sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \ --net=host \ gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \ /usr/local/bin/etcd \ --listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \ --advertise-client-urls=http://${MASTER_IP}:4001 \ --data-dir=/var/etcd/data echo "waiting for etc-d to start" sleep 25 ## Save a network config sudo docker -H unix:///var/run/docker-bootstrap.sock run \ --net=host \ gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \ etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }' echo "waiting for network config to save" sleep 5 ## Run Flannel flannel_image_id=$(sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \ --net=host \ --privileged \ -v /dev/net:/dev/net \ quay.io/coreos/flannel:${FLANNEL_VERSION} \ /opt/bin/flanneld \ --ip-masq=${FLANNEL_IPMASQ} \ --etcd-endpoints=http://${MASTER_IP}:4001 \ --iface=${FLANNEL_IFACE}) echo "waiting for Flannel to pick up config" sleep 5 echo Flannel config is SET_VARIABLES=$(sudo docker -H unix:///var/run/docker-bootstrap.sock exec $flannel_image_id cat /run/flannel/subnet.env) eval $SET_VARIABLES sudo bash -c "echo [Service] > /etc/systemd/system/docker.service.d/docker.conf" if [ "$PROXY" == "true" ] then sudo bash -c "echo Environment=HTTP_PROXY=http://203.127.104.198:8080/ NO_PROXY=localhost,127.0.0.1,192.168.0.0/16,10.0.0.0/16 FLANNEL_NETWORK=$FLANNE L_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.service.d/docker.conf" else sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s ervice.d/docker.conf" fi echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU ## Delete docker networking sudo /sbin/ifconfig docker0 down sudo brctl delbr docker0 ## Start docker service sudo systemctl daemon-reload sudo systemctl start docker sudo systemctl status docker -l ## Start kubernetes master sudo docker run \ --volume=/:/rootfs:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:rw \ --volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \ --volume=/var/run:/var/run:rw \ --net=host \ --privileged=true \ --pid=host \ -d \ gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \ sudo bash -c "echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU >>/etc/systemd/system/docker.s ervice.d/docker.conf" fi echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU ## Delete docker networking sudo /sbin/ifconfig docker0 down sudo brctl delbr docker0 ## Start docker service sudo systemctl daemon-reload sudo systemctl start docker sudo systemctl status docker -l ## Start kubernetes master sudo docker run \ --volume=/:/rootfs:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:rw \ --volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \ --volume=/var/run:/var/run:rw \ --net=host \ --privileged=true \ --pid=host \ -d \ gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \ /hyperkube kubelet \ --allow-privileged=true \ --api-servers=http://localhost:8080 \ --v=2 \ --address=0.0.0.0 \ --enable-server \ --hostname-override=127.0.0.1 \ --config=/etc/kubernetes/manifests-multi \ --containerized \ --cluster-dns=10.0.0.10 \ --cluster-domain=cluster.local ## Sleep 10 echo get all pods sleep 10 kubectl create -f dashboard-service.yaml --namespace=kube-system kubectl get pod --all-namespaces
Note: All the source code is available at https://github.com/santanu-dey/kubernetes-cluster
Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes. Checkout the git hub repo. Once the master VM is ready, it can be cloned to create the worker VMs.
Start up the services
Once the services are started up then the spark services can be started up like below:
Master Node
# ./start-k8s.sh # kubectl get node NAME STATUS AGE 127.0.0.1 Ready 1m 
Similarly when the worker nodes are up they would show up on the list of node
# kubectl get node NAME STATUS AGE 127.0.0.1 Ready 1h kubernetes2 Ready 31m kubernetes3 Ready 19m
And also the kubernetes cluster would show up as below
# kubectl get svc --all-namespaces -o yaml NAMESPACE NAME READY STATUS RESTARTS AGE kube-system k8s-master-127.0.0.1 4/4 Running 1 4m kube-system k8s-proxy-127.0.0.1 1/1 Running 0 4m kube-system kube-addon-manager-127.0.0.1 2/2 Running 0 4m kube-system kube-dns-v18-7tvnm 3/3 Running 0 4m kube-system kubernetes-dashboard-v1.1.0-q30lc 1/1 Running 0 4m
And then the kubernetes cluster is ready for running any container workload. I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster
Putting it all together :
23 Saturday Jul 2016
Considerations
|
Kubernetes
|
Docker Swarm
|
Adoption and Maturity
|
Kubernetes is much ahead with adoption from major companies like RedHat for OpenShit, Rackspace for Solum.
Google cloud platform and AWS also has seen Kubernetes deployments. It is a standard offering.
The product is also quite active in git hub and has been updating frequently.
|
Docker swarm is relatively new.
Also the code frequency is not as massive as kubernetes.
|
Deployment Environment
|
Kubernetes readily installs on virtually everything starting from bare Linux OS to Docker or Vagrant or Cloud or Mesos.
|
Docker swarm manager can run on linux.
Installation on anything else will have to be done following the installation steps.
|
Features
|
Kubernetes is feature reach, for now:
|
All of these can be achieved in docker swarm as well. However, as of now these are not straight out of the box features in Docker Swarm.
|
docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
13 Sunday Mar 2016
Posted How-to
in12 Friday Feb 2016
Posted How-to
inThis is why I think this place is crowded !!
09 Tuesday Feb 2016
Posted How-to
inTags
analytics, aws, javascript, mobile, sdk
Here is a short video on using AWS Mobile Analytics.
Associated source code can be found at https://github.com/santanu-dey/aws-mobile-analytics.git
05 Monday Oct 2015
Posted How-to
inI tried to use AWS CloudFormation to automate AWS deployment. This is similar to Docker / Kubernetes combination, functionally, to launch and maintain a host of computing resources. However the things is that AWS is very slow to launch and terminate these resources. But the concept works very well. Check out below:
Note: If the video does not work, you can view it directly at youtube https://www.youtube.com/watch?v=Cs_0r04ajb8
The slides are here: