Data Management Capabilities Needed for Real-time Predictive Maintenance Use Cases

01 Saturday Dec 2018

Tags

AI, Big Data, container, kubernetes, ML, Serverless

Capabilities Summary

When dealing with Predictive Maintenance of machines and systems, some capabilities are essential in the underlying data platform to effectively carry out maintenance Just-In-Time before the actual failure happens.

Near real-time data ingestion of millions data points per second
Ability to apply ML models for predicting machine failure on the ingested data in real-time
Efficiently handling of Time-Series data for historical and real-time
API and Microservices based Integration to downstream and upstream systems – so as to allow event driven nature of the use case
Alerts and Visibility integrated with the downstream systems for end-to-end Automation

Let us look at the most key capabilities more closely.

Why Real-Time?

Imminent failure signals can be detected within a few minutes of appearing, if real-time data is made available to the predictive analytics system. Not taking advantage of available signals as soon as they occur results in inability to take corrective actions within a reasonable time and leads to operational outages.

Also, real-time situational awareness of the overall operational aspects can only be derived from real-time ingestion of data. Real-time visualisation can be only possible with real-time data points available in the system.

Real-time ingestion and processing of telemetry data from sensor might technically become very challenging soon

Combining and correlating multiple event streams in real-time
Combining fresh real-time data with large voluminous amount of historical data for trend
Combining fresh real-time data with with static reference for additional context

Now add time-series to this mix.

Why Time Series Database?

Usually time series data has the following characteristics:

The data may have records containing 100s or even 1000s of attributes
Data records are generated in time order, where time-intervals can be either uniform or irregular
The data is generally immutable, e.g. sensor data points once recorded at a time remains unaltered. New data points are generated for each new time interval.
The raw data grows quickly over-time in linear fashion – however, the insights needed from the data are based on various time aggregation functions, such as:
- Min/Max/Averages/Moving Averages/Standard Deviations etc. over various time windows

General purpose NoSQL databases such as HBase and traditional RDBMS databases such as MySQL are not well equipped to handle time-series data mainly due to the following reasons:

High IOPS: Time-series data requires a very high write-speed (IOPS). The usual transactional databases are overwhelmed by millions of records per second. Because those are concerned with consistency,
Rolling Time Window: Time series prediction algorithms operate on rolling windows, where a window of consecutive observations is used to predict the future samples. This fixed length window moves from the beginning of the data to the end of it. Traditional Databases do not support retrieval of data by Rolling Time Windows. Even in the case of batch operations, when the rolling window straddles two files, data from both are required, that poses challenges in processing the data in distributed and hence in timely manner.
Data Compression: Time-series data grows quickly and linearly and disk space concerns limit the:
- Granularity of data that can be stored for historical analysis and ML training
- Amount of historical data that can be stored and made available for ML training

The Data Platform for Predictive Maintenance use cases should be equipped with a time-series database that supports compression algorithms built-in, more data can be efficiently made available for computation workloads.

Why Serverless?

There are various reasons why serverless is important in this use case.

Decoupling ML models from any Proprietary Platforms and Environment Dependencies; By following the Microservice Principles, ML models should be exposed as a RESTFul API. It allows the ML models to decouple itself from the underlying platform – so that it can be ported easily or even these models can be remotely utilised from other apps.

Function as unit for ML models; An ML model has two distinct parts in its life-cycles. First, In which the models is trained, tested and developed. Secondly, the model is deployed to evaluate fresh data points. This evaluation phase of the lifecycle of the ML models are suitable for deploying as functions.
In serverless architecture, functions act as the unit of functionality and scale. This is a scalable architecture to deploy ML models as functions. This architecture is applicable during the evaluation phase of the lifecycle of ML models, as stated above. Each instance of a Model can be thought as an independent function that can be versioned, deployed, invoked, updated or even deleted at any time without compromising the rest of the system.

Event-Driven; Serverless functions are triggered by events. In scenarios such as Sensor Data Analytics from Machinery – the events occur real-time – the ML models should be triggered as the events occur for the best possible results. The ML models should be housed in the serverless container and usually serverless functions can be triggered by REST API, MQTT, File-drop, schedule-based and so on.

Auto-scalability; No run-time management and administration is required for the ML model functions that are deployed as containers. Everything is taken care of by the underlying container management platform, such as Kubernetes. For example, Kubernetes manages availability, automatic-scalability, monitoring, logging and security aspects of the containers. In this way the ML functions can be scaled and managed easily.

Support for any language / polyglot architecture
Most common framework or language capable of binding web services, various language APIs or Spark data provider interfaces usually are supported within serverless functions. Go, Python, Java, NodeJS, .NET, and shell scripts are the most common. So, AI and ML frameworks that uses Python packages, R/CRAN and TensorFlow etc. are all possible to be deployed within a serverless environment based on the choice of the developer.

Spark Cluster using Multi-Node Kubernetes and Docker

03 Saturday Sep 2016

Posted by santanu77 in How-to

≈ 4 Comments

Tags

Big Data, cluster, container, docker, How-to, kubernetes

In this post I am going to share my experience with

setting up a kubernetes multinode cluster on docker
then running a spark cluster on kubernetes

Installation

My Installation was 3 node:

I used virtual box and CentOS 7 to create the master node first & then cloned to create the worker nodes.

Kubernetes Master & 1st worker node – 192.168.56.121
Kubernetes 2nd worker node – 192.168.56.122
Kubernetes 3rd worker node – 192.168.56.123

I have used the following version of software and installation steps on all node installation

The environment variables also reflect the software version

export MASTER_IP=192.168.56.121 # is needed by all nodes
export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&amp;amp;amp;amp;amp;amp;tab=tags
export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes
export FLANNEL_IPMASQ=true

Installation steps to follow are here in Kubernetes official site http://kubernetes.io/docs/getting-started-guides/docker-multinode/
For the time being, allow all connections through Iptables and delete all rules. Keep the service running
Install Docker – I used version 1.11.2
Optionally set up host names in /etc/hosts file – if you are connecting via hostnames. And also set the hostname of the VM

You will note that, in the docker based version of Kubernetes there are two docker instances are running on all nodes.

Docker 1st instance, let’s call it the bootstrap docker instance -this one is required for the Kubernetes components itself
Docker 2nd instance, this is the regular installation of Docker required for running the managed containers on the VM nodes.

An easy way to create a script to start the bootstrap version is

case $1 in
 start)
 sudo sh -c 'docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --iptables=false --ip-masq=false --mtu=1500 --bridge=none --exec-root=/var/run/docker-bootstrap --graph=/var/lib/docker-bootstrap 2&amp;amp;amp;amp;amp;gt; /var/log/docker-bootstrap.log 1&amp;amp;amp;amp;amp;gt; /dev/null &amp;amp;amp;amp;amp;amp;'
esac

Note: In my case I had to add the MTU option to get past this issue https://github.com/docker/docker/issues/15498 . Otherwise MTU should be optional.

Then I also wrote a script to start and stop all services. Below is the one for kubernetes master node

## Checking commandline arguments
while test $# -gt 0; do
case &amp;amp;amp;quot;$1&amp;amp;amp;quot; in
-h|--help)
echo Usage:
echo &amp;amp;amp;quot;start-k8s.sh [proxy true]&amp;amp;amp;quot;
exit 0
;;
-p|--proxy)
shift
if test $# -gt 0; then
export PROXY=$1
else
echo &amp;amp;amp;quot;invalid argument for proxy&amp;amp;amp;quot;
exit 1
fi
shift
;;
*)
break
;;
esac
done

#stop any running instance
/opt/docker-bootstrap/stop-k8s.sh

## Setting up env var
#############################################
export MASTER_IP=192.168.56.121

# get from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export K8S_VERSION=v1.4.0-alpha.1

# get from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export ETCD_VERSION=2.2.5

# get from https://quay.io/repository/coreos/flannel?tag=latest&amp;amp;amp;amp;tab=tags
export FLANNEL_VERSION=0.5.5

# the interface that would connect all hosts
export FLANNEL_IFACE=enp0s8
export FLANNEL_IPMASQ=true

## starting docker boot-strap
/opt/docker-bootstrap/docker-boostrap start

echo &amp;amp;amp;quot;waiting for docker-bootstrap to start&amp;amp;amp;quot;
sleep 5

## starting up docker
#sudo systemctl start docker

## start etcd
sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
/usr/local/bin/etcd \
--listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \
--advertise-client-urls=http://${MASTER_IP}:4001 \
--data-dir=/var/etcd/data
echo &amp;amp;amp;quot;waiting for etc-d to start&amp;amp;amp;quot;
sleep 25

## Save a network config
sudo docker -H unix:///var/run/docker-bootstrap.sock run \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
etcdctl set /coreos.com/network/config '{ &amp;amp;amp;quot;Network&amp;amp;amp;quot;: &amp;amp;amp;quot;10.1.0.0/16&amp;amp;amp;quot; }'

echo &amp;amp;amp;quot;waiting for network config to save&amp;amp;amp;quot;
sleep 5

## Run Flannel
flannel_image_id=$(sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
--privileged \
-v /dev/net:/dev/net \
quay.io/coreos/flannel:${FLANNEL_VERSION} \
/opt/bin/flanneld \
--ip-masq=${FLANNEL_IPMASQ} \
--etcd-endpoints=http://${MASTER_IP}:4001 \
--iface=${FLANNEL_IFACE})

echo &amp;amp;amp;quot;waiting for Flannel to pick up config&amp;amp;amp;quot;
sleep 5

echo Flannel config is
SET_VARIABLES=$(sudo docker -H unix:///var/run/docker-bootstrap.sock exec $flannel_image_id cat /run/flannel/subnet.env)
eval $SET_VARIABLES
sudo bash -c &amp;amp;amp;quot;echo [Service] &amp;amp;amp;gt; /etc/systemd/system/docker.service.d/docker.conf&amp;amp;amp;quot;

if [ &amp;amp;amp;quot;$PROXY&amp;amp;amp;quot; == &amp;amp;amp;quot;true&amp;amp;amp;quot; ]
then
sudo bash -c &amp;amp;amp;quot;echo Environment=HTTP_PROXY=http://203.127.104.198:8080/ NO_PROXY=localhost,127.0.0.1,192.168.0.0/16,10.0.0.0/16 FLANNEL_NETWORK=$FLANNE
L_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.service.d/docker.conf&amp;amp;amp;quot;

else
sudo bash -c &amp;amp;amp;quot;echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.s
ervice.d/docker.conf&amp;amp;amp;quot;

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
sudo bash -c &amp;amp;amp;quot;echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.s
ervice.d/docker.conf&amp;amp;amp;quot;

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
/hyperkube kubelet \
--allow-privileged=true \
--api-servers=http://localhost:8080 \
--v=2 \
--address=0.0.0.0 \
--enable-server \
--hostname-override=127.0.0.1 \
--config=/etc/kubernetes/manifests-multi \
--containerized \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local

## Sleep 10
echo get all pods
sleep 10
kubectl create -f dashboard-service.yaml --namespace=kube-system
kubectl get pod --all-namespaces

Note: All the source code is available at https://github.com/santanu-dey/kubernetes-cluster

Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes. Checkout the git hub repo. Once the master VM is ready, it can be cloned to create the worker VMs.

Start up the services

Once the services are started up then the spark services can be started up like below:

Master Node

# ./start-k8s.sh
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1m&amp;amp;nbsp;

Similarly when the worker nodes are up they would show up on the list of node

 
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1h
kubernetes2 Ready 31m
kubernetes3 Ready 19m

And also the kubernetes cluster would show up as below

# kubectl get svc --all-namespaces -o yaml
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system k8s-master-127.0.0.1 4/4 Running 1 4m
kube-system k8s-proxy-127.0.0.1 1/1 Running 0 4m
kube-system kube-addon-manager-127.0.0.1 2/2 Running 0 4m
kube-system kube-dns-v18-7tvnm 3/3 Running 0 4m
kube-system kubernetes-dashboard-v1.1.0-q30lc 1/1 Running 0 4m

And then the kubernetes cluster is ready for running any container workload. I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster

Putting it all together :

Big is Small

~ APIs, ML Engineering at scale, and Cloud is making it all small, connected and intelligent.

Tag Archives: container