When dealing with Predictive Maintenance of machines and systems, some capabilities are essential in the underlying data platform to effectively carry out maintenance Just-In-Time before the actual failure happens.
Near real-time data ingestion of millions data points per second
Ability to apply ML models for predicting machine failure on the ingested data in real-time
Efficiently handling of Time-Series data for historical and real-time
API and Microservices based Integration to downstream and upstream systems – so as to allow event driven nature of the use case
Alerts and Visibility integrated with the downstream systems for end-to-end Automation
Let us look at the most key capabilities more closely.
Why Real-Time?
Imminent failure signals can be detected within a few minutes of appearing, if real-time data is made available to the predictive analytics system. Not taking advantage of available signals as soon as they occur results in inability to take corrective actions within a reasonable time and leads to operational outages.
Also, real-time situational awareness of the overall operational aspects can only be derived from real-time ingestion of data. Real-time visualisation can be only possible with real-time data points available in the system.
Real-time ingestion and processing of telemetry data from sensor might technically become very challenging soon
Combining and correlating multiple event streams in real-time
Combining fresh real-time data with large voluminous amount of historical data for trend
Combining fresh real-time data with with static reference for additional context
Now add time-series to this mix.
Why Time Series Database?
Usually time series data has the following characteristics:
The data may have records containing 100s or even 1000s of attributes
Data records are generated in time order, where time-intervals can be either uniform or irregular
The data is generally immutable, e.g. sensor data points once recorded at a time remains unaltered. New data points are generated for each new time interval.
The raw data grows quickly over-time in linear fashion – however, the insights needed from the data are based on various time aggregation functions, such as:
Min/Max/Averages/Moving Averages/Standard Deviations etc. over various time windows
General purpose NoSQL databases such as HBase and traditional RDBMS databases such as MySQL are not well equipped to handle time-series data mainly due to the following reasons:
High IOPS: Time-series data requires a very high write-speed (IOPS). The usual transactional databases are overwhelmed by millions of records per second. Because those are concerned with consistency,
Rolling Time Window: Time series prediction algorithms operate on rolling windows, where a window of consecutive observations is used to predict the future samples. This fixed length window moves from the beginning of the data to the end of it. Traditional Databases do not support retrieval of data by Rolling Time Windows. Even in the case of batch operations, when the rolling window straddles two files, data from both are required, that poses challenges in processing the data in distributed and hence in timely manner.
Data Compression: Time-series data grows quickly and linearly and disk space concerns limit the:
Granularity of data that can be stored for historical analysis and ML training
Amount of historical data that can be stored and made available for ML training
The Data Platform for Predictive Maintenance use cases should be equipped with a time-series database that supports compression algorithms built-in, more data can be efficiently made available for computation workloads.
Why Serverless?
There are various reasons why serverless is important in this use case.
Decoupling ML models from any Proprietary Platforms and Environment Dependencies; By following the Microservice Principles, ML models should be exposed as a RESTFul API. It allows the ML models to decouple itself from the underlying platform – so that it can be ported easily or even these models can be remotely utilised from other apps.
Function as unit for ML models; An ML model has two distinct parts in its life-cycles. First, In which the models is trained, tested and developed. Secondly, the model is deployed to evaluate fresh data points. This evaluation phase of the lifecycle of the ML models are suitable for deploying as functions.
In serverless architecture, functions act as the unit of functionality and scale. This is a scalable architecture to deploy ML models as functions. This architecture is applicable during the evaluation phase of the lifecycle of ML models, as stated above. Each instance of a Model can be thought as an independent function that can be versioned, deployed, invoked, updated or even deleted at any time without compromising the rest of the system.
Event-Driven; Serverless functions are triggered by events. In scenarios such as Sensor Data Analytics from Machinery – the events occur real-time – the ML models should be triggered as the events occur for the best possible results. The ML models should be housed in the serverless container and usually serverless functions can be triggered by REST API, MQTT, File-drop, schedule-based and so on.
Auto-scalability; No run-time management and administration is required for the ML model functions that are deployed as containers. Everything is taken care of by the underlying container management platform, such as Kubernetes. For example, Kubernetes manages availability, automatic-scalability, monitoring, logging and security aspects of the containers. In this way the ML functions can be scaled and managed easily.
Support for any language / polyglot architecture Most common framework or language capable of binding web services, various language APIs or Spark data provider interfaces usually are supported within serverless functions. Go, Python, Java, NodeJS, .NET, and shell scripts are the most common. So, AI and ML frameworks that uses Python packages, R/CRAN and TensorFlow etc. are all possible to be deployed within a serverless environment based on the choice of the developer.
Often businesses on their Analytics journey need to decide on the technologies, timeframe, scale, budget, team structure etc. to be successful. In order to take a holistic approach it is critical to discover the current situation at first. To take stock of the organization’s analytics requirements, capabilities, priorities and so on, some essential questions need to be discussed in a structured manner by the relevant Business units, stakeholders and even may include external consultants.
In my experience, the best way to ensure that all relevant points are covered, a standard “Analytics Platform Assessment Questionnaire” is a good tool to that can get you started. It covers questions from strategy point of view, project level details and data perspectives as well.
Please share your email by submitting the contact form below, (I will not sell your emails or spam you, this is just for my own download tracking purposes)
I have used the following version of software and installation steps on all node installation
The environment variables also reflect the software version
export MASTER_IP=192.168.56.121 # is needed by all nodes
export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&tab=tags
export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes
export FLANNEL_IPMASQ=true
Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes. Checkout the git hub repo. Once the master VM is ready, it can be cloned to create the worker VMs.
Start up the services
Once the services are started up then the spark services can be started up like below:
Master Node
# ./start-k8s.sh
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1m 
Similarly when the worker nodes are up they would show up on the list of node
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1h
kubernetes2 Ready 31m
kubernetes3 Ready 19m
And also the kubernetes cluster would show up as below
And then the kubernetes cluster is ready for running any container workload. I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster
Here is my analysis on Docker Swarm and Kubernetes
Why use Containers?
Containers are isolated process groups sharing a single OS, while VMS are different OS running on the same hardware.
Containers have the following characteristics:
Isolated processes
User isolation
Application lib / binary isolation
Network Isolation
Memory limitations can be defined
Disk IO by shared volume with host
What is Container Orchestration?
When running applications within container there are various operation aspects to manage, such as
Lifecycle of the containers from creation to destruction
Compute & storage resources underneath the container OS
Networking between containers
Maintenance like scale-up, scale-down , Monitoring, logging etc.
Kubernetes & Docker Swarm
Recently I was looking at Docker Swarm ( released from Docker last month ) & got compelled to compare it with Kubernetes. It is surprising in many ways, that Kubernetes was not a product from Docker. But now that Docker have released Docker Swarm – it is obviously overlapping with Cloud Foundry Diego or Kubernetes type of container orchestration engines.
Considerations
Kubernetes
Docker Swarm
Adoption and Maturity
Kubernetes is much ahead with adoption from major companies like RedHat for OpenShit, Rackspace for Solum.
Google cloud platform and AWS also has seen Kubernetes deployments. It is a standard offering.
The product is also quite active in git hub and has been updating frequently.
Docker swarm is relatively new.
Also the code frequency is not as massive as kubernetes.
Kubernetes readily installs on virtually everything starting from bare Linux OS to Docker or Vagrant or Cloud or Mesos.
Docker swarm manager can run on linux.
Installation on anything else will have to be done following the installation steps.
Features
Kubernetes is feature reach, for now:
A visual dashboard UI
Ability to autoscale
having its own management of persistent volumes
All of these can be achieved in docker swarm as well. However, as of now these are not straight out of the box features in Docker Swarm.
When Kubernetes is Better
Persistent Volumes feature in Kubernetes allows having the compute nodes just for running the containers and allocate persistent volumes to the containers from a separate pull of persistent volumes. This is a more scalable, manageable and cleaner architecture.
Load balancing, Auto-scaling features are now declaratively available in Kubernets without the user having to write any additional script on top.
Ability to deploy readily on Google cloud and AWS cloud is great for folks who are already using those platforms.
Where Docker Swarm Aces
Docker swarm commands are easier to learn if you are familiar with Docker.
It is native to docker – hence the architecture is simplified. For example, resource node just has the same docker daemon listening remotely on TCP on swarm mode. In case of Kubernetes there is additional processes, i.e., kubelet that needs to run in each node in addition to docker process.
Networking is also docker native. While in kubernetes it creates another layer of networking around the nodes.
What is next?
Kubernetes or Swarm, whichever one produces more and more readily available templates for launching common deployment units will see adoption
Docker should use its docker-hub registry to create similar registry of docker swarm templates or even kubernetes templates and provide an easy way to launch orchestrated deployment units in AWS, Azure, Docker cloud or Google cloud. Something like an orchestration layer independent of the cloud provider.
Sophisticated analytics, monitoring, alerting and anomaly detection capability in a dashboard would be needed soon.