Data Management Capabilities Needed for Real-time Predictive Maintenance Use Cases

01 Saturday Dec 2018

Tags

AI, Big Data, container, kubernetes, ML, Serverless

Capabilities Summary

When dealing with Predictive Maintenance of machines and systems, some capabilities are essential in the underlying data platform to effectively carry out maintenance Just-In-Time before the actual failure happens.

Near real-time data ingestion of millions data points per second
Ability to apply ML models for predicting machine failure on the ingested data in real-time
Efficiently handling of Time-Series data for historical and real-time
API and Microservices based Integration to downstream and upstream systems – so as to allow event driven nature of the use case
Alerts and Visibility integrated with the downstream systems for end-to-end Automation

Let us look at the most key capabilities more closely.

Why Real-Time?

Imminent failure signals can be detected within a few minutes of appearing, if real-time data is made available to the predictive analytics system. Not taking advantage of available signals as soon as they occur results in inability to take corrective actions within a reasonable time and leads to operational outages.

Also, real-time situational awareness of the overall operational aspects can only be derived from real-time ingestion of data. Real-time visualisation can be only possible with real-time data points available in the system.

Real-time ingestion and processing of telemetry data from sensor might technically become very challenging soon

Combining and correlating multiple event streams in real-time
Combining fresh real-time data with large voluminous amount of historical data for trend
Combining fresh real-time data with with static reference for additional context

Now add time-series to this mix.

Why Time Series Database?

Usually time series data has the following characteristics:

The data may have records containing 100s or even 1000s of attributes
Data records are generated in time order, where time-intervals can be either uniform or irregular
The data is generally immutable, e.g. sensor data points once recorded at a time remains unaltered. New data points are generated for each new time interval.
The raw data grows quickly over-time in linear fashion – however, the insights needed from the data are based on various time aggregation functions, such as:
- Min/Max/Averages/Moving Averages/Standard Deviations etc. over various time windows

General purpose NoSQL databases such as HBase and traditional RDBMS databases such as MySQL are not well equipped to handle time-series data mainly due to the following reasons:

High IOPS: Time-series data requires a very high write-speed (IOPS). The usual transactional databases are overwhelmed by millions of records per second. Because those are concerned with consistency,
Rolling Time Window: Time series prediction algorithms operate on rolling windows, where a window of consecutive observations is used to predict the future samples. This fixed length window moves from the beginning of the data to the end of it. Traditional Databases do not support retrieval of data by Rolling Time Windows. Even in the case of batch operations, when the rolling window straddles two files, data from both are required, that poses challenges in processing the data in distributed and hence in timely manner.
Data Compression: Time-series data grows quickly and linearly and disk space concerns limit the:
- Granularity of data that can be stored for historical analysis and ML training
- Amount of historical data that can be stored and made available for ML training

The Data Platform for Predictive Maintenance use cases should be equipped with a time-series database that supports compression algorithms built-in, more data can be efficiently made available for computation workloads.

Why Serverless?

There are various reasons why serverless is important in this use case.

Decoupling ML models from any Proprietary Platforms and Environment Dependencies; By following the Microservice Principles, ML models should be exposed as a RESTFul API. It allows the ML models to decouple itself from the underlying platform – so that it can be ported easily or even these models can be remotely utilised from other apps.

Function as unit for ML models; An ML model has two distinct parts in its life-cycles. First, In which the models is trained, tested and developed. Secondly, the model is deployed to evaluate fresh data points. This evaluation phase of the lifecycle of the ML models are suitable for deploying as functions.
In serverless architecture, functions act as the unit of functionality and scale. This is a scalable architecture to deploy ML models as functions. This architecture is applicable during the evaluation phase of the lifecycle of ML models, as stated above. Each instance of a Model can be thought as an independent function that can be versioned, deployed, invoked, updated or even deleted at any time without compromising the rest of the system.

Event-Driven; Serverless functions are triggered by events. In scenarios such as Sensor Data Analytics from Machinery – the events occur real-time – the ML models should be triggered as the events occur for the best possible results. The ML models should be housed in the serverless container and usually serverless functions can be triggered by REST API, MQTT, File-drop, schedule-based and so on.

Auto-scalability; No run-time management and administration is required for the ML model functions that are deployed as containers. Everything is taken care of by the underlying container management platform, such as Kubernetes. For example, Kubernetes manages availability, automatic-scalability, monitoring, logging and security aspects of the containers. In this way the ML functions can be scaled and managed easily.

Support for any language / polyglot architecture
Most common framework or language capable of binding web services, various language APIs or Spark data provider interfaces usually are supported within serverless functions. Go, Python, Java, NodeJS, .NET, and shell scripts are the most common. So, AI and ML frameworks that uses Python packages, R/CRAN and TensorFlow etc. are all possible to be deployed within a serverless environment based on the choice of the developer.

Analytics Platform Assessment Questionnaire Download

17 Sunday Sep 2017

Posted by santanu77 in How-to

≈ Leave a comment

Tags

analytics, Big Data

Often businesses on their Analytics journey need to decide on the technologies, timeframe, scale, budget, team structure etc. to be successful. In order to take a holistic approach it is critical to discover the current situation at first. To take stock of the organization’s analytics requirements, capabilities, priorities and so on, some essential questions need to be discussed in a structured manner by the relevant Business units, stakeholders and even may include external consultants.

In my experience, the best way to ensure that all relevant points are covered, a standard “Analytics Platform Assessment Questionnaire” is a good tool to that can get you started. It covers questions from strategy point of view, project level details and data perspectives as well.

Here is the download link: Analytics Platform Assessment Questionnaire.

Please share your email by submitting the contact form below, (I will not sell your emails or spam you, this is just for my own download tracking purposes)

Spark Cluster using Multi-Node Kubernetes and Docker

03 Saturday Sep 2016

Posted by santanu77 in How-to

≈ 4 Comments

Tags

Big Data, cluster, container, docker, How-to, kubernetes

In this post I am going to share my experience with

setting up a kubernetes multinode cluster on docker
then running a spark cluster on kubernetes

Installation

My Installation was 3 node:

I used virtual box and CentOS 7 to create the master node first & then cloned to create the worker nodes.

Kubernetes Master & 1st worker node – 192.168.56.121
Kubernetes 2nd worker node – 192.168.56.122
Kubernetes 3rd worker node – 192.168.56.123

I have used the following version of software and installation steps on all node installation

The environment variables also reflect the software version

export MASTER_IP=192.168.56.121 # is needed by all nodes
export K8S_VERSION=v1.4.0-alpha.1# get the latest from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export ETCD_VERSION=2.2.5 # get the latest from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export FLANNEL_VERSION=0.5.5 # get the latest from https://quay.io/repository/coreos/flannel?tag=latest&amp;amp;amp;amp;amp;amp;tab=tags
export FLANNEL_IFACE=enp0s8 # name of the interface that would connect the nodes
export FLANNEL_IPMASQ=true

Installation steps to follow are here in Kubernetes official site http://kubernetes.io/docs/getting-started-guides/docker-multinode/
For the time being, allow all connections through Iptables and delete all rules. Keep the service running
Install Docker – I used version 1.11.2
Optionally set up host names in /etc/hosts file – if you are connecting via hostnames. And also set the hostname of the VM

You will note that, in the docker based version of Kubernetes there are two docker instances are running on all nodes.

Docker 1st instance, let’s call it the bootstrap docker instance -this one is required for the Kubernetes components itself
Docker 2nd instance, this is the regular installation of Docker required for running the managed containers on the VM nodes.

An easy way to create a script to start the bootstrap version is

case $1 in
 start)
 sudo sh -c 'docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --iptables=false --ip-masq=false --mtu=1500 --bridge=none --exec-root=/var/run/docker-bootstrap --graph=/var/lib/docker-bootstrap 2&amp;amp;amp;amp;amp;gt; /var/log/docker-bootstrap.log 1&amp;amp;amp;amp;amp;gt; /dev/null &amp;amp;amp;amp;amp;amp;'
esac

Note: In my case I had to add the MTU option to get past this issue https://github.com/docker/docker/issues/15498 . Otherwise MTU should be optional.

Then I also wrote a script to start and stop all services. Below is the one for kubernetes master node

## Checking commandline arguments
while test $# -gt 0; do
case &amp;amp;amp;quot;$1&amp;amp;amp;quot; in
-h|--help)
echo Usage:
echo &amp;amp;amp;quot;start-k8s.sh [proxy true]&amp;amp;amp;quot;
exit 0
;;
-p|--proxy)
shift
if test $# -gt 0; then
export PROXY=$1
else
echo &amp;amp;amp;quot;invalid argument for proxy&amp;amp;amp;quot;
exit 1
fi
shift
;;
*)
break
;;
esac
done

#stop any running instance
/opt/docker-bootstrap/stop-k8s.sh

## Setting up env var
#############################################
export MASTER_IP=192.168.56.121

# get from https://storage.googleapis.com/kubernetes-release/release/latest.txt or /stable.txt
export K8S_VERSION=v1.4.0-alpha.1

# get from https://gcr.io/v2/google_containers/etcd-amd64/tags/list
export ETCD_VERSION=2.2.5

# get from https://quay.io/repository/coreos/flannel?tag=latest&amp;amp;amp;amp;tab=tags
export FLANNEL_VERSION=0.5.5

# the interface that would connect all hosts
export FLANNEL_IFACE=enp0s8
export FLANNEL_IPMASQ=true

## starting docker boot-strap
/opt/docker-bootstrap/docker-boostrap start

echo &amp;amp;amp;quot;waiting for docker-bootstrap to start&amp;amp;amp;quot;
sleep 5

## starting up docker
#sudo systemctl start docker

## start etcd
sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
/usr/local/bin/etcd \
--listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \
--advertise-client-urls=http://${MASTER_IP}:4001 \
--data-dir=/var/etcd/data
echo &amp;amp;amp;quot;waiting for etc-d to start&amp;amp;amp;quot;
sleep 25

## Save a network config
sudo docker -H unix:///var/run/docker-bootstrap.sock run \
--net=host \
gcr.io/google_containers/etcd-amd64:${ETCD_VERSION} \
etcdctl set /coreos.com/network/config '{ &amp;amp;amp;quot;Network&amp;amp;amp;quot;: &amp;amp;amp;quot;10.1.0.0/16&amp;amp;amp;quot; }'

echo &amp;amp;amp;quot;waiting for network config to save&amp;amp;amp;quot;
sleep 5

## Run Flannel
flannel_image_id=$(sudo docker -H unix:///var/run/docker-bootstrap.sock run -d \
--net=host \
--privileged \
-v /dev/net:/dev/net \
quay.io/coreos/flannel:${FLANNEL_VERSION} \
/opt/bin/flanneld \
--ip-masq=${FLANNEL_IPMASQ} \
--etcd-endpoints=http://${MASTER_IP}:4001 \
--iface=${FLANNEL_IFACE})

echo &amp;amp;amp;quot;waiting for Flannel to pick up config&amp;amp;amp;quot;
sleep 5

echo Flannel config is
SET_VARIABLES=$(sudo docker -H unix:///var/run/docker-bootstrap.sock exec $flannel_image_id cat /run/flannel/subnet.env)
eval $SET_VARIABLES
sudo bash -c &amp;amp;amp;quot;echo [Service] &amp;amp;amp;gt; /etc/systemd/system/docker.service.d/docker.conf&amp;amp;amp;quot;

if [ &amp;amp;amp;quot;$PROXY&amp;amp;amp;quot; == &amp;amp;amp;quot;true&amp;amp;amp;quot; ]
then
sudo bash -c &amp;amp;amp;quot;echo Environment=HTTP_PROXY=http://203.127.104.198:8080/ NO_PROXY=localhost,127.0.0.1,192.168.0.0/16,10.0.0.0/16 FLANNEL_NETWORK=$FLANNE
L_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.service.d/docker.conf&amp;amp;amp;quot;

else
sudo bash -c &amp;amp;amp;quot;echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.s
ervice.d/docker.conf&amp;amp;amp;quot;

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
sudo bash -c &amp;amp;amp;quot;echo Environment=FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU &amp;amp;amp;gt;&amp;amp;amp;gt;/etc/systemd/system/docker.s
ervice.d/docker.conf&amp;amp;amp;quot;

fi

echo FLANNEL_NETWORK=$FLANNEL_NETWORK FLANNEL_SUBNET=$FLANNEL_SUBNET FLANNEL_MTU=$FLANNEL_MTU

## Delete docker networking
sudo /sbin/ifconfig docker0 down
sudo brctl delbr docker0

## Start docker service
sudo systemctl daemon-reload
sudo systemctl start docker
sudo systemctl status docker -l

## Start kubernetes master
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet:/var/lib/kubelet:rw,rslave \
--volume=/var/run:/var/run:rw \
--net=host \
--privileged=true \
--pid=host \
-d \
gcr.io/google_containers/hyperkube-amd64:${K8S_VERSION} \
/hyperkube kubelet \
--allow-privileged=true \
--api-servers=http://localhost:8080 \
--v=2 \
--address=0.0.0.0 \
--enable-server \
--hostname-override=127.0.0.1 \
--config=/etc/kubernetes/manifests-multi \
--containerized \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local

## Sleep 10
echo get all pods
sleep 10
kubectl create -f dashboard-service.yaml --namespace=kube-system
kubectl get pod --all-namespaces

Note: All the source code is available at https://github.com/santanu-dey/kubernetes-cluster

Similar scripts are available for starting and stopping the kubernetes and related services on the worker nodes. Checkout the git hub repo. Once the master VM is ready, it can be cloned to create the worker VMs.

Start up the services

Once the services are started up then the spark services can be started up like below:

Master Node

# ./start-k8s.sh
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1m&amp;amp;nbsp;

Similarly when the worker nodes are up they would show up on the list of node

 
# kubectl get node
NAME STATUS AGE
127.0.0.1 Ready 1h
kubernetes2 Ready 31m
kubernetes3 Ready 19m

And also the kubernetes cluster would show up as below

# kubectl get svc --all-namespaces -o yaml
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system k8s-master-127.0.0.1 4/4 Running 1 4m
kube-system k8s-proxy-127.0.0.1 1/1 Running 0 4m
kube-system kube-addon-manager-127.0.0.1 2/2 Running 0 4m
kube-system kube-dns-v18-7tvnm 3/3 Running 0 4m
kube-system kubernetes-dashboard-v1.1.0-q30lc 1/1 Running 0 4m

And then the kubernetes cluster is ready for running any container workload. I am using the Spark for this example. The script and yaml files to start the spark cluster are also available in the same github repo https://github.com/santanu-dey/kubernetes-cluster

Putting it all together :

Setting up a local Development Environment for Playing with Big Data: Part 2 : Play with Hadoop on Docker

11 Friday Sep 2015

Posted by santanu77 in How-to

≈ 1 Comment

Tags

Big Data, docker, hadoop, how to

I think docker is simplifying the big data dev ops concerns by a factor of 10x or more.

It is easy enough for me to just run a single command and bring to life any specific distribution of Hadoop in docker containers.

To get a flavor of it, thought of writing this blog entry. In the part 1 of this blog I had set up linux container based environment. In this entry, I am posting docker based environment set up.

Step 1: Install Docker

Step 2: Install Kubernetes with Kubectl

In my case I do not want to mess with my laptop so I use a VM centos6.6 on my macbook pro. That way it is one extra step to start-up the VM, but it keeps my host laptop free of installations and configurations.

Once both step #1 and step #2 are working for you,

Here is how you will launch a hadoop instance.

Step 3: Create a PoD Definition for Kubernetes. Pick any available Hadoop image from Docker hub.

[dockeruser@centos6 docker-for-hadoop]$ vi hbase-single-node-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: hbase-single-node-pod
  labels:
    name: hbase-single-node-pod
spec:
  containers:
  - name: hbase
    image: 'santanu77/hadoop-docker'
    ports:
    - containerPort: 60000
      hostPort: 60000
    - containerPort: 60010
      hostPort: 60010
    - containerPort: 8088
      hostPort: 8088

Step 4: Just launch the instance

[dockeruser@centos6 docker-for-hadoop]$ kubectl create -f hbase-single-node-pod.yaml
pods/hbase-single-node-pod

Now let us check the status of the instance. Once this is working we can log into the instance or view a service etc.

[dockeruser@centos6 docker-for-hadoop]$ kubectl describe pod hbase-single-node
Name:                hbase-single-node-pod
Namespace:            default
Image(s):            santanu77/hadoop-docker
Node:                127.0.0.1/127.0.0.1
Labels:                name=hbase-single-node-pod
Status:                Running
Reason:
Message:
IP:                172.17.0.1
Replication Controllers:    <none>
Containers:
  hbase:
    Image:        santanu77/hadoop-docker
    State:        Running
      Started:        Thu, 10 Sep 2015 23:55:16 -0400
    Ready:        True
    Restart Count:    0
Conditions:
  Type        Status
  Ready     True
No events.

I can hit the hadoop cluster manager service from my host as well given that it the port 8088 was mapped to the hosts port. So I can access it using my VM’s static IP and port 8088.

Also I can directly SSH into my hadoop instance as any other instance and start running a job.

IoT Simplified

15 Wednesday Apr 2015

Posted by santanu77 in Viewpoint

≈ Leave a comment

Tags

API, Big Data, Device, IoT, Sensor

Here is an interesting high-level graphical view of what IoT consists of.

Setting up a local Development Environment for Playing with Big Data: Part 1

23 Monday Jun 2014

Posted by santanu77 in How-to

≈ 2 Comments

Tags

Big Data, LXC, VirtualBox, Virtualization

Creating a VirtualBox Environment

I wanted to have a local environment with more than one virtual nodes so that I use that environment to simulate a cluster of servers distributed systems need. For example if I need a hadoop cluster for development work it would be so neat to have everything within my MacBookPro itself. I decided to use LXC because it would be lightweight to run so many nodes on a single physical system. But I did not want to install the LXC software directly my MacBookPro. I decided to use a VirtualBox guest on my MacOS host. This guest can be used to host the Linux containers. CentOS was my preferred OS for the guest. But I soon realized installing LXC on CentOS 6.3 is a bit challenging. I had read something about CentOS natively supporting LXC. Probably that was another future version. The version I had of CentOS also had Security Enabled Linux (SELinux) that makes it even more cumbersome to run LXC on CentOS. So I got a VirtualBox image of ubuntu instead from the following link http://virtualboxes.org/images/ubuntu-server/

Ubuntu Linux Server Edition 14.04 x86
Size (compressed/uncompressed): 430 MB/1.4 GB
MD5SUM of ova image: 7afed719e42e59f870509b6ffe53c442
Link: https://s3-eu-west-1.amazonaws.com/virtualboxes.org/ubuntu-14.04-server-i386.ova.torrent
Active user account(s)(username/password): ubuntu/reverse
Notes: US keyboard, Guest Additions NOT installed. Additional packages: OpenSSH server.

Configured a NAT and a host only network on it. Gave it about 3 GB or RAM considering that my MacBook only has about 8GB.

Then start it up!

 
ubuntu@ubuntu-i386:~$ uname -a Linux ubuntu-i386 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:08:14 UTC 2014 i686 i686 i686 GNU/Linux

NAT helps accessing internet from the guest box. Host only network on the guest allows it to be accessed by a static IP from the host. I could now ssh into the box from my iTerm running on the host Mac OS.

Installing LXC on Ubuntu

Used the following simple steps to install LXC on ubuntu. It was very straightforward. Went through without any issues

sudo apt-get install lxc

Note that after the installation one more bridge network interface is added to the ubuntu OS

ubuntu@ubuntu-i386:~$ ifconfig

eth0 Link encap:Ethernet HWaddr 08:00:27:60:64:8b
 inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
 inet6 addr: fe80::a00:27ff:fe60:648b/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:43555 errors:0 dropped:0 overruns:0 frame:0
 TX packets:19870 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:62172162 (62.1 MB) TX bytes:1239397 (1.2 MB)

eth1 Link encap:Ethernet HWaddr 08:00:27:ea:17:80
 inet addr:192.168.100.102 Bcast:192.168.100.255 Mask:255.255.255.0
 inet6 addr: fe80::a00:27ff:feea:1780/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:3867 errors:0 dropped:0 overruns:0 frame:0
 TX packets:3112 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:389120 (389.1 KB) TX bytes:900839 (900.8 KB)

lo Link encap:Local Loopback
 inet addr:127.0.0.1 Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING MTU:65536 Metric:1
 RX packets:20 errors:0 dropped:0 overruns:0 frame:0
 TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:1560 (1.5 KB) TX bytes:1560 (1.5 KB)

lxcbr0 Link encap:Ethernet HWaddr 82:b0:dc:e5:4e:a5
 inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
 inet6 addr: fe80::80b0:dcff:fee5:4ea5/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:0 (0.0 B) TX bytes:648 (648.0 B)

Testing a Container

I needed to check that the installation worked by creating a container and accessing it

ubuntu@ubuntu-i386:~$ sudo su
root@ubuntu-i386:/home/ubuntu# lxc-create -n test1
root@ubuntu-i386:/home/ubuntu# lxc-start -d -n test1 -o output.log
root@ubuntu-i386:/home/ubuntu# lxc-ls --fancy

NAME STATE IPV4 IPV6 AUTOSTART
------------------------------------------
test1 RUNNING 10.0.3.92 - NO

In order to login to the container use:


root@ubuntu-i386:/home/ubuntu# lxc-attach -n test1

Check that network is working:


root@test1:/home/ubuntu# ping google.com
PING google.com (74.125.236.162) 56(84) bytes of data.
64 bytes from maa03s16-in-f2.1e100.net (74.125.236.162): icmp_seq=1 ttl=61 time=30.2 ms
64 bytes from maa03s16-in-f2.1e100.net (74.125.236.162): icmp_seq=2 ttl=61 time=22.8 ms
64 bytes from maa03s16-in-f2.1e100.net (74.125.236.162): icmp_seq=3 ttl=61 time=45.3 ms

In order to stop and destroy the container :

root@ubuntu-i386:/home/ubuntu# lxc-stop -n test1
root@ubuntu-i386:/home/ubuntu# lxc-destroy -n test1

Now that the containers are available within the VM these can be used to install hadoop or any other software.

In Part 2 of this blog write-up I will explore using Docker instead of linux containers. Advantage of docker is that various pre-installed containers are easy to pull and run in the containers, without having to start the installation from the scratch.

Big is Small

~ APIs, ML Engineering at scale, and Cloud is making it all small, connected and intelligent.

Tag Archives: Big Data

Data Management Capabilities Needed for Real-time Predictive Maintenance Use Cases

Capabilities Summary

Why Real-Time?

Why Time Series Database?

Why Serverless?

Analytics Platform Assessment Questionnaire Download

Spark Cluster using Multi-Node Kubernetes and Docker

Setting up a local Development Environment for Playing with Big Data: Part 2 : Play with Hadoop on Docker

IoT Simplified

Setting up a local Development Environment for Playing with Big Data: Part 1

Creating a VirtualBox Environment

Installing LXC on Ubuntu

Testing a Container