Making deep learning accessible on Openstack
by Samuel Cozannet on 25 April 2016
This week at the Openstack Developers Summit we are excited to showcase how Canonical with IBM, Mesosphere, Skymind and Data Fellas are working together to make the opportunities of deep learning easier for everyone to access.
Deep learning is a completely new way of building applications. These applications, built around neuronet models rapidly become self learning and self evolving. Make no mistake this is, once again, a paradigm shift for our industry. It opens up a new world of very exciting possibilities.
Deep learning allows complex, big software to skip many of the traditional design and development processes by having the software itself evolve. This is important since we quickly encounter significant constraints when building and deploying big software and big data projects. These constraints not only include the complexity of the operations involved but also include the very stark realisation that there are not enough people, with the required skills and domain expertise, to meet industry demand. Unless we find a smarter way of addressing these constraints then we will severely limit the speed at which our industry can liberate the opportunities of big software and deep learning in particular.
At the heart of deep learning is the concept of neural networks that monitor a specific environment, searching for significant events and patterns and learning how best to adapt to these events and patterns. Of course a key part of this process is the period in which the artificial intelligence is in training. Once initiated the model continue to be self-improving as more data is analyzed over time.
Across all industries we see meaningful applications of deep learning emerging. In healthcare a recent challenge was launched to improve the process and outcomes around cardiac diagnosis. In personal concierge services and in retail neural networks are being married to image recognition to drive recommendation engines. In natural language processing deep learning is being used not only to automate to a higher level of interaction with customers but to also understand, through sentiment analysis, when the experience is degrading and when a warm body needs to intervene. There are of course many projects and many stories that are emerging in deep learning. These only scratch the surface of what is possible. This begs the question – “Why are we not seeing an explosion of new, real world experiences constructed around deep learning?”
The answer is that, as well as the constraints that were previously mentioned, there are also additional things to consider for anyone involved in this space. For instance, if you have a small set of data it is easy to set up a small project cheaply in a few days. When you start to tackle big data sets and to operate at scale your ability to do so quickly becomes significantly more challenging and your options become more limited.
Canonical and Ubuntu underpin the world of scale-out architectures and automation around big software projects. We wake up every day thinking about how we can help simplify, codify, automate and unleash the potential of technology such as deep learning. That is why we have been working with partners such as IBM Power Systems, Mesosphere, Skymind and Data Fellas.
- IBM Power Systems accelerates the processing of deep learning applications, using Coherent Accelerator Processor Interface (CAPI) or GPU attach to increase throughput. This performance will improve even further when NVLink becomes available.
- Mesosphere supplies the mechanism to run distributed systems and containers as simple as using a single computer. It makes it easy to deploy and operate complex datacenter services like Spark.
- Skymind is the commercial support arm of the open-source framework DeepLearning4j (the java/scala deep learning framework), bringing the power of deep learning to enterprise on Spark and Hadoop.
- Data Fellas brings an Agile Data Science Toolkit bringing the full power of distributed machine learning in the hands of the new generation of data scientists.
The first thing that we created is a model with Juju, Canonical’s Application Modelling Framework, that automates the process for building a complete, classic, deep learning stack. This includes everything from bare metal to essential interfaces. The model includes:
- A data pipeline to push data into Hadoop HDFS.
- An evolutive data computation stack made of Spark and Hadoop.
- A Computing framework for the scheduling of Spark jobs based on Mesos.
- An interactive notebook to create training pipelines and build Neural Networks.
The system is modelled by Juju is deployed on IBM Power GPU enabled machines for performance and operated by Juju on LXD containers. The Juju model for this looks like this:
We can provide guidance on how you can deploy your own machine/deep learning stack at scale and do your own data analysis. We believe that this early work significantly increases the ability for everyone to get their hands on classic big data infrastructure in just minutes.
There are many use cases for deep learning and it’s hard to pick only one! However, Canonical is engaged heavily in major OpenStack projects in all sectors including telco, retail, media, finance and others. Our initial projects have therefore gravitated towards how we make operations around Openstack more performant.
OpenStack Logs
Canonical runs the OpenStack Interoperability Lab (OIL). With over 35 vendors participating and over 170 configuration combinations Canonical typically builds over 3500 Openstack every month to complete interoperability testing.
This generates over 10GB of logs on OpenStack interoperability and performance every week. We use these log results to train the deep learning model to predict when an OpenStack cloud is going to fail. This produced two outcomes.
First, even at an early stage, this showed an improvement over traditional monitoring systems that only assess the status based on how OpenStack engineers and operators have configured the monitoring of the solution. Intelligent agents were able to trigger alarms based on “feeling” the network, rather than on straight values and probabilities. This is a bit like a spam robot reducing the amount of work of support teams by notifying them of the threat level.
Secondly, over time, as the cloud grows, losing a node becomes less and less manageable. These agents are able to make completely automated decisions such as “migrate all containers off this node” or “restart these services asap”
The beauty of this is that it doesn’t depend on OpenStack itself. The same network will be trainable on any form of applications, creating a new breed of monitoring and metrology systems, combining the power of logs with metrics. Ultimately this makes OpenStack more reliable and performant.
Network Intrusion
We also applied our reference architecture to anomaly detection using NIDS (network intrusion detection) data. This is a classic problem for NeuroNets. Models are trained to monitor and identify unauthorized, illicit and anomalous network behavior, notify network administrators and take autonomous actions to save the network integrity.
Several datasets were used for this initial proof of concept and the models used included:
- MLP | Feedforward (currently used for streaming)
- RNN
- AutoEncoder
- MLP simulated AutoEncoder
If you would like to start a conversation with Canonical to help us identify the applications and workloads that are most meaningful to you please get in touch with Samuel Cozannet. Or if you are keen to partner with us in this work please get in touch.
Alternatively you can join us on our upcoming webinar “Modelling Deep Learning and Data Science enablement with Juju” on September 8th, 2016.