Let’s Talk Artificial Intelligence

There has been a lot of buzz around Artificial Intelligence as Bixby, Alexa, and Siri become household names. Today, self driving cars share the roads with human drivers, hog farms keep an “artificial eye” on the vitals of individual animals in the pens, and soon enough we’ll be able to monitor heart health by staring into your eyes as a substitute to being hooked up to an EKG machine.

I wanted to know how this is happening and learn about the brain behind all of these advancements, so I recently set out to understand and learn the landscape of various Linux AI projects including NuPIC, Caffe and TensorFlow.  This post will cover some of the important things I learned about each of these platforms.

NuPIC

NuPIC stands for Numenta Platform for Intelligent Computing, and I found the theory behind it to be fascinating. It’s an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence that’s inspired by and based on the neuroscience of the neocortex, and it’s well suited for finding anomalies in live data streams. The algorithms and framework are implemented in Python,  JavaScript, C++, Shell, and Java.

I picked nupic and nupic-core (implementation of core NuPIC algorithms in C++) to play with on x86_64 (Ubuntu 17.10) and odroid-xu4 (Ubuntu 16.04) platforms. I had good luck compiling nupic and nupic-core from sources on both platforms. I was able to run unit_tests on x86_64, however, on odroid_xu4 I couldn’t run it due to an older version of pip. NuPIC requires pip version 9.0.1, so I’ll have to come back another time to build pip 9.01 from sources.

Here is a short summary of the steps to install NuPIC and run tests.

git clone https://github.com/numenta/nupic.core.git

cd nupic.core/
export NUPIC_CORE=$PWD
mkdir -p $NUPIC_CORE/build/scripts
cd ../nupic.core/build/scripts/
cmake $NUPIC_CORE -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_INSTALL_PREFIX=../release \
-DPY_EXTENSIONS_DIR=$NUPIC_CORE/bindings/py/src/nupic/bindings

# While still in $NUPIC_CORE/build/scripts
make -j3

# While still in $NUPIC_CORE/build/scripts
make install

# Validate install running cpp_region_test and unit_tests
cd $NUPIC_CORE/build/release/bin
./cpp_region_test
./unit_tests

#Note:
#This step worked on x86_64 as it has pip 9.0.1 and failed on
# odroid-xu4 due to older pip 8.1.1 ./unit_tests
git clone https://github.com/numenta/nupic
cd nupic

#Follow the detailed instructions in README.md to compile and install

# Install from local source code
sudo pip install -e .

sudo apt-get install python-pytest

Note:
This step worked on x86_64 as it has pip 9.0.1 and failed on
odroid-xu4 due to older pip 8.1.1 py.test tests/unit

Caffe

Caffe is Deep learning framework from Berkeley AI Research (BAIR) and The Berkeley Vision and Learning Center (BVLC). Tutorials and installation guides can be found in the project’s README.

I compiled Caffe on the odroid-xu4 successfully and was able to run tests. I had to install several dependencies for “make all” to complete successfully, and at the end I was able to run “make runtest” which ran 1110 tests from 152 test cases. The experience is relatively painless!

Here is a short summary of the install steps for installing and running tests.

https://github.com/BVLC/caffe.git
cd caffe
# Created Makefile.config from Makefile.config.example
# Uncomment CPU_ONLY=1 for Caffe cpu version

# Install dependenices
sudo apt-get install protobuf-compiler libboost-all-dev
sudo apt-get install libgoogle-glog-dev
sudo apt-get install libopenblas-dev
sudo apt-get install libopencv-highgui-dev
sudo apt-get install libhdf5-dev
export CPATH="/usr/include/hdf5/serial/"
sudo apt-get install libleveldb-dev
sudo apt-get install liblmdb-dev
sudo apt-get install libatlas-base-dev
sudo apt-get install libgflags-dev

make all
make test
make runtest
[==========] 1110 tests from 152 test cases ran. (197792 ms total)
[  PASSED  ] 1110 tests.

The Caffe command can be found under .build_release/tools/caffe.

caffe: command line brew
usage: caffe <command> <args>

commands:
  train           train or finetune a model
  test            score a model
  device_query    show GPU diagnostic information
  time            benchmark model execution time

  Flags from tools/caffe.cpp:
    -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use
      '-gpu all' to run on all available GPUs. The effective training batch
      size is multiplied by the number of devices.) type: string default: ""
    -iterations (The number of iterations to run.) type: int32 default: 50
    -level (Optional; network level.) type: int32 default: 0
    -model (The model definition protocol buffer text file.) type: string
      default: ""
    -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.)
      type: string default: ""
    -sighup_effect (Optional; action to take when a SIGHUP signal is received:
      snapshot, stop or none.) type: string default: "snapshot"
    -sigint_effect (Optional; action to take when a SIGINT signal is received:
      snapshot, stop or none.) type: string default: "stop"
    -snapshot (Optional; the snapshot solver state to resume training.)
      type: string default: ""
    -solver (The solver definition protocol buffer text file.) type: string
      default: ""
    -stage (Optional; network stages (not to be confused with phase), separated
      by ','.) type: string default: ""
    -weights (Optional; the pretrained weights to initialize finetuning,
      separated by ','. Cannot be set simultaneously with snapshot.)
      type: string default: ""

TensorFlow

Now, on to TensorFlow: a machine learning framework. It provides a C API defined in c_api.h, which is suitable for building bindings for other languages; I played with TensorFlow and the C API, and I compiled TensorFlow and Bazel, a pre-requisite, from sources. Compiling from sources took a long time on the odroid-xu4 and ran into the same pip version issue I mentioned earlier. On the x86_64 platform, I managed to write a hello_tf.c using the C API and watch it run.

The Tensorflow website has a great guide that explains how to prepare your Linux environment. There are two choices for installing Bazel, you can either install Bazel binaries, or compile from sources. Here are my setup notes for compiling Bazel from sources:

sudo apt-get install build-essential openjdk-8-jdk python zip
sudo apt-get install python-numpy python-dev python-pip python-wheel

mkdir bazel; cd bazel

wget https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-dist.zip

unzip bazel-0.10.0-dist.zip

./compile.sh

# The above failed with "The system is out of resources."
# message. Workaround reference:
# https://github.com/bazelbuild/bazel/issues/1308

vi scripts/bootstrap/compile.sh

# find the line:
"run "${JAVAC}" -classpath "${classpath}" 
 -sourcepath "${sourcepath}""
add -J-Xms256m -J-Xmx384m
as shown in 
"run "${JAVAC}" -J-Xms256m -J-Xmx384m -classpath
 "${classpath}" -sourcepath "${sourcepath}""

# Run compile again: Compile should work now.
./compile.sh

sudo pip install six numpy wheel

cd tensorflow
export PATH=$PATH:../linux_ai/bazel/output
./configure

# The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /tmp/tensorflow_pkg directory:

bazel build --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures --config=opt //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-1.6.0rc0-cp27-cp27mu-linux_armv7l.whl

You can validate your TesnforFlow installation by running python in a different directory and executing the following code:

python

Python 2.7.14 (default, Sep 23 2017, 22:06:14) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))

#If you see "Hello, TensorFlow!" as a result, you are good to start
#developing TensorFlow programs.

For added fun, I also installed the TensorFlow C API, here is the sample program:

#include <stdio.h>
#include <tensorflow/c/c_api.h>

int main()
{
	printf("Hello from TensorFlow C library version %s\n", TF_Version());
	return 0;
}
gcc -I/usr/local/include -L/usr/local/lib hello_tf.c -ltensorflow
./hello_tf 
Hello from TensorFlow C library version 1.5.0

Summary

My experience has given me a few tips and takeaways I think are worth sharing:

  • Most AI frameworks depend on new tool chain versions, so it’s important to ensure you have latest versions of things like pip and python.
  • Compiling TensorFlow can be very tedious.
  • Caffe installed easily on odroid-xu4.
  • Unlike TesnsorFlow Java and TensorFlow Go, TensorFlow for C appears to be covered by the TensorFlow API stability guarantees.

This blog should have provided you with a quick summary of the select AI projects I experimented with, and I hope some of my notes might be useful to you. My work in this exciting area continues, and I will share more new discoveries in the future. If you’re working on some interesting AI projects, we’d love to hear about them in the comments below.

Even More Open Source Medicine and Machine Learning, in This Week’s Wrap Up

Open Source Wrap Up: November 14-20, 2015

Group of Biohackers Start Work on Open Source Insulin

More than 370 million people suffer from diabetes worldwide, and these people rely on regular insulin injections to regulate the amount of sugar in their blood. Despite this major need for insulin, there is no generic version available on the market, and the high cost results in it having limited availability in poorer parts of the world. A group of citizen and academic researchers and biohackers, led by Counter Culture Labs, have launched a project to develop a simple method for producing insulin and release the process to the public domain. They have launched a crowdfunding campaign (that has already exceeded their goal) to fund the first stage of this research.  For stage 1, ” the team will insert an optimized DNA sequence for insulin into E. coli bacteria, induce the bacteria to express insulin precursors, and verify that human proinsulin has been produced.”

Netflix Launches Open Source Platform for Continuous Delivery

Netflix has launched Spinnaker, a new open source project for continuous delivery, in conjunction with Google, Microsoft, and Pivotal. It is designed to serve as a replacement to Asgard, the model Netflix previously used to manage Amazon Web Services. Spinnaker expands on Asgard by providing support for multiple platforms, including Google Cloud Platform, and Cloud Foundry, with support for Microsoft Azure reportedly in development. Spinnaker also provides cluster management and visibility into the footprint of applications on cloud infrastructure.

Check out the code on GitHub.

Linux Goes to the International Space Station

The United Space Alliance, a NASA contractor, has decided to migrate to Linux for operations on the International Space Station (ISS). The company’s Laptops and Network Integration team is in charge of the station’s OpsLAN: a network of laptops that provide the crew with vital capabilities for day-to-day operations. The team decided to migrate from Windows to Linux in order to improve stability and reliability, and provide better control over the operating system they deploy. The Linux Foundation provided the training needed for this team to develop a solution based on Linux through classes that introduced the team members to Linux and application development on the platform.

Read More from the Linux Foundation.

Microsoft Releases Distributed Machine Learning Toolkit

Following in the wake of Google’s release of TensorFlow, Microsoft has answered with their own open source, distributed machine learning toolkit. DMTK includes a flexible framework that supports a unified interface for data parallelization, big model storage, model scheduling, and automatic pipelining. It also includes distributed algorithms for word embedding and multi-sense word embedding as well as LightLDA: a fast and lightweight system for large-scale topic modeling. DMTK offers an API designed to reduce the barrier to entry of distributed machine learning with the goal being for researchers and developers to focus more on the data, model, and training the core machine logics. By open sourcing this software, Microsoft hopes to encourage academia and the industry to get involved in the platform’s development.

The code can be found on GitHub.

Other News