Monthly Archives: February 2015

Text based games are something of a guilty pleasure for me. Inform7 and assorted text-based interactive fictions consistently draw well over the amount of time allocated, for reasons that still evade me. One frustration I've had with them consistently is the cumbersome and error-prone communication between entities. In a most perfect of worlds I'd love a fully AI driven text adventure, similar to DnD, but with an artificial dungeon master. A SomethingAwful IRC cohort pointed out the implausibility of this option at this point in time, but the idea stuck with me. Gradually, things coalesced into this project, the game I'm calling Voight.

ChatBots are fun, but I don't have too much interest in doing Markov Chain learning for them. I want to really stretch and throw some of the deep learning research I've been doing at the NLP problem and see what comes out of it. The technology might not be far enough along to support a fully general artificial intelligence system, but if we constrain our goals to only conversation AND justify absurdities as being 'part of a buggy robot', things become more tractable. So what's the setting? Your character, Dr. Voight, is interviewing broken or simple robots. While I have some ideas for driving the plot further, all that is irrelevant if the chat-bot part isn't working properly.

What are the mechanics? It's a simple question/response or statement/response interface. The non-AI part of the application listens to (1) keywords in the robot's response and (2) distances from key values in the internal state to determine whether or not to advance the plot.

Why Voight? Simple: In Blade Runner, the human/replicant test is the Voight-Kampff test.

Continuing this week's theme of quick Python snippets, here's a chunk of code which, when given two directories full of images named #.jpg, will build and test an SVM classifier. The code is terribly simple. Mostly, I'm including it here because I don't want to get my flash drive from the other room, and I'm too lazy to SSH/SFTP it over to my other machine.

import os
import numpy
from sklearn import svm 
from sklearn.utils import shuffle
from PIL import Image

training_examples = list()
training_labels = list()
test_examples = list()
test_labels = list()

# Load data
def load_data(folder, count, label, example_list, label_list, start_index=0):
	index = start_index
	start_example_count = len(example_list) # We may have some examples already
	while len(example_list)-start_example_count < count:
			img =, "{}.jpg".format(index)))
			img = img.convert('L') # Make black and white
			img = numpy.asarray(img, dtype=numpy.float) # Convert to numpy matrix with floating point values
			img = img.reshape((1,-1)) # Force image to a single row
			img /= 255.0 # Rescale from 0,255 to 0,1 for our SVM.
			example_list.append(img[0]) # The [0] unpacks the NxM matrix into a 1xM row.
		except IOError as ioe:
			print("Error loading image from folder {}, number {}".format(folder, index))
		index += 1

load_data("positive", MAX_EXAMPLES/2, 1, training_examples, training_labels)
load_data("negative", MAX_EXAMPLES/2, 0, training_examples, training_labels)
load_data("positive", 100, 1, test_examples, test_labels, 500) # Skip the first 500 images, which we used for training
load_data("negative", 100, 0, test_examples, test_labels, 500)

# Shuffle data
training_examples, training_labels = shuffle(training_examples, training_labels)

# Build and train classifier
classifier = svm.SVC(), training_labels)

# Test predictions
predictions = classifier.predict(test_examples)

# Calculate error
hits = 0
misses = 0
for prediction, truth in zip(predictions, test_labels):
	if prediction == truth:
		hits += 1
		misses += 1

This borders on blogspam, but I found it so useful that I can't help but share.

Evan Shelhamer shared here the format that Caffe expects its input data to be.

import caffe
import lmdb

in_db ='image-lmdb', map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, in_ in enumerate(inputs):
        im =
        im_dat =, 0, 1)))
        in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())