Monthly Archives: October 2015

Or: "How I learned to stop worrying and generate planet names."

A friend of mine is working on a procedural universe generator. Procedural stuff is something I happen to love, as you might ascertain from my older ramblings. In the interest of handing out some knowledge, I want to illustrate an application of different number system representations. After this is done, you'll have an intuitive understanding of how to take any number (from 0 through 2^64-1) and get back a planet name like 'Gregorio-X. Ignus system. Charion group. Darion supercluster.'

First, some background.

One numeric basis to which you're already accustomed is base-10. Base ten has the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 in it. If you write a number in base ten, like 123, you know that a digit in 2's place counts for 10x what a digit in 3's place does. That's perhaps a long way of saying "123 is 100 + 20 + 3". We can rewrite 123 as 1*10^2 + 2*10^1 + 3*10^0. Anything to the power of zero is 1. 10^0 is 1. Our first 3*10^0 reduces to 3*1 = 3. Great. Now something easier. 10^1 = 10, so 2*10^1 is 2*10 = 20. The process repeats. What you should be seeing from this pattern is each digit position increasing the weight by a factor of 10 -- each change in position means the exponent is bigger by one. This will become important when we look at other bases.


Binary is base-2. The digits in binary are simple: 0 and 1. Let's look at the number 101. That's 5 in base-10. Similar to how we can write 123 as 1*10^2 + 2*10^1 + 3*10^0, we can rewrite 101 as 1*2^2 + 0*2^1 + 1*2^0. 1*4 + 0*2 + 1*1 = 4 + 0 + 1 = 5.

With me so far? Hang tight. We're almost there.


What would happen if we wanted a base with MORE than ten digits? Well, hexadecimal happens to be base 16. We have digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F! Note that we need more than just our regular ten digits. To account for all the numbers in hex, we have to include some letters. 'A' in this case is '10' and 'B' is '11'. And so on up to 'F' = '15'. This means '10' in hexadecimal is 16 in decimal. 1*16^1 + 0*16^0. C2A is C*16^2 + 2*16^ 1 + A*16^0. Substitute C with 12 and A with 10 to get 12*16^2 + 2*16^1 + 10*16^0. 12*256 + 2*16 + 10*1 = 3072 + 32 + 1 = 3114.

Arbitrary Bases

Notice how in hexadecimal we had to include other digits A-F. We don't _have_ to use 0-9 as our basis digits, either. Let's use base-10, but for digits 0-9 we will substitute 'fluffy', 'fuzzy', 'poof', 'bunny', 'kitten', 'puppy', 'snuggly', 'happy', 'cuddly', 'puff'. The number 0189 then is 'fluffyfuzzycuddlypuff'. We can also go backwards, 'snugglypuppypoof' is 652.

As a natural extension to this, we can pick bigger bases. In my code, I picked base-65 because that gave me around 8 digits to represent all the numbers 0 to 2^64. Here's the Python code which converts from base-10 to base-65. My digits for the base are made up names.

I've been trying to squeeze data out of my RBM trainer for a few days. I can get some good looking weights, but I'm a bit stuck on generating novel data. Here are some things I've tried and learned.

Increase Gibbs Sampling

In Hinton's Coursera lectures, he speaks of flattening out the energy landscape. Imagine the RBM as water, trying to move to the state of lowest energy (the reservoirs). In the case of MNIST, we have ten pools, one for each of the digits zero through nine. If we use one step of gibbs sampling, we flatten out the area nearby each of those local pools, meaning drops that don't fall far from them will move downhill and fall into one of the ten reservoirs. If we're farther out, however, the droplet might get stuck in a different local minimum -- one outside the range we'd smoothed. We have to gradually increase the number of gibbs samples so that we flatten out progressively larger regions. In doing so, it means when we randomly generate data in the field, it will "flow" into one of our predefined categories.

Generate Random Inputs, Not Random Hiddens

I tried randomly setting the hidden values and propagating back from that. It didn't work since there is some 'most liked' states. I imagine if I were setting those states based on learned patters I'd have more luck, but until them I just have to randomly generate a visible state.

Don't Accumulate Samples

In the following examples, assume input is a randomly initialized array of appropriate size.

This is the code I tried:

public Matrix daydream(Matrix input, int numCycles) {
	Matrix accumulator = Matrix.zeros(input.numRows(), input.numColumns());
	for(int i=0; i < numCycles; i++) {
		input = reconstruct(predict(input), true); // True means binary reconstruction.
	return accumulator.elementMultiply_i(1.0/(double)numCycles);

It's not the worst I've seen -- at least the samples aren't all the same, but it's far from good.

On the other hand, this one seems to go to zero after a few dozen iterations:

public Matrix daydream(Matrix input, int numCycles) {
	for(int i=0; i < numCycles; i++) {
		// False means gaussian reconstruction, but true yields the same results.
		input = reconstruct(predict(input), false);
	return input;

I would guess it's because I have bias disabled for these networks.

Aij, my egotystically named AI learning toolkit is still in development. Last weekend, I went through and did some bug fixes which allow Hinton's results on MNIST to be reproduced.

You can see the full source on Github here:


AIJ library after training on MNIST for a few epochs. No visible or hidden bias. (~100)

Unbiased RBM with Long-Training Time

Another unbiased RBM after several thousand more iterations.

Although I feel like I'm at a pretty comfortable place in my life with respect to programming languages (all the right tools for the right job), I'm still looking for the magical, silver bullet that can do everything I want with perfect convenience, high entertainment value, and minimal headache.

My current repertoire consists of Python for most prototyping and web applications, Java for desktop and games, and C for the occasional embedded and low-level stuff. Throw in some JavaScript for client-side things by necessity.

But not one of these is perfect in any sense of the word. They're all Ada-descendants, which is fine with me, aside from the nagging voice in my head and the choir of society saying 'try new things!'. I feel like my approach to learning new languages is less about learning the finer details of the language and more about figuring out stuff that can be done. For many of my programs, it's less about how I put together the application (like the ML exercises or games), and more a question of 'how would I do this mathematically?' Which means a language which is easier makes sense to me. I've experimented with dozens to get outside the comfort zone, Clojure, Haskell, Scheme, Elixr, C#, but none of them allow me to really think about the things I want to _build_. Perhaps/Probably because I'm too lazy to get fluent enough to get cozy.

But if I _were_ hoping for a perfect programming language, what would it be like? I'm not sure I can specify the syntax, but I can probably provide the feature set.

Compiled or JVM Compatible

No interpreter. Python is a great language. I use it regularly and it's still entertaining, but distribution is a total pain in the ass. It lets me build decent web applications with great ease, lets me do anything I want with matrix multiplications and machine learning, and the venv/PIP combo is a great ecosystem. Distribution, though. Goddamn. You can't compile a Python app to native. You can't build a statically linked library. You can't build a DLL. You can't make assembly code for an IC. Well, okay, you kinda' can, but it's not great. I just want an application which can compile down to fast, native code. Better yet, let me use one compiler on Windows or OSX or Linux and build for ALL THE OTHER PLATFORMS. That would be fantastic. Excluding that, I suppose I can accept building for the JVM.

Linear Algebra Library and Operators

Java also fits a lot of these, and it's a bit more boilerplate heavy when it comes to web applications (Jetty? Netty? Grizzly? Glassfish? Tomcat? Containers? Servlets? Oh my!) but it's really the De Facto industry standard. It also is awesome when it comes to distributions. Some people whine about installing the JVM, and I don't blame them. Java Devs don't notice the problem because the JDK doesn't include any of the Ask toolbar nonsense. You do what you can. But I digress, Linear Algebra Libraries. Python has numpy, which works brilliantly and Java has kinda' jblas, along with a bunch of others. The Java language, though, doesn't support operator overloading, for better or worse. That means you're left with a.mmul(Matrix.ones(b.getRows(), b.getColumns()).transpose().div(b)).transpose().div(c) instead of a*(1.0/b).T/c. Huge problem? Not really, but it does make it that much harder to transliterate. In a very similar sense, it helps to have one standard image library that gives me pixel-level access and a few useful methods to go from disk image -> picture -> matrix -> picture -> disk image.

Static Typing

Again we come to Python. I actually really hate inferring type. It might mean more writing in Java when it comes to saying DoubleMatrix derp, but it's really helpful to be able to say, "I EXPECT TYPE DOUBLEMATRIX" when I need it. This is especially helpful when you're getting data from a web application and you can't tell if some middleware has cast a variable from an integer to a string.

Easy or Automatic Parallelization

Part of me misses the OPENMP decorater from C++ and C. Being able to say `#pragma omg parallel` and know that you're automatically going to spin up stuff on a bunch of cores is profoundly satisfying. At the same time, I like the ease with which Java lets me spin up a thread and say, "Do thing in the background until I join." C's threadding library isn't terrible, but Java really has it down nicely. I find myself lusting after ParallelFor in Java (which is getting better thanks to JDK8) and missing Threads when working in C. Combining this with the Linear Algebra library above would be an absolute godsend, especially if the program can GPU or FPGA parallelize things.

Superb Ecosystem and Tools

This probably goes without saying, but having a compiler which points out a line problem, a debugger which lets you step, and a REPL which lets you play can do amazing things. Being able to say, "pip install without worrying is also a great touch. Gradle is a great piece of this, since it gives a lot of the flexibility of Make scripts without the bullshit package management.

Simple, Iteratively Deepening Syntax

Haskell is a cool tool, but it's a lot to take in at the same time. With Python, I was able to start and experiment simply in the style that I was accustomed to, doing simple for-loops and such. Eventually, moving up to stuff like `blah_map = {x[0]:x[1] for x in itertools.product(foo, bar)}`. Being able to start somewhat familiarly helps a lot, since, as mentioned above, I'm the type that cares more about the product and the math than the tools themselves.

Too much to ask? Probably, but I'm excited to see what the future holds.