In an earlier update, I wrote about implementing computational graphs to learn new programming languages.  I took the time recently to implement one for Kotlin, as I was having a hard time moving forward in Rust.  Here are some implementation decisions I made and how I feel about them looking back:

Implementing operators as standalone classes.

When I build Aij (the Java compute graph), Rust Compute Graph, and the first revision of CG4j, I had a graph class which contained 'add node'/'add operator' methods which appended lambdas to a big list of operations for forward and reverse nodes.  This meant that the graph class was nice and portable.  I could drop it into new projects and be on my way.  The downside comes from serialization.  When using Java's internal serializer, Lambdas can't get converted, so saving and restoring the graph automatically isn't possible.  Another downside to this is lambdas incur a small performance overhead, from what I see, even in Rust where we incur a heap allocation unnecessarily.  The solution: define an operator or node class and subclass it.

Returning Nodes or Integers?

Depending on whether you opt to use nodes or node indices, you might find yourself passing either a node index and a graph pointer to your constructors OR a node.  Passing around a graph is inconvenient, especially if you have to load one from disk.  It means you need to deal with the whole chicken-and-egg thing.  Passing around a node is easier by far, but it means you have to handle saving out the nodes which might exist on multiple paths and, more problematically...

Recursively computing the forward propagation calls.

One thing I did do correctly in the Rust implementation was calculating the nodes in order.  Each node had an ID/index, and when doing forward-prop I'd just iterate from left-to-right across the node list and calculate each one.  In the Kotlin implementation, I opted to memoize and recursively compute the nodes.  That shouldn't have been an issue, in theory, since I was caching results, but the function calls aren't free, and it means getting stack-traces like this:

 at com.josephcatrambone.cg4j.Tensor.add_i(Tensor.kt:256)
 at com.josephcatrambone.cg4j.AddNode.forwardOperation(Node.kt:87)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:33)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:30)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:30)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:30)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:30)
 at com.josephcatrambone.cg4j.Graph.forward(Graph.kt:30)
 at com.josephcatrambone.cg4j.Graph.forward$default(Graph.kt:19)
 at com.josephcatrambone.cg4j.RNN.predict(RNN.kt:129)
 at com.josephcatrambone.cg4j.RNN.predict$default(RNN.kt:97)
 at com.josephcatrambone.cg4j.MainKt.testRNN(Main.kt:107)

The single advantage to this approach is that unused code paths don't get called.  If you have nodes, A, B, ... X, Y, Z, and Z = A+B, then when you compute using the non-memoized version you're actually going to be computing the values for everything between A and Z, as opposed to just A, B, and Z for the recursive version.  Depending on how much of your graph is used at any one time, this can be a big saving.  Not sure who the clear winner is here.

In a previous blog post [link], I mentioned looking for a more perfect programming language.  I realized that I don't generally approach languages with the same project in mind, partially because I don't have a standard ensemble of things to do.  In the hopes of better establishing a baseline, I'll try to enumerate a standard set of problems that I will tackle when picking up a new language in the hopes of seeing how it "feels".  These problems are more in tune with the kinds of things that I handle and don't necessarily generalize to the broader world, but hopefully by describing the details that go into a problem they'll provide insight into what one should be looking for when evaluating a new language.  If I'm lucky, this will also provide an opportunity to discuss and deconstruct what languages solve a problem particularly well.  These are more complete applications than the ones seen by Rosetta Code [link] and Project Euler [link], and take into account things like the availability of image and linear algebra libraries.

Problem 0: Format

Problem description: This is a high level description of what the problem will be and why we want to solve it.  Hiccups and design considerations will be posted here.

What this shows us: These are the features of the language that this problem should highlight.


Problem 1: Image Cat

Problem description: Sometimes we're SSH'ed into a remote machine and don't want to open up an image to see if it matches our expectation.  "Is test.jpg a logo?  Is training_data_000.png a pokemon picture?  Is it blank?"  Sure, we can do a quick psftp to the remote machine, but that gets tedious.  We can also use imgcat from the creator of iTerm2 [link], but that's not available everywhere.  User should be able to specify an image as an argument, or the program should accept an image from an input stream like stdin if they're using Curl.

What this shows us: Fundamentally, this problem shows us how easy it is to work with images and color data.  It also lets us do CLI argument handling and shows how quickly the application can spin-up and spin-down.

Problem 2: Computational Graph

Problem description: Calculus on computational graphs is the cornerstone of most of the newest generation of deep learning frameworks.  If you've ever used Theano, TensorFlow, Torch, or Keras (along with a handfull more), you've used a computational graph toolkit.

What this shows us: Being able to easily pass around anonymous functions and evaluate named arguments are a nice test of language usability.  Additionally, this should allow you to see first hand how good the serialization/deserialization is for your language of choice.  Lastly, this should provide some experience with a language's linear algebra libraries.  When I implemented this in Rust the first time, I had problems passing around anonymous functions with Box<Fn(a,b)->c> everywhere.  Python was pretty easy, though it took some reasoning to be sure I was storing nodes correctly, and being able to pass kwargs to functions made my life much easier.

Problem 3: Top-Down/Raycasting Game

Problem description: Build a top-down or first person game where the user must navigate a level.  Include physics, level loading, and possibly an editor.

What this shows us: This shows our language's ability to do real-time input and rendering. Depending on whether or not you decide to do raycasting, this may also test your language's numerical capabilities.  It can emphasize the ease of use with Heap or Stack variables, especially Vectors.

Problem 4: Shared Library Plugin

Problem description: Combine the computational graph program above with another language like Python or Rust.  Build a graph in your new language.

What this shows us: This shows us what convenience methods we have for our FFI in both the host and target language.  It also demonstrates error handling and portability.

Problem 5: Image Sharing Site

Problem description: Develop a site which allows users to upload pictures after signing in.

What this shows us: This demonstrates how easy it is to get a web application up and running with our language, as well as its interoperability with databases.  It handles also file-uploads and template rendering.


I had the distinct honor of working with a number of talented individuals at the unofficial unconference.

We were ambitious, maybe too much so, but I comfort myself by saying that incremental progress is progress nonetheless, and that it's a worthwhile endeavor to report both our successes and our failures. The following is an account of what we tried and our reason for trying.

We began with the problem phrased as follows: "Given a high-level input description of a program, produce source code which meets the requirements." We didn't expect to solve this problem in a day, but figured that generating a dataset would be both feasible in the time frame and worthwhile. Collectively, we decided to produce tuples of problem descriptions, programs, and example outputs. The next question was one of output language: we wanted to generate examples in a language which was simple enough for a machine to learn and also usable by a human creating the programming problems. Python was the favorite language among the group, but had its limitations -- it was a big language with lots of minutia. Learning to generate Python would require apprehending list-comprehension and more subtlety syntactically than we felt was strictly necessary. Instead, we opted for a high-level Turing-complete computational graph representation of a language (basically, an AST). The graph could be "compiled" to a description or "compiled" to Python and then run, giving all the required outputs of the problem.

The next issue was generating programming problems of sufficient variety to be useful. Too few examples would basically guarantee overfitting, so the manual construction of programming examples was out. Too repetitive would mean the program would ignore the details of the English language and would pick up on the structure of the sentences. That seemed okay at the time -- we figured we could remove details without too much effort to make problems more 'programming-challenge-esque'. It became apparent quickly that selecting which details to omit to frame the problem was almost as big a challenge as the original.

Our graph approach was producing, "Given variables a, b, c, return a*c - b if c > a else c*b;" Not a particularly interesting problem, since it basically involves a direct translation from the description to machine code, and we wanted to avoid building, "a compiler, but with ML."

The remainder of the day was spent first trying to construct more elaborate program descriptions and more subtle, interesting problems. The latter was spent in an entirely alternative mode where we decided to try and use an autoencoder to learn the Python AST and an autoencoder to learn the structure of English, then bridge the two using our limited dataset scraped from Project Euler and from OpenAI's sample dataset.

I'm not yet sure what we'd do differently. It seemed we made the right choices given the available information most of the time -- the biggest oversight to me remains misjudging the quality of the English output graph. I have to wonder if that could be improved through clever graph optimization tricks. Perhaps that's a project for another hackathon.

On a personal note, I was overwhelmingly impressed by the depth of skills expressed by the persons there. As much as I don't enjoy being the least smart person in a room, that's a good state for a programming collaboration. I look forward to working with everyone again after I've had a chance to study some more. Maybe a lot more.

Thank you to everyone who participated. It has been a pleasure.

UPDATE: This code is now available in both Java and Python!

I've been on an automatic differentiation kick ever since reading about dual numbers on Wikipedia.

I implemented a simple forward-mode autodiff system in Rust, thinking it would allow me to do ML faster. I failed to realize/read that forward differentiation, while simpler, requires one forward pass to get the derivative of ALL outputs with respect to ONE input variable. Reverse-mode, in contrast, gives you the derivative of all inputs with respect to one output.

That is to say, if I had f(x, y, z) = [a, b, c], forward mode would give me da/dx, db/dx, dc/dx in a single pass. Reverse mdoe would give me da/dx, da/dy, da/dz in a single pass.

Forward mode is really easy. I have a repo with code changes here:

Reverse mode took me a while to figure out, mostly because I was confused about how adjoints worked. I'm still confused, but I'm now so accustomed to the strangeness that I'm not noticing it. Here's some simple, single-variable reverse-mode autodiff. It's about 100 lines of Python:

I did my masters in machine learning, so I'm a little touchy on the subject. It always stands out to me when someone says, 'big data punishes poor people' because it sounds like "polynomials are anti-semetic" or "bolt cutters are racist".

Machine learning is a tool like any other, and it can be used for nefarious purposes. I don't think it's an unreasonable assertion that things like search-bubbling actually contribute negatively to echo-chamber effects, as they result in people seeing only data that reinforces their viewpoints (as a side effect of being more relevant). To cast the blanket statement like this, however, I think is a catchy but unnecessarily negative act.

I hope the book doesn't overlook the positive contributions that data mining has made, like discovering genetic markers for diseases, finding new antibiotics, finding treatments for cancers, decreasing water consumption in agriculture, tracking diminishing animal populations, or even more mundane things like providing automatic subtitles to videos for the hearing impaired.

The most interesting question I have to raise is this: is it _more_ humane to remove the biases of a human? Humans are REALLY good at seeing patterns. We're so good at seeing patterns that we see them where there are none -- we see Jesus in toast, we see faces in the sky, we see people as part of a group. That last one is racist, and while we can't alter our perceptions we can be made aware of them and do everything we can to try and work around our 'feelings'. Machines are getting good at recognizing patterns too, now. They even beat us in a lot of cases. If we train a model with racist data, though, it will generate racist predictions. Can we efficiently sanitize data to be sure that it's fair to everyone involved? Is it inevitable that people will abuse statistics to further their own ends? Equally curious: if data suggests a 99% chance that someone will default on a loan, should we chide the operator of the tool for using it? What if they're trying to protect their own best interests? I don't know if there's a winner there.

There's a lot of answers I don't have and, ironically, an inability to predict the future, but I do have an emotional response to the article: it's unpleasant and bothersome. I can't say it's wrong, but I can say it's an incomplete picture and that it furthers the author's agenda: making a boogeyman of an emerging technology. I don't like that.

tl;dr: This is a nuanced topic and I'm dubious that the author can reasonably cover it, fearing instead that it devolves into fear-monger.