» 2016 » October Joseph's Blog

In a previous blog post [link], I mentioned looking for a more perfect programming language. I realized that I don’t generally approach languages with the same project in mind, partially because I don’t have a standard ensemble of things to do. In the hopes of better establishing a baseline, I’ll try to enumerate a standard set of problems that I will tackle when picking up a new language in the hopes of seeing how it “feels”. These problems are more in tune with the kinds of things that I handle and don’t necessarily generalize to the broader world, but hopefully by describing the details that go into a problem they’ll provide insight into what one should be looking for when evaluating a new language. If I’m lucky, this will also provide an opportunity to discuss and deconstruct what languages solve a problem particularly well. These are more complete applications than the ones seen by Rosetta Code [link] and Project Euler [link], and take into account things like the availability of image and linear algebra libraries.

Problem 0: Format

Problem description: This is a high level description of what the problem will be and why we want to solve it. Hiccups and design considerations will be posted here.

What this shows us: These are the features of the language that this problem should highlight.

Problem 1: Image Cat

Problem description: Sometimes we’re SSH’ed into a remote machine and don’t want to open up an image to see if it matches our expectation. “Is test.jpg a logo? Is training_data_000.png a pokemon picture? Is it blank?” Sure, we can do a quick psftp to the remote machine, but that gets tedious. We can also use imgcat from the creator of iTerm2 [link], but that’s not available everywhere. User should be able to specify an image as an argument, or the program should accept an image from an input stream like stdin if they’re using Curl.

What this shows us: Fundamentally, this problem shows us how easy it is to work with images and color data. It also lets us do CLI argument handling and shows how quickly the application can spin-up and spin-down.

Problem 2: Computational Graph

Problem description: Calculus on computational graphs is the cornerstone of most of the newest generation of deep learning frameworks. If you’ve ever used Theano, TensorFlow, Torch, or Keras (along with a handfull more), you’ve used a computational graph toolkit.

What this shows us: Being able to easily pass around anonymous functions and evaluate named arguments are a nice test of language usability. Additionally, this should allow you to see first hand how good the serialization/deserialization is for your language of choice. Lastly, this should provide some experience with a language’s linear algebra libraries. When I implemented this in Rust the first time, I had problems passing around anonymous functions with Box<Fn(a,b)->c> everywhere. Python was pretty easy, though it took some reasoning to be sure I was storing nodes correctly, and being able to pass kwargs to functions made my life much easier.

Problem 3: Top-Down/Raycasting Game

Problem description: Build a top-down or first person game where the user must navigate a level. Include physics, level loading, and possibly an editor.

What this shows us: This shows our language’s ability to do real-time input and rendering. Depending on whether or not you decide to do raycasting, this may also test your language’s numerical capabilities. It can emphasize the ease of use with Heap or Stack variables, especially Vectors.

Problem 4: Shared Library Plugin

Problem description: Combine the computational graph program above with another language like Python or Rust. Build a graph in your new language.

What this shows us: This shows us what convenience methods we have for our FFI in both the host and target language. It also demonstrates error handling and portability.

Problem 5: Image Sharing Site

Problem description: Develop a site which allows users to upload pictures after signing in.

What this shows us: This demonstrates how easy it is to get a web application up and running with our language, as well as its interoperability with databases. It handles also file-uploads and template rendering.

I had the distinct honor of working with a number of talented individuals at the unofficial unconference.

We were ambitious, maybe too much so, but I comfort myself by saying that incremental progress is progress nonetheless, and that it’s a worthwhile endeavor to report both our successes and our failures. The following is an account of what we tried and our reason for trying.

We began with the problem phrased as follows: “Given a high-level input description of a program, produce source code which meets the requirements.” We didn’t expect to solve this problem in a day, but figured that generating a dataset would be both feasible in the time frame and worthwhile. Collectively, we decided to produce tuples of problem descriptions, programs, and example outputs. The next question was one of output language: we wanted to generate examples in a language which was simple enough for a machine to learn and also usable by a human creating the programming problems. Python was the favorite language among the group, but had its limitations — it was a big language with lots of minutia. Learning to generate Python would require apprehending list-comprehension and more subtlety syntactically than we felt was strictly necessary. Instead, we opted for a high-level Turing-complete computational graph representation of a language (basically, an AST). The graph could be “compiled” to a description or “compiled” to Python and then run, giving all the required outputs of the problem.

The next issue was generating programming problems of sufficient variety to be useful. Too few examples would basically guarantee overfitting, so the manual construction of programming examples was out. Too repetitive would mean the program would ignore the details of the English language and would pick up on the structure of the sentences. That seemed okay at the time — we figured we could remove details without too much effort to make problems more ‘programming-challenge-esque’. It became apparent quickly that selecting which details to omit to frame the problem was almost as big a challenge as the original.

Our graph approach was producing, “Given variables a, b, c, return a*c – b if c > a else c*b;” Not a particularly interesting problem, since it basically involves a direct translation from the description to machine code, and we wanted to avoid building, “a compiler, but with ML.”

The remainder of the day was spent first trying to construct more elaborate program descriptions and more subtle, interesting problems. The latter was spent in an entirely alternative mode where we decided to try and use an autoencoder to learn the Python AST and an autoencoder to learn the structure of English, then bridge the two using our limited dataset scraped from Project Euler and from OpenAI’s sample dataset.

I’m not yet sure what we’d do differently. It seemed we made the right choices given the available information most of the time — the biggest oversight to me remains misjudging the quality of the English output graph. I have to wonder if that could be improved through clever graph optimization tricks. Perhaps that’s a project for another hackathon.

On a personal note, I was overwhelmingly impressed by the depth of skills expressed by the persons there. As much as I don’t enjoy being the least smart person in a room, that’s a good state for a programming collaboration. I look forward to working with everyone again after I’ve had a chance to study some more. Maybe a lot more.

Thank you to everyone who participated. It has been a pleasure.

—Joseph's Blog

Math, Machine Learning, Game Development

Archive

Monthly Archives: October 2016

Programming Problem Compendium

Problem 0: Format

Problem 1: Image Cat

Problem 2: Computational Graph

OpenAI Unofficial Self-Organizing Unconference Review