» 2016 » September Joseph's Blog

UPDATE: This code is now available in both Java and Python!

I’ve been on an automatic differentiation kick ever since reading about dual numbers on Wikipedia.

I implemented a simple forward-mode autodiff system in Rust, thinking it would allow me to do ML faster. I failed to realize/read that forward differentiation, while simpler, requires one forward pass to get the derivative of ALL outputs with respect to ONE input variable. Reverse-mode, in contrast, gives you the derivative of all inputs with respect to one output.

That is to say, if I had f(x, y, z) = [a, b, c], forward mode would give me da/dx, db/dx, dc/dx in a single pass. Reverse mdoe would give me da/dx, da/dy, da/dz in a single pass.

Forward mode is really easy. I have a repo with code changes here: https://github.com/JosephCatrambone/RustML

Reverse mode took me a while to figure out, mostly because I was confused about how adjoints worked. I’m still confused, but I’m now so accustomed to the strangeness that I’m not noticing it. Here’s some simple, single-variable reverse-mode autodiff. It’s about 100 lines of Python:

	#!/usr/bin/env python
	# JAD: Joseph's Automatic Differentiation

	from collections import deque

	class Graph(object):
	def __init__(self):
	self.names = list()
	self.operations = list()
	self.derivatives = list() # A list of LISTS, where each item is the gradient with respect to that argument.
	self.node_inputs = list() # A list of the indices of the input nodes.
	self.shapes = list()
	self.graph_inputs = list()
	self.forward = list() # Cleared on forward pass.
	self.adjoint = list() # Cleared on reverse pass.

	def get_output(self, input_set, node=-1):
	self.forward = list()
	for i, op in enumerate(self.operations):
	self.forward.append(op(input_set))
	return self.forward[node]

	def get_gradient(self, input_set, node, forward_data=None):
	if forward_data is not None:
	self.forward = forward_data
	else:
	self.forward = list()
	for i, op in enumerate(self.operations):
	self.forward.append(op(input_set))
	# Initialize adjoints to 0 except our target, which is 1.
	self.adjoint = [0.0]*len(self.forward)
	self.adjoint[node] = 1.0
	gradient_stack = deque()
	for input_node in self.node_inputs[node]:
	gradient_stack.append((input_node, node)) # Keep pairs of target/parent.
	while gradient_stack: # While not empty.
	current_node, parent_node = gradient_stack.popleft()
	for dop in self.derivatives[current_node]:
	self.adjoint[current_node] += self.adjoint[parent_node]*dop(input_set)
	for input_arg in self.node_inputs[current_node]:
	gradient_stack.append((input_arg, current_node))
	return self.adjoint

	def get_shape(self, node):
	return self.shapes[node]

	def add_input(self, name, shape):
	index = len(self.names)
	self.names.append(name)
	self.operations.append(lambda inputs: inputs[name])
	self.derivatives.append([lambda inputs: 1])
	self.node_inputs.append([])
	self.graph_inputs.append(index)
	self.shapes.append(shape)
	return index

	def add_add(self, name, left, right):
	index = len(self.names)
	self.names.append(name)
	self.operations.append(lambda inputs: self.forward[left] + self.forward[right])
	self.derivatives.append([lambda inputs: 1, lambda inputs: 1]) # d/dx a + b = 1 + 0 or 0 + 1
	self.node_inputs.append([left, right])
	self.shapes.append(self.get_shape(left))
	return index

	def add_multiply(self, name, left, right):
	index = len(self.names)
	self.names.append(name)
	self.operations.append(lambda inputs: self.forward[left] * self.forward[right])
	self.derivatives.append([lambda inputs: self.forward[right], lambda inputs: self.forward[left]])
	self.node_inputs.append([left, right])
	self.shapes.append(self.get_shape(left))
	return index

	if __name__=="__main__":
	g = Graph()
	x = g.add_input("x", (1, 1))
	y = g.add_input("y", (1, 1))
	a = g.add_add("a", x, y)
	b = g.add_multiply("b", a, x)

	input_map = {'x': 2, 'y': 3}
	print(g.get_output(input_map)) # 10
	print(g.get_gradient(input_map, b)) # 3, 2, 2, 1.

view raw AutoDiff.py hosted with

by GitHub

I did my masters in machine learning, so I’m a little touchy on the subject. It always stands out to me when someone says, ‘big data punishes poor people’ because it sounds like “polynomials are anti-semetic” or “bolt cutters are racist”.

Machine learning is a tool like any other, and it can be used for nefarious purposes. I don’t think it’s an unreasonable assertion that things like search-bubbling actually contribute negatively to echo-chamber effects, as they result in people seeing only data that reinforces their viewpoints (as a side effect of being more relevant). To cast the blanket statement like this, however, I think is a catchy but unnecessarily negative act.

I hope the book doesn’t overlook the positive contributions that data mining has made, like discovering genetic markers for diseases, finding new antibiotics, finding treatments for cancers, decreasing water consumption in agriculture, tracking diminishing animal populations, or even more mundane things like providing automatic subtitles to videos for the hearing impaired.

The most interesting question I have to raise is this: is it _more_ humane to remove the biases of a human? Humans are REALLY good at seeing patterns. We’re so good at seeing patterns that we see them where there are none — we see Jesus in toast, we see faces in the sky, we see people as part of a group. That last one is racist, and while we can’t alter our perceptions we can be made aware of them and do everything we can to try and work around our ‘feelings’. Machines are getting good at recognizing patterns too, now. They even beat us in a lot of cases. If we train a model with racist data, though, it will generate racist predictions. Can we efficiently sanitize data to be sure that it’s fair to everyone involved? Is it inevitable that people will abuse statistics to further their own ends? Equally curious: if data suggests a 99% chance that someone will default on a loan, should we chide the operator of the tool for using it? What if they’re trying to protect their own best interests? I don’t know if there’s a winner there.

There’s a lot of answers I don’t have and, ironically, an inability to predict the future, but I do have an emotional response to the article: it’s unpleasant and bothersome. I can’t say it’s wrong, but I can say it’s an incomplete picture and that it furthers the author’s agenda: making a boogeyman of an emerging technology. I don’t like that.

tl;dr: This is a nuanced topic and I’m dubious that the author can reasonably cover it, fearing instead that it devolves into fear-monger.

—Joseph's Blog

Math, Machine Learning, Game Development

Archive

Monthly Archives: September 2016

Automatic Differentiation

“Weapons of Math Destruction” – An Alternative View