Shuffling datasets is a fairly standard fare operation in data science. Without it, we risk training a model on a moving target which gradually shifts over time. For most interesting datasets, we can’t fit all of the data in memory, but we’d still like to be able to access it at random. Sure, you can generate a random number and seek on disk to that location, but that doesn’t guarantee that the sought position will be one with a valid line. Jumping into the middle of a JSON blob and seeking to the end is bound to yield a bad time. Here is a brief solution:

class LineSeekableFile:
    def __init__(self, seekable):
        self.fin = seekable
        self.line_count = 0
        self.line_map = list() # Map from line index -> file position.
        while seekable.readline():
            self.line_count += 1
    def __getitem__(self, index):
        # NOTE: This assumes that you're not reading the file sequentially.  For that, just use 'for line in file'.[index])
        return self.fin.readline()

    def __len__():
        return self.line_count

It is available here as a Python Gist:

Wrapping up the end of the first day. There’s a chance I’ll do more tonight, but I’ve got to pack for my trip. Progress was faster than expected. I have characters on the screen and movement.

One hiccup I had today was unprojecting the from a screen click to a point in physics space. I was doing camera.unproject(blah blah) and couldn’t figure out why my Y-coordinate was flipped when I had correctly set my view. It seemed that whether or not I set y-down in my orthographic camera, as I clicked closer to the bottom of the frame, the larger the number got! It turns out if you’re attaching an InputListener to a STAGE object in libGDX, it will be called with correctly unprojected x and y values based on the current camera, so I was double unwrapping. I figured this out when I made my camera follow the player and had different values coming in while not moving the mouse. Important safety tip.

The fruits of today’s labor:


Indy Game: The Movie – See it.  Melodrama becomes irksome at points, but it does great things for inspiration.

Kii Keyboard: Download it. It’s everything you could want from a swipe keyboard without the frustrating pieces.  Works beautifully on a Droid 3 and doesn’t interfere when the physical keyboard is in use.

O’ Reilly HTML 5 and JavaScript Web Apps: Avoid it.  Nothing new or particularly valuable.

O’ Reilly Programming Computer Vision with Python: Buy it.  It’s a beautiful companion to the myriad theoretical books in the marketplace.  Be forewarned that it’s almost too applied, favoring an implementation to pseudo code.  If you know the math and want to see it used, there is no better source.