Archive

Status

Motion capture is neat! There are a bunch of tools available now, iMotion, assorted Kinect libraries, etc. I’ve tried most of them but haven’t found anything that I particularly enjoy using. Most markerless systems suffer from drift, poor localization, jitter, and a host of other issues. Inertial motion capture units also have drift of their own and misbehave when there are metallic objects or magnetic fields nearby. I don’t think I’ll be able to get around any of the issues from the markerless problems unless I put together a fiducial system, but I can still try to make a useful piece of software.

Motion Capture Mk5 is a system that captures human poses and streams the data across the network to connected clients. The protocol should be absurdly simple so that a client can be written in almost any language in the span of a day. This makes the tool multi-purpose — it can be used to stream capture data to Blender for animation or VR Chat for real-time interaction or some other software for some other reason. Letting the application run on a network machine means if you, like me, have a living space that’s slightly constrained, you can capture from a volume that’s larger while your machine sits off to the side and out of the way.

What skills do I hope to practice along the way?

  • UI development in Rust with mixed 2D/3D
  • Computer-vision (if fiducials come into play)
  • Model integration into Rust via Tract or Candle (for depth estimation or pose estimation)
  • Rust networking and IPC

What kinds of deliverables will we see at the end?

I’d like to finish the month with an application that captures and sends the estimated positions and rotations of 13 joints. Ideally, the model will have depth estimation internally, too, so I can duplicate and modify the code into a SLAM tool. A Blender client or Godot client would not be out of the question.

Open Questions:

Should we start by using a pre-canned pose model? There’s a risk it will take more time to get a model building and integrated than it will to train from scratch, but it could be a good jumping off point.

Should we try finding fiducial markers? AprilTag-rs could be a good option, but it doesn’t build on Windows, last I checked.

I have a notebook with a backlog of projects on which to work. For August, I’d like something on the easier side since I’ll be moving for a good chunk of it and probably exhausted for the rest.

Possibility 1: Workout Calculator

Three to four times a week I find myself needing to throw together a workout plan. That means making a set of sets that cover the main muscles, let rest the ones that need to recover, and get the heartrate up. There are some existing solutions online but honestly they’re all apps that are covered in ads and they annoy the hell out of me. This would probably be a mobile-first app; I’m thinking written in Godot.

Possibility 2: Sketch to Animation

I would very much like to set up a model which will take a character sheet and a pose sketch to make a finalized posed frame for an animation. This would be similar to the “three-quarters view from front and side” project because I hate drawing 3/4ths views. This is mostly targeted at people that like animation but aren’t really keen on detailed character work. Let someone do the detailed work and let another person make animations, then combine their efforts automatically while removing the tedium of redrawing lots of the same content. Sure, there exist animation solutions already which cover a lot of this, but the idea still captures my attention.

Possibility 3: OpenFX Davinci Resolve Plugin

I’ve been meaning to put together a plugin for resolve so I can do something like implement thin-plate-splines (i.e., porting puppet warp from After Effects). I don’t think this would be a finished product in itself; I’d only want to get the .ofx file built and running in Resolve and showing some kind of IO. Could be as simple as ‘invert color’, as long as it runs.

Possibility 4: Easily Addressable Short Term Memory Networks (EAST MN?)

It’s been a while since I did anything with language models, and I’ve been thinking about options to make smaller networks that are more powerful. Transformers do not have any historical state (positional encoding passed as input doesn’t count), which is a blessing and a curse. I’m curious about whether it would be possible to add a memory component to transformers that’s addrassible in the same way attention is.

I have 36 hours left to decide. I’m leaning towards #3 or #4 — #3 because I like writing Rust and #4 because it would be useful and would sate my curiosity.

HuggingFace Jam 2023 happened to coincide with this month’s project. While I’d always intended to have the plan in place by the third day so I can work on it, practicality beats purity.

The theme of the jam is “expand”, which I’m going to take to mean “expand your skills”. A long time ago I wrote “The Rage of Painting” and I’m still proud of what I’ve done there. I’d like to rebuild that game to run on mobile and web platforms, and to use more of the latest advancements in computer vision.

The jam is 48 hours long, but 16 of that will be sleeping for me, at least, then add 12 hours for everything else I have to do. 20 hours of time budget.

4 hours: get a screen in Godot with the ability to draw to it and a variety of colors.

4 hours: get a text UI up and have the tutor give the prompts. (No judging yet.)

4 hours: polish the assets and work off the rough edges. Playable build uploaded to itch.io.

4 hours: try and run uploads of the image to a service for ‘judging’.

That’s 16 hours. The remaining 4 I’m going to write off as buffer.

Let’s gooooo.

Hour 1:

My first big decision is how to actually draw the points to the screen. While I’ve historically used a TextureRect and set pixels based on user mouse coordinates, I’m wondering if I should perhaps try and use a shader here to be faster and more memory efficient. I’m going to spend 30 minutes trying to figure it out before I fall back to the dumb solution (since I have some afternoon obligations).

Hour 2-3:

Unfortunately, using a shader to do the drawing was a no-go. I can’t write to the texture buffer from the shader unless I use a compute shader, and the web build doesn’t support compute shaders. Fortunately, using Image and blitting a brush works fairly well. I’m trying not to overengineer the brushes by adding support for movement speed, brush dynamics, etc. Perhaps I’ll come back and add that.

Hour 4:

I did end up adding some interpolation to the brush to help with smoothness. I thought there might be an issue with the number of substeps, but it still performs okay. I did a build and uploaded it to itch.io to make sure everything was fine on the web and it works even better there.

Hours 5-48:

Things have developed, not necessarily for the better. A planned visit to scout a new apartment and short stay at a birthday party turned into a full day and a half adventure. My Sundays are spoken for already, leaving no time to complete most anything. This was perhaps a complete failure.

I found myself also unable to get motivated enough to carry on with the project through the rest of the month, between disenchantment, apartment hunting, and life in general. I’m going to call July a failure and start again.