Archive

Tag Archives: programming

Motion capture is neat! There are a bunch of tools available now, iMotion, assorted Kinect libraries, etc. I’ve tried most of them but haven’t found anything that I particularly enjoy using. Most markerless systems suffer from drift, poor localization, jitter, and a host of other issues. Inertial motion capture units also have drift of their own and misbehave when there are metallic objects or magnetic fields nearby. I don’t think I’ll be able to get around any of the issues from the markerless problems unless I put together a fiducial system, but I can still try to make a useful piece of software.

Motion Capture Mk5 is a system that captures human poses and streams the data across the network to connected clients. The protocol should be absurdly simple so that a client can be written in almost any language in the span of a day. This makes the tool multi-purpose — it can be used to stream capture data to Blender for animation or VR Chat for real-time interaction or some other software for some other reason. Letting the application run on a network machine means if you, like me, have a living space that’s slightly constrained, you can capture from a volume that’s larger while your machine sits off to the side and out of the way.

What skills do I hope to practice along the way?

  • UI development in Rust with mixed 2D/3D
  • Computer-vision (if fiducials come into play)
  • Model integration into Rust via Tract or Candle (for depth estimation or pose estimation)
  • Rust networking and IPC

What kinds of deliverables will we see at the end?

I’d like to finish the month with an application that captures and sends the estimated positions and rotations of 13 joints. Ideally, the model will have depth estimation internally, too, so I can duplicate and modify the code into a SLAM tool. A Blender client or Godot client would not be out of the question.

Open Questions:

Should we start by using a pre-canned pose model? There’s a risk it will take more time to get a model building and integrated than it will to train from scratch, but it could be a good jumping off point.

Should we try finding fiducial markers? AprilTag-rs could be a good option, but it doesn’t build on Windows, last I checked.

I have a notebook with a backlog of projects on which to work. For August, I’d like something on the easier side since I’ll be moving for a good chunk of it and probably exhausted for the rest.

Possibility 1: Workout Calculator

Three to four times a week I find myself needing to throw together a workout plan. That means making a set of sets that cover the main muscles, let rest the ones that need to recover, and get the heartrate up. There are some existing solutions online but honestly they’re all apps that are covered in ads and they annoy the hell out of me. This would probably be a mobile-first app; I’m thinking written in Godot.

Possibility 2: Sketch to Animation

I would very much like to set up a model which will take a character sheet and a pose sketch to make a finalized posed frame for an animation. This would be similar to the “three-quarters view from front and side” project because I hate drawing 3/4ths views. This is mostly targeted at people that like animation but aren’t really keen on detailed character work. Let someone do the detailed work and let another person make animations, then combine their efforts automatically while removing the tedium of redrawing lots of the same content. Sure, there exist animation solutions already which cover a lot of this, but the idea still captures my attention.

Possibility 3: OpenFX Davinci Resolve Plugin

I’ve been meaning to put together a plugin for resolve so I can do something like implement thin-plate-splines (i.e., porting puppet warp from After Effects). I don’t think this would be a finished product in itself; I’d only want to get the .ofx file built and running in Resolve and showing some kind of IO. Could be as simple as ‘invert color’, as long as it runs.

Possibility 4: Easily Addressable Short Term Memory Networks (EAST MN?)

It’s been a while since I did anything with language models, and I’ve been thinking about options to make smaller networks that are more powerful. Transformers do not have any historical state (positional encoding passed as input doesn’t count), which is a blessing and a curse. I’m curious about whether it would be possible to add a memory component to transformers that’s addrassible in the same way attention is.

I have 36 hours left to decide. I’m leaning towards #3 or #4 — #3 because I like writing Rust and #4 because it would be useful and would sate my curiosity.

This month’s project was something of a success, though I’m late on the write-up. The suggestion came from my partner Lisa, who proposed something like it for our friends’ group.

GooglyEyeBot sits in a server and, with a configurable random probability, replies to a message and applies Googly Eyes. It can also be set to only apply googly eyes when tagged, which I’ve made the default.

Things that went well:

  • First version was very fast to produce. Used MediaPipe with the expectation that I’d find time to retrain a face model when the time came.
  • Discord.py is rather nice, even if “the right way” to do something is a little impenetrable at times. I found myself looking for how messages would connect with each other. I think Discord itself is moving through some changes, so it makes sense.

Things that were tricky:

  • I’m still not entirely sure how to determine if the user sending a command is an administrator of the server (guild).
  • Never got around to making my own face or training set.

Repo: https://github.com/JosephCatrambone/googlyeyebot