Making the video editing workload much lighter
Shotcut is a Free (GPLv3) cross-platform video editor. I’ve been using it a couple of times lately to put some simple clips together (like sorting the Take 2 copyright claim GTA Online video).
I figured I’d use it to take a clip of my friends and I getting schooled by someone with a bomb lance in Hunt: Showdown.
Actually, my first thought was to write a script to put a clip together using MELT — based on JSON, of course — but on reflection for these I wanted something a bit more refined.
So, enter Shotcut. One of the things I was keen to include were text-based captions. I’ve been including these in gifs (example) for a while now, and I think they work really well for video. They can be informative, and sometimes funny!
Text in Shotcut is doable natively via filters: text, HTML etc. But this felt awkward to me- I’d rather have something directly visible in the timeline which is easy to manipulate; and to add filters to itself if it comes to it.
So I decided… to write a script to generate images with these captions, based on — yup! — JSON. I quickly thew together a JSON file for the dialogue in clip I wanted to caption:
{ captions: [
[ 0, close by here],
[ 0, other side of this wall], [ 1, yep yep yep],
[ 2, That was a Sparks! :o],
[ 0, ohhhh fudge],
[ 0, I die to this],
[ 0, GADDAMMITTT],
[1, what was that?],
[0, bomblance :(],
[1, where?],
[2, he's with me], [2, :(],
[0, you've got one bullet left],
[0, maybe on top if he's got a bomblance?],
[1, good idea],
[0, is that not him at the gate?],
[1, dunno where he is], [2, he's on our bodies],
[1, I know...],
[1, WHAT?! *panicflee*],
[1, this is a bit difficult],
[1, fuq! :(],
[1, I should have run again],
[1, oh well],
[0, "gg wp Flakel, you beat us o7"]
]
}
Simple! The numbers refer to speakers; 0 is the first, 1 = 2nd, 2 = 3rd. I didn’t actually need to zero-index speakers, and in fact I can use text strings to denote who is speaking, but writing numbers is quicker if there’s twenty-five captions to do.
The script, which I will throw up on GitHub, goes through this and generates the caption for each item in the list. It has assigned colours for each ‘speaker’.
Due to familiarity, I was going to use imagemagick. But I originally used Pillow as I wanted to [re]gain a bit of familiarity with that. Once I had [re]acquainted myself with the few bits I needed it was relatively straightforward to generate a cropped image with the text appropriately sized, coloured and stroked; but I found myself wanting a full 1920×1080 frame as this made the Shotcut workflow much quicker since there was no need to set position if the image was the same size as the source video.
So I changed Pillow/PIL out for imagemagick and subprocess
and redid the whole thing in a few minutes. The imagemagick version is significantly slower, but not so slow as to be intolerable even when wanting to tweak a couple of the captions.
I’m quite happy with how it turned out:
The ‘automatic’ text sizing could use a little tweak!
Lessons learned:
- using something you’re familiar with is often easier than learning something new
- PIL is faster than imagemagick for generating simple text on a transparent background
- bomb lancers can be pretty deadly