Skip to content

Generating Text Captions for Shotcut

  • by

Making the video editing workload much lighter

Shotcut is a Free (GPLv3) cross-platform video editor. I’ve been using it a couple of times lately to put some simple clips together (like sorting the Take 2 copyright claim GTA Online video).

I figured I’d use it to take a clip of my friends and I getting schooled by someone with a bomb lance in Hunt: Showdown.

Actually, my first thought was to write a script to put a clip together using MELT — based on JSON, of course — but on reflection for these I wanted something a bit more refined.

So, enter Shotcut. One of the things I was keen to include were text-based captions. I’ve been including these in gifs (example) for a while now, and I think they work really well for video. They can be informative, and sometimes funny!

Text in Shotcut is doable natively via filters: text, HTML etc. But this felt awkward to me- I’d rather have something directly visible in the timeline which is easy to manipulate; and to add filters to itself if it comes to it.

So I decided… to write a script to generate images with these captions, based on — yup! — JSON. I quickly thew together a JSON file for the dialogue in clip I wanted to caption:

{ captions: [                                                                                                       
        [ 0, close by here],                                                                                        
        [ 0, other side of this wall],                                                                                      [ 1, yep yep yep],                                                                                          
        [ 2, That was a Sparks! :o],                                                                                
        [ 0, ohhhh fudge],                                                                                          
        [ 0, I die to this],                                                                                        
        [ 0, GADDAMMITTT],                                                                                          
        [1, what was that?],                                                                                        
        [0, bomblance :(],                                                                                          
        [1, where?],                                                                                                
        [2, he's with me],                                                                                                  [2, :(],                                                                                                    
        [0, you've got one bullet left],                                                                            
        [0, maybe on top if he's got a bomblance?],                                                                 
        [1, good idea],                                                                                             
        [0, is that not him at the gate?],                                                                          
        [1, dunno where he is],                                                                                             [2, he's on our bodies],                                                                                    
        [1, I know...],                                                                                             
        [1, WHAT?! *panicflee*],                                                                                    
        [1, this is a bit difficult],                                                                               
        [1, fuq! :(],                                                                                               
        [1, I should have run again],                                                                               
        [1, oh well],                                                                                               
        [0, "gg wp Flakel, you beat us o7"]                                                                         
]                                                                                                                   
}

Simple! The numbers refer to speakers; 0 is the first, 1 = 2nd, 2 = 3rd. I didn’t actually need to zero-index speakers, and in fact I can use text strings to denote who is speaking, but writing numbers is quicker if there’s twenty-five captions to do.

The script, which I will throw up on GitHub, goes through this and generates the caption for each item in the list. It has assigned colours for each ‘speaker’.

Due to familiarity, I was going to use imagemagick. But I originally used Pillow as I wanted to [re]gain a bit of familiarity with that. Once I had [re]acquainted myself with the few bits I needed it was relatively straightforward to generate a cropped image with the text appropriately sized, coloured and stroked; but I found myself wanting a full 1920×1080 frame as this made the Shotcut workflow much quicker since there was no need to set position if the image was the same size as the source video.

So I changed Pillow/PIL out for imagemagick and subprocess and redid the whole thing in a few minutes. The imagemagick version is significantly slower, but not so slow as to be intolerable even when wanting to tweak a couple of the captions.

I’m quite happy with how it turned out:

The ‘automatic’ text sizing could use a little tweak!

Lessons learned:

  • using something you’re familiar with is often easier than learning something new
  • PIL is faster than imagemagick for generating simple text on a transparent background
  • bomb lancers can be pretty deadly

Tell us what's on your mind

Discover more from Rob's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading