Automating YouTube Uploads With OCR Part 8: Output

Nearly a working tool!

We’ve been using python and tesseract to OCR frames from a video footage of Deep Rock Galactic to extract metadata which we can use for putting the videos on YouTube.


Nearly all of the elements are captured, there’s just the mutators left to capture: warnings and anomalies. These appear in text form on the starting screen on either side of the mission block:

Here we have a Cave Leech Cluster and a Rich Atmosphere.

Since the text of these mutators is known to a list of ten or less for each, we can detect them using a wide box, then hard-casting them to whichever potential output it has the smallest Levenshtein distance to.

Tie-Breaking Frames

The loading/ending frame detection works well for most, but on the odd one or two it suffers. It’s best to ignore the frames which are completely/pretty dark (ie either transition or fade-in) , and the ones that are very bright (eg light flash) as that hurts contrast and so hurts OCR.

Using ImageStat from PIL we can grab the frame mean (averaged across RGB values), then normalise it to add to our frame scoring function in the detection routine.

We want to normalise between 0 and 1, which is easy to do if you want to scale linearly between 0 and 255 (RGB max value): just divide the average by 255. But we won’t want that. Manually looking at a few good, contrasty frames it seemed that the value of 75 was the best- even by 150 the frame was looking quite washed out. So we want to have a score of 0 at mean pixel value of 0 and 150; and a score of 1 at mean pixel value of 75:

# Tie break score graph should look something like:
# (tb_val)          
# |    /\            
# |   /  \           
# |  /    \          
# |_/      \_ (x)                
# 0    75    150                
# For sake of argument using 75 as goldilocks value
# ie not too dark, not too bright

75 is thus our ‘goldilocks’ value- not too dark, not too light. So our tiebreak value is:

tb_val = (goldilocks - (abs(goldilocks - frame_mean)))/goldilocks


Since we’ve gotten detection of the various elements to where we want them, we can start generating output. Our automated YT uploader works with JSON, and looks for the following fields: filename, title, description, tags, playlists, game, thumb ( ? time, title, additional), and scheduled.

Thumb time and additional we can safely ignore. Title is easy, as I use mission_type: mission_name. All of my Deep Rock Galactic uploads go into the one playlist. Tags are a bunch of things like hazard level, minerals, biome and some other common-to-all ones like “Deep Rock Galactic” (for game auto detection). The fun ones are description and scheduled.

Funnily enough, one of my earliest forays into javascript was a mad-libs style page which took the phrases via prompt() and put them in some text.

This was back in the days of IE4, and javascript wasn’t quite what it is today…

For the description, I took a bit of a “mad libs” style approach: use the various bits and pieces we’ve captured with a variety of linking verbs and phrases to give non-repetitive output. This mostly comes down to writing the phrases, sticking them in a bunch of lists and using random.choice() to pick one of them.

For obvious reasons, I don’t want to publish fifty-odd videos at once, rather spread them out over a period. I publish a couple of DRG videos on a Monday, Wednesday, Friday and at the weekend. To do this in python, I decided to use a generator, and call next() on it every time we need to populate the scheduled field. The function itself is fairly simple: if the time of scheduled_date is the earlier of the times at which I publish, go to the later one and return the full date; if it’s at the later time, increment by two days (if Monday/Wednesday), or one day and set the time to the earlier one.

We run this through json.dumps() and we have output! For example:

  "filename": "2019-10-17 19-41-38.mkv",
  "title": "Elimination: Illuminated Pocket",
  "description": "BertieB, Costello and graham get their orders from Mission Control and get dropped in to the Fungus Bogs to take on the mighty Dreadnoughts in Illuminated Pocket (Elimination)\n\nRecorded on 2019-10-17",
  "tags": [
    "Deep Rock Galactic",
    "Fungus Bogs",
    "Hazard 4",
    "Enor Pearl"
  "playlists": "Deep Rock Galactic",
  "game": "drg",
  "thumb": {
    "title": "Pocket Elimination"
  "scheduled": "2019-11-18 18:00"

Looks good!

Automating YouTube Uploads With OCR Part 4: Exploring Approaches To Improve Detection

My path in the woods diverged, and I took them all

We’ve been seeing if we can apply OCR to the loading screen of Deep Rock Galactic to generate metadata for YouTube uploads for automation.

Last time, we got a quick-and-dirty script that would pull out the various parts of one image successfully. Now we’d like to do that for any given loading screen- any number of dwarves, hazard level, biome, level mutators (which the original image lacked).

We picked nine loading screens to expand our detection to:

The results are mixed:

Starting DRG OCR...
             file                          names       mission_type    mission_name                       biome    hazard                                objective
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT     DEFECT CELL                  MAGMA CORE  HAZARD 3     COLLECT 6 EGGS\nCollect 25 Hollomite
1  tests/drg2.png                  [&l, [T, @&3]               IR e      OPEN TRICK  RADIOACTIVE EXCLUSION ZONE  HAZARD 3  (COLLECT 225 MORKITE\nCollect 10 Fossil
2  tests/drg3.png       [BertieB, L), MaoTheCat]    INING EXPEDITI!  URIFIED LEGAC)              GLACIAL STRATA  HAZARD 3   COLLECT 250 MORKITE\nCollect 10 Fossil
3  tests/drg4.png               [T, 3 Oz!, o\no]     VAGE OPERATION    HEALTHY WREC                  MAGMA CORE   LLrZtl]         SR RTINS\nCollect 15 Apoca Bloom
4  tests/drg5.png                [o383, (o383, ]        LT X g (o))    RAPID POCKET                    SALTPITS    HAZARD                  COLLECT 10 AQUARQS\n(=R
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATION      ANGRY LUCK               DENSE BIOZONE    HAZARD             NIV T\nCollect 20 Boolo Cap.
6  tests/drg7.png            [®29, @&28, T VL R]                      ANGER’S PRIZE                 FUNGUS BOGS    HAZARD     COLLECT 6 EGGS\nCollect 25 Hollomite
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION    BRIGHT JEWEL              GLACIAL STRATA    HAZARD            (eI VRS\nCollect 20 Boolo Cap
8  tests/drg9.png             [. ®29, (o], I ‘4]                          HIOELR DY  RADIOACTIVE EXCLUSION ZONE    LLYZU]     COLLECT 6 EGGS\nCollect 20 Boolo Cap

or in image form:

The mission type was a source of issue before for text detection, but looking at the generated crop boxes, it seems text is getting cut off, which will also affect the mission name detection as they are presented together.

When we started this, I knew the number of players would have an impact on the locations of the text for the player names. However, given only up to four players can play at once, it wouldn’t be too bad to write detection for the four possibilities. But if other text is moving, that gets messy very quickly.

We have a couple of options at this point:

  • enlarge the detection boxes for the longest/biggest text we have in the examples and see if that works across all of them
  • think about using something like OpenCV to do text ROI (region of interest) detection (eg as pyimagesearch does it)

The first seems like it could be done quicker than the second, so we’ll give that a try first. We’re still in the “what approach works” stage (aka the quick-and-dirty stage) here!

Unfortunately, the approach wasn’t quite successful. It’s possible that the particular frames we picked from each video had an impact, but that’s not something we can easily test around with our current setup. Let’s see about adding OpenCV to the mix…


We’re going to reuse the approach taken by Adrian on pyimagesearch as the work has been done for us, and see where that gets us.


Well, the short answer is: not as far as I had hoped!

The boxes it detects on a full image detects either too little or too much, though the latter could probably be helped by some video pixel averaging to blur the background and keep the text crisp. However it also splits on non-word boundaries. All of these problems can be worked around, but perhaps there’s another approach we can add to the mix?

Another Image

As well as a start screen, there’s also an end screen:

Another successful mission!

The information is presented slightly differently, but importantly i) it presents the info more uniformly ii) background noise looks like less of an issue. Let’s put this one through the paces we did for the loading screen.

Overall naive OCR pulls out names well but misses about everything else. Mission name: yes. Mission type: nope. Minerals: yes. Promising! Heck, we could even pull out mission time and total hazard bonus if we wanted.

Let’s put OpenCV on the back burner for the time being, and see what a combined approach using two images gets us.

             file                          names       mission_type                       biome      hazard     mission_name                     minerals
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT                  MAGMA CORE  HAZARD 3 -       OPEN TRICK                 F ATl\n\nEL]
1  tests/drg2.png                  [&l, [T, @&3]       INING EXPEDI  RADIOACTIVE EXCLUSION ZONE  HAZARD 3 -  PURIFIED LEGACY              RGN AL\n\n48 17
2  tests/drg3.png       [BertieB, L), MaoTheCat]  MINING EXPEDITIO|              GLACIAL STRATA  HAZARD 3 -  UNHEALTHY WRECK    MAGNITE 3 CROPPA\n\n39 -3
3  tests/drg4.png               [T, 3 Oz!, o\no]        AL ol 2N ()                  MAGMA CORE    HAZARD 4     RAPID POCKET       2 nli) |2 el 1T\n\n3 4
4  tests/drg5.png                [o383, (o383, ]   POINT EXTRACTION                    SALTPITS    HAZARD 4       ANGRY LUCK               BISMOR UMANITE
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATIO|               DENSE BIOZONE    HAZARD 4         I E Vi S             Tt AL v4\n\n3} 8
6  tests/drg7.png            [®29, @&28, T VL R]                                    FUNGUS BOGS    HAZARD 4  CECOND COMEBACK               S 6syTel) fivd
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION              GLACIAL STRATA    HAZARD 4    COLDSSAL DOOM       [IChley [ (e\n\n169 48
8  tests/drg9.png             [. ®29, (o], I ‘4]                     RADIOACTIVE EXCLUSION ZONE    HAZARD 4   TIRTIY TN T (3  COSLINCL IR MAGNITE\n\nX 3]

Improvement! We’re getting somewhere now, and we’ll see what we can do to clean the rest of it up using two images as a basis.