Automating YouTube Uploads With OCR Part 4: Exploring Approaches To Improve Detection

My path in the woods diverged, and I took them all

We’ve been seeing if we can apply OCR to the loading screen of Deep Rock Galactic to generate metadata for YouTube uploads for automation.

Last time, we got a quick-and-dirty script that would pull out the various parts of one image successfully. Now we’d like to do that for any given loading screen- any number of dwarves, hazard level, biome, level mutators (which the original image lacked).

We picked nine loading screens to expand our detection to:

The results are mixed:

Starting DRG OCR...
             file                          names       mission_type    mission_name                       biome    hazard                                objective
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT     DEFECT CELL                  MAGMA CORE  HAZARD 3     COLLECT 6 EGGS\nCollect 25 Hollomite
1  tests/drg2.png                  [&l, [T, @&3]               IR e      OPEN TRICK  RADIOACTIVE EXCLUSION ZONE  HAZARD 3  (COLLECT 225 MORKITE\nCollect 10 Fossil
2  tests/drg3.png       [BertieB, L), MaoTheCat]    INING EXPEDITI!  URIFIED LEGAC)              GLACIAL STRATA  HAZARD 3   COLLECT 250 MORKITE\nCollect 10 Fossil
3  tests/drg4.png               [T, 3 Oz!, o\no]     VAGE OPERATION    HEALTHY WREC                  MAGMA CORE   LLrZtl]         SR RTINS\nCollect 15 Apoca Bloom
4  tests/drg5.png                [o383, (o383, ]        LT X g (o))    RAPID POCKET                    SALTPITS    HAZARD                  COLLECT 10 AQUARQS\n(=R
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATION      ANGRY LUCK               DENSE BIOZONE    HAZARD             NIV T\nCollect 20 Boolo Cap.
6  tests/drg7.png            [®29, @&28, T VL R]                      ANGER’S PRIZE                 FUNGUS BOGS    HAZARD     COLLECT 6 EGGS\nCollect 25 Hollomite
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION    BRIGHT JEWEL              GLACIAL STRATA    HAZARD            (eI VRS\nCollect 20 Boolo Cap
8  tests/drg9.png             [. ®29, (o], I ‘4]                          HIOELR DY  RADIOACTIVE EXCLUSION ZONE    LLYZU]     COLLECT 6 EGGS\nCollect 20 Boolo Cap

or in image form:

The mission type was a source of issue before for text detection, but looking at the generated crop boxes, it seems text is getting cut off, which will also affect the mission name detection as they are presented together.

When we started this, I knew the number of players would have an impact on the locations of the text for the player names. However, given only up to four players can play at once, it wouldn’t be too bad to write detection for the four possibilities. But if other text is moving, that gets messy very quickly.

We have a couple of options at this point:

  • enlarge the detection boxes for the longest/biggest text we have in the examples and see if that works across all of them
  • think about using something like OpenCV to do text ROI (region of interest) detection (eg as pyimagesearch does it)

The first seems like it could be done quicker than the second, so we’ll give that a try first. We’re still in the “what approach works” stage (aka the quick-and-dirty stage) here!

Unfortunately, the approach wasn’t quite successful. It’s possible that the particular frames we picked from each video had an impact, but that’s not something we can easily test around with our current setup. Let’s see about adding OpenCV to the mix…

OpenCV

We’re going to reuse the approach taken by Adrian on pyimagesearch as the work has been done for us, and see where that gets us.

(…)

Well, the short answer is: not as far as I had hoped!

The boxes it detects on a full image detects either too little or too much, though the latter could probably be helped by some video pixel averaging to blur the background and keep the text crisp. However it also splits on non-word boundaries. All of these problems can be worked around, but perhaps there’s another approach we can add to the mix?

Another Image

As well as a start screen, there’s also an end screen:

Another successful mission!

The information is presented slightly differently, but importantly i) it presents the info more uniformly ii) background noise looks like less of an issue. Let’s put this one through the paces we did for the loading screen.

Overall naive OCR pulls out names well but misses about everything else. Mission name: yes. Mission type: nope. Minerals: yes. Promising! Heck, we could even pull out mission time and total hazard bonus if we wanted.

Let’s put OpenCV on the back burner for the time being, and see what a combined approach using two images gets us.

             file                          names       mission_type                       biome      hazard     mission_name                     minerals
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT                  MAGMA CORE  HAZARD 3 -       OPEN TRICK                 F ATl\n\nEL]
1  tests/drg2.png                  [&l, [T, @&3]       INING EXPEDI  RADIOACTIVE EXCLUSION ZONE  HAZARD 3 -  PURIFIED LEGACY              RGN AL\n\n48 17
2  tests/drg3.png       [BertieB, L), MaoTheCat]  MINING EXPEDITIO|              GLACIAL STRATA  HAZARD 3 -  UNHEALTHY WRECK    MAGNITE 3 CROPPA\n\n39 -3
3  tests/drg4.png               [T, 3 Oz!, o\no]        AL ol 2N ()                  MAGMA CORE    HAZARD 4     RAPID POCKET       2 nli) |2 el 1T\n\n3 4
4  tests/drg5.png                [o383, (o383, ]   POINT EXTRACTION                    SALTPITS    HAZARD 4       ANGRY LUCK               BISMOR UMANITE
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATIO|               DENSE BIOZONE    HAZARD 4         I E Vi S             Tt AL v4\n\n3} 8
6  tests/drg7.png            [®29, @&28, T VL R]                                    FUNGUS BOGS    HAZARD 4  CECOND COMEBACK               S 6syTel) fivd
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION              GLACIAL STRATA    HAZARD 4    COLDSSAL DOOM       [IChley [ (e\n\n169 48
8  tests/drg9.png             [. ®29, (o], I ‘4]                     RADIOACTIVE EXCLUSION ZONE    HAZARD 4   TIRTIY TN T (3  COSLINCL IR MAGNITE\n\nX 3]

Improvement! We’re getting somewhere now, and we’ll see what we can do to clean the rest of it up using two images as a basis.

Tell us what's on your mind