Automating YouTube Uploads With OCR Part 5: Refinements and Improving Accuracy

Having limited output possibilities helps immensely

We’ve been using pytesseract to help us OCR screen in Deep Rock Galactic to get metadata for YouTube uploads.

Last time we explored a number of approaches to get the output on the right track. We settled on using a second image from the end screen which had clearer text to augment the processing.

Colour Inversion

Let’s see if we can improve that further with box refinements and what the tesseract wiki suggests.

Yes:

             file                          names             mission_type                       biome      hazard        mission_name                      minerals
0  tests/drg1.png   [graham, MaoTheCat, BertieB]               1 EGG HUNT                  MAGMA CORE  HAZARD 3 -          OPEN TRICK       .ENDH PEARL UMANITE\n98
1  tests/drg2.png                  [&l, [T, @&3]      > miNiNG ExPeDITIBN  RADIOACTIVE EXCLUSION ZONE  HAZARD 3 -     PURIFIED LEGACY         � MAGNITE UMANITE\n17
2  tests/drg3.png       [BertieB, L), MaoTheCat]        MINING EXPEDITION              GLACIAL STRATA  HAZARD 3 -     UNHEALTHY WRECK          � MAGNITE CROPPA\n41
3  tests/drg4.png               [T, 3 Oz!, o\no]         ALVAGE OPERATION                  MAGMA CORE    HAZARD 4        RAPID POCKET      BISMOR ENOR PEARL\n22 24
4  tests/drg5.png                [o383, (o383, ]       ~ POINT EXTRACTION                    SALTPITS    HAZARD 4          ANGRY LUCK         BISMOR UMANITE\n94 19
5  tests/drg6.png  [BertieB, Costello, Noobface]        SALVAGE OPERATION               DENSE BIOZONE    HAZARD 4      RANGER'S PRIZE             � CROPPA JADIZ\n8
6  tests/drg7.png            [�29, @&28, T VL R]                | EGGHUNT                 FUNGUS BOGS    HAZARD 4     CECOND COMEBACK             �BISHUH JADIZ\na8
7  tests/drg8.png         [IR A )], Costello, T]  y\n\n MINING EXPEDITION              GLACIAL STRATA    HAZARD 4       COLDSSAL DOOM  � UMANITE ENOR PEARL\n169 48
8  tests/drg9.png             [. �29, (o], I �4]           EGG HUNT __ .l  RADIOACTIVE EXCLUSION ZONE    HAZARD 4  ILLUMINATED POCKET     .ENDH PEARL I MAGNITE\n29

Inverting the image to be black-on-white helps hugely. In fact, given many of the fields have very restricted possibilities, we probably have enough to work with, once we take care of variable number of names.

Handling Different Numbers of Players / Names

In DRG there are 1-4 players. My games are usually 3 or 4 players, sometimes 2, very very rarely solo. As the players names appear in different positions depending on the number of players we need to either

i) use fixed boxes for each number and see which one has sensible output

ii) use OpenCV to detect text to OCR

The first way is manageable in a relatively straightforward manner. Since there is a small number of regular players including myself, we can check for the presence of any of those in the output and keep it if it seems sensible.

Doing that gets us to:

There’s a bit of overdetection, particularly in the last row, which actually only had two players. We can clean things up by:

i) if a name is BertieB with anything else, it’s BertieB as my name doesn’t change (Note this may not be true for everyone- some folks like to change their username)

ii) non-alphanumeric names can be pruned

iii) names of 1-3 chars are likely noise detected as text*

* The last one could probably be dealt with by appropriate thresholding, but that’s a topic for another time.

Doing that, we get:

Which is a huge improvement. We could hard-lock the output to a subset of names (which 99% of my games are with), but that would be a headache to remember to check in the case of playing a game on a public server or people who want to join in my stream. This is “good enough” for the time being!

Levenshtein Distance

Using the Levenshtein distance – the number of edits needed to transform a string into another – we can compare the OCR’d text to the five mission types, and pick whichever is closest. We can do the same thing with the biomes, minerals, and mission names. It should work excellently for the first three as there are few choices; however it should still work acceptably well for the mission names, even though there are over a hundred first at last names.

Our code is simple:

def hard_cast_text(detected_text, choices):                                                                       
      """Hard cast detected_text to one of list of choices"""                                                       
      from Levenshtein import distance                                                                              
      distances = {}                                                                                                
                                                                                                                    
      for choice in choices:                                                                                        
          distances[choice] = distance(choice,lower(),                                                              
                                       detected_text.lower())                                                       
                                                                                                                    
      return min(distances, key=distances.get)

This could probably be made a one-liner if I thought long and hard enough about it. But we’re here to automate, not golf python.

The minerals needed a little extra to handle enor pearl being two words and certain detections being closer in Levenshtein distance to, say, jadiz. Another scoring system that weights the beginning of strings more heavily may have helped there, but keeping it to Levenshtein means I can strip out the external library and implement my own if I so wish.

Our output for these nine tests looks good:

             file                                  names       mission_type                       biome    hazard        mission_name               minerals
0  tests/drg1.png           [graham, MaoTheCat, BertieB]           Egg Hunt                  Magma Core  Hazard 3          Open Trick  [Umanite, Enor Pearl]
1  tests/drg2.png  [BertieB, graham, MaoTheCat, ksyme99]  Mining Expedition  Radioactive Exclusion Zone  Hazard 3     Purified Legacy     [Magnite, Umanite]
2  tests/drg3.png           [BertieB, graham, MaoTheCat]  Mining Expedition              Glacial Strata  Hazard 3     Unhealthy Wreck      [Croppa, Magnite]
3  tests/drg4.png                    [BertieB, Costello]  Salvage Operation                  Magma Core  Hazard 4        Rapid Pocket   [Bismor, Enor Pearl]
4  tests/drg5.png  [BertieB, graham, Noobface, Costello]   Point Extraction                   Salt Pits  Hazard 4          Angry Luck      [Bismor, Umanite]
5  tests/drg6.png          [BertieB, Costello, Noobface]  Salvage Operation               Dense Biozone  Hazard 4      Ranger's Prize        [Jadiz, Croppa]
6  tests/drg7.png           [BertieB, Costello, bTRRABN]           Egg Hunt                 Fungus Bogs  Hazard 4     Second Comeback        [Bismor, Jadiz]
7  tests/drg8.png            [BertieB, Costello, graham]  Mining Expedition              Glacial Strata  Hazard 4       Colossal Doom  [Umanite, Enor Pearl]
8  tests/drg9.png                    [BertieB, Costello]           Egg Hunt  Radioactive Exclusion Zone  Hazard 4  Illuminated Pocket  [Magnite, Enor Pearl]

Next step? Further automation, of course!

Automating YouTube Uploads With OCR Part 5: Refinements and Improving Accuracy

Colour Inversion

Handling Different Numbers of Players / Names

Levenshtein Distance

Related Posts

Leave a Reply

Automating YouTube Uploads With OCR Part 5: Refinements and Improving Accuracy

Colour Inversion

Handling Different Numbers of Players / Names

Levenshtein Distance

Related Posts

Leave a Reply

Discover more from Rob's Blog