Having limited output possibilities helps immensely
We’ve been using pytesseract to help us OCR screen in Deep Rock Galactic to get metadata for YouTube uploads.
Last time we explored a number of approaches to get the output on the right track. We settled on using a second image from the end screen which had clearer text to augment the processing.
Colour Inversion
Let’s see if we can improve that further with box refinements and what the tesseract wiki suggests.
Yes:
file names mission_type biome hazard mission_name minerals
0 tests/drg1.png [graham, MaoTheCat, BertieB] 1 EGG HUNT MAGMA CORE HAZARD 3 - OPEN TRICK .ENDH PEARL UMANITE\n98
1 tests/drg2.png [&l, [T, @&3] > miNiNG ExPeDITIBN RADIOACTIVE EXCLUSION ZONE HAZARD 3 - PURIFIED LEGACY MAGNITE UMANITE\n17
2 tests/drg3.png [BertieB, L), MaoTheCat] MINING EXPEDITION GLACIAL STRATA HAZARD 3 - UNHEALTHY WRECK MAGNITE CROPPA\n41
3 tests/drg4.png [T, 3 Oz!, o\no] ALVAGE OPERATION MAGMA CORE HAZARD 4 RAPID POCKET BISMOR ENOR PEARL\n22 24
4 tests/drg5.png [o383, (o383, ] ~ POINT EXTRACTION SALTPITS HAZARD 4 ANGRY LUCK BISMOR UMANITE\n94 19
5 tests/drg6.png [BertieB, Costello, Noobface] SALVAGE OPERATION DENSE BIOZONE HAZARD 4 RANGER'S PRIZE CROPPA JADIZ\n8
6 tests/drg7.png [29, @&28, T VL R] | EGGHUNT FUNGUS BOGS HAZARD 4 CECOND COMEBACK BISHUH JADIZ\na8
7 tests/drg8.png [IR A )], Costello, T] y\n\n MINING EXPEDITION GLACIAL STRATA HAZARD 4 COLDSSAL DOOM UMANITE ENOR PEARL\n169 48
8 tests/drg9.png [. 29, (o], I 4] EGG HUNT __ .l RADIOACTIVE EXCLUSION ZONE HAZARD 4 ILLUMINATED POCKET .ENDH PEARL I MAGNITE\n29
Inverting the image to be black-on-white helps hugely. In fact, given many of the fields have very restricted possibilities, we probably have enough to work with, once we take care of variable number of names.
Handling Different Numbers of Players / Names
In DRG there are 1-4 players. My games are usually 3 or 4 players, sometimes 2, very very rarely solo. As the players names appear in different positions depending on the number of players we need to either
i) use fixed boxes for each number and see which one has sensible output
ii) use OpenCV to detect text to OCR
The first way is manageable in a relatively straightforward manner. Since there is a small number of regular players including myself, we can check for the presence of any of those in the output and keep it if it seems sensible.
Doing that gets us to:
There’s a bit of overdetection, particularly in the last row, which actually only had two players. We can clean things up by:
i) if a name is BertieB with anything else, it’s BertieB as my name doesn’t change (Note this may not be true for everyone- some folks like to change their username)
ii) non-alphanumeric names can be pruned
iii) names of 1-3 chars are likely noise detected as text*
* The last one could probably be dealt with by appropriate thresholding, but that’s a topic for another time.
Doing that, we get:
Which is a huge improvement. We could hard-lock the output to a subset of names (which 99% of my games are with), but that would be a headache to remember to check in the case of playing a game on a public server or people who want to join in my stream. This is “good enough” for the time being!
Levenshtein Distance
Using the Levenshtein distance – the number of edits needed to transform a string into another – we can compare the OCR’d text to the five mission types, and pick whichever is closest. We can do the same thing with the biomes, minerals, and mission names. It should work excellently for the first three as there are few choices; however it should still work acceptably well for the mission names, even though there are over a hundred first at last names.
Our code is simple:
def hard_cast_text(detected_text, choices):
"""Hard cast detected_text to one of list of choices"""
from Levenshtein import distance
distances = {}
for choice in choices:
distances[choice] = distance(choice,lower(),
detected_text.lower())
return min(distances, key=distances.get)
This could probably be made a one-liner if I thought long and hard enough about it. But we’re here to automate, not golf python.
The minerals needed a little extra to handle enor pearl being two words and certain detections being closer in Levenshtein distance to, say, jadiz. Another scoring system that weights the beginning of strings more heavily may have helped there, but keeping it to Levenshtein means I can strip out the external library and implement my own if I so wish.
Our output for these nine tests looks good:
file names mission_type biome hazard mission_name minerals
0 tests/drg1.png [graham, MaoTheCat, BertieB] Egg Hunt Magma Core Hazard 3 Open Trick [Umanite, Enor Pearl]
1 tests/drg2.png [BertieB, graham, MaoTheCat, ksyme99] Mining Expedition Radioactive Exclusion Zone Hazard 3 Purified Legacy [Magnite, Umanite]
2 tests/drg3.png [BertieB, graham, MaoTheCat] Mining Expedition Glacial Strata Hazard 3 Unhealthy Wreck [Croppa, Magnite]
3 tests/drg4.png [BertieB, Costello] Salvage Operation Magma Core Hazard 4 Rapid Pocket [Bismor, Enor Pearl]
4 tests/drg5.png [BertieB, graham, Noobface, Costello] Point Extraction Salt Pits Hazard 4 Angry Luck [Bismor, Umanite]
5 tests/drg6.png [BertieB, Costello, Noobface] Salvage Operation Dense Biozone Hazard 4 Ranger's Prize [Jadiz, Croppa]
6 tests/drg7.png [BertieB, Costello, bTRRABN] Egg Hunt Fungus Bogs Hazard 4 Second Comeback [Bismor, Jadiz]
7 tests/drg8.png [BertieB, Costello, graham] Mining Expedition Glacial Strata Hazard 4 Colossal Doom [Umanite, Enor Pearl]
8 tests/drg9.png [BertieB, Costello] Egg Hunt Radioactive Exclusion Zone Hazard 4 Illuminated Pocket [Magnite, Enor Pearl]
Next step? Further automation, of course!