Automating YouTube Uploads With OCR Part 3: Programming with pytesseract and pillow

Last time, a bit of investigating showed that with a little cropping, tesseract can give good OCR results on a still of Deep Rock Galactic’s loading screen.

However, we were cropping manually, which defeats the purpose of this exercise, which is to automate metadata generation.

Thankfully, most of the operations we want to do are purely crops, so it’s straightforward to write a basic python script to get tesseract to recognise the right words.

Lets jump right in with something quick and dirty. The goal here is to get some useful output quickly, so we can confirm that the approach is viable; proper code architecturing comes later.

Starting DRG OCR...
['BertieB', 'graham', 'ksyme99']
POINT EXTRACTION
CLOUDED Joy
DENSE BIOZONE
HAZARD 3
COLLECT 7 AQUARQS
Collect 15 Apoca Bloom

We got nearly all of what we want from the image, except for the minerals which are pictographs, which tesseract to my knowledge doesn’t handle.

There was one gotcha though. While the mission type (Point Extraction) was handled fine when using the full-sized image, all the crop boxes I tried didn’t mange to OCR the text correctly. If I used a box which included the mission name, it read both okay; so it would have been possible to do a combined OCR and split on newline.

One of the techniques to get a more accurate result with tesseract is to surround a small box with a border, which gave the right result:

img_mission_type = ImageOps.expand(img.crop(mission_type_box), border=10, fill="white")                       
mission_type = pytesseract.image_to_string(img_mission_type) 

Our very quick-and-dirty script gets what we’re expecting. The next step is to clean it up and expand our testing base. We can also consider the final output – if we’re giving at set of images to improve the range it can deal with, we might we well get useful output from it!

We’ll start by adapting it to these nine images. The one at middle bottom might be an issue due to the exhaust form the drop ship changing the contrast quite significantly- either it’ll be made to work or we’ll have to choose a different frame from that video.

Running the script as-is on image 1 (top-left), we get:

Starting DRG OCR...
['graham', 'PR A', 'BertieB']
EGG HUNT
' DEFECT CELL
MAGMA CORE
HAZARD 3
COLLECT 6 EGGS
Collect 25 Hollomite

Not bad, but it’s tripped up on MaoTheCat and added an extra apostrophe to the mission name. Looking at the crop boxes, it seems one’s too high for the middle player, and the mission name box is getting just a tiny bit of the mission icon. Tweaking the boxes, we get:

Starting DRG OCR...
['graham', 'MaoTheCat', 'BertieB']
EGG HUNT
DEFECT CELL
MAGMA CORE
HAZARD 3
COLLECT 6 EGGS
Collect 25 Hollomite

And the output from the original image remains accurate too. We will continue this process for the rest of the test images and see where it takes us…

Tell us what's on your mind