cool programming

AI Dungeon 2 Is Fun Nonsense

Apparently, there’s something about Mary

It’s night. I’m a private detective from Chicago named Joseph, on the hunt for someone named Jim, and I have a gun and a badge. I’m in the woods, and I hear some noise from behind the trees. Suddenly an old man shoots an arrow from a bow at a hitherto-unseen target. He runs off, but I catch up with him and ask his name. It turns out that he’s also a detective from Chicago named John, and he’s also hot on the trail of Jim too.

I ask “How did you know my name?” and he replies, succinctly: “Because we’re both detectives.” I try to discuss the case with him, but he refuses to be drawn on it, preferring to cryptically state “I’m sure we’ll have some clues soon enough”.

We come across a small house in the woods, and I venture inside. A woman sits, reading quietly. I ask her about Jim, but she only says that he left long ago. I make a note of the house and return the next day without John. I look around and find some white socks and black pants. Ah-ha! These are crucial to the case. I put them on immediately. Surely it’s now only a matter of time before I find Jim.

I find only a shack, in which a single light bulb illuminates a strange assortment of books and papers with diagrams.

I go back outside, and see John, the other detective watching me cautiously. Clearly he’s jealous of my new socks and pants. He disappears into the woods. I run after him but find only a shack, in which a single light bulb illuminates a strange assortment of books and papers with diagrams. I picture Jim with this:

Combing through the strange lot of papers, I find one that might help my case! It’s a drawing. A drawing of a man in front of a tree. He has a hat, and the hat has horns. His eyes are wide open and staring at me.

This is Jim!

I find the tree in the drawing. It’s odd. It isn’t right. It seems to be made of wood, but it has cracks all over and seems as if it was never alive in the first place. Maybe it has Jim inside it? In any case it isn’t right. It has to go.

I break the tree apart, fling a piece at a nearby wall, which thuds, then silence.

The next day, I come home and see that everything is gone.

The above is how my first dabble with AI Dungeon 2 started. I was linked to it without context, so had no preconceptions going in. it all started off somewhat normally, I wondered if it was some kind of randomly-generated MUD (a old text-based system predating popular MMORPGs that let users create text-based worlds and interact with one another. But as things got slowly more odd it seemed like it was something else. It had the slightly weird, funny cadence that computer-generated text has.

I had come close to finding Jim. The house, the pants, the drawing in the shack, and the tree. They all fitted together, and I knew I must be close. I returned to the woods.

Thereupon I chanced on a woman sitting on a rock, crying. She explained that her sister Mary had gone missing only the night before. Perhaps Jim had a hand in this. I tried to explain the situation as best I could, but this only upset her more. So instead, I gave her a hug. This calmed her down, perhaps too much. She fell to the ground. She needed to be somewhere safe, but where? Ah! The shack! I carry her there.

Going in, I find a man dressed in an old coat and wearing glasses. He has long white hair that hangs down to his shoulders. His eyes are closed and he seems very tired looking. What the heck is he doing there? I demand to know his name.

“My name is James, but everyone calls me Jack.” Joseph, John, Jim, James, Jack… Wait! James? As in the unshortened form of Jim..? I have to think on my feet, and decide to act quickly.

“Where is Mary?”

I’ve got him now. Or so I think. But the man just sighs and shakes his head. He thinks he’s won. But I’m Joseph, a detective from Chicago. And Chicago detectives know how to roll with the punches, literally and figuratively. I decide to roll with this one and throw him off balance. I drop my voice, lean in close and growl:

“Where is Jim?”

“Mary..? She left with another guy named John.”

He yawns and rubs his eyes. He looks tired too. But he knows I’ve got him. “Mary… Jim… Where is Mary?” He’s trying to throw me, but he didn’t reckon with my Windy City credentials. He coughs and then speaks. “She left with another guy named John.”


The one thing I wasn’t expecting. The one man I didn’t suspect.

Time for action. Mary and John can wait, but Jim’s my case and he has questions to answer. I grab Jim by the collar and pull him from behind the desk. He puts up a brief resistance, but he isn’t strong enough to break free. Up against the wall he goes, and I cuff his hands together behind his back. Time to take him downtown.

I’ve long enjoyed the output of Markov chains. They are some relatively simple procedures for generating sequences based on previous values and frequencies. You can apply this to text, and generate new text based on frequencies of letters, or words.

The old resources I used to learn about Markov Chains way back when have somewhat stuck in my head. I recall a reference to ‘Alice in Elsinore’; and that can be found at a page called ‘Fun with Markov Chains‘. There’s another bit which went into the varying lengths, how short lengths — say, one to three characters — produced gibberish that kinda almost looked like it might have been English once; and longer lengths gradually come closer and closer to the original text[s]. That seems to have been part of Programming Pearls, which used to be available to read online; I only managed to find part of that section archived on Jeff Atwood’s blog by use of some judicious Google search tools.

You can create some fun things with Markov chains. The examples given above included a generated Alice in Elsinore and the Revelation of Alice. I implemented Markov chain text generation as a command for an IRC bot that I wrote, which could talk in the ‘voice’ of my friends that hung out on there; that command was definitely my favourite.

I implemented Markov chain text generation as a command for an IRC bot that I wrote, which could talk in the ‘voice’ of my friends that hung out on there. That command was definitely my favourite.

Latterly, we’ve seen a resurgence in this with the rise in ‘AI’. Such as this ‘AI-written Harry potter fanfiction’

Harry Potter and the Portrait of What Looked Like a Large Pile of Ash
Hungry indeed

or less child-friendly things, like Trump speeches:

But calling any of this ‘AI’ is a stretch. It’s picking things based on random chance and frequency. If I have a sock drawer with thirty red socks, six green and two blue I’d be… a bit boring. But if I closed my eyes and picked socks from there, it would be a bit misleading to write an article saying “I got an AI to choose my clothes for the week and these are the results”.

But I digress.

Having brought in Jim, my attention must turn to Mary. Her sister was counting on me. I trusted my Chicago detective instincts and followed up on a lead that Jim spilled during his interrogation.

I went to the park. There I met two men, Mikey and Brenda. Apparently, they didn’t get along. I knew Mikey was hiding something, and decided to find out what it was. I dragged him into an alleyway, shoved my knee into his back, and started punching him.

I knew Mikey was hiding something, and decided to find out what it was. I dragged him into an alleyway, shoved my knee into his back, and started punching him.

Good Cop time was over, now it’s Bad Cop’s shift.

Mikey pleaded with me for mercy, this was all a misunderstanding, help would be forthcoming, he didn’t want to die, etc. I told him to shut up.

Where is Jim?” I asked in the same voice I used on Jim earlier… Wait, wait. Wasn’t Jim at the police station? “Oh, that’s right,” Mikey says. “He went home for the day.” I was confused, but went along with it. “Oh, good”. But then Mikey had a surprise for me. He grabbed me, threatened me and apologised. I sensed that Jim was a touchy subject best left alone, so asked about Mary.

“Mary?” Mikey asks. “Who’s Mary?” I explained about the woman’s missing sister. “What about her?” Mikey enquires further. But at that point we spot mart coming out of a store. I approach Mary, and she looks surprised to see me.

“Hey, you’re not my brother anymore,” Mary says. “Are…are you?”

Apparently she recognised me. I ask about her sister and Mary explains she’s at work.

At this point I realise something weird is going on. Sounds seem muffled, colours aren’t quite right, and time and place seem strangely elastic.

I thought perhaps AI Dungeon 2 was a bit like Sleep Is Death (Geisterfahrer) by Jason Rohrer, where the stories are written by players; or Cleverbot, where responses given by people are saved and can be reused.

But AI Dungeon 2 instead uses deep learning techniques to keep generating content, no matter what is thrown at it. It does have limitations, but it’s an interesting concept sprung from a Hackathon.

Best bit? It’s Free Software, MIT licensed! Check out its Github!

Things were getting weird. I tried to dance with Mary, which seemed like the thing to do at the time. She stared at me, but not in an uncomfortable way. I tried a backflip, and it ended with us falling asleep together1. Then I had to run away, far away; away from the voices shouting that we’re not sisters.

A group of men accosted me. They looked like they had been drinking heavily. I had to keep the initiative; my detective instincts took over and I slapped one of the men. It surprised the group. I slapped another one and it surprised them identically. But they started to beat me, which I guess was inevitable.

I tried everything to distract them. The harmonica, juggling, telling a joke. Fortunately, the last one worked. Unfortunately, at that moment a helicopter landed and I was kidnapped. Mary tried to rescue me, but the jailer was having none of her please for mercy or bribes. Eventually, he tired of the conversation and wandered off into the woods, and Mary went all Bastille day on the prisoners.

The narrative was based on my first interaction with AI Dungeon 2, which can be read in full.


automation computer vision programming python

Automating YouTube Uploads With OCR Part 8: Output

Nearly a working tool!

We’ve been using python and tesseract to OCR frames from a video footage of Deep Rock Galactic to extract metadata which we can use for putting the videos on YouTube.


Nearly all of the elements are captured, there’s just the mutators left to capture: warnings and anomalies. These appear in text form on the starting screen on either side of the mission block:

Here we have a Cave Leech Cluster and a Rich Atmosphere.

Since the text of these mutators is known to a list of ten or less for each, we can detect them using a wide box, then hard-casting them to whichever potential output it has the smallest Levenshtein distance to.

Tie-Breaking Frames

The loading/ending frame detection works well for most, but on the odd one or two it suffers. It’s best to ignore the frames which are completely/pretty dark (ie either transition or fade-in) , and the ones that are very bright (eg light flash) as that hurts contrast and so hurts OCR.

Using ImageStat from PIL we can grab the frame mean (averaged across RGB values), then normalise it to add to our frame scoring function in the detection routine.

We want to normalise between 0 and 1, which is easy to do if you want to scale linearly between 0 and 255 (RGB max value): just divide the average by 255. But we won’t want that. Manually looking at a few good, contrasty frames it seemed that the value of 75 was the best- even by 150 the frame was looking quite washed out. So we want to have a score of 0 at mean pixel value of 0 and 150; and a score of 1 at mean pixel value of 75:

# Tie break score graph should look something like:
# (tb_val)          
# |    /\            
# |   /  \           
# |  /    \          
# |_/      \_ (x)                
# 0    75    150                
# For sake of argument using 75 as goldilocks value
# ie not too dark, not too bright

75 is thus our ‘goldilocks’ value- not too dark, not too light. So our tiebreak value is:

tb_val = (goldilocks - (abs(goldilocks - frame_mean)))/goldilocks


Since we’ve gotten detection of the various elements to where we want them, we can start generating output. Our automated YT uploader works with JSON, and looks for the following fields: filename, title, description, tags, playlists, game, thumb ( ? time, title, additional), and scheduled.

Thumb time and additional we can safely ignore. Title is easy, as I use mission_type: mission_name. All of my Deep Rock Galactic uploads go into the one playlist. Tags are a bunch of things like hazard level, minerals, biome and some other common-to-all ones like “Deep Rock Galactic” (for game auto detection). The fun ones are description and scheduled.

Funnily enough, one of my earliest forays into javascript was a mad-libs style page which took the phrases via prompt() and put them in some text.

This was back in the days of IE4, and javascript wasn’t quite what it is today…

For the description, I took a bit of a “mad libs” style approach: use the various bits and pieces we’ve captured with a variety of linking verbs and phrases to give non-repetitive output. This mostly comes down to writing the phrases, sticking them in a bunch of lists and using random.choice() to pick one of them.

For obvious reasons, I don’t want to publish fifty-odd videos at once, rather spread them out over a period. I publish a couple of DRG videos on a Monday, Wednesday, Friday and at the weekend. To do this in python, I decided to use a generator, and call next() on it every time we need to populate the scheduled field. The function itself is fairly simple: if the time of scheduled_date is the earlier of the times at which I publish, go to the later one and return the full date; if it’s at the later time, increment by two days (if Monday/Wednesday), or one day and set the time to the earlier one.

We run this through json.dumps() and we have output! For example:

  "filename": "2019-10-17 19-41-38.mkv",
  "title": "Elimination: Illuminated Pocket",
  "description": "BertieB, Costello and graham get their orders from Mission Control and get dropped in to the Fungus Bogs to take on the mighty Dreadnoughts in Illuminated Pocket (Elimination)\n\nRecorded on 2019-10-17",
  "tags": [
    "Deep Rock Galactic",
    "Fungus Bogs",
    "Hazard 4",
    "Enor Pearl"
  "playlists": "Deep Rock Galactic",
  "game": "drg",
  "thumb": {
    "title": "Pocket Elimination"
  "scheduled": "2019-11-18 18:00"

Looks good!

automation ocr programming python

Automating YouTube Uploads With OCR Part 5: Refinements and Improving Accuracy

Having limited output possibilities helps immensely

We’ve been using pytesseract to help us OCR screen in Deep Rock Galactic to get metadata for YouTube uploads.

Last time we explored a number of approaches to get the output on the right track. We settled on using a second image from the end screen which had clearer text to augment the processing.

Colour Inversion

Let’s see if we can improve that further with box refinements and what the tesseract wiki suggests.


             file                          names             mission_type                       biome      hazard        mission_name                      minerals
0  tests/drg1.png   [graham, MaoTheCat, BertieB]               1 EGG HUNT                  MAGMA CORE  HAZARD 3 -          OPEN TRICK       .ENDH PEARL UMANITE\n98
1  tests/drg2.png                  [&l, [T, @&3]      > miNiNG ExPeDITIBN  RADIOACTIVE EXCLUSION ZONE  HAZARD 3 -     PURIFIED LEGACY         ‘ MAGNITE UMANITE\n17
2  tests/drg3.png       [BertieB, L), MaoTheCat]        MINING EXPEDITION              GLACIAL STRATA  HAZARD 3 -     UNHEALTHY WRECK          ‘ MAGNITE CROPPA\n41
3  tests/drg4.png               [T, 3 Oz!, o\no]         ALVAGE OPERATION                  MAGMA CORE    HAZARD 4        RAPID POCKET      BISMOR ENOR PEARL\n22 24
4  tests/drg5.png                [o383, (o383, ]       ~ POINT EXTRACTION                    SALTPITS    HAZARD 4          ANGRY LUCK         BISMOR UMANITE\n94 19
5  tests/drg6.png  [BertieB, Costello, Noobface]        SALVAGE OPERATION               DENSE BIOZONE    HAZARD 4      RANGER'S PRIZE             ‘ CROPPA JADIZ\n8
6  tests/drg7.png            [®29, @&28, T VL R]                | EGGHUNT                 FUNGUS BOGS    HAZARD 4     CECOND COMEBACK             ‘BISHUH JADIZ\na8
7  tests/drg8.png         [IR A )], Costello, T]  y\n\n MINING EXPEDITION              GLACIAL STRATA    HAZARD 4       COLDSSAL DOOM  ‘ UMANITE ENOR PEARL\n169 48
8  tests/drg9.png             [. ®29, (o], I ‘4]           EGG HUNT __ .l  RADIOACTIVE EXCLUSION ZONE    HAZARD 4  ILLUMINATED POCKET     .ENDH PEARL I MAGNITE\n29

Inverting the image to be black-on-white helps hugely. In fact, given many of the fields have very restricted possibilities, we probably have enough to work with, once we take care of variable number of names.

Handling Different Numbers of Players / Names

In DRG there are 1-4 players. My games are usually 3 or 4 players, sometimes 2, very very rarely solo. As the players names appear in different positions depending on the number of players we need to either

i) use fixed boxes for each number and see which one has sensible output

ii) use OpenCV to detect text to OCR

The first way is manageable in a relatively straightforward manner. Since there is a small number of regular players including myself, we can check for the presence of any of those in the output and keep it if it seems sensible.

Doing that gets us to:

There’s a bit of overdetection, particularly in the last row, which actually only had two players. We can clean things up by:

i) if a name is BertieB with anything else, it’s BertieB as my name doesn’t change (Note this may not be true for everyone- some folks like to change their username)

ii) non-alphanumeric names can be pruned

iii) names of 1-3 chars are likely noise detected as text*

* The last one could probably be dealt with by appropriate thresholding, but that’s a topic for another time.

Doing that, we get:

Which is a huge improvement. We could hard-lock the output to a subset of names (which 99% of my games are with), but that would be a headache to remember to check in the case of playing a game on a public server or people who want to join in my stream. This is “good enough” for the time being!

Levenshtein Distance

Using the Levenshtein distance – the number of edits needed to transform a string into another – we can compare the OCR’d text to the five mission types, and pick whichever is closest. We can do the same thing with the biomes, minerals, and mission names. It should work excellently for the first three as there are few choices; however it should still work acceptably well for the mission names, even though there are over a hundred first at last names.

Our code is simple:

def hard_cast_text(detected_text, choices):                                                                       
      """Hard cast detected_text to one of list of choices"""                                                       
      from Levenshtein import distance                                                                              
      distances = {}                                                                                                
      for choice in choices:                                                                                        
          distances[choice] = distance(choice,lower(),                                                              
      return min(distances, key=distances.get)

This could probably be made a one-liner if I thought long and hard enough about it. But we’re here to automate, not golf python.

The minerals needed a little extra to handle enor pearl being two words and certain detections being closer in Levenshtein distance to, say, jadiz. Another scoring system that weights the beginning of strings more heavily may have helped there, but keeping it to Levenshtein means I can strip out the external library and implement my own if I so wish.

Our output for these nine tests looks good:

             file                                  names       mission_type                       biome    hazard        mission_name               minerals
0  tests/drg1.png           [graham, MaoTheCat, BertieB]           Egg Hunt                  Magma Core  Hazard 3          Open Trick  [Umanite, Enor Pearl]
1  tests/drg2.png  [BertieB, graham, MaoTheCat, ksyme99]  Mining Expedition  Radioactive Exclusion Zone  Hazard 3     Purified Legacy     [Magnite, Umanite]
2  tests/drg3.png           [BertieB, graham, MaoTheCat]  Mining Expedition              Glacial Strata  Hazard 3     Unhealthy Wreck      [Croppa, Magnite]
3  tests/drg4.png                    [BertieB, Costello]  Salvage Operation                  Magma Core  Hazard 4        Rapid Pocket   [Bismor, Enor Pearl]
4  tests/drg5.png  [BertieB, graham, Noobface, Costello]   Point Extraction                   Salt Pits  Hazard 4          Angry Luck      [Bismor, Umanite]
5  tests/drg6.png          [BertieB, Costello, Noobface]  Salvage Operation               Dense Biozone  Hazard 4      Ranger's Prize        [Jadiz, Croppa]
6  tests/drg7.png           [BertieB, Costello, bTRRABN]           Egg Hunt                 Fungus Bogs  Hazard 4     Second Comeback        [Bismor, Jadiz]
7  tests/drg8.png            [BertieB, Costello, graham]  Mining Expedition              Glacial Strata  Hazard 4       Colossal Doom  [Umanite, Enor Pearl]
8  tests/drg9.png                    [BertieB, Costello]           Egg Hunt  Radioactive Exclusion Zone  Hazard 4  Illuminated Pocket  [Magnite, Enor Pearl]

Next step? Further automation, of course!

all posts automation computer vision ocr programming python

Automating YouTube Uploads With OCR Part 4: Exploring Approaches To Improve Detection

My path in the woods diverged, and I took them all

We’ve been seeing if we can apply OCR to the loading screen of Deep Rock Galactic to generate metadata for YouTube uploads for automation.

Last time, we got a quick-and-dirty script that would pull out the various parts of one image successfully. Now we’d like to do that for any given loading screen- any number of dwarves, hazard level, biome, level mutators (which the original image lacked).

We picked nine loading screens to expand our detection to:

The results are mixed:

Starting DRG OCR...
             file                          names       mission_type    mission_name                       biome    hazard                                objective
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT     DEFECT CELL                  MAGMA CORE  HAZARD 3     COLLECT 6 EGGS\nCollect 25 Hollomite
1  tests/drg2.png                  [&l, [T, @&3]               IR e      OPEN TRICK  RADIOACTIVE EXCLUSION ZONE  HAZARD 3  (COLLECT 225 MORKITE\nCollect 10 Fossil
2  tests/drg3.png       [BertieB, L), MaoTheCat]    INING EXPEDITI!  URIFIED LEGAC)              GLACIAL STRATA  HAZARD 3   COLLECT 250 MORKITE\nCollect 10 Fossil
3  tests/drg4.png               [T, 3 Oz!, o\no]     VAGE OPERATION    HEALTHY WREC                  MAGMA CORE   LLrZtl]         SR RTINS\nCollect 15 Apoca Bloom
4  tests/drg5.png                [o383, (o383, ]        LT X g (o))    RAPID POCKET                    SALTPITS    HAZARD                  COLLECT 10 AQUARQS\n(=R
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATION      ANGRY LUCK               DENSE BIOZONE    HAZARD             NIV T\nCollect 20 Boolo Cap.
6  tests/drg7.png            [®29, @&28, T VL R]                      ANGER’S PRIZE                 FUNGUS BOGS    HAZARD     COLLECT 6 EGGS\nCollect 25 Hollomite
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION    BRIGHT JEWEL              GLACIAL STRATA    HAZARD            (eI VRS\nCollect 20 Boolo Cap
8  tests/drg9.png             [. ®29, (o], I ‘4]                          HIOELR DY  RADIOACTIVE EXCLUSION ZONE    LLYZU]     COLLECT 6 EGGS\nCollect 20 Boolo Cap

or in image form:

The mission type was a source of issue before for text detection, but looking at the generated crop boxes, it seems text is getting cut off, which will also affect the mission name detection as they are presented together.

When we started this, I knew the number of players would have an impact on the locations of the text for the player names. However, given only up to four players can play at once, it wouldn’t be too bad to write detection for the four possibilities. But if other text is moving, that gets messy very quickly.

We have a couple of options at this point:

  • enlarge the detection boxes for the longest/biggest text we have in the examples and see if that works across all of them
  • think about using something like OpenCV to do text ROI (region of interest) detection (eg as pyimagesearch does it)

The first seems like it could be done quicker than the second, so we’ll give that a try first. We’re still in the “what approach works” stage (aka the quick-and-dirty stage) here!

Unfortunately, the approach wasn’t quite successful. It’s possible that the particular frames we picked from each video had an impact, but that’s not something we can easily test around with our current setup. Let’s see about adding OpenCV to the mix…


We’re going to reuse the approach taken by Adrian on pyimagesearch as the work has been done for us, and see where that gets us.


Well, the short answer is: not as far as I had hoped!

The boxes it detects on a full image detects either too little or too much, though the latter could probably be helped by some video pixel averaging to blur the background and keep the text crisp. However it also splits on non-word boundaries. All of these problems can be worked around, but perhaps there’s another approach we can add to the mix?

Another Image

As well as a start screen, there’s also an end screen:

Another successful mission!

The information is presented slightly differently, but importantly i) it presents the info more uniformly ii) background noise looks like less of an issue. Let’s put this one through the paces we did for the loading screen.

Overall naive OCR pulls out names well but misses about everything else. Mission name: yes. Mission type: nope. Minerals: yes. Promising! Heck, we could even pull out mission time and total hazard bonus if we wanted.

Let’s put OpenCV on the back burner for the time being, and see what a combined approach using two images gets us.

             file                          names       mission_type                       biome      hazard     mission_name                     minerals
0  tests/drg1.png   [graham, MaoTheCat, BertieB]           EGG HUNT                  MAGMA CORE  HAZARD 3 -       OPEN TRICK                 F ATl\n\nEL]
1  tests/drg2.png                  [&l, [T, @&3]       INING EXPEDI  RADIOACTIVE EXCLUSION ZONE  HAZARD 3 -  PURIFIED LEGACY              RGN AL\n\n48 17
2  tests/drg3.png       [BertieB, L), MaoTheCat]  MINING EXPEDITIO|              GLACIAL STRATA  HAZARD 3 -  UNHEALTHY WRECK    MAGNITE 3 CROPPA\n\n39 -3
3  tests/drg4.png               [T, 3 Oz!, o\no]        AL ol 2N ()                  MAGMA CORE    HAZARD 4     RAPID POCKET       2 nli) |2 el 1T\n\n3 4
4  tests/drg5.png                [o383, (o383, ]   POINT EXTRACTION                    SALTPITS    HAZARD 4       ANGRY LUCK               BISMOR UMANITE
5  tests/drg6.png  [BertieB, Costello, Noobface]  SALVAGE OPERATIO|               DENSE BIOZONE    HAZARD 4         I E Vi S             Tt AL v4\n\n3} 8
6  tests/drg7.png            [®29, @&28, T VL R]                                    FUNGUS BOGS    HAZARD 4  CECOND COMEBACK               S 6syTel) fivd
7  tests/drg8.png         [IR A )], Costello, T]  MINING EXPEDITION              GLACIAL STRATA    HAZARD 4    COLDSSAL DOOM       [IChley [ (e\n\n169 48
8  tests/drg9.png             [. ®29, (o], I ‘4]                     RADIOACTIVE EXCLUSION ZONE    HAZARD 4   TIRTIY TN T (3  COSLINCL IR MAGNITE\n\nX 3]

Improvement! We’re getting somewhere now, and we’ll see what we can do to clean the rest of it up using two images as a basis.

automation programming python

Automating YouTube Uploads With OCR Part 3: Programming with pytesseract and pillow

Last time, a bit of investigating showed that with a little cropping, tesseract can give good OCR results on a still of Deep Rock Galactic’s loading screen.

However, we were cropping manually, which defeats the purpose of this exercise, which is to automate metadata generation.

Thankfully, most of the operations we want to do are purely crops, so it’s straightforward to write a basic python script to get tesseract to recognise the right words.

Lets jump right in with something quick and dirty. The goal here is to get some useful output quickly, so we can confirm that the approach is viable; proper code architecturing comes later.

Starting DRG OCR...
['BertieB', 'graham', 'ksyme99']
Collect 15 Apoca Bloom

We got nearly all of what we want from the image, except for the minerals which are pictographs, which tesseract to my knowledge doesn’t handle.

There was one gotcha though. While the mission type (Point Extraction) was handled fine when using the full-sized image, all the crop boxes I tried didn’t mange to OCR the text correctly. If I used a box which included the mission name, it read both okay; so it would have been possible to do a combined OCR and split on newline.

One of the techniques to get a more accurate result with tesseract is to surround a small box with a border, which gave the right result:

img_mission_type = ImageOps.expand(img.crop(mission_type_box), border=10, fill="white")                       
mission_type = pytesseract.image_to_string(img_mission_type) 

Our very quick-and-dirty script gets what we’re expecting. The next step is to clean it up and expand our testing base. We can also consider the final output – if we’re giving at set of images to improve the range it can deal with, we might we well get useful output from it!

We’ll start by adapting it to these nine images. The one at middle bottom might be an issue due to the exhaust form the drop ship changing the contrast quite significantly- either it’ll be made to work or we’ll have to choose a different frame from that video.

Running the script as-is on image 1 (top-left), we get:

Starting DRG OCR...
['graham', 'PR A', 'BertieB']
Collect 25 Hollomite

Not bad, but it’s tripped up on MaoTheCat and added an extra apostrophe to the mission name. Looking at the crop boxes, it seems one’s too high for the middle player, and the mission name box is getting just a tiny bit of the mission icon. Tweaking the boxes, we get:

Starting DRG OCR...
['graham', 'MaoTheCat', 'BertieB']
Collect 25 Hollomite

And the output from the original image remains accurate too. We will continue this process for the rest of the test images and see where it takes us…

automation programming python

Automating YouTube Uploads With OCR Part 2: Getting Started with tesseract

Last time, we decided that Deep Rock Galactic is a game which is ripe for extracting video metadata from, thanks to it’s beautiful loading screen filled with information:

For OCR we need look no further than tesseract! It’s open source, under development (since 1985 no less!) and easy to install in Arch.

Let’s jump right in and point it at the image above, default settings.

$ tesseract drg-ocr-1.png stdout                                
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 189                              

7 A

Sticktogether and help your fellow dwarves. Getting incapacitated too far away from your team might
‘mean they won' be able to getto you.



' x Collect 15 Apoca Bloom [ErE hhb :: ‘


-1581440568 -1581440568 654 3 2

Oh, er. Now, for an image that’s a still from a video that’s not too bad, actually! It missed the names, classes, and biome, and thinks “Alone!” is “Aloney”; but on the plus side it got the mission type, name, objectives and hazard level.

Not a bad start, and I reckon we can clean that up when we get to actually processing the image with a bit of smarts.

Perhaps using a smaller region would help?

Let’s see:

Detected 34 diacritics


‘ /‘ e I

4Bert1ea j 3 Eraham
DHILLEH b scout

' /xé./f,,
" // II/ s

Eh, sort of? Given we’ve done no processing or cleanup, tesseract isn’t doing terribly.

Let’s make it real easy!

$ tesseract drg-ocr-name-bertieb.jpg stdout


We haven’t done any of the things that can improve tesseract’s accuracy, like image clean up or changing page segmentation mode. Despite that, we’re getting good, usable results from simply cropping.

The next stage is automation!

automation programming python

Automating YouTube Uploads with OCR


I play games. Quite often I stream those games, and I also upload the footage to YouTube for my friends and subscribers to enjoy.

However, each session of a game is one video, so I end up with many videos. In fact, Jefe, I’d say I have a plethora of videos. Since they are different round of the same games, many of the videos have similar structure to their descriptions.

Lots of things being similar sounds like fertile ground for automation!

I have a system, described elsewhere, which uploads and publishes videos to YouTube based on metadata I write, which is vastly more convenient than doing it manually through the web interface, which is a bit clunky to work with when doing videos in any quantity.

But if the metadata is similar, what if we could automatically generate that?

Deep Rock Galactic

If you’re not familiar with Deep Rock Galactic, it’s a coop FPS game for up to four players that sees you going on missions in procedurally-generated caves on a fictional world to extract materials and kill aliens. It’s great fun, but don’t take my word for it, go watch some videos!

DRG has a loading screen that very helpfully includes all the information on it that is needed to generate the metadata for the YouTube video:

The loading screen. It has all the information about the mission and so the video. DRG devs, THANK YOU!

Let’s break down the elements:

Here we have the names of the brave dwarven miners. This lets me say who is in the video.

It also has the classes. I don’t use that information currently, but since it’s there I could.

This has the mission type (Point Extraction), and the generated name (Clouded Joy).

Lots going on here.

  1. Biome (location) of mission
  2. Potential minerals*
  3. Objectives
  4. Hazard level (difficulty)

* these are in pictograph format, but we can still work with that.


Let look at an example video for metadata and see how it maps up:

DRG Elimination: Rippled Outpost

BertieB and Costello brave the Glacial Strata to eliminate two Glyphid Dreadnoughts

It goes: <Game Name> <Mission Type>: <Mission Name>

And the rest of the metadata mentioned above is included in tags, but it could be put into the description just as easily.

All the elements are there, all we need to do is do a bit of image recognition on them. Fortunately python has bindings to such things, so as we’ve figured out where everything is, all that’s left to do is write the code- that’s the easy bit, right?