Automating YouTube Uploads With OCR Part 3: Programming with pytesseract and pillow

Last time, a bit of investigating showed that with a little cropping, tesseract can give good OCR results on a still of Deep Rock Galactic’s loading screen.

However, we were cropping manually, which defeats the purpose of this exercise, which is to automate metadata generation.

Thankfully, most of the operations we want to do are purely crops, so it’s straightforward to write a basic python script to get tesseract to recognise the right words.

Lets jump right in with something quick and dirty. The goal here is to get some useful output quickly, so we can confirm that the approach is viable; proper code architecturing comes later.

Starting DRG OCR...
['BertieB', 'graham', 'ksyme99']
POINT EXTRACTION
CLOUDED Joy
DENSE BIOZONE
HAZARD 3
COLLECT 7 AQUARQS
Collect 15 Apoca Bloom

We got nearly all of what we want from the image, except for the minerals which are pictographs, which tesseract to my knowledge doesn’t handle.

There was one gotcha though. While the mission type (Point Extraction) was handled fine when using the full-sized image, all the crop boxes I tried didn’t mange to OCR the text correctly. If I used a box which included the mission name, it read both okay; so it would have been possible to do a combined OCR and split on newline.

One of the techniques to get a more accurate result with tesseract is to surround a small box with a border, which gave the right result:

img_mission_type = ImageOps.expand(img.crop(mission_type_box), border=10, fill="white")                       
mission_type = pytesseract.image_to_string(img_mission_type) 

Our very quick-and-dirty script gets what we’re expecting. The next step is to clean it up and expand our testing base. We can also consider the final output – if we’re giving at set of images to improve the range it can deal with, we might we well get useful output from it!

We’ll start by adapting it to these nine images. The one at middle bottom might be an issue due to the exhaust form the drop ship changing the contrast quite significantly- either it’ll be made to work or we’ll have to choose a different frame from that video.

Running the script as-is on image 1 (top-left), we get:

Starting DRG OCR...
['graham', 'PR A', 'BertieB']
EGG HUNT
' DEFECT CELL
MAGMA CORE
HAZARD 3
COLLECT 6 EGGS
Collect 25 Hollomite

Not bad, but it’s tripped up on MaoTheCat and added an extra apostrophe to the mission name. Looking at the crop boxes, it seems one’s too high for the middle player, and the mission name box is getting just a tiny bit of the mission icon. Tweaking the boxes, we get:

Starting DRG OCR...
['graham', 'MaoTheCat', 'BertieB']
EGG HUNT
DEFECT CELL
MAGMA CORE
HAZARD 3
COLLECT 6 EGGS
Collect 25 Hollomite

And the output from the original image remains accurate too. We will continue this process for the rest of the test images and see where it takes us…

Automating YouTube Uploads With OCR Part 2: Getting Started with tesseract

Last time, we decided that Deep Rock Galactic is a game which is ripe for extracting video metadata from, thanks to it’s beautiful loading screen filled with information:

For OCR we need look no further than tesseract! It’s open source, under development (since 1985 no less!) and easy to install in Arch.

Let’s jump right in and point it at the image above, default settings.

$ tesseract drg-ocr-1.png stdout                                
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 189                              
Y

I
7 A

WORK TOGETHER...OR DIE ALONEY 
Sticktogether and help your fellow dwarves. Getting incapacitated too far away from your team might
‘mean they won' be able to getto you.
TAP [ 70 CALL FOR ATTENTION

   

POINT EXTRACTION
CLOUDED JoY
LA HAZARD 3- DANGEROUS

> COLLECT 7 AQUARQS ECEES
' x Collect 15 Apoca Bloom [ErE hhb :: ‘

  

-1581440568 -1581440568 654 3 2

Oh, er. Now, for an image that’s a still from a video that’s not too bad, actually! It missed the names, classes, and biome, and thinks “Alone!” is “Aloney”; but on the plus side it got the mission type, name, objectives and hazard level.

Not a bad start, and I reckon we can clean that up when we get to actually processing the image with a bit of smarts.

Perhaps using a smaller region would help?

Let’s see:

Detected 34 diacritics
 

N

‘ /‘ e I

4Bert1ea j 3 Eraham
DHILLEH b scout

' /xé./f,,
" // II/ s

Eh, sort of? Given we’ve done no processing or cleanup, tesseract isn’t doing terribly.

Let’s make it real easy!

$ tesseract drg-ocr-name-bertieb.jpg stdout
BertieB

Beautiful.

We haven’t done any of the things that can improve tesseract’s accuracy, like image clean up or changing page segmentation mode. Despite that, we’re getting good, usable results from simply cropping.

The next stage is automation!

Automating YouTube Uploads with OCR

Why?

I play games. Quite often I stream those games, and I also upload the footage to YouTube for my friends and subscribers to enjoy.

However, each session of a game is one video, so I end up with many videos. In fact, Jefe, I’d say I have a plethora of videos. Since they are different round of the same games, many of the videos have similar structure to their descriptions.

Lots of things being similar sounds like fertile ground for automation!

I have a system, described elsewhere, which uploads and publishes videos to YouTube based on metadata I write, which is vastly more convenient than doing it manually through the web interface, which is a bit clunky to work with when doing videos in any quantity.

But if the metadata is similar, what if we could automatically generate that?

Deep Rock Galactic

If you’re not familiar with Deep Rock Galactic, it’s a coop FPS game for up to four players that sees you going on missions in procedurally-generated caves on a fictional world to extract materials and kill aliens. It’s great fun, but don’t take my word for it, go watch some videos!

DRG has a loading screen that very helpfully includes all the information on it that is needed to generate the metadata for the YouTube video:

The loading screen. It has all the information about the mission and so the video. DRG devs, THANK YOU!

Let’s break down the elements:

Here we have the names of the brave dwarven miners. This lets me say who is in the video.

It also has the classes. I don’t use that information currently, but since it’s there I could.

This has the mission type (Point Extraction), and the generated name (Clouded Joy).

Lots going on here.

  1. Biome (location) of mission
  2. Potential minerals*
  3. Objectives
  4. Hazard level (difficulty)

* these are in pictograph format, but we can still work with that.

Descriptions

Let look at an example video for metadata and see how it maps up:

DRG Elimination: Rippled Outpost

BertieB and Costello brave the Glacial Strata to eliminate two Glyphid Dreadnoughts

It goes: <Game Name> <Mission Type>: <Mission Name>
<Players><Biome><Objectives>

And the rest of the metadata mentioned above is included in tags, but it could be put into the description just as easily.

All the elements are there, all we need to do is do a bit of image recognition on them. Fortunately python has bindings to such things, so as we’ve figured out where everything is, all that’s left to do is write the code- that’s the easy bit, right?

Services, Servers, DomUs and Containers, Oh My!

What a tangled web we weave…

I was confused. Staring at a console window, wondering how I’d installed a program†. The system package manager knew nothing about it, pip pretended like it never heard of it, and I hadn’t downloaded and compiled the source.

I’ve never really had what anyone would call a sane management approach to my servers and services. The closest I’ve got is using Xen as a hypervisor and trying to separate DomUs (VM guests) by service type, backed by LVM-on-mdraid storage*. That sounds alright in theory, but in practice most services have tended to congregate on the largest guest, turning that DomU into a virtualised general purpose server. A server that mixes and matches the system package manager, other package managers like pip, SteamCMD and manual installs.

Pug fugly, in other words.

Pug-fugly, as coined in Pgymoelian

* the storage also sounds good in theory but the implementation has led me to dub the machine the ‘Frankenserver’ (more on that another time)

This is fairly typical. You need to do X. Not only that, but you need to do it RIGHT NOW. Software ABC does that. So you download it from whatever source seems most convenient and up-to-date, glance at the ‘Quickstart guide’ (while saying thank goodness for those!), do a bit of minimal configuration and then you’re up and running, doing X.

But what’s the big deal? The service[s] work, after all.

The issue is a general one: it takes time to figure out the setup before you can usefully interact with it.

This doesn’t just apply to DevOps; but to coding, writing, maintenance and repair, DIY, cooking, house management.

Or more simply: Fail to plan, plan to fail.

I found that it was taking me time to get my head around:

  •  what I was dealing with
  • how it had been set up
  • why it wasn’t working
  • how to fix it
  • how to update it

I’d do those things, sort whatever, get the service working again, then six months later I’d have to figure it all out again.

“What a way to run a railroad…”

Clearly, there must be a better way.

In fact there are several better ways, depending on what you want to do. DevOps is huge business, and scales into the multinational megacorp range. But the home user can benefit too. There are clear benefits in using a well-organised system for pretty much anything, and managing servers, services and other applications is no exception. Used well, maintainability, security, reliability are all enhanced.

But how does one get started? There are plenty to choose from. Some folks I know love Docker, others opt for LXD on LXC (those options are not exclusive). There are also the configuration management tools, like Puppet, Ansible, Chef (etc).

Well, I briefly used Docker in the past, and now have it on one of the DomU guests, hosting a few services I used to run elsewhere. This seems like reason enough to dip my toes deeper in the waters and move more services to containers, or at least to automated processes.

It’s rarely glamorous, but writing good documentation can make a huge difference to the person that follows you. Even when that person is you.

As an aside, the other key ingredient other than having good systems in place is to have good documentation.

For example, before writing this up I wondered about installing a new spam plugin. I used to use Spam Karma 2, but that’s been unmaintained for a long while. But which one? Well, seems I’ve used Akismet and Anti Spam Bee in the past, but why did I stop using them? I have a vague recollection of the former re-moderating old comments and declaring them spam, and the latter not working in some way, but what?

Good documentation, make it your non-New Year’s Resolution.

So the take-home message here is that ad-hoc setup pop up and stick around for longer than they should; don’t do that, have a good system instead and document what you’re doing and why.

Because it’s good to have a goal, my aim is to get low-hanging fruit services moved over to Docker in the first instance (heh) to learn more about the tech. Fore there I can decide what I can move to containers, and maybe even see if LXC would fit my needs anywhere. I’d also like to see if I can apply this to the wee tools I write myself to help automate my workflows- rather than running them manually, perhaps I can develop them as services. And while doing all of this documenting what I am doing and why.

Then maybe one day I won’t have to ask “I have a (python-based) program installed that doesn’t seem to have been installed either by apt or pip, and obviously I can’t remember… is there any way to figure out how I actually installed it? :D”.


beets, for music library organisation/tagging/management

Featured image by steve gibson on Flickr

Terminal sharing via web? Use Gotty

I found an old 2.5″ hard drive that I didn’t recognise, and it was not keen to mount- it kept errorring out. So I figured I’d ddrescue it, and wanted to share that process with a friendly online community.

But sharing my terminal was something I hadn’t done before, so I cast around for suggestions and someone mentioned gotty:

GoTTY – Share your terminal as a web application

GoTTY is a simple command line tool that turns your CLI tools into web applications.

It does read-only (and interactive) sharing easily, though I had some issues getting it to play nice with nginx, mostly down to my own configuration choices elsewhere.

So I got it all hooked up and folks watched the recovery via a web interface at http://bertieb.org/tty/ (if I’m not running gotty that’ll error out, but I can see using it again in future) with bated breath… or maybe that was just me!

PS It turns out the drive didn’t fully recover but I got enough to figure out that it wasn’t even mine- I think I removed it from a salvaged laptop that used to run some services for me.

Screenshot update:

Softly, Softly

Sometimes when confronted by a setback, people lose their cool.

It happens. You have spent hours, perhaps days going over something in your head. You do research, can’t quite get to the answer you want to you post on a QA site, thinking “these guys really know what they’re talking about, I’ll get an answer real quick”.

But the process goes wrong. Instead of seeing your post and congratulating your for your witty and well-chosen phrasing, they take issue with terminology, ask questions that seem obvious, or irrelevant, or both! Then the question is put on hold until such time as your can ‘improve’ it.

Being told to improve isn’t the end of the world, but it’s a definite poke in the ego. The tribe has circled, they won’t let you in. As a result, some choose to express their frustrations in a non-constructive manner. This is unfortunate, because it evokes less consideration and feelings of altruism, not more.

In the example above, I’ve tried to pull things back from that brink:

Hi Raven, welcome to Super User. It seems you are a bit frustrated by the QA process here. Questions are sometimes put “on hold” so that they can be more easily answered- this has the benefit of making it more likely for you to get an answer. While it may be clear what you mean and intend, some of our experienced members find the specifics of what you are asking unclear.

However, I’m afraid you cannot ‘require’ that we take a post from being “on hold”, the best option for this to happen is to clarify or add detail as requested. Furthermore, your comments come across as quite aggressive; that’s hopefully not what you intended (since that would not be okay); you should consider removing those before a moderator does.

Futhermore, as your edit does not contain information pertinent to answering your question, I am going to ‘rollback’ to your original version. You are welcome to make further edits to add information to make answering the question easier.

(typos mine)

There’s two things that need to be done, as I see it: 1) let the person know what their approach won’t work and is not acceptable; 2) try to get things back on track.

The first, if done right, is doable; in fact it’s very satisfying when it happens. You can go from someone insulting you and being aggressive, to having a laugh with them (in the best case!). That said, in this case getting the tone back to being civil would be enough. Being nice when someone expects hostility can be incredibly disarming. It has to be sincere though; phoney friendliness backfires.

The second is important form a QA point of view. Interactions are hazardous and sometimes go wrong and the wrong things are said sometimes but those are all a distraction. Ideally, a question would be improved so that it can be answered (some cannot). A reasonable response from the community side is more likely to engender a helpful clarification or vital detail. It’s not easy to keep that up though, especially seeing the same thing day after day.

Post-scriptum: in this case the user characterised the on-hold review process and comments as ‘nefarious’ and declared that what they had written was ‘ALREADY 100% CLARIFIED’, amongst other things. It doesn’t always work.

Metal Gear Solid V: The Phantom Pain Mini-Review

I intended to include this in my post about reaching 100%, but that turned into a story about sneaking into a base armed with trousers.

So here’s the mini-review I was going to include with that.

The good: Controls and gameplay are (for the most part) deliciously smooth, and this strongly complements the organic gameplay possible- attacking a military fortress as a super-soldier with high tech gear plays as fluidly as ambushing an outpost wearing nothing but a pair of trousers. I ended up enjoying the story, but that’s going to be variable for others.

The bad: It’s only 2/3 complete, thanks Konami! Why do I have to stare at the Boss in a chopper every time I want to do something from the ACC? The grind for 100% was tedious, though optional. The animations and controls, while smooth were on occasion a bit slow- I kept getting into AA emplacements rather than fultoning them; and during certain fights, dodging some attacks failed because snake wanted to lie down or/or crawl instead of dive-dodge.

The ugly: At this point I’m not sure if Hideo Kojima is trying to illuminate/draw attention to sexism/misogyny etc, but uh, there were some highly questionable bits.

100% in MGSV:TPP

I finally got there! I have mixed feelings, as I no longer have an excuse to load the game up and mess around; but on the other hand I am happy to be done with some of the slog.

Even towards the end, after a hundred hours the game could still surprise me. I’ll give you a quick example: I was attempting a Subsistence mission (where you start with nothing but a pair of trousers and your wits). Some careful sneaking at an outpost netted me an AK and a bit of ammo, and so armed I went for the main objective- a small base that served as a radio relay station. I dispatched their forward patrol non-lethally and made my way through a pass that lead to a ledge directly above the base.

I do like the non-lethal approach, which has been consistently rewarded by other titles in the Metal Gear series. My normal approach would be to methodically and silently tranquilise the various patrols and guards without arousing suspicion; but with my only ranged weapon being unsilenced this wasn’t an option. Could I lure the guards somewhere secluded and incapacitate them? Perhaps! I made a bit of a movement on the ledge and the guard decided he would run the several hundred meters around the hill and through the pass to get to me. A quick bit of CQC and one down! Okay it took a few minutes but the theory was solid. Another guard was similarly lured.

There my plan stalled. No matter what wild, strange dance I did on that ledge, I couldn’t attract anyone’s attention. It being night probably didn’t help. There was nothing to do — aside from firing my gun, but that would have attracted a little too much attention — but drop down into the base proper. I was able to knock out another guard when I got the harsh musical sting and slowmo (which will forever belong to Max Payne in my head) meaning someone else had seen me. What’s more, they were too far away to silence before the brief window that lets you prevent a full-blown alert closed. Whoopsy.

I was determined to roll with the punches. Even if the punches were in the form of many automatic rifle rounds to the not-armoured-trousers.

So, there was a full-blown alert! I ducked behind the small building housing the objective (some radio comms equipment), peeking out to exchange fire. I soaked up a bit of damage — trousers are not as good armour as you might think — and was seriously considering reloading from checkpoint. But by this point in the game I had already achieved the S rank for the mission (which generally depends on either speed or being super-stealthy or both) so I was determined to roll with the punches. Even if the punches were in the form of many automatic rifle rounds to the not-armoured-trousers.

Combat was faring less well than I hoped. The guards were well-armed and well-armoured. Ducking around one end of the building was rewarded with a warning of a missile lock on me (!), and the other had a shotgun/machine gun duo who were surprisingly effective. I was low on health, low on ammo and lacking a weapon that dealt sufficient damage to those who wished to do me harm. Then I was thrown a bone.

Warning: Sandstorm approaching

A very windy, sandy bone; but a bone nonetheless. Sandstorms limit visibility to about 2 feet, so this was an opportunity to scarper. Which I did, in the direction of a nearby mortar.

In the nigh-on 100 hours of gameplay, I don’t think I’d used a mortar before. They just didn’t mesh terribly well with the whole stealthy-and-non-lethal approach I so heavily favoured. As I could still see marked enemies in the sandstorm, I was able to take out most of the guards that had previously been gunning me with great effect. And some of the guards I hadn’t spotted before and couldn’t see through the sand. And some of the radio transmitters. And the anti-air radar.

Mortars are great!

After that finishing off the last few non-mortally-wounded (or is that non-mortarily-wounded?) soldiers and completing the objective was a relative breeze.

MGSV:TPP had many of these moments, when things didn’t work out the way they were planned; but the outcome was even better than expected.

tl;dr: good game, would recommend.

Closing in (95%)

Nearly there!

It’s getting annoyingly grindy now! I spend most of the time in the chopper, either:

  • deploying to an area to put down capture cages then immediately leaving
  • returning to the medical platform on mother base to hand over photographs

The latter is especially irritating as there’s no indication that you need to do it, and you can’t give all ten photographs at once! So chopper in, run to room, cutscene into room, hand over photo, run back to chopper, leave; repeat.

krusty_groan.wav

I have a sheet of paper that I’m crossing off the things as I do them. It’s slightly illegible due to my broken fingers, but usable.

On “Back up, Back Down”

The gods of irony got together with the gods of gaming after my recent gripe about having to do and redo things in MGSV:TPP:

Some missions have mutually-exclusive objectives – I’m looking at you, Backup, Back Down – so may require more

Well, I played through the “Extreme” version of Back Up, Back Down to do the additional objectives, and the team searching for the prisoner got there, stood around him and then… very kindly didn’t execute him.

So I ran up, stunned them all with a non-lethal assault rifle and fultoned him out! All optional objectives complete.