Automating YouTube Uploads With OCR Part 10: Reflection, Lessons Learned, and Improvements

Every day is a school day

We set out to use OCR to extract metadata from frames of the loading and ending screens of Deep Rock Galactic to use to fill in details of videos destined for YouTube.

In other words we went from:

To:

Why?

It’s always good to reflect when you’ve done something. Did it go well, or not as well as expected? What did you hope to achieve? Did you achieve that? What has it changed? There’s as many ways to reflect as there are things to reflect on.

In this project I wanted to achieve a greater degree of automation with my video creation workflow. Partly because it would save me time:

The ever-relevant XKCD (https://xkcd.com/1205/)

The other reason is because copying text is no longer the provenance of monks in a scriptorium- it’s a repetitive, uncreative task. I enjoy spending time playing games with my friends, and those videos are there so that they and others can relive and enjoy them too; spending time copying text is not a good use of my time.

However, there’s a more pertinent image for this sort of task:

Pretty much spot on (https://xkcd.com/1319/)

There were 47 videos in the test batch. Let’s say that I would have spent five minutes per video copying across the title, writing a description, figuring out the tags and such; doing that manually would have taken 235 minutes, or nearly four hours. That might sound like a lot, but it’s certainly less time than I worked on the automation.

The automatic OCR will have ongoing benefits – there are more videos to process.

But the best part is that I learned. I learned about tesseract and OCR, a bit about OpenCV, and honed my python programming skills.

Lessons Learned

OCR is good enough to extract text from video stills. I assumed this, but it is good to have it confirmed.

Cleaning up images makes a huge difference to OCR accuracy. I could probably have improved detection in the opening image to use just that if I had cleaned up earlier in the process; but using both loading and ending images gives more metadata, so it worked out okay.

It’s really easy to leak file descriptors. Late on, when I went to test with a wider variety of videos, I ran into this issue “OSError: [Errno 24] Too many open files“. Instead of using tempfile.mktemp, which unexpectedly kept the fd, I had to use tempfile.NamedTemporaryFile. That one took a bit of hunting down as it looked like pytesseract was failing, and coincidentally they had a couple of issues in previous versions due to the same issue (mktemp vs NamedTemporaryFile)! Most confusing.

What Would I Do Differently?

Implement automated testing. This would have hugely helped in the refinements stage, where regressions in detection accuracy occurred as I refined. There were a couple of reasons that put me off at the time, but they were more excuses than reasons:

  • this was a “quick and dirty” attempt to get a tool working, refinements to it can come later

    This an old, old excuse; proved false time and again. It’s sometimes phrased as “This is just a temporary fix, will do it properly later” and other variants. What it boils down to is “We’re going to do this the ‘wrong’ way for now, and change it later”.

    It sounds fine, if you actually sort it later, but invariably that doesn’t happen. Time and effort have to be focused somewhere, and it’s a harder sell to redo something that “works” (however hackily) than to implement a new feature, or get a product out the door.

    Here it was even worse: doing that work may well have improved the “quick and dirty” process.
  • the frame extraction + OCR processes aren’t quick, and tests should be quick to run; it’s also hard to break apart the pipeline

    This excuse is on slightly firmer ground, but not by much! It’s true that these things take time, but they can be broken down to components and tested individually using sample images (for example).

    It might not provide the coverage of a real life full data set, but it’ll catch the worst of regressions.

Future Improvements + Directions

Use only a start or end frame if one is missing. At the moment a video is skipped if either the start or end frame is not detected. That leaves the video to be done entirely manually- we could get at least some of the metadata from without the other.

Detect in-game menu screen. For times when I hit the record button too late (or OBS takes too long to spin up), I could go into the menu which has a couple of bits of metadata. I would need to remember to do this, but I usually realise I’ve hit record too late. Combined with the above improvement, we could increase video coverage.

Expand OCR to other games. This is non-trivial but an obvious way to go. Killing Floor 2 is the likeliest next candidate as at the moment it’s the one we play the most and also has metadata to capture.

Consider a further automated pipeline. As it stands, I have to run the program against videos manually; not a big deal. But a tool that detected new videos, automatically runs the OCR tool against them and puts them and the JSON output in a convenient place (± automatically uploading them to YouTube) would make the process more streamlined. This may be beyond my own need or indeed tolerance- I could see it being potentially frustrating if I wanted to manually handle a video differently.

Overall though, I am happy with how the tool turned out.

Using Discourse Dev with Traefik (without ‘Bad Gateway’ + ‘blocked host’)

tl;dr:

  • Traefik grabs the first port it sees, which on the dev image is 1080- we want port 9292. Use --label=traefik.http.routers.discourse-dev.port=9292
  • You need to set a dev host using en env var in the container: -e DISCOURSE_DEV_HOSTS=your_dev_hostname \

With the dev version of Discourse working, I wanted to let its connectivity be managed by the traefik proxy. But whichever way I sliced it, I would get a Bad Gateway error. The usual suspect for this is not setting a port, or having the service on a different network from traefik itself. However, this issue persisted for me.

I had to add the following to (discourse_source_root)/bin/docker/boot_dev, in the docker run ... section:

    --network=traefik_default \
    --label=traefik.port=80 \
    --label=traefik.docker.network=traefik_default \
    --label=traefik.http.routers.discourse-dev.rule=Host\(\`$DEVHOST\`\) \
    --label=traefik.http.services.discourse-dev.loadBalancer.server.port=9292 \

I set DEVHOST=<my dev host> earlier in the file, or you can use the host there directly. The last line points traefik at the correct port (9292) in the discourse-dev container.

Accessing by host then produces a page with a blocked host error:

Blocked host: discourse_dev_host
To allow requests to discourse_dev_host, add the following to your environment configuration:
config.hosts << “discourse_dev_host”

Setting DISCOURSE_DEV_HOSTS permits access on those hosts. We need to do this in the container, so add the following to the same section in the same file:

-e DISCOURSE_DEV_HOSTS=$DEVHOST \

Which permits access via that (or those) hostname(s).

[solved] ‘Connection closed’ in Discourse Dev Install

tl;dr: this was a temporary issue solved by a later commit, if you checked out discourse after 28 Oct but before 4 Nov, git pull to update


Having installed a production Discourse forum, I wanted to get a local dev instance up and running for testing.

There are good instructions for doing just that using Docker. Don’t do what I did: follow the production install method and assume that will work by pointing the prod hostname at it in /etc/hosts.

Unfortunately, when I followed the instructions to set up the dev instance, I was greeted with an ‘Unable to connect’ screen. (ERR_FAILED). Even using telnet from the same host failed:

bertieb@ubunutu-vm:~/discourse$ telnet 127.0.0.1 9292
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

Dang. I tried this across fresh Arch and Ubuntu Server (19.10 + 18.04.3 LTS) installs and got the same thing.

Installing the non-Docker version worked but only for localhost; then a comment on that guide’s topic pointed me at a recent change to interface binding. Checking out the commit before that change let me connect from other hosts in both the Docker and non-Docker versions.

As of 2019-11-04 a later commit sorted this issues and added a specific flag (-b) for permitting connections from other hosts.

Automating YouTube Uploads With OCR Part 9: Bringing it All Together

I love it when a plan comes together

We’ve been finding a way to automate YouTube uploads using tesseract to OCR frames from Deep Rock Galactic videos to extract metadata used in the video on YouTube.

We got to the stage where we have good, useful JSON output that our automated upload tool can work on. Job done? Well, yes- I could point the tool at it and let it work on that, but it would take quite a while. You see, to give a broad test base and plenty of ‘live-fire’ ammunition, I let a blacklog of a month’s videos build up.

Automating Metadata Updates

Why is that an issue for an automated tool? The YouTube API by default permits 10 000 units per day of access, and uploading a video costs 1600 units. That limits us to six videos per day max, or five once the costs of other API calls are factored in. So I’d rather upload the videos in the background using the web API, and let our automated tool set the metadata.

For that we need the videoIds reported by the API. My tool of choice to obtain those was shoogle. I wrapped it in a python script to get the playlistId of the uploads playlist, then grabbed the videoIds of the 100 latest videos, got the fileDetails of those to get the uploaded fileName… and matched that list to the filename of JSON entries.

So far so good.

Faster Thumbnails

But one of the personal touches that I like to do, and that will likely not be automated away is to pick a frame from the video for the thumbnail. So I need a way to quickly go through the videos, find a frame that would make a good thumbnail, and add that as a field to thumb for the correct video entry. I’ve used xdotool in the past to speed up some of the more repetitive parts of data entry (if you’ve used AutoHotKey for Windows, it’s similar to that in some ways).

I threw together a quick script to switch to the terminal with vim, go to the filename of current video in VLC (VLC can expose a JSON interface with current video metadata- the ones I’m interested in are the filename and the current seek position), create a thumb ? time entry with the current time and then switch back to VLC. That script can be assigned a key combo in Openbox, so the process is: find frame, hit hotkey, find frame in next video, hotkey, repeat.

Though the process is streamlined, finding a good frame in 47 videos isn’t the quickest! But the final result is worth it:

We have videos with full metadata, thumbnail and scheduled date/time set.

Glorious.

I included a video that failed OCR due to a missing loading screen (I hit record too late). There’s a handful of those- I found five while doing the thumbnails. I could do a bit of further work and get partial output from the loading/ending screen alone; or I could bit the bullet and do those ones manually, using it as a reminder to hit the record button at the right time!