Categories
cool programming

AI Dungeon 2 Is Fun Nonsense

Apparently, there’s something about Mary

It’s night. I’m a private detective from Chicago named Joseph, on the hunt for someone named Jim, and I have a gun and a badge. I’m in the woods, and I hear some noise from behind the trees. Suddenly an old man shoots an arrow from a bow at a hitherto-unseen target. He runs off, but I catch up with him and ask his name. It turns out that he’s also a detective from Chicago named John, and he’s also hot on the trail of Jim too.

I ask “How did you know my name?” and he replies, succinctly: “Because we’re both detectives.” I try to discuss the case with him, but he refuses to be drawn on it, preferring to cryptically state “I’m sure we’ll have some clues soon enough”.

We come across a small house in the woods, and I venture inside. A woman sits, reading quietly. I ask her about Jim, but she only says that he left long ago. I make a note of the house and return the next day without John. I look around and find some white socks and black pants. Ah-ha! These are crucial to the case. I put them on immediately. Surely it’s now only a matter of time before I find Jim.

I find only a shack, in which a single light bulb illuminates a strange assortment of books and papers with diagrams.

I go back outside, and see John, the other detective watching me cautiously. Clearly he’s jealous of my new socks and pants. He disappears into the woods. I run after him but find only a shack, in which a single light bulb illuminates a strange assortment of books and papers with diagrams. I picture Jim with this:

Combing through the strange lot of papers, I find one that might help my case! It’s a drawing. A drawing of a man in front of a tree. He has a hat, and the hat has horns. His eyes are wide open and staring at me.

This is Jim!

I find the tree in the drawing. It’s odd. It isn’t right. It seems to be made of wood, but it has cracks all over and seems as if it was never alive in the first place. Maybe it has Jim inside it? In any case it isn’t right. It has to go.

I break the tree apart, fling a piece at a nearby wall, which thuds, then silence.

The next day, I come home and see that everything is gone.


The above is how my first dabble with AI Dungeon 2 started. I was linked to it without context, so had no preconceptions going in. it all started off somewhat normally, I wondered if it was some kind of randomly-generated MUD (a old text-based system predating popular MMORPGs that let users create text-based worlds and interact with one another. But as things got slowly more odd it seemed like it was something else. It had the slightly weird, funny cadence that computer-generated text has.


I had come close to finding Jim. The house, the pants, the drawing in the shack, and the tree. They all fitted together, and I knew I must be close. I returned to the woods.

Thereupon I chanced on a woman sitting on a rock, crying. She explained that her sister Mary had gone missing only the night before. Perhaps Jim had a hand in this. I tried to explain the situation as best I could, but this only upset her more. So instead, I gave her a hug. This calmed her down, perhaps too much. She fell to the ground. She needed to be somewhere safe, but where? Ah! The shack! I carry her there.

Going in, I find a man dressed in an old coat and wearing glasses. He has long white hair that hangs down to his shoulders. His eyes are closed and he seems very tired looking. What the heck is he doing there? I demand to know his name.

“My name is James, but everyone calls me Jack.” Joseph, John, Jim, James, Jack… Wait! James? As in the unshortened form of Jim..? I have to think on my feet, and decide to act quickly.

“Where is Mary?”

I’ve got him now. Or so I think. But the man just sighs and shakes his head. He thinks he’s won. But I’m Joseph, a detective from Chicago. And Chicago detectives know how to roll with the punches, literally and figuratively. I decide to roll with this one and throw him off balance. I drop my voice, lean in close and growl:

“Where is Jim?”

“Mary..? She left with another guy named John.”

He yawns and rubs his eyes. He looks tired too. But he knows I’ve got him. “Mary… Jim… Where is Mary?” He’s trying to throw me, but he didn’t reckon with my Windy City credentials. He coughs and then speaks. “She left with another guy named John.”

Damn.

The one thing I wasn’t expecting. The one man I didn’t suspect.

Time for action. Mary and John can wait, but Jim’s my case and he has questions to answer. I grab Jim by the collar and pull him from behind the desk. He puts up a brief resistance, but he isn’t strong enough to break free. Up against the wall he goes, and I cuff his hands together behind his back. Time to take him downtown.


I’ve long enjoyed the output of Markov chains. They are some relatively simple procedures for generating sequences based on previous values and frequencies. You can apply this to text, and generate new text based on frequencies of letters, or words.

The old resources I used to learn about Markov Chains way back when have somewhat stuck in my head. I recall a reference to ‘Alice in Elsinore’; and that can be found at a page called ‘Fun with Markov Chains‘. There’s another bit which went into the varying lengths, how short lengths — say, one to three characters — produced gibberish that kinda almost looked like it might have been English once; and longer lengths gradually come closer and closer to the original text[s]. That seems to have been part of Programming Pearls, which used to be available to read online; I only managed to find part of that section archived on Jeff Atwood’s blog by use of some judicious Google search tools.

You can create some fun things with Markov chains. The examples given above included a generated Alice in Elsinore and the Revelation of Alice. I implemented Markov chain text generation as a command for an IRC bot that I wrote, which could talk in the ‘voice’ of my friends that hung out on there; that command was definitely my favourite.

I implemented Markov chain text generation as a command for an IRC bot that I wrote, which could talk in the ‘voice’ of my friends that hung out on there. That command was definitely my favourite.

Latterly, we’ve seen a resurgence in this with the rise in ‘AI’. Such as this ‘AI-written Harry potter fanfiction’

Harry Potter and the Portrait of What Looked Like a Large Pile of Ash
Hungry indeed

or less child-friendly things, like Trump speeches:

But calling any of this ‘AI’ is a stretch. It’s picking things based on random chance and frequency. If I have a sock drawer with thirty red socks, six green and two blue I’d be… a bit boring. But if I closed my eyes and picked socks from there, it would be a bit misleading to write an article saying “I got an AI to choose my clothes for the week and these are the results”.

But I digress.


Having brought in Jim, my attention must turn to Mary. Her sister was counting on me. I trusted my Chicago detective instincts and followed up on a lead that Jim spilled during his interrogation.

I went to the park. There I met two men, Mikey and Brenda. Apparently, they didn’t get along. I knew Mikey was hiding something, and decided to find out what it was. I dragged him into an alleyway, shoved my knee into his back, and started punching him.

I knew Mikey was hiding something, and decided to find out what it was. I dragged him into an alleyway, shoved my knee into his back, and started punching him.

Good Cop time was over, now it’s Bad Cop’s shift.

Mikey pleaded with me for mercy, this was all a misunderstanding, help would be forthcoming, he didn’t want to die, etc. I told him to shut up.

Where is Jim?” I asked in the same voice I used on Jim earlier… Wait, wait. Wasn’t Jim at the police station? “Oh, that’s right,” Mikey says. “He went home for the day.” I was confused, but went along with it. “Oh, good”. But then Mikey had a surprise for me. He grabbed me, threatened me and apologised. I sensed that Jim was a touchy subject best left alone, so asked about Mary.

“Mary?” Mikey asks. “Who’s Mary?” I explained about the woman’s missing sister. “What about her?” Mikey enquires further. But at that point we spot mart coming out of a store. I approach Mary, and she looks surprised to see me.

“Hey, you’re not my brother anymore,” Mary says. “Are…are you?”

Apparently she recognised me. I ask about her sister and Mary explains she’s at work.

At this point I realise something weird is going on. Sounds seem muffled, colours aren’t quite right, and time and place seem strangely elastic.


I thought perhaps AI Dungeon 2 was a bit like Sleep Is Death (Geisterfahrer) by Jason Rohrer, where the stories are written by players; or Cleverbot, where responses given by people are saved and can be reused.

But AI Dungeon 2 instead uses deep learning techniques to keep generating content, no matter what is thrown at it. It does have limitations, but it’s an interesting concept sprung from a Hackathon.

Best bit? It’s Free Software, MIT licensed! Check out its Github!


Things were getting weird. I tried to dance with Mary, which seemed like the thing to do at the time. She stared at me, but not in an uncomfortable way. I tried a backflip, and it ended with us falling asleep together1. Then I had to run away, far away; away from the voices shouting that we’re not sisters.

A group of men accosted me. They looked like they had been drinking heavily. I had to keep the initiative; my detective instincts took over and I slapped one of the men. It surprised the group. I slapped another one and it surprised them identically. But they started to beat me, which I guess was inevitable.

I tried everything to distract them. The harmonica, juggling, telling a joke. Fortunately, the last one worked. Unfortunately, at that moment a helicopter landed and I was kidnapped. Mary tried to rescue me, but the jailer was having none of her please for mercy or bribes. Eventually, he tired of the conversation and wandered off into the woods, and Mary went all Bastille day on the prisoners.


The narrative was based on my first interaction with AI Dungeon 2, which can be read in full.

1:

Categories
linux

Protip: Don’t dd Your Root Partition

In which our hero makes the titular mistake.

I was in the process of creating a new DomU, a virtual machine guest under Xen, and had just completed a basic Arch install.

At this point I thought “Oh, it would be handy to have a bare-bones Arch image ready to go, I should make that happen”. So I took an LVM snapshot of the logical volume in one terminal window, and continued with post-install setup in another.

I went to copy the logical volume using dd and tab completed:

$ dd if=/dev/vg/newdomudisk of=/dev/vg/a<TAB>
$ dd if=/dev/vg/newdomudisk of=/dev/vg/archroot

Because it’s an Arch install, I had probably named it ‘archsomething’, right? Well, no.

I had named the intended LV ‘basearch’ because it’s a base Arch install. While I continued customising the guest, I had a nagging feeling that something wasn’t right.

$ ls /etc
  Segmentation fault

Side note: this is almost the same point as Mario Wolczko in the [in]famous recovery story as told to alt.forklore.computers, archived in a bunch of places (mirror here). Only his error was “ls: not found.” The story is well worth a read for the creativity shown in recovery.


My reaction was ‘Oh poop‘. I stopped the dd. Unfortunately it had written a good couple of gigabytes by that point. The ssh connection stayed up for a while, letting me see that most things had been nuked. Then the connection hung, and the guests stopped responding.

I was caught out in this situation by a couple of things. My other server running the Xen hypervisor uses Debian as a base, so it didn’t cross my mind that an Arch logical volume would be the one with the hypervisor. I was also multitasking, and didn’t double-check the target (LV) before dd-ing.

So: make names obvious. Make them blindingly obvious. I’ve named the new LV containing root for the Xen hypervisor xenroot. and you can bet I’ll be double and triple-checking dd for a good while, at least!

Categories
all posts games

Take 2 Claims ‘WZLJHRS’

Jack Howitzer as Jack Howitzer in ‘Jack Howitzer’

I played some GTA V: Online the other night — my three word review: ‘fun but clunky’ — and uploaded the footage of it as I usually do, leaving it as a draft to be later updated with my automation tools.

Later on I saw I had a notification on YouTube and thought “Ah! Someone’s subscribed, or commented, or similar”. Actually, I had a copyright claim from Take 2 Interactive for ‘WZLJHRS’. What?

“There are some visibility restrictions on your video. However, your channel isn’t affected. No one can view this video due to one or more of the Content ID claims below. WZLJHRS: Video cannot be seen or monetized; Blocked in all territories by Take 2 Interactive”

The just under two minute segment in question was a GTA teevee programme (‘Jack Howitzer’, a documentary/mockumentary about a washed up action movie actor) I watched while waiting for my friend to arrive at my office. It had some funny moments.

I am mindful of YouTube’s content ID system, and I mute game music pre-emptively having been bitten in the past by that. I didn’t suspect for a second that a fake TV show in a game would result in an entire video being blocked.

I will have to amend the video and reupload.

PS: WZLJHRS: WZL ? Weazel News network || JHRS ? Jack Howitzer show?

Categories
docker matrix

Matrix bot: Maubot on Docker

Beep, boop, I’m a bot

maubot is a bot system for Matrix, the decentralised messaging platform. The good folks in the room I hang out in have expressed a desire for a bot on occasion, so I thought I’d check it out.

While it’s described as a ‘system’, maubot can just as happily run one a single bot, if that’s what you want. Features are provided by plugins- maubot won’t do anything out of the box.

It has a Docker image, so I figured I’d use that. I followed the instructions on the Wiki page with only minor modification:

0. Create a directory (mkdir maubot) and enter it (cd maubot).

1. Pull the docker image with docker pull dock.mau.dev/maubot/maubot:latest

2. Run the container for the first time to create a config file:

docker run --rm -v `pwd:/data:z dock.mau.dev/maubot/maubot:latest

3. Update the config to your liking. I added a username with password in the admins section, and my homeserver in the registration_secrets section.

I then created a short docker-compose.yml file:

version: "3"
services:
  maubot:
    image: dock.mau.dev/maubot/maubot
    container_name: maubot
    volumes:
      - /home/robert/docker-containers/maubot:/data:z
    ports:
      - 29316:29316
    restart: unless-stopped
    labels:
      - "traefik.http.routers.maubot.rule=Host(`maubot.bertieb.org`)"

networks:
  default:
    external:
      name: traefik_default

And start with docker-compose up -d.

(The traefik labels make the service available via http://maubot.bertieb.org)

Setting up a Bot

This gets you a management interface, but the bot itself needs to be set up. It’s not entirely obvious, though the instructions are present elsewhere in the wiki.

I manually created a user using the Riot web interface; though there are instructions on how to do it via CLI using mbc. If you go the manual route the access token that maubot asks for can be yound by clocking your avatar/username dropdown in the top left to access ‘Settings’ -> ‘Help & About’ -> ‘Access Token’:

Once done, you should have a bot:

But it won’t do anything just yet, you need to add plugins!

As an aside, I managed to run into a permissions issue at this point, where the maubot interface wasn’t responding via HTTP, and docker logs was complaining:

[2019-11-29 19:18:15,617] [INFO@maubot.init] Initializing maubot 0.1.0.dev28
[2019-11-29 19:18:15,618] [DEBUG@maubot.loader.zip] Preloading plugins...
Traceback (most recent call last):                             
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/maubot/maubot/__main__.py", line 58, in <module>
    init_zip_loader(config)
  File "/opt/maubot/maubot/loader/zip.py", line 270, in init
    ZippedPluginLoader.load_all()                                   
  File "/opt/maubot/maubot/loader/zip.py", line 253, in load_all
    for file in os.listdir(directory):
PermissionError: [Errno 13] Permission denied: '/data/plugins'

It’s simplest to upload the desired plugins from the web interface itself. As the wiki points out elsewhere:

To add a plugin, upload a zip file containing the maubot.yaml and relevant files at the top level. Github releases of plugins have those premade (see i.e. https://github.com/TomCasavant/PollMaubot/releases – file casavant.tom.poll-v1.0.0.mbp) – mau.dev/maubot has a CI that makes those. Also, mbc build will make those with the relevant files.

So, for the echo plugin, go to its GitHub releases page, grab the file xyz.maubot.echo-v1.3.1.mbp and upload that to the maubot web interface by hitting the plus arrow beside ‘plugins’.

Alternatively, you can compile the plugins to a .mbp yourself: clone the main maubot repo, run setup.py to get the dependencies, then clone the plugin[s] in maubot’s directory, and finally run mbc build <plugin> for the plugin you just cloned (per mChron’s comment).

Once you have the plugins you want, create an instance for them and assign them to the bot client. That interface will also show you configuration options and let you view logs, if needed. Then you’re good to go!

Categories
all posts

A Sunlit Kelpie

Cold but bright

It may have been down at around -2°C that morning, but the walk in Falkirk was great. The Kepies are really impressive!

Categories
all posts

Automating YouTube Uploads With OCR Part 10: Reflection, Lessons Learned, and Improvements

Every day is a school day

We set out to use OCR to extract metadata from frames of the loading and ending screens of Deep Rock Galactic to use to fill in details of videos destined for YouTube.

In other words we went from:

To:

Why?

It’s always good to reflect when you’ve done something. Did it go well, or not as well as expected? What did you hope to achieve? Did you achieve that? What has it changed? There’s as many ways to reflect as there are things to reflect on.

In this project I wanted to achieve a greater degree of automation with my video creation workflow. Partly because it would save me time:

The ever-relevant XKCD (https://xkcd.com/1205/)

The other reason is because copying text is no longer the provenance of monks in a scriptorium- it’s a repetitive, uncreative task. I enjoy spending time playing games with my friends, and those videos are there so that they and others can relive and enjoy them too; spending time copying text is not a good use of my time.

However, there’s a more pertinent image for this sort of task:

Pretty much spot on (https://xkcd.com/1319/)

There were 47 videos in the test batch. Let’s say that I would have spent five minutes per video copying across the title, writing a description, figuring out the tags and such; doing that manually would have taken 235 minutes, or nearly four hours. That might sound like a lot, but it’s certainly less time than I worked on the automation.

The automatic OCR will have ongoing benefits – there are more videos to process.

But the best part is that I learned. I learned about tesseract and OCR, a bit about OpenCV, and honed my python programming skills.

Lessons Learned

OCR is good enough to extract text from video stills. I assumed this, but it is good to have it confirmed.

Cleaning up images makes a huge difference to OCR accuracy. I could probably have improved detection in the opening image to use just that if I had cleaned up earlier in the process; but using both loading and ending images gives more metadata, so it worked out okay.

It’s really easy to leak file descriptors. Late on, when I went to test with a wider variety of videos, I ran into this issue “OSError: [Errno 24] Too many open files“. Instead of using tempfile.mktemp, which unexpectedly kept the fd, I had to use tempfile.NamedTemporaryFile. That one took a bit of hunting down as it looked like pytesseract was failing, and coincidentally they had a couple of issues in previous versions due to the same issue (mktemp vs NamedTemporaryFile)! Most confusing.

What Would I Do Differently?

Implement automated testing. This would have hugely helped in the refinements stage, where regressions in detection accuracy occurred as I refined. There were a couple of reasons that put me off at the time, but they were more excuses than reasons:

  • this was a “quick and dirty” attempt to get a tool working, refinements to it can come later

    This an old, old excuse; proved false time and again. It’s sometimes phrased as “This is just a temporary fix, will do it properly later” and other variants. What it boils down to is “We’re going to do this the ‘wrong’ way for now, and change it later”.

    It sounds fine, if you actually sort it later, but invariably that doesn’t happen. Time and effort have to be focused somewhere, and it’s a harder sell to redo something that “works” (however hackily) than to implement a new feature, or get a product out the door.

    Here it was even worse: doing that work may well have improved the “quick and dirty” process.
  • the frame extraction + OCR processes aren’t quick, and tests should be quick to run; it’s also hard to break apart the pipeline

    This excuse is on slightly firmer ground, but not by much! It’s true that these things take time, but they can be broken down to components and tested individually using sample images (for example).

    It might not provide the coverage of a real life full data set, but it’ll catch the worst of regressions.

Future Improvements + Directions

Use only a start or end frame if one is missing. At the moment a video is skipped if either the start or end frame is not detected. That leaves the video to be done entirely manually- we could get at least some of the metadata from without the other.

Detect in-game menu screen. For times when I hit the record button too late (or OBS takes too long to spin up), I could go into the menu which has a couple of bits of metadata. I would need to remember to do this, but I usually realise I’ve hit record too late. Combined with the above improvement, we could increase video coverage.

Expand OCR to other games. This is non-trivial but an obvious way to go. Killing Floor 2 is the likeliest next candidate as at the moment it’s the one we play the most and also has metadata to capture.

Consider a further automated pipeline. As it stands, I have to run the program against videos manually; not a big deal. But a tool that detected new videos, automatically runs the OCR tool against them and puts them and the JSON output in a convenient place (± automatically uploading them to YouTube) would make the process more streamlined. This may be beyond my own need or indeed tolerance- I could see it being potentially frustrating if I wanted to manually handle a video differently.

Overall though, I am happy with how the tool turned out.

Categories
docker troubleshooting

Using Discourse Dev with Traefik (without ‘Bad Gateway’ + ‘blocked host’)

tl;dr:

  • Traefik grabs the first port it sees, which on the dev image is 1080- we want port 9292. Use --label=traefik.http.routers.discourse-dev.port=9292
  • You need to set a dev host using en env var in the container: -e DISCOURSE_DEV_HOSTS=your_dev_hostname \

With the dev version of Discourse working, I wanted to let its connectivity be managed by the traefik proxy. But whichever way I sliced it, I would get a Bad Gateway error. The usual suspect for this is not setting a port, or having the service on a different network from traefik itself. However, this issue persisted for me.

I had to add the following to (discourse_source_root)/bin/docker/boot_dev, in the docker run ... section:

    --network=traefik_default \
    --label=traefik.port=80 \
    --label=traefik.docker.network=traefik_default \
    --label=traefik.http.routers.discourse-dev.rule=Host\(\`$DEVHOST\`\) \
    --label=traefik.http.services.discourse-dev.loadBalancer.server.port=9292 \

I set DEVHOST=<my dev host> earlier in the file, or you can use the host there directly. The last line points traefik at the correct port (9292) in the discourse-dev container.

Accessing by host then produces a page with a blocked host error:

Blocked host: discourse_dev_host
To allow requests to discourse_dev_host, add the following to your environment configuration:
config.hosts << “discourse_dev_host”

Setting DISCOURSE_DEV_HOSTS permits access on those hosts. We need to do this in the container, so add the following to the same section in the same file:

-e DISCOURSE_DEV_HOSTS=$DEVHOST \

Which permits access via that (or those) hostname(s).

Categories
docker linux troubleshooting

[solved] ‘Connection closed’ in Discourse Dev Install

tl;dr: this was a temporary issue solved by a later commit, if you checked out discourse after 28 Oct but before 4 Nov, git pull to update


Having installed a production Discourse forum, I wanted to get a local dev instance up and running for testing.

There are good instructions for doing just that using Docker. Don’t do what I did: follow the production install method and assume that will work by pointing the prod hostname at it in /etc/hosts.

Unfortunately, when I followed the instructions to set up the dev instance, I was greeted with an ‘Unable to connect’ screen. (ERR_FAILED). Even using telnet from the same host failed:

bertieb@ubunutu-vm:~/discourse$ telnet 127.0.0.1 9292
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

Dang. I tried this across fresh Arch and Ubuntu Server (19.10 + 18.04.3 LTS) installs and got the same thing.

Installing the non-Docker version worked but only for localhost; then a comment on that guide’s topic pointed me at a recent change to interface binding. Checking out the commit before that change let me connect from other hosts in both the Docker and non-Docker versions.

As of 2019-11-04 a later commit sorted this issues and added a specific flag (-b) for permitting connections from other hosts.

Categories
automation ocr python

Automating YouTube Uploads With OCR Part 9: Bringing it All Together

I love it when a plan comes together

We’ve been finding a way to automate YouTube uploads using tesseract to OCR frames from Deep Rock Galactic videos to extract metadata used in the video on YouTube.

We got to the stage where we have good, useful JSON output that our automated upload tool can work on. Job done? Well, yes- I could point the tool at it and let it work on that, but it would take quite a while. You see, to give a broad test base and plenty of ‘live-fire’ ammunition, I let a blacklog of a month’s videos build up.

Automating Metadata Updates

Why is that an issue for an automated tool? The YouTube API by default permits 10 000 units per day of access, and uploading a video costs 1600 units. That limits us to six videos per day max, or five once the costs of other API calls are factored in. So I’d rather upload the videos in the background using the web API, and let our automated tool set the metadata.

For that we need the videoIds reported by the API. My tool of choice to obtain those was shoogle. I wrapped it in a python script to get the playlistId of the uploads playlist, then grabbed the videoIds of the 100 latest videos, got the fileDetails of those to get the uploaded fileName… and matched that list to the filename of JSON entries.

So far so good.

Faster Thumbnails

But one of the personal touches that I like to do, and that will likely not be automated away is to pick a frame from the video for the thumbnail. So I need a way to quickly go through the videos, find a frame that would make a good thumbnail, and add that as a field to thumb for the correct video entry. I’ve used xdotool in the past to speed up some of the more repetitive parts of data entry (if you’ve used AutoHotKey for Windows, it’s similar to that in some ways).

I threw together a quick script to switch to the terminal with vim, go to the filename of current video in VLC (VLC can expose a JSON interface with current video metadata- the ones I’m interested in are the filename and the current seek position), create a thumb ? time entry with the current time and then switch back to VLC. That script can be assigned a key combo in Openbox, so the process is: find frame, hit hotkey, find frame in next video, hotkey, repeat.

Though the process is streamlined, finding a good frame in 47 videos isn’t the quickest! But the final result is worth it:

We have videos with full metadata, thumbnail and scheduled date/time set.

Glorious.

I included a video that failed OCR due to a missing loading screen (I hit record too late). There’s a handful of those- I found five while doing the thumbnails. I could do a bit of further work and get partial output from the loading/ending screen alone; or I could bit the bullet and do those ones manually, using it as a reminder to hit the record button at the right time!

Categories
automation computer vision programming python

Automating YouTube Uploads With OCR Part 8: Output

Nearly a working tool!

We’ve been using python and tesseract to OCR frames from a video footage of Deep Rock Galactic to extract metadata which we can use for putting the videos on YouTube.

Mutators

Nearly all of the elements are captured, there’s just the mutators left to capture: warnings and anomalies. These appear in text form on the starting screen on either side of the mission block:

Here we have a Cave Leech Cluster and a Rich Atmosphere.

Since the text of these mutators is known to a list of ten or less for each, we can detect them using a wide box, then hard-casting them to whichever potential output it has the smallest Levenshtein distance to.

Tie-Breaking Frames

The loading/ending frame detection works well for most, but on the odd one or two it suffers. It’s best to ignore the frames which are completely/pretty dark (ie either transition or fade-in) , and the ones that are very bright (eg light flash) as that hurts contrast and so hurts OCR.

Using ImageStat from PIL we can grab the frame mean (averaged across RGB values), then normalise it to add to our frame scoring function in the detection routine.

We want to normalise between 0 and 1, which is easy to do if you want to scale linearly between 0 and 255 (RGB max value): just divide the average by 255. But we won’t want that. Manually looking at a few good, contrasty frames it seemed that the value of 75 was the best- even by 150 the frame was looking quite washed out. So we want to have a score of 0 at mean pixel value of 0 and 150; and a score of 1 at mean pixel value of 75:

# Tie break score graph should look something like:
# (tb_val)          
# |    /\            
# |   /  \           
# |  /    \          
# |_/      \_ (x)                
# 0    75    150                
#                   
# For sake of argument using 75 as goldilocks value
# ie not too dark, not too bright

75 is thus our ‘goldilocks’ value- not too dark, not too light. So our tiebreak value is:

tb_val = (goldilocks - (abs(goldilocks - frame_mean)))/goldilocks

Output

Since we’ve gotten detection of the various elements to where we want them, we can start generating output. Our automated YT uploader works with JSON, and looks for the following fields: filename, title, description, tags, playlists, game, thumb ( ? time, title, additional), and scheduled.

Thumb time and additional we can safely ignore. Title is easy, as I use mission_type: mission_name. All of my Deep Rock Galactic uploads go into the one playlist. Tags are a bunch of things like hazard level, minerals, biome and some other common-to-all ones like “Deep Rock Galactic” (for game auto detection). The fun ones are description and scheduled.

Funnily enough, one of my earliest forays into javascript was a mad-libs style page which took the phrases via prompt() and put them in some text.

This was back in the days of IE4, and javascript wasn’t quite what it is today…

For the description, I took a bit of a “mad libs” style approach: use the various bits and pieces we’ve captured with a variety of linking verbs and phrases to give non-repetitive output. This mostly comes down to writing the phrases, sticking them in a bunch of lists and using random.choice() to pick one of them.

For obvious reasons, I don’t want to publish fifty-odd videos at once, rather spread them out over a period. I publish a couple of DRG videos on a Monday, Wednesday, Friday and at the weekend. To do this in python, I decided to use a generator, and call next() on it every time we need to populate the scheduled field. The function itself is fairly simple: if the time of scheduled_date is the earlier of the times at which I publish, go to the later one and return the full date; if it’s at the later time, increment by two days (if Monday/Wednesday), or one day and set the time to the earlier one.

We run this through json.dumps() and we have output! For example:

{
  "filename": "2019-10-17 19-41-38.mkv",
  "title": "Elimination: Illuminated Pocket",
  "description": "BertieB, Costello and graham get their orders from Mission Control and get dropped in to the Fungus Bogs to take on the mighty Dreadnoughts in Illuminated Pocket (Elimination)\n\nRecorded on 2019-10-17",
  "tags": [
    "Deep Rock Galactic",
    "DRG",
    "PC",
    "Co-op",
    "Gaming",
    "Elimination",
    "Dreadnought",
    "Fungus Bogs",
    "Hazard 4",
    "Magnite",
    "Enor Pearl"
  ],
  "playlists": "Deep Rock Galactic",
  "game": "drg",
  "thumb": {
    "title": "Pocket Elimination"
  },
  "scheduled": "2019-11-18 18:00"
}

Looks good!