Skip to content

Proactive Laziness: Extract Team Names from a ROM using awk

  • by

Do it not the easy way, and not the hard way, but some other way

Context

During a Subscriber Crown, I assign teams (usually) to subscribers. To do this, I wrote a team picker in python which highlights a bunch of teams and then ‘picks’ one:

That python program takes as one of its inputs a plain-text list of teams, one team per line. For old genesis games (NHL 94, Fifa International Soccer, etc) those team lists are usually somewhere, like the game’s Wikipedia page or a retro gaming wiki. Simple enough to copy and paste. But in doing the entry for MLBPA Baseball, I couldn’t easily find a list, so I had to get it myself.

There are twenty-something teams in the game so the dumb-but-infallible method of loading up the ROM (which means booting up a VM) and manually copying the teams felt a bit, well, friction-y.

A Learning Experience

I’m a big fan of pushing oneself to learn features and new ways of doing things. It’s easy to pour scorn on someone who, say, works on a spreadsheet and manually adds up values in a column to get a total. But if that person comes from a background of ledgers or other paper spreadsheets and no-one told them that digital spreadsheets have formulae that can be used, much less ‘you can use SUM() for that’, well, it’s not their fault!

I could quite easily open up the ROM in a hex editor, and manually eyeball the teams. In fact, that’s what I did at first.

<insert “I don’t even see the code any more” reference>

Then I thought “don’t be so silly”, there’s a learning opportunity here. Vim can work as a hex editor, so maybe I could use that and use it to jump by a fixed byte offset to each team, copying each one to a file! Except, er, I don’t know how to jump byte-wise in a file opened by xxd / a vim buffer run through xxd. I asked in #vim but I think there was a slight bit of talking at cross-purposes: I was learning-oriented whereas the person trying to help me was goal-oriented. No problem, just not quite a meeting of the minds; I having done it myself, I do appreciate those who help others for free.

Over to awk

To be honest, strings gets me most of what I need from the file:

n;@U          
'n;@U          
( 0<          
( ;|          
tJ;|           
(_Nu           
f@09             
P(NuBmr         
Bms`BmsbBmsdBmsfA
dDR9            
dDR9           
S*JmT          
NnNu;|        
Atlanta        
ATL          
YNMaddux,Greg 
Glavine,Tom      
Avery,Steve    
Smoltz,John     
Smith,Pete      
McMichael,Greg 
Bedrosian,Steve
Howell,Jay    
Mercker,Kent    
Stanton,Mike     
Nixon,Otis    
Blauser,Jeff    
Gant,Ron      
McGriff,Fred     
Pendleton,Terry 
Justice,Dave   
Berryhill,Damon
Lemke,Mark      
Bream,Sid    
Olson,Greg    
Cabrera,Francisco
Hunter,Brian    
Belliard,Rafael
Pecota,Bill     
Sanders,Deion  
Baltimore          
BALT

(etc)

Looking at it, there’s an obvious section with the team name, team abbreviation and player roster. A couple of approaches come to mind: I could use regex with start+end markers to pull out only the section with teams; or I could select lines which are all caps (team abbreviation), and pull out the previous line (team name).

The first approach seems the most straightforward:

awk '/PAT1/,/PAT2/' file 
awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag'

via this SO answer.

From there it’s trivial to remove the bits we don’t want: player names, by filtering out lines with a comma (grep -v ,), and filtering out uppercase letters (grep -e "[^[:upper:] ]"):

formatted with column for improved screenshottability

I’d like to investigate more, and see if there’s an elegant way to do it in pure awk, but this is the output I need.

Reflection

In my view it’s important to explore what tools can do, and I think little tasks like this are a useful way of gently pushing boundaries. It helps that it is a real task, so that you actually have to complete it; as compared to synthetic learning ‘follow along’ learning exercises for which the temptation to abandon them can be significant!

I perhaps jumped into #vim a little hastily. I have a natural tendency to over-research before asking questions; I’d rather ‘waste’ my own time trying to find an answer than other folks’. I am trying to find the middle ground sweet spot there.

Tell us what's on your mind

Discover more from Rob's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading