Do it not the easy way, and not the hard way, but some other way
Context
During a Subscriber Crown, I assign teams (usually) to subscribers. To do this, I wrote a team picker in python which highlights a bunch of teams and then ‘picks’ one:
That python program takes as one of its inputs a plain-text list of teams, one team per line. For old genesis games (NHL 94, Fifa International Soccer, etc) those team lists are usually somewhere, like the game’s Wikipedia page or a retro gaming wiki. Simple enough to copy and paste. But in doing the entry for MLBPA Baseball, I couldn’t easily find a list, so I had to get it myself.
There are twenty-something teams in the game so the dumb-but-infallible method of loading up the ROM (which means booting up a VM) and manually copying the teams felt a bit, well, friction-y.
A Learning Experience
I’m a big fan of pushing oneself to learn features and new ways of doing things. It’s easy to pour scorn on someone who, say, works on a spreadsheet and manually adds up values in a column to get a total. But if that person comes from a background of ledgers or other paper spreadsheets and no-one told them that digital spreadsheets have formulae that can be used, much less ‘you can use SUM()
for that’, well, it’s not their fault!
I could quite easily open up the ROM in a hex editor, and manually eyeball the teams. In fact, that’s what I did at first.
Then I thought “don’t be so silly”, there’s a learning opportunity here. Vim can work as a hex editor, so maybe I could use that and use it to jump by a fixed byte offset to each team, copying each one to a file! Except, er, I don’t know how to jump byte-wise in a file opened by xxd / a vim buffer run through xxd. I asked in #vim but I think there was a slight bit of talking at cross-purposes: I was learning-oriented whereas the person trying to help me was goal-oriented. No problem, just not quite a meeting of the minds; I having done it myself, I do appreciate those who help others for free.
Over to awk
To be honest, strings
gets me most of what I need from the file:
n;@U 'n;@U ( 0< ( ;| tJ;| (_Nu f@09 P(NuBmr Bms`BmsbBmsdBmsfA dDR9 dDR9 S*JmT NnNu;| Atlanta ATL YNMaddux,Greg Glavine,Tom Avery,Steve Smoltz,John Smith,Pete McMichael,Greg Bedrosian,Steve Howell,Jay Mercker,Kent Stanton,Mike Nixon,Otis Blauser,Jeff Gant,Ron McGriff,Fred Pendleton,Terry Justice,Dave Berryhill,Damon Lemke,Mark Bream,Sid Olson,Greg Cabrera,Francisco Hunter,Brian Belliard,Rafael Pecota,Bill Sanders,Deion Baltimore BALT
(etc)
Looking at it, there’s an obvious section with the team name, team abbreviation and player roster. A couple of approaches come to mind: I could use regex with start+end markers to pull out only the section with teams; or I could select lines which are all caps (team abbreviation), and pull out the previous line (team name).
The first approach seems the most straightforward:
awk '/PAT1/,/PAT2/' file
awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag'
via this SO answer.
From there it’s trivial to remove the bits we don’t want: player names, by filtering out lines with a comma (grep -v ,
), and filtering out uppercase letters (grep -e "[^[:upper:] ]"
):
I’d like to investigate more, and see if there’s an elegant way to do it in pure awk, but this is the output I need.
Reflection
In my view it’s important to explore what tools can do, and I think little tasks like this are a useful way of gently pushing boundaries. It helps that it is a real task, so that you actually have to complete it; as compared to synthetic learning ‘follow along’ learning exercises for which the temptation to abandon them can be significant!
I perhaps jumped into #vim
a little hastily. I have a natural tendency to over-research before asking questions; I’d rather ‘waste’ my own time trying to find an answer than other folks’. I am trying to find the middle ground sweet spot there.