Services, Servers, DomUs and Containers, Oh My!

What a tangled web we weave…

I was confused. Staring at a console window, wondering how I’d installed a program†. The system package manager knew nothing about it, pip pretended like it never heard of it, and I hadn’t downloaded and compiled the source.

I’ve never really had what anyone would call a sane management approach to my servers and services. The closest I’ve got is using Xen as a hypervisor and trying to separate DomUs (VM guests) by service type, backed by LVM-on-mdraid storage*. That sounds alright in theory, but in practice most services have tended to congregate on the largest guest, turning that DomU into a virtualised general purpose server. A server that mixes and matches the system package manager, other package managers like pip, SteamCMD and manual installs.

Pug fugly, in other words.

Pug-fugly, as coined in Pgymoelian

* the storage also sounds good in theory but the implementation has led me to dub the machine the ‘Frankenserver’ (more on that another time)

This is fairly typical. You need to do X. Not only that, but you need to do it RIGHT NOW. Software ABC does that. So you download it from whatever source seems most convenient and up-to-date, glance at the ‘Quickstart guide’ (while saying thank goodness for those!), do a bit of minimal configuration and then you’re up and running, doing X.

But what’s the big deal? The service[s] work, after all.

The issue is a general one: it takes time to figure out the setup before you can usefully interact with it.

This doesn’t just apply to DevOps; but to coding, writing, maintenance and repair, DIY, cooking, house management.

Or more simply: Fail to plan, plan to fail.

I found that it was taking me time to get my head around:

  •  what I was dealing with
  • how it had been set up
  • why it wasn’t working
  • how to fix it
  • how to update it

I’d do those things, sort whatever, get the service working again, then six months later I’d have to figure it all out again.

“What a way to run a railroad…”

Clearly, there must be a better way.

In fact there are several better ways, depending on what you want to do. DevOps is huge business, and scales into the multinational megacorp range. But the home user can benefit too. There are clear benefits in using a well-organised system for pretty much anything, and managing servers, services and other applications is no exception. Used well, maintainability, security, reliability are all enhanced.

But how does one get started? There are plenty to choose from. Some folks I know love Docker, others opt for LXD on LXC (those options are not exclusive). There are also the configuration management tools, like Puppet, Ansible, Chef (etc).

Well, I briefly used Docker in the past, and now have it on one of the DomU guests, hosting a few services I used to run elsewhere. This seems like reason enough to dip my toes deeper in the waters and move more services to containers, or at least to automated processes.

It’s rarely glamorous, but writing good documentation can make a huge difference to the person that follows you. Even when that person is you.

As an aside, the other key ingredient other than having good systems in place is to have good documentation.

For example, before writing this up I wondered about installing a new spam plugin. I used to use Spam Karma 2, but that’s been unmaintained for a long while. But which one? Well, seems I’ve used Akismet and Anti Spam Bee in the past, but why did I stop using them? I have a vague recollection of the former re-moderating old comments and declaring them spam, and the latter not working in some way, but what?

Good documentation, make it your non-New Year’s Resolution.

So the take-home message here is that ad-hoc setup pop up and stick around for longer than they should; don’t do that, have a good system instead and document what you’re doing and why.

Because it’s good to have a goal, my aim is to get low-hanging fruit services moved over to Docker in the first instance (heh) to learn more about the tech. Fore there I can decide what I can move to containers, and maybe even see if LXC would fit my needs anywhere. I’d also like to see if I can apply this to the wee tools I write myself to help automate my workflows- rather than running them manually, perhaps I can develop them as services. And while doing all of this documenting what I am doing and why.

Then maybe one day I won’t have to ask “I have a (python-based) program installed that doesn’t seem to have been installed either by apt or pip, and obviously I can’t remember… is there any way to figure out how I actually installed it? :D”.


beets, for music library organisation/tagging/management

Featured image by steve gibson on Flickr

[solved] MySQL: Can’t find file ‘./db/table.frm’ (errno: 13)

tl;dr If you’re seeing this and the table does exist- check (and fix) permissions!

I was searching my backups for a database file which contained the entries for an old ranty blog I used to have before I cancelled the domain.

Lo and behold I had a named, dated .tgz file. Unusually, it contained a backup of the MySQL directory structure; rather than a mysqldump‘d set of SQL queries to reconstruct the databases and tables. No matter, I copied the database directory into /var/lib/mysql/db.

Browsing via the command-line interface indicated the database was present (SHOW DATABASES) and usable (USE db). So I tried to SELECT post_title FROM ranty_posts LIMIT 5. But no can do:

Can’t find file: ‘./db/ranty_posts.frm’ (errno: 13)

The problem is permissions, the file is there, slightly-misleading error message notwithstanding. Fortunately, it’s an easy fix- give mysqld the ability to read the files, eg:

# chown mysql:mysql /var/lib/mysql/db/ -R

Which will change the user and group ownership to mysql.


Database and table names changed to protect the innocent

[Fixed] MySQL: Table is marked as crashed and last (automatic?) repair failed (+ WordPress)

tl;dr: run myisamchk on the problematic table

I’ve run into the following error in my Apache error.log recently:

Table 'database.tablename' is marked as crashed and last (automatic?) repair failed

Fortunately the fix is simple: run myisamchk on the table which is marked as crashed:


$ sudo su
# service mysql stop
# cd /var/lib/mysql/databasename
# myisamchk -r tablename
MyISAM-table 'tablename' is not fixed because of errors
Try fixing it by using the --safe-recover (-o), the --force (-f)
 option or by not using the --quick (-q) flag
# myisamchk -r -o -f tablename
Data records: 107435
Found block that points outside data file at 16166832
# service mysql start

I’ve run into these errors before due to running out of disk space on the (admittedly tiny) VPS I had.

I also had this problem with a WordPress database able, causing the often-seen and unhelpfully terse:

Error establishing a database connection

Interestingly, this wasn’t getting bounced to error.log, and I had to use the WordPress database repair screen to track down which one needed the fix (which was the same myisamchk).

All sorted now!

Speeding up fdupes

tl;dr: use jdupes

I was merging some fileserver content, and realised I would inevitably end up with duplicates. “Aha”, I thought “time to use good old fdupes“. Well yes, except a few hours later, fdupes was still only at a few percent. Turns out running it on a collected merged mélange of files which are several terabytes in size is not a speedy process.

Enter jdupes, Jody Bruchon’s fork of fdupes. It’s reportedly many times faster than the original, but that’s only half the story. The key, as with things like Project Euler is to figure out the smart way of doing things– in this case smart way is to find duplicates on a subset of files. That might be between photo directories if you think you might have imported duplicates.

In my case, I care about disk space (still haven’t got that LTO drive), and so restricting the search to files over, say, 50 megabytes seemed reasonable. I could probably have gone higher. Even still, it finished in minutes, rather than interminable hours.

/jdupes -S -Z -Q -X size-:50M -r ~/storage/

NB: Jdoy Bruchon makes an excellent point below about the use of -Q. From the documentation:

-Q --quick skip byte-for-byte confirmation for quick matching
WARNING: -Q can result in data loss! Be very careful!

As I was going to manually review (± delete) the duplicates myself, potential collisions are not a huge issue. I would not recommend using it if data loss is a concern, or if using the automated removal option.

jdupes is in Arch AUR and some repos for Debian, but the source code is easy to compile in any case.

Compressing Teamspeak 3 Recordings Using sox

tl;dr: Loop through the files in bash, sox them to FLAC

Success!

I’ve been combining fileserver contents recently, and I came across a little archive of Teamspeak 3 recordings:

$ du -sh .
483G /home/robert/storage/media/ts_recordings/

Eep.

I wrote a quick-and-dirty script to convert the files:


#!/bin/bash

n=0
total=$(ls *.wav|wc)
ls *.wav | while read file; do
        sox -q ${file} ${file%.*}.flac
        if [ -e ${file%.*}.flac ]; then
                if ! [ -s {file%.*}.flac ]; then
                        rm ${file}
                else
                        echo "${file%.*}.flac is zero-length!"
                fi
        else
                echo "Failed on ${file}"
        fi

        ((n++))
        if  ! ((n % 10 )); then
                echo "${n} of ${total}"
        fi
done

The script checks that the FLACs replacing the WAVs exist and are not zero-length before removing the original.

This was fine, but after finishing, I was still left with a bunch of uncompressed files in RF64 format, which unfortunately errored.

It turns out sox 14.4.2 added RF64 read support, so I grabbed that on my Arch machine, and converted the few remaining files (substituting wav ? rf64 twice in the script above.

The final result?

$ du -sh .
64G /home/robert/storage/raid6/media/ts_recordings/

400 gigs less space and still lossless? Ahh, much better.

[Solved] “Logical volume is used by another device”

tl;dr: use dmsetup remove before trying lvremove

Note: Volume group and logical volume names have been substituted here. I’m not entirely sure it’s necessary, but better safe than sorry. If following this, please use the names of your volume group[s] and logical volume[s]

I am in the process of combining fileserver information, and so I have been touching parts of the system not usually looked at in the normal case of day-to-day operations. For some reason, on one of my logical volumes I had created a partition table and added a partition. Of course, that worked normally so there was no reason to be aware of this — clearly I had blanked the fact that I did it at all not long after doing so — until recently.

The Problem

Logical volume vg/lv-old is used by another device.

After copying the data over to a new logical volume, I wanted to remove the now-unnecessary original logical volume that contained the partition. Easy, right?


# lvremove -v /dev/vg/lv-old
    DEGRADED MODE. Incomplete RAID LVs will be processed.
    Using logical volume(s) on command line
  Logical volume vg/lv-old is used by another device.

Okay, what’s using it? cat /proc/mounts reports that it isn’t mounted. lsof and fuser return nothing. Maybe retrying the command will work*… nope.

There are a bunch of posts around this, mostly saying “make sure it is umounted first”, or “try using -f with lvremove“. And the old favourite: “a reboot fixed it”.

Find Out device-mapper’s Mapping

Well, the culprit in this case seemed to be device-mapper creating a mapping which counted as ‘in-use’. Check for the mapping via:


# dmsetup info -c | grep old
vg-lv--old       253   9 L--w    1    2      1 LVM-6O3jLvI6ZR3fg6ZpMgTlkqAudvgkfphCyPcP8AwpU2H57VjVBNmFBpL
Tis8ia0NE

Find Out Mapped Device

Then use that to find out what is holding it:


$ ls -la /sys/dev/block/253\:9/holders

drwxr-xr-x 2 root root 0 Dec 12 01:07 .
drwxr-xr-x 8 root root 0 Dec 12 01:07 ..
lrwxrwxrwx 1 root root 0 Dec 12 01:07 dm-18 -> ../../dm-18

Remove Device (via `dmsetup remove`)

Then do a dmsetup remove on that device-mapper device:


# dmsetup remove /dev/dm-18

Retry `lvremove`

And you’re good to go with lvremove:


# lvremove -v /dev/vgraid6/lv-old
    DEGRADED MODE. Incomplete RAID LVs will be processed.
    Using logical volume(s) on command line
Do you really want to remove active logical volume lv-old? [y/n]: y
    Archiving volume group "vg" metadata (seqno 35).
    Removing vg-lv--old (253:9)
    Releasing logical volume "lv-old"
    Creating volume group backup "/etc/lvm/backup/vg" (seqno 36).
  Logical volume "lv-old" successfully removed

Bish bash bosh!

Addendum

*: I’m not sure of the thought process behind “just try it again”.

I’m reminded of a short bit of Darrell Hammond’s stand up (paraphrased):

“You know that message you get when you dial the wrong number that tells you to ‘check you have the right number and dial again’? Well, women will check the number and try again. Men will try the same number, but this time we’ll push the buttons a ******** harder…”

[Solved] “Filesystem is already n blocks long. Nothing to do!”

tl;dr: if you’re sure you did everything right, use lsblk or parted (etc) to see if a partition table is present on your logical volume.

So I am in the process of merging the content of two fileservers, and had the need to extend a logical volume to accommodate some additional data. No problem- that’s one of the benefits of using LVM!

Except after resizing, I ran into a problem:


$ lvextend +150G /dev/vg/lvinquestion
$ resize2fs /dev/vg/lvinquestion
> The filesystem is already 268435200 (4k) blocks long. Nothing to do!

Wait, what? Aside from the fact I could have combined the comments by including the --resizefs option to lvextend, why was resize2fs complaining that there was “Nothing to do!”?

Fortunately SE Arquade user ToxicFrog had the answer:

@bertieb parted reports the partition size, not the filesystem size
If it’s a partitioned LV you need to resize the partition after expanding the VL
*LV

Ah, whoops! I’m not sure why I partitioned the LV (it only had one partition) but I must have done so.

lsblk confirmed the partition:


sdh                                8:112  0   2.7T  0 disk
??sdh1                             8:113  0   2.7T  0 part
  ??md1                            9:1    0  10.9T  0 raid6
  (...)
    ??vg-lv                      253:9    0   1.2T  0 lvm
    ? ??vg-lv1                   253:18   0   1.1T  0 part

So, then what? Well, I used dd to copy the filesystem to a new logical volume, then extended that, and finally removed the original:


# dd if=/dev/dm-18 bs=1M | pv -s 1T |  dd of=/dev/vg/lv-new bs=1M
# lvextend --resizefs -L 1.15T /dev/vg/lv-new
# lvremove /dev/vg/lv
# lvrename vg lv-new lv

(pv was included to give a nice progress indicator, rather than faffing around with SIGUSR1)

And that was that. There was a slight problem with removing the original logical volume, but more on that later…

Browsing MySQL Backups

tl;dr: Seems the quickest way of doing this was to fire up a VM, install mysql-server and mysql-client and browse that way.

I have backups of things. This is important, because as the old adage goes: running without backups is data loss waiting to happen. I’m not sure if that’s the adage, but it’s something resembling what I say to people. I’m a real hit at parties.

I wanted to check the backups of the database powering this blog, as there was a post that could swear I remembered referring to (iterating over files in bash) but couldn’t find. I had a gzipped dump of the MySQL database, and wanted to check that.

zgrep bash mysql.sql.gz | less was my first thought, but that gave me a huge amount of irrelevant stuff.

A few iterations later and I was at zgrep bash mysql.sql.gz | grep -i iterate | grep -i files | grep -v comments and none the wiser. I had hoped there was some tool to perform arbitrary queries on dump files, rather than going through a proper database server, but that’s basically sqlite and to my limited searches, didn’t seem to exist for MySQL.

What I ended up doing was firing up a VM, installing mysql-server and mysql-client and dumping the dump into that server via zcat:

zcat mysql.sql.gz | mysql -u 'root' -p database

And then querying the database: select post_title, post_date from wp_posts where post_title like '%bash%' followed by select post_content from wp_posts where post_title like '%terate%';

And the post is back!

Wanted: One LTO-4/5/6/7 Drive!

I am something of a digital hoarder. I have files dating back to one of the earliest computers that anyone in my family owned. I think I even still have diskettes for an older word processor, the name of which escapes me at the moment. As such, I have slightly more than average storage requirements.

At present I handle these requirements via a Linux fileserver, using 3TB drives RAID6’d via mdadm. On top of that I use LVM to serve up some volumes for Xen, but that’s not strictly relevant to storage.

Looking at the capacities of LTO makes me quite covetous. LTO tapes are small, capacious and reliable– with a few tapes, I could archive a fair amount of data. I could also move the tapes outside my house- and lo, offline offsite backups!

Sadly, drives are expensive, unless you’re stepping back relatively small capacity* LTO-2 drives.

At present, given the cost of drives, some back-of-the-envelope calculations show that for any reasonable** dataset, simply buying hard drives (at time of writing, 3TB is cheapest per GP) is the most cost-effective means of archiving. Given that is where the focus of development is, I don’t think this is likely to change soon.

I’ll just have to wait for a going-out-of-business auction, and hope the liquidators overlook the value of the backup system…