MiniCrib News

As of 31 December, the popular MiniCrib library of SCD cribs is now published in .DOCX (rather than .DOC) format. Way to go! We took the opportunity to improve the way MiniCrib is imported to the Strathspey SCD database.

Along with the e-cribs project founded by Eric Ferguson, the MiniCrib project is an important contributor to the database, and has been for a long time – the collaboration started when the late Charles Upton was still in charge of MiniCrib, and we're happy to say we have a very good working relationship with the current maintainers.

The MiniCrib dataset is published in the shape of a huge Microsoft Word file (now, as we mentioned, in .DOCX format) on the MiniCrib web site. Every night an automated process on the Strathspey database server checks whether a new version has appeared, and if so, this version is downloaded and converted from .DOCX to plain text, which is a lot easier to process. We're using a small tool called docx2txt, which is a nifty little Perl program (those were the days …) that will convert the 2.3-megabyte crib file to plain text in the blink of an eye.

The plain-text file is then fed to another program which will try to match the individual cribs to the corresponding dances in the database. This is not entirely trivial because various dance names have been used several times by different devisers, and there are sometimes variations in spelling which make the process more difficult than it seems. We have found out that, rather than trying to find all the dances whose names match that of a given crib and then deciding which one is correct, it is better to try to identify the deviser first and then find the dance among the dances written by that person (or group of people). Even in the “worst case”, when you are dealing with a dance by John Drewry or John W. Mitchell, that means you will only need to distinguish between 700 or so candidates rather than over 22,000 – plus the chances of a name being used twice is much lower. It's still a bit complicated because we need to deal with variations of “Saint” and “St.”, or “Tenth” and “10th”, to name but a few of the oddities one might encounter. Some dances have an article like “A” or ”The” in the database and don't in the crib file, or the other way round. And so on.

Anyway, to cut a long story short, thanks to a few last-minute additions and changes by myself (in the importer code) and Murrough Landon (in the database), we're now in a position to match 7120 out of the 7126 cribs in the MiniCrib file to their corresponding dances in the database automatically. Out of the remaining six, four are inofficial variations of well-known dances (e.g., “Polharrow Burn” for three couples!?) which we don't carry in the database. One is a dance which we're listing, but which exists in so many variations that offering a crib would be futile, and the last one is a dance which probably exists but has never been officially published; we're in the process of tracking down the devisers to find out about its status and (hopefully) add it to the database.

Of course this is an ongoing process because the MiniCrib file is updated every six weeks or so, often with a handful of new dances added that the database may or may not have heard about. Murrough usually receives a set of proposed additions ahead of time, so he can add the new dances to the database, if necessary, before the new MiniCrib version becones official. All in all this is a great example of cross-project collaboration in the SCD world, and we're all very satisfied that things are working this way, and have done so smoothly for a number of years now. Happy dancing, everyone!