This one is for the non-programmers out there.

I’m writing a program that takes electoral registers from around the country and sticks in the same place.  The reason for this is that it’s a Good Thing To Do.

Hopefully I’ve not lost you yet.

Local authorities are obliged by law to send their unabridged registers to the library in “electronic format”.  What the morans who made this legislation failed to realise (or realised but didn’t care about) is that “electronic format” is a virtually meaningless term.  So we get eleventybillion different formats that just about have this in common; they contains a shed load of names and a shed load of addresses.  We get umpteen types of spreadsheets, masses of word documents and piles of PDFs.

Names are fun.  Mine can be written like this: Tom Demeranville.  It can also be written as Thomas N. Demeranville.  Or Demeranville, T.  Now for a human that’s easy to recognise as the same name, it’s easy to parse. For a computer it’s a bloody nightmare.

Addresses are fun too.  What exactly does Address1 mean?  Or Address7?  Why did they stick the postcode in Address5 and leave Address1 empty?  Those and other questions I’ve given up on answering but the code works nonetheless.

So what I do all day is write code that can parse this data into a well defined format, stick it in a database of some kind and makes it available for search. Theres some, er, oversimplification in that description but that’s about the size of it.

Hopefully this will make the lives of those that currently curate the data and those who want to search it much, much easier.

