Tower Hamlets Council payments to suppliers over £500

Open data? Open government? How is that relevant to me?

Simple. It’s your money. If you lost a tenner you would want to know where it went. Same principle applies to local councils. You want to know where the cash goes.

Below you can find links to files that contain details of payments over £500 by Tower Hamlets Council to its suppliers. That’s around 158,000 transactions between 2010 and 2013.

Handy, huh?

Scrubbed data for download (CSV)

(Updated Wednesday 16th April 2014. These datasets will be migrated to Openspending in future. )

Notice: These files contain public sector information licensed under the Open Government Licence v2.0.

You can find all the original spreadsheets and Comma Separated Value (CSV) files on the Tower Hamlets Council website. (A word of warning. The original data is held in 23 Excel spreadsheets. And 177 CSV files. )

But how is this (a) legal and (b) possible? Quite simple really.

“The public should be able to hold local councils to account about the services they provide. To do this, people need information about what decisions local councils are taking, and how local councils are spending public money.”

That’s what the Department for Communities and Local Government (DCLG) wants and who am I to argue?

Indeed our very own Council reinforces the fact that we as residents should have a poke around. This is what Tower Hamlets says:

“One of the benefits to the community in publishing this data is that it means that our spending will come under greater scrutiny by local people. We hope it will inform people about what we do, and encourage people to challenge how we spend money. It is your money and we welcome comments.”

So in December 2013 I thought I would have a look at the data Tower Hamlets Council published and see if there was anything interesting in it.

And I am still wondering.

Because as with many things in life having a look at this ‘open data’ was not quite as straightforward as it could be.

Scrub a dub dub

Those with a leaning towards geekishness will know all too well what I mean when I say I spent a huge amount of time ‘data scrubbing’.

For those without pen pocket protectors, data scrubbing means taking raw data (the stuff in the Council’s files) and going through every cell in every file of all 119,000 + records and getting it into a basic clean format that is usable.

Sometimes you can write programs to do this, other times you use Excel, sometimes it comes down to a very boring manual task. Bottom line is everything needs to be nice and tidy.

That’s why you will find links to only seven files on this web page compared to 200 on the Tower Hamlets Council site.

The only think I have added to the files is a unique id number. I would have used the transaction number but there seems to be some duplication in the original data.

Fido! Breakfast time!

In fact the original data is a mess. The proverbial dogs breakfast.

Different fields. Different names for fields. Varying file names. No consistency in file names. Data values transposed. Descriptions truncated. Nasty.

A less charitable person than myself might suspect that Tower Hamlets Council was trying to conceal something. But obviously that is not the case. I mean, what on earth would they have to hide?

I think the exceptionally poor quality of the original 200 data files is just down to attitude.

The Council might publicly state that they”encourage people to challenge how we spend money” but my experience of trying to do just that indicates the opposite.

Before I started to write this I was going to sum this up as Tower Hamlets Council only abiding by the spirit of open data. But that is not correct.

Just by dumping a couple of hundred files onto a website does not comply with either the spirit or the letter of the guidelines to Local Councils for publishing data.

Yes Tower Hamlets have published details of all spending to suppliers over £500. But the only people who can make use of it are those people like me who have the experience, skills and lack of a social life to spend the time sorting the data out and turning into information.

And that is why these files are published here.

Spot the problem

Just to add insult to injury, even once the data has been scrubbed and turn into nice clean files that can be imported into a MySQL database the information in them is somewhat opaque.

A random example of Tower Hamlets spending data

DirectorateServiceDivisionResponsible UnitExpense TypePayment DateTransaction NumberNet AmountSupplier
Communities & LocalitiesCultural and Related ServicesCulture & HeritageCommunity Services GrantsVoluntary Associations24/12/20125990563£3420OSMANI TRUST

Spot the problem with this? The transaction record tells you everything apart from what the Osmani Trust was paid £3,420 for. Eggs? Rockets? Squirrel hats? Who knows. Not us.

We can at least see who has been paid. We do not know what they have been paid for.

The Council must know. But for some reason have decided not to reveal this. Remember that all this data is just a report from some big and nasty computer system in the Town Hall that spends all its time just tracking money.

So it should know what it’s suppliers have been paid for.

But this record – and the other 118,999 records published – does not have the fundamental information Tower Hamlets residents need.

What does Tower Hamlets Council spend our money on? We still don’t know. It seems that other Councils only publish spending transaction to this level of detail which is not much use. 

So get challenging people!

Of course having converted this spending data to an accessible format and published it for you all to enjoy I thought I might do some digging and challenging myself. More on this in future posts.

Fortunately for me I have the Wapping Mole to help me out.

For more information:

This material is Open Data

Mac OS X databases for data journalism

Guess what? If you dive into the murky waters of data journalism you are going to have a lot of data to manage. NSS. 

Excel Mac sucks

For my latest project I have been doing the usual Excel thing but it’s not much fun. One reason is because Excel Mac sucks big time. I have seriously considered getting a Windows PC for the sole reason that Excel PC is better.

But I haven’t gone down that road yet.

Whatever your preferred initial data scrubbing tool at some point you are going to end up with lots of data on your Mac that you need to interrogate on a regular basis. So you need a database.

Lots of messy data

My current project uses graph databases for Social Network Analysis (SNA) but way before I get anywhere need nodes and edges I need to consolidate lots of messy data from different sources. So its relational database (RDBMS) first then off into SNA land with Gephi.

Again being a Mac OS X user your choice of RDBMS is limited.

I do have a copy of Bento sitting on a shelf and that would probably do the job for starters but, in true Mac software fashion, it has been discontinued.

After some Duck Duck Go’ing and a little Googling I have come up with these three candidates.

Not sure which one(s) I will try as yet so will report back. My main requirements for a database are:

      • It works
      • The interface doesn’t look like a kidnapper’s ransom note
      • It works
      • Ideally exports JSON (I know there is an add-on for Sequel Pro for this)
      • You don’t have to have a degree in computing to use it (I have a couple but prefer to keep them for emergencies)
      • Donation-ware (if I use it I will pay for it, if I don’t I won’t), ideally Open Source

Gephi can be hooked up to MySQL so that’s a huge advantage for using this sort of software.

For the time being I am off to Data Scrubbing in Excel Land.  I spent Christmas morning there and it wasn’t too bad to be honest.

Site consolidation – the tech is the least of your worries

One of the tasks I find myself undertaking more frequently is site consolidation.

The typical scenario is Huge Global Corporation Inc. has developed (and i use that term very loosely) their web presence over a period of years with no strategic plan. Yes each of their different websites may serve the purposes of a different part of Huge Global Corporation Inc. but none of them connect to each other on even the most basic level. Forget APIs and data flow (big data? yeah right!), send an email via one and someone somewhere is probably printing it out and putting it on a colleagues desk. I do not exaggerate.

How many sites do we have?

One company I worked with had between 70 and 140 websites. They did not actually know the precise number. Scary, huh?

But let’s say you have something more reasonable like a corporate who has 19 websites. And they want them all to play nice. Even all reflect the corporate brand maybe? (Don’t get me started on digital brand guidelines…)

First thing to do is create what I call a ‘universe map’, a schematic that shows all the existing sites (and apps, and Facebook pages, and Twitter feeds) and how they relate to each other.  Oh and how many different Content Management Systems they all use.

Then if you actually have accurate analytics (again, very unlikely) you can have a look and see where the traffic is flowing within and between the sites.

Business Requirements. Always handy.

Then you can refer back to what Huge Global Corporation Inc. now wants in terms of their business requirements (if they have them).

So you look at what they have and what they want and come up with a pragmatic plan.  An information architecture. That of course sits on top of the technical architecture. The  two will overlap but remember that your users do not give a stuff what OS a server is running.

But then you hit the real killer of projects.


And so maybe the real skill of an Information Architect is in navigating the internal rivalries of Huge Global Corporation Inc., not the server configurations.