Tag Archives: Thomson Reuters

Without identifiers the goals of Open Data are unachievable

By pure coincidence the day after posting my previous entry I saw an interesting tweet from the nice ODI people and the equally nice Thomson Reuters people about linking Open Data with identifiers.

The ODI and Thomson Reuters have written a white paper about this issue which is incredibly important which you can find here:  Open Data Institute and Thomson Reuters, 2014, Creating Value with Identifiers in an Open Data World, retrieved from thomsonreuters.com/site/data-identifiers/

As Dave Weller, Chief Enterprise Architect for Thomson Reuters, points out in this blog post,   the human mind links facts to other facts to put the world in context.

If you know Fact A the person in front of you with an axe raised is your Fact B Uncle Joe and you know Fact C you need logs chopped for the fire all is well. If Fact A the person in front of you with an axe raised is a Fact B complete stranger who has just Fact C broken into your house circumstances may not be so peachy. It’s basic logic.

In Open Data Digging Around For Interesting Stuff Land it is not often easy to know which facts are which and consequently the validity of any assumptions based on those facts. Your logic chain may be faulty or just plain broken.

Is Joe Bloggs Acme Widget Limited of 23 Acacia Avenue owned by the same Joe Bloggs who also owns Joe Bloggs Charity Widgets for Africa at 24 Acacia Avenue? The only way to know for certain is by having access to data points that are unique identifiers and so allow assumptions to be made on facts, no guesses.

For the last year I have encountered this issue on a daily basis while undertaking data journalism work relating to payments by Tower Hamlets Council.

Proof it is worth writing to your MP

At around the same time an update to the Code of Recommended Practice for Local Authorities on Data Transparency was out for consultation so I copied my views to Jim Fitzpatrick who is my local MP. Text below.

Email to Jim Fitzpatrick MP 15th January 2014

To: Transparencycode@communities.gsi.gov.uk

Re: Code of Recommended Practice for Local Authorities on Data Transparency

“I work in new media and also run a hyperlocal site, Love Wapping, in Wapping, Tower Hamlets. For various reasons I have spent the last two months working on a data journalism project relating to voluntary sector grant allocations by Tower Hamlets Council. This research is completely dependent on being able to identify the recipients of grants with reasonable accuracy. 100% would be nice but i live in the real world.

As a result of spending weeks trying to reconcile whatTower Hamlets Council claims to have given in grants against the organisations that are supposed to be the recipients of grants i know from experience that there is a gaping hole in the legislation that needs to be fixed. Here it is:

Appendix A p43-44 Grants to voluntary, community and social enterprise organisations

Information Title

The proposal is that ‘information which must be published’ includes (among other things) the beneficiary of the grant. Well fine. But – and this is the flaw – information recommended for publication (my emphasis) includes providers registration numbers where the provider is from the voluntary or community sector.

‘Recommended’ is wrong. And this is why.

I have spent the last month working through the different organisations that Tower Hamlets Council have made grants to and I know that if grants or any other monies are given to organisations and the details of those organisations do not include the organisations registration number (Company, Charity, Mutual, etc.) it is impossible to accurately map to a grant to a organisation.

Additionally I have found at least one instance where two organisations with exactly the same name operate in the same part of Tower Hamlets doing similar work. And without the registration number of the beneficiary of the grant it’s a little tricky to work out which one got the cash.

So all I am saying is that information which must be published has to include providers’s registration number. Without this requirement the legislation will be toothless.

Many charities have a corporate identity as well, so without the registration number of the organisation that the funds were given to how is it possible to work out where the money has gone?”

Jim forwarded my views to Brandon Lewis MP, Minister for Local Authorities at the Department for Communities and Local Government (DCLG).

I have to admit that until writing this blog post I had not checked the resulting update to the Code of Recommended Practice for Local Authorities on Data Transparency. Sorry!

So I thought I should. And I was  delighted to find this on page 15 relating to payments to Grants to voluntary, community and social enterprise organisations.

33. For each identified grant, the following information must be published as a minimum:

  • date the grant was awarded
  • time period for which the grant has been given
  • local authority department which awarded the grant
  • beneficiary
  • beneficiary’s registration number 
  • summary of the purpose of the grant, and
  • amount

Woo hoo! Life just got a lot easier! Thanks Jim and Brandon! Of course this change no doubt had nothing whatsoever to do with my input, but I don’t care, it’s there.

Spot the difference

Now it is possible to identify the difference between a charity that gets a grant and a company with the same name who gets a grant. What fun!

But this is no-brainer stuff.  The ODI / Thomson Reuters work promises a much more powerful system, another step towards a semantic web. However it ain’t going to be easy.

The white paper addresses eight challenges for Open Data that need to be overcome for an open system of linked identifiers to work.

  1. Data is ungrounded
  2. Lack of reconciliation options
  3. Lack of identifier scheme documentation
  4. Proprietary identifier schemes
  5. Rationalising multiple identities
  6. Inability to resolve identifiers
  7. Fragile identifiers
  8. Identifier recycling and evolution

The good news is that as Thomson Reuters is a big beastie in Data Land they have done lots of interesting work on this.

Calais? Mon dieux!

Part of this work is something called OpenCalais.  This is a web service that automatically creates rich semantic metadata for submitted content.

OpenCalais schematic
OpenCalais schematic

Oh yes. Me likey! And to be honest what I really like about this is that named entities includes people. Now that could be very powerful indeed.

The OpenCalais site is a bit dull and unloved but there is an OpenCalais WordPress plugin (not updated for a year and well buggy on current WP installs) and a Drupal module too.

Bottom line with linked identities is that you need legislation to force people like Local Authorities to undertake even the most basic of ID tagging, people like ODI / Thomson Reuters to take the lead, some decent software and Open Data enthusiasts to see what they can do.

Because without identifiers the goals of Open Data are unachievable. Simple as that. We might as well all go home and have a nice cup of tea instead.