Category Archives: Software

dc.js + crossfilter.js + d3.js = huh? Part II

As I began my first explorations of dc.js and crossfilter I was more than a little baffled by the need for dc.js and crossfilter and then realised that dc.js has native support for crossfilter. Doh!

I then found this great Hacker News discussion about how dc.js, crossfilter.js and d3.js relate to each other. Below are a few quotes but you really should read the whole thing.

Love the paintbrush description of D3. I only realised this after spending an awfully long time coding a bar chart in D3… Ooops! But I was learning D3 at my kitchen table – that’s my excuse and I am sticking to it.

“dc.js is the ‘glue’ that holds d3 and crossfilter together. So I can create a crossfilter, generate multiple dimensions, group those dimensions, then render multiple charts.”

“D3 is like a paintbrush — you can make anything with it if you’re DaVinci, but it’s a very low-level tool so you need to be a master if you want to make anything that’s not my drippy kindergarten giraffe drawing.”

“The benefits of crossfilter or dc.js over plain d3.js is the layer of abstraction making it easier to use.

“Crossfilter seems really cool – but since it’s another library, what is it that Dc is offering?”

“dc.js sits on top of D3 and provides glue that between multiple D3 charts and crossfilter”

“dc.js marries crossfilter.js with d3.js — that’s it in a nutshell.”

Does any of that make any sense?

No? Well it should come as no surprise to anyone who has recently learned D3 that the best explanation comes from the great D3 Noob resource.

This is the D3 Noob explanation which I think is the best I explanation of the dc.js + crossfilter.js + d3.js thing that I have read:

“…crossfilter isn’t a library that’s designed to draw graphs. It’s designed to manipulate data. D3.js is a library that’s designed to manipulate graphical objects (and more) on a web page. The two of them will work really well together, but the barrier to getting data onto a web page can be slightly daunting because the combination of two non-trivial technologies can be difficult to achieve.

This is where dc.js comes in. It was developed by Nick Qi Zhu and the first version was released on the 7th of July 2012.

Dc.js is designed to be an enabler for both libraries. Taking the power of crossfilter’s data manipulation capabilities and integrating the graphical capabilities of d3.js.”

Better? Good. Next thing is to have a look at this Bare bones structure for a dc.js and crossfilter page then read this excellent explanation of Crossfilter, dc.js and d3.js for Data Discovery and then read this Introduction to dc.js.

There are times when I wonder how I learned coding before the existence of the interwebs and people who share their knowledge so freely.

Then I remember it was my friends on my Artificial Intelligence course who helped me get my head around Prolog’s tail end recursion.

Further reading

Without identifiers the goals of Open Data are unachievable

By pure coincidence the day after posting my previous entry I saw an interesting tweet from the nice ODI people and the equally nice Thomson Reuters people about linking Open Data with identifiers.

The ODI and Thomson Reuters have written a white paper about this issue which is incredibly important which you can find here:  Open Data Institute and Thomson Reuters, 2014, Creating Value with Identifiers in an Open Data World, retrieved from

As Dave Weller, Chief Enterprise Architect for Thomson Reuters, points out in this blog post,   the human mind links facts to other facts to put the world in context.

If you know Fact A the person in front of you with an axe raised is your Fact B Uncle Joe and you know Fact C you need logs chopped for the fire all is well. If Fact A the person in front of you with an axe raised is a Fact B complete stranger who has just Fact C broken into your house circumstances may not be so peachy. It’s basic logic.

In Open Data Digging Around For Interesting Stuff Land it is not often easy to know which facts are which and consequently the validity of any assumptions based on those facts. Your logic chain may be faulty or just plain broken.

Is Joe Bloggs Acme Widget Limited of 23 Acacia Avenue owned by the same Joe Bloggs who also owns Joe Bloggs Charity Widgets for Africa at 24 Acacia Avenue? The only way to know for certain is by having access to data points that are unique identifiers and so allow assumptions to be made on facts, no guesses.

For the last year I have encountered this issue on a daily basis while undertaking data journalism work relating to payments by Tower Hamlets Council.

Proof it is worth writing to your MP

At around the same time an update to the Code of Recommended Practice for Local Authorities on Data Transparency was out for consultation so I copied my views to Jim Fitzpatrick who is my local MP. Text below.

Email to Jim Fitzpatrick MP 15th January 2014


Re: Code of Recommended Practice for Local Authorities on Data Transparency

“I work in new media and also run a hyperlocal site, Love Wapping, in Wapping, Tower Hamlets. For various reasons I have spent the last two months working on a data journalism project relating to voluntary sector grant allocations by Tower Hamlets Council. This research is completely dependent on being able to identify the recipients of grants with reasonable accuracy. 100% would be nice but i live in the real world.

As a result of spending weeks trying to reconcile whatTower Hamlets Council claims to have given in grants against the organisations that are supposed to be the recipients of grants i know from experience that there is a gaping hole in the legislation that needs to be fixed. Here it is:

Appendix A p43-44 Grants to voluntary, community and social enterprise organisations

Information Title

The proposal is that ‘information which must be published’ includes (among other things) the beneficiary of the grant. Well fine. But – and this is the flaw – information recommended for publication (my emphasis) includes providers registration numbers where the provider is from the voluntary or community sector.

‘Recommended’ is wrong. And this is why.

I have spent the last month working through the different organisations that Tower Hamlets Council have made grants to and I know that if grants or any other monies are given to organisations and the details of those organisations do not include the organisations registration number (Company, Charity, Mutual, etc.) it is impossible to accurately map to a grant to a organisation.

Additionally I have found at least one instance where two organisations with exactly the same name operate in the same part of Tower Hamlets doing similar work. And without the registration number of the beneficiary of the grant it’s a little tricky to work out which one got the cash.

So all I am saying is that information which must be published has to include providers’s registration number. Without this requirement the legislation will be toothless.

Many charities have a corporate identity as well, so without the registration number of the organisation that the funds were given to how is it possible to work out where the money has gone?”

Jim forwarded my views to Brandon Lewis MP, Minister for Local Authorities at the Department for Communities and Local Government (DCLG).

I have to admit that until writing this blog post I had not checked the resulting update to the Code of Recommended Practice for Local Authorities on Data Transparency. Sorry!

So I thought I should. And I was  delighted to find this on page 15 relating to payments to Grants to voluntary, community and social enterprise organisations.

33. For each identified grant, the following information must be published as a minimum:

  • date the grant was awarded
  • time period for which the grant has been given
  • local authority department which awarded the grant
  • beneficiary
  • beneficiary’s registration number 
  • summary of the purpose of the grant, and
  • amount

Woo hoo! Life just got a lot easier! Thanks Jim and Brandon! Of course this change no doubt had nothing whatsoever to do with my input, but I don’t care, it’s there.

Spot the difference

Now it is possible to identify the difference between a charity that gets a grant and a company with the same name who gets a grant. What fun!

But this is no-brainer stuff.  The ODI / Thomson Reuters work promises a much more powerful system, another step towards a semantic web. However it ain’t going to be easy.

The white paper addresses eight challenges for Open Data that need to be overcome for an open system of linked identifiers to work.

  1. Data is ungrounded
  2. Lack of reconciliation options
  3. Lack of identifier scheme documentation
  4. Proprietary identifier schemes
  5. Rationalising multiple identities
  6. Inability to resolve identifiers
  7. Fragile identifiers
  8. Identifier recycling and evolution

The good news is that as Thomson Reuters is a big beastie in Data Land they have done lots of interesting work on this.

Calais? Mon dieux!

Part of this work is something called OpenCalais.  This is a web service that automatically creates rich semantic metadata for submitted content.

OpenCalais schematic
OpenCalais schematic

Oh yes. Me likey! And to be honest what I really like about this is that named entities includes people. Now that could be very powerful indeed.

The OpenCalais site is a bit dull and unloved but there is an OpenCalais WordPress plugin (not updated for a year and well buggy on current WP installs) and a Drupal module too.

Bottom line with linked identities is that you need legislation to force people like Local Authorities to undertake even the most basic of ID tagging, people like ODI / Thomson Reuters to take the lead, some decent software and Open Data enthusiasts to see what they can do.

Because without identifiers the goals of Open Data are unachievable. Simple as that. We might as well all go home and have a nice cup of tea instead.