Tired and generally worn out. But I have another data journalism project that I need to get done that might crack open all sorts of things.
To be honest it is more a continuation of previous work that I have been undertaking over the last 18 months or so. Might lead somewhere, might not.Either way I need to do some basic data analysis to see if what I have been told is true.If it is then BINGO. Maybe even double BINGO.With all this talk of data driven this and data driven the other it’s sometimes easy to forget that the best information is almost always what someone tells you.Funny thing being that what to them might be a minor occurrence could be a significant development into investigative work.I have had the Excel files sitting on my hard drive for a couple of weeks but life has been a bit busy.First thing – convert the Excel files with multiple worksheets into individual CSV files.
I use a file conversion service at Zamar.com to do this.
Not any more I don’t – two Excel files uploaded came back empty. Oops. Probably a fault in the files, not the Zamar service but am not taking any chances at the moment.This Excel VBA macro
seems to do the job. Code below:
Dim wSheet As Worksheet
Dim csvFile As String
For Each wSheet In Worksheets
On Error Resume Next
csvFile = CurDir & "\" & wSheet.Name & ".csv" ActiveWorkbook.SaveAs Filename:=csvFile, _ FileFormat:=xlCSV, CreateBackup:=False ActiveWorkbook.Saved = True
Next? Oh joy. Data scrubbing.
I have tried to document a data journalism story before failed miserably. This time I hope to be able to keep a little diary.
It is rare to get a press release that has everything needed to tell a story, let alone a data story.
A few weeks back I noticed an interesting data story from Barnardo’s UK
in a local newspaper in sarf lunnun (or South London if you don’t speak the East End patois).Getting in touch with Dan at Barnardo’s I soon had a press release with real facts in, a cracking model released image and some interesting data.Even data references! How unusual is that?Below is the result of my work.
The image was a little too tight for my liking so had to use context-sensitive fill in Photoshop and more than a little retouching to give myself a few more valuable pixels on the right hand side.At the same time I wanted the typography and data to intrude onto the boy’s image.I tried various styles of bar chart, most of them very slight variations on the finished result, the main option being a yellow key line around the ’81 families in Tower Hamlets’ bar.Barnardo’s liked what I had done and it helped to get the reality of life behind statistics out there.Here is the story
on my hyperlocal site Love Wapping.
Doing data journalism work means I have a common problem popping up most days.
Yet more weird (or not so weird) and wonderful (or not so wonderful) CSV files need to be imported into MySQL for analysis and that means yet another table.So more time spent working out – aka guessing – the optimal values for the fields. After all we don’t want a nasty fat sluggish database do we? No.I get annoyed having to do anything a machine can do so much better than me so had a quick Google and found this very handy bit of information – Determining the optimal Column Widths from a .csv File
– over on DBA StackExchange courtesy of Cristian Porta
.Thanks Cristian!Running PROCEDURE ANALYSE() works a treat (see below) although I found that in MySQLWorkbench I needed to follow these simple instructions
to get the right result. In my version (Community 6.0) you need to look in the ‘SQL Queries’ tab in Preferences not SQL Editor.
|CHAR(7) NOT NULL
|ENUM(’10’,’30’,’50’) NOT NULL
|MEDIUMINT(6) UNSIGNED NOT NULL
|MEDIUMINT(6) UNSIGNED NOT NULL
|ENUM(‘E92000001′) NOT NULL
|ENUM(‘E19000003′) NOT NULL
|ENUM(‘E18000007′) NOT NULL
|CHAR(0) NOT NULL
Anyway it works.
Which means I can now get on with the real business of analysing the data.