Tag Archives: configuration

How to work out best field types from a .csv file for MySQL

Doing data journalism work means I have a common problem popping up most days.

Yet more weird (or not so weird) and wonderful (or not so wonderful) CSV files need to be imported into MySQL  for analysis and that means yet another table.

So more time spent working out – aka guessing – the optimal values for the fields. After all we don’t want a nasty fat sluggish database do we? No.

I get annoyed having to do anything a machine can do so much better than me so had a quick Google and found this very handy bit of information – Determining the optimal Column Widths from a .csv File – over on DBA StackExchange courtesy of Cristian Porta.

Thanks Cristian!

Running PROCEDURE ANALYSE() works a treat (see below) although I found that in MySQLWorkbench I needed to follow these simple instructions to get the right result. In my version (Community 6.0) you need to look in the ‘SQL Queries’ tab in Preferences not SQL Editor.

Optimal_fieldtype
CHAR(7) NOT NULL
ENUM(’10’,’30’,’50’) NOT NULL
MEDIUMINT(6) UNSIGNED NOT NULL
MEDIUMINT(6) UNSIGNED NOT NULL
ENUM(‘E92000001’) NOT NULL
ENUM(‘E19000003’) NOT NULL
ENUM(‘E18000007’) NOT NULL
CHAR(0) NOT NULL

Anyway it works.

Which means I can now get on with the real business of analysing the data.