Ruby: Parsing CSV files quickly

03 Aug 2006

I seem to have a limitless stream of excel files coming at me from clients. Most of them are of the same format: first line is the column name and the rest of the lines are data.

Ruby has excellent baked-in capabilities for handling files, data, and even CSV stuff. Still, it was pretty hard for me to figure out how to turn a CSV file into an array of hashes where each cell was named with the correct column name.

So here y’are folks: an easy way to turn CSV files into an array of hashes.

    def csv_to_array(file_location)
      csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
      fields = csv.shift
      csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
    end
  • Simen said: Have you had a look at FasterCSV [fastercsv.rubyforge.org]?
  • s. potter said: what makes FasterCSV faster?
  • Jens said: Hi, this is about 50 times slower than reading the CSV file into an array of arrays. Is this possible or am I missing something obvious? I was using this code before: ~~~~ruby def csvread(filespec) out = {} File.open(filespec, 'r') do |infile| while (line = infile.gets) out << line.split(',') end end return out end ~~~~ Using Rails 1.2.1 on OS X 10.4 (Macbook dual 2GHz Core Duo 2). Jens
  • Danger said: I'll be your code would be loads faster - it looks pretty good (sorry about the poor comment formatting). When I wrote the CSV => array of hashes I valued the hashing of appropriate data far more than speed. I'll probably use your code when I don't need a hash. Thanks!
  • Caleb Jones said: The problem with simply splitting on ',' is that some fields in a CSV may contain commas themselves. In this case, the convention is the wrap such items in quotes ("). Simply splitting on ',' will break when this occurs. Not sure if the Ruby CSV class handles this or not (though it would seem silly to write a whole parsing hierarchy for CSV files that failed on something this simple).
  • Danger said: You're right, this is a very basic implementation of CSV parsing. I imagine fasterCSV does a much better job. I threw this up simply because "ruby csv parse" doesn't return any good hits on Google and I wanted to give folks at least *some* way to make it work.
  • Ara Vartanian said: Lovely. Bravo. Just what I needed.
  • Mike said: What's Up with comments ? :( I may recommend, a book about this post "Parsing CSV files quickly" in Ruby. Check it : http://www.springerlink.com/content/j3n125411240r200/
  • Neil Murphy said: The chapter from the book is charged at $25 - for four pages. Calling it a ripoff seems a serious understatement
  • Scott White said: I was able to simplify the Hash creation by using the array zip method: csv.collect { |record| Hash[*fields.zip(record).flatten ] } though it doesn't turn all the records into strings... but I'm not sure why that's necessary anyway.

Please if you found this post helpful or have questions.