How not to parse CSV using Java

I’ve only been on StackOverflow for a short while and already feel like I’m drowning under the the sheer quantity of people asking how to parse CSV.  Most of them start in one a few ways.  So, for the record, here’s how you don’t do it.

Don’t use regular expressions

That’s right.  You don’t use regex.  I almost feel like breaking out a <blink> and <marquee> tags to make this stand out more.  They’re really rather bad for this kind of thing.

String#split() is not your friend

If you’re familiar with Guava you may have come across this quiz in their documentation.  See if you can get it right.

",a,,b,".split(",") returns…

  1. "", "a", "", "b", ""
  2. null, "a", null, "b", null
  3. "a", null, "b"
  4. "a", "b"
  5. None of the above

Good luck with that.

Ready?  It actually returns "", "a", "", "b".  On a side note here, the Guava Splitter class is excellent, check it out.

String tokenizer isn’t any better

Why?  Because none of these methods deal with escaped characters.  You do know how CSV is escaped right?

Wrong.  I bet you don’t.  It can be escaped in all sorts of ways.  My favourite being the old quote a quote method of putting two double quotes in a row (“”).  Brilliant.  Which genius came up with that?

But I want to write a regex.  I love writing regex.

Fine.  Write one for this:

Which I just grabbed as the most recent CSV question on stackoverflow 

So what do I do?

The answer is simple.  You don’t re-invent the wheel.  You use a CSV parser that’s been tested to death against all the weird edge cases you’ve not even thought about and has a nice API that reads something like whatever.parse(someCSV);

My favourite is the Jackson CsvMapper.

Parse CSV rows from a file into arrays of Objects like this (taken from the Jackson docs)

Which is really useful for those fun files with variable length rows and random escape characters.

Or you can map rows directly to instances of your own classes.  It’s as simple as:

There are plenty other libraries out there to choose from, including the much loved OpenCSV.  Good luck!

Update: I’ve written a companion piece on writing csv using jackson

Written by Tom


Leave a Reply