I’ve only been on StackOverflow for a short while and already feel like I’m drowning under the the sheer quantity of people asking how to parse CSV. Most of them start in one a few ways. So, for the record, here’s how you don’t do it.
Don’t use regular expressions
That’s right. You don’t use regex. I almost feel like breaking out a <blink> and <marquee> tags to make this stand out more. They’re really rather bad for this kind of thing.
String#split() is not your friend
If you’re familiar with Guava you may have come across this quiz in their documentation. See if you can get it right.
",a,,b,".split(",") returns…
- "", "a", "", "b", ""
- null, "a", null, "b", null
- "a", null, "b"
- "a", "b"
- None of the above
Good luck with that.
Ready? It actually returns "", "a", "", "b". On a side note here, the Guava Splitter class is excellent, check it out.
String tokenizer isn’t any better
Why? Because none of these methods deal with escaped characters. You do know how CSV is escaped right?
Wrong. I bet you don’t. It can be escaped in all sorts of ways. My favourite being the old quote a quote method of putting two double quotes in a row (“”). Brilliant. Which genius came up with that?
But I want to write a regex. I love writing regex.
Fine. Write one for this:
1 2 |
col1,col2 ABCD,"<p>Its Awesome ,I want to fly..</p>" |
Which I just grabbed as the most recent CSV question on stackoverflow
So what do I do?
The answer is simple. You don’t re-invent the wheel. You use a CSV parser that’s been tested to death against all the weird edge cases you’ve not even thought about and has a nice API that reads something like whatever.parse(someCSV);
My favourite is the Jackson CsvMapper.
Parse CSV rows from a file into arrays of Objects like this (taken from the Jackson docs)
1 2 3 4 |
CsvMapper mapper = new CsvMapper(); mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY); File csvFile = new File("input.csv"); // or from String, URL etc MappingIterator<Object[]> it = mapper.reader(Object[].class).readValues(csvFile); |
Which is really useful for those fun files with variable length rows and random escape characters.
Or you can map rows directly to instances of your own classes. It’s as simple as:
1 2 3 4 5 6 7 8 9 10 11 12 |
public class CSVPerson{ public String firstname; public String lastname; //etc } CsvMapper mapper = new CsvMapper(); CsvSchema schema = CsvSchema.emptySchema().withHeader().withColumnSeparator(delimiter); MappingIterator<CSVPerson> it = = mapper.reader(CSVPerson).with(schema).readValues(input); while (it.hasNext()){ CSVPerson row = it.next(); } |
There are plenty other libraries out there to choose from, including the much loved OpenCSV. Good luck!
Update: I’ve written a companion piece on writing csv using jackson
Can you please give an example using string as well. I would appreciate it.Thanks.
Hi Ruby, I’m not sure what you mean. Could you explain?
[…] while back I wrote a post on how not to parse CSV using Java. It included a few pointers on how to pull CSV data into an application but not how to spit it […]