Counterpoint - CSV is absolutely atrocious and should be cast into the Sun.
It's unkillable, like many eldritch horrors.
> The specification of CSV holds in its title: "comma separated values". Okay, it's a lie, but still, the specification holds in a tweet and can be explained to anybody in seconds: commas separate values, new lines separate rows. Now quote values containing commas and line breaks, double your quotes, and that's it. This is so simple you might even invent it yourself without knowing it already exists while learning how to program.
Except that's just one way people do it. It's not universal and so you cannot take arbitrary CSV files in and parse them like this. You can't take a CSV file constructed like this and pass it into any CSV accepting program - many will totally break.
> Of course it does not mean you should not use a dedicated CSV parser/writer because you will mess something up.
Yes, implementers often have.
> No one owns CSV. It has no real specification
Yep. So all these monstrosities in the real world are all... maybe valid? Lots of totally broken CSV files can be parsed as CSV but the result is wrong. Sometimes subtly.
> This means, by extension, that it can both be read and edited by humans directly, somehow.
One of the very common ways they get completely fucked up, yes. Someone goes and sorts some rows and boom broken, often unrecoverable data loss. Someone doesn't correctly add or remove a comma. Someone mixes two files that actually have differently encoded text.
> CSV can be read row by row very easily without requiring more memory than what is needed to fit a single row.
CSV must be parsed row by row.
> By comparison, column-oriented data formats such as parquet are not able to stream files row by row without requiring you to jump here and there in the file or to buffer the memory cleverly so you don't tank read performance.
Sort of? Yes if you're building your own parser but who is doing that? It's also not hard with things like parquet.
> But of course, CSV is terrible if you are only interested in specific columns because you will indeed need to read all of a row only to access the part you are interested in.
Or if you're interested in a specific row, because you're going to have to be careful about parsing out every row until you get there.
CSV does not have a row separator. Or rather it does but it also lets you have that row separator appear and not mean "separate these rows" so you can't simply trust it.
> But critics of CSV coming from this set of pratices tend to only care about use-cases where everything is expected to fit into memory.
Parquet uses row groups which means you can stream chunks easily, those chunks contain metadata so you can easily filter rows you don't need too.
I much more often need to keep the whole thing in memory working with CSV than parquet. With parquet I don't even need to be able to fit all the rows on disk I can read the chunk I want remotely.
> CSV can be appended to
Yeah that's easier. Row groups means you can still do this though, but granted it's not as easy. *However* I will point out that absolutely nothing stops someone completely borking things by appending something that's not exactly the right format.
> CSV is dynamically typed
Not really. Everything is strings. You can do that with anything else if you want to. JSON can have numbers of any size if you just store them as strings.
> CSV is succinct
Yes, more so than jsonl, but not really more than (you guessed it) parquet. Also it's horrific for compression.
> Reverse CSV is still valid CSV
Get a file format that doesn't absolutely suck and you can parse things in reverse if you want. More usefully you can parse just sections you actually care about!
> Excel hates CSV
Helpfully this just means that the most popular way of working with tabular data in the world doesn't play that nicely with it.